Dropout Prediction by Machine Learning and Natural Language Processing to improve student engagement in online education services
Abstract: In the education industry, the needs of online learning are significantly increasing. As a result, engaging students with data analysis is getting more crucial. The purpose of this paper is to analyze the student satisfaction survey results and investigate which factors affect student’s course to prevent future dropout in online schools. Those surveys were analyzed by machine learning with Feature Importance to predict students who are probably going to churn ensuring interpretability. In addition to that, sentiment analysis was performed for free comments text by natural language processing, which revealed keywords that frequently appear in positive and negative comments. The experimental results show that the accuracy rate of the machine learning model has reached 90%, and the AUC value is more than 90%. The model can be used to predict whether the student may dropout the school and provide a lot of valuable information to understand which factors affect student’s satisfaction.