Weng, Wenting (2022-03). Applying Statistical Methods in Clustered Educational Data. Doctoral Dissertation. Thesis uri icon

abstract

  • As technologies have been used in education, data have been generated within technologies or collected outside. How educators can utilize data has become a challenge. Therefore, a systematic literature review was conducted in the first study. The review noted the impact from previous learning analytics and educational data mining studies uncovering sample and methodological characteristics of the studies. The findings showed every aspect of the studies, including research objectives, learning environments, education levels, data preprocessing tasks, data analysis methods, data tools, sample sizes, and feature information. Additionally, big data in education can support the application of learning theories into practices. The design and improvement of technologies can use these theories as underpinnings. The second study applied mixed effects Random Forest (MERF), the random effects expectation-maximization recursive partitioning method (RE-EM Tree), hierarchical linear modeling (HLM), and regular Random Forest (RF). The comparison results of these methods have shown that MERF generated the most accurate models. RE-EM Tree and HLM achieved similar accuracy. The advantages and disadvantages of each method were explained. The results indicated that MERF was more appropriate than RF in clustered data and choosing which method depended on a research or project purpose. When the purpose is to predict students' learning performance, MERF can be the optimal method choice. When the purpose is to detect the relationship between predictor and response variables and examine each variable's impact, RE-EM Tree and HLM will better serve the purpose. Whether we should select RE-EM Tree or HLM can depend on the size of data dimension. Considering the data dimension, HLM was applied in the third study to examine the relationship between student information and communications technology (ICT) related factors and learning performance in mathematics and science moderated by school-level factors. The results showed the importance of ICT related factors and indicated that schools with higher students' socio-economic status yielded better learning outcomes in mathematics and science as well as better supported ICT use. The shortage of school resources had an interaction effect with students' ICT use at school. School size was also important for students' mathematics achievements.

publication date

  • March 2022