Smith, Cameron Maurice (2018-08). A Toolset for Mining GitHub Repositories in Educational Software Projects. Master's Thesis. Thesis uri icon

abstract

  • In this research, we investigate how the mining of student software repository data can be useful in capturing development analytics in educational software projects. Our methodology was to demonstrate the feasibility of extracting and analyzing software repository data automatically and show examples of how to analyze the obtained data. We designed an application toolset that works with GitHub, a web-based version control platform that Texas A&M University makes freely available to the students. Our toolset can be used with GitHub software repositories hosting programming assignments developed by students as part of their coursework. We consider how the analysis of information available in a software repository revision history can enable inspection of student programming assignment progression behaviors. For example, for a given programming assignment, using analytics derived from a set of corresponding student software repository changelogs, one can generate assignment progression statistics. As a result of the exploratory phase of this research, we demonstrate usage of our toolset with anonymized student GitHub repository data from two previous courses taught at Texas A&M University. We conclude that it is feasible to automate the extraction of student GitHub repository data that may lead to valuable observations about student software project development patterns. We make our tools available to the community so that other relevant questions regarding the relationship between software development analytics and student learning can be explored.
  • In this research, we investigate how the mining of student software repository data can be useful in capturing development analytics in educational software projects. Our methodology was to demonstrate the feasibility of extracting and analyzing software repository data automatically and show examples of how to analyze the obtained data.
    We designed an application toolset that works with GitHub, a web-based version control platform that Texas A&M University makes freely available to the students. Our toolset can be used with GitHub software repositories hosting programming assignments developed by students as part of their coursework. We consider how the analysis of information available in a software repository revision history can enable inspection of student programming assignment progression behaviors. For example, for a given programming assignment, using analytics derived from a set of corresponding student software repository changelogs, one can generate assignment progression statistics. As a result of the exploratory phase of this research, we demonstrate usage of our toolset with anonymized student GitHub repository data from two previous courses taught at Texas A&M University. We conclude that it is feasible to automate the extraction of student GitHub repository data that may lead to valuable observations about student software project development patterns. We make our tools available to the community so that other relevant questions regarding the relationship between software development analytics and student learning can be explored.

publication date

  • August 2018