UPM Institutional Repository

Leveraging data lake architecture for predicting academic student performance


Citation

Abdul Rahim, Shameen Aina and Sidi, Fatimah and Affendey, Lilly Suriani and Ishak, Iskandar and Nurlankyzy, Appak Yessirkep (2024) Leveraging data lake architecture for predicting academic student performance. International Journal on Advanced Science, Engineering and Information Technology, 14 (6). pp. 2121-2129. ISSN 2088-5334; eISSN: 2460-6952

Abstract

In today's rapidly evolving landscape of higher education, the effective management and analysis of academic data have become increasingly challenging, particularly in the context of the 3Vs of Big Data: volume, variety, and velocity. The amount of data produced by educational institutions has increased dramatically, including student records. This flood of data originates from various sources and takes several forms, such as learning management systems and student information systems. Hence, in education, data analytics and predictive modeling have become increasingly significant in acquiring insights into student performance, such as identifying at-risk students who are most likely to fail their courses. This study proposes a novel approach for predicting student academic performance, particularly identifying at-risk students, by leveraging a data lake architecture. The proposed methodology comprises the ingestion, transformation, and quality assessment of a combined data source from Universiti Putra Malaysia's Student Information System and learning management system within the data lake environment. With its parallel processing capabilities, this centralized data repository facilitates the training and evaluation of various machine learning models for prediction. In addition to forecasting the student performance, appropriate machine learning algorithms such as Support Vector Classifier, Naive Bayes, and Decision Trees are used to build prediction models by using the data lake's scalability and parallel processing capabilities. This study has laid a solid groundwork for using data architecture to improve students’ performance.


Download File

[img] Text
117456.pdf - Published Version
Available under License Creative Commons Attribution.

Download (1MB)

Additional Metadata

Item Type: Article
Divisions: Faculty of Computer Science and Information Technology
DOI Number: https://doi.org/10.18517/ijaseit.14.6.12408
Publisher: Insight Society
Keywords: Data analytics; Data lake; Machine learning algorithms; Predictive modeling; Student performance
Depositing User: Ms. Nur Faseha Mohd Kadim
Date Deposited: 23 May 2025 08:56
Last Modified: 23 May 2025 08:56
Altmetrics: http://www.altmetric.com/details.php?domain=psasir.upm.edu.my&doi=10.18517/ijaseit.14.6.12408
URI: http://psasir.upm.edu.my/id/eprint/117456
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item