UPM Institutional Repository

Comparison of machine learning model performance for predicting the climate variables in Johor Bahru, Malaysia


Citation

Che Rose, Farid Zamani and Rosili, Nur Aqilah Khadijah and Marsani, Muhammad Fadhil (2025) Comparison of machine learning model performance for predicting the climate variables in Johor Bahru, Malaysia. Scientific Reports, 15 (1). art. no. 23465. pp. 1-20. ISSN 2045-2322

Abstract

Accurately predicting climate variables such as air temperature, humidity and precipitation plays a crucial role in air quality management. This research aims to provide preliminary information that can shed lights to local stakeholders for climate adaptation strategies in Johor Bahru city, Malaysia. Five machine learning models were employed viz. Support Vector Regressions (SVR), Random Forest (RF), Gradient Boosting Machine (GBM), Extreme Gradient Boosting Machine (XGBoost) and Prophet to analyze the 15,888 daily time series climate data in Johor Bahru city, Malaysia. Six climate variables datasets obtained from NASA Prediction of Worldwide Energy Resources (POWER) include Temperature at 2 m (T2M), Dew/Frost Point at 2 m (T2MDEW), Wet Bulb Temperature at 2 m (T2MWET), Specific Humidity at 2 m (QV2M), Relative Humidity at 2 m (RH2M), Precipitation (PREC). Results showed that RF outperforms the other ML models in prediction performance by exhibiting the lowest error for both training and testing data. Superior results are seen for RF in fitting the training data for T2M, T2MDEW and T2MWET with R² above 90% demonstrating a strong predictive capability. RF exhibits the lowest error to predict the T2M (RMSE: 0.2182, MAE: 0.1679), T2MDEW (RMSE: 0.2291, MAE: 0.1750), T2MWET (RMSE: 0.1621, MAE: 0.1251), QH2M (RMSE: 0.3502, MAE: 0.2701) and RV2M (RMSE: 1.4444, MAE: 1.1090). RF shows particularly strong Nash–Sutcliffe efficiency (NSE) scores up to 0.94 in the training phase, especially for temperature-related variables indicating high explanatory power and stability. In contrast, SVR demonstrates superior generalization in the testing phase, with the highest Kling-Gupta Efficiency (KGE) value (0.88) confirming its reliability in out-of-sample forecasting. The findings of this research provide transparent, data-driven insights that can inform policymakers and guide the development of robust public policies and strategic investments in Johor Bahru.


Download File

[img] Text
120149.pdf - Published Version

Download (7MB)

Additional Metadata

Item Type: Article
Divisions: Faculty of Science
Institute for Mathematical Research
DOI Number: https://doi.org/10.1038/s41598-025-08033-y
Publisher: Nature Research
Keywords: Adaptation; Climate; Humidity; Machine learning; Precipitation; Prediction; Temperature
Depositing User: Ms. Zaimah Saiful Yazan
Date Deposited: 24 Sep 2025 02:02
Last Modified: 24 Sep 2025 02:02
Altmetrics: http://www.altmetric.com/details.php?domain=psasir.upm.edu.my&doi=10.1038/s41598-025-08033-y
URI: http://psasir.upm.edu.my/id/eprint/120149
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item