Tree-based Machine Learning in Classifying Reverse Migration

  • Azreen Anuar
  • Nur Huzeima Mohd Hussain
  • Hugh Byrd


Reverse migration is an increasingly urgent issue as it is influenced by various factors such as economic crises, political turmoil, natural disasters, and the COVID-19 pandemic. Predicting reverse migration can provide valuable insights for policymakers and stakeholders to design appropriate interventions. However, there is a scarcity of studies that have applied machine learning algorithms to this problem. This paper aims to fill the gap in the literature by discussing the application of machine learning algorithms for predicting reverse migration. The study compares the performance of three types of tree-based machine learning (Decision Tree, Random Forest, Gradient Boosted Trees) with linear-based algorithms (Logistic Regression, Fast Last Margin, Generalized Linear Model). In addition to accuracy, this study also measured the area under the curve (AUC) metric, which has been seldom explored in previous research of reverse migration prediction. The findings revealed that tree-based machine learning algorithms performed slightly better than linear-based algorithms in terms of accuracy of prediction, with an improvement of approximately 1%. Based on the accuracy and AUC results, Gradient Boosted Trees is selected as the best algorithm. The findings of this study suggest that machine learning can provide valuable insights into predicting reverse migration. With the use of appropriate machine learning algorithms, policymakers and stakeholders can make more informed decisions to address the challenges posed by reverse migration.


[1] A. H. Attia and A. M. Said, “Brain seizures detection using machine learning classifiers based on electroencephalography signals: a comparative study,” Indones. J. Electr. Eng. Comput. Sci., vol. 27, no. 2, pp. 803–810, 2022, doi: 10.11591/ijeecs.v27.i2.pp803-810.
[2] R. A. Rahman, S. Masrom, and N. Omar, “Tax avoidance detection based on machine learning of malaysian government-linked companies,” Int. J. Recent Technol. Eng., vol. 8, no. 2 Special Issue 11, 2019, doi: 10.35940/ijrte.B1083.0982S1119.
[3] P. H. Damia Abd Samad, S. Mutalib, and S. Abdul-Rahman, “Analytics of stock market prices based on machine learning algorithms,” Indones. J. Electr. Eng. Comput. Sci., vol. 16, no. 2, pp. 1050–1058, 2019, doi: 10.11591/ijeecs.v16.i2.pp1050-1058.
[4] M. Micevska, “Revisiting forced migration: A machine learning perspective,” Eur. J. Polit. Econ., vol. 70, p. 102044, 2021, doi: 10.1016/j.ejpoleco.2021.102044.
[5] M. Caulfield, J. Bouniol, S. J. Fonte, and A. Kessler, “How rural out-migrations drive changes to farm and land management: A case study from the rural Andes,” Land use policy, vol. 81, pp. 594–603, 2019, doi:
[6] C. Robinson and B. Dilkina, “A machine learning approach to modeling human migration,” in Proceedings of the 1st ACM SIGCAS Conference on Computing and Sustainable Societies, 2018, pp. 1–8. doi: 10.1145/3209811.3209868.
[7] C. Kern, T. Klausch, and F. Kreuter, “Tree-based machine learning methods for survey research,” in Survey research methods, 2019, vol. 13, no. 1, p. 73.
[8] C. Halimu, A. Kasem, and S. H. S. Newaz, “Empirical comparison of area under ROC curve (AUC) and Mathew correlation coefficient (MCC) for evaluating machine learning algorithms on imbalanced datasets for binary classification,” in Proceedings of the 3rd international conference on machine learning and soft computing, 2019, pp. 1–6.
[9] A. V Joshi, “Decision Trees,” Machine Learning and Artificial Intelligence. Springer, pp. 53–63, 2020.
[10] R. S. Olson and J. H. Moore, “TPOT: A Tree-Based Pipeline Optimization Tool for Automating Machine Learning,” Automated Machine Learning: Methods, Systems, Challenges. Springer International Publishing, pp. 151–160, 2019. doi: 10.1007/978-3-030-05318-5_8.
[11] A. S. A. Rahman, S. Masrom, R. A. Rahman, and R. Ibrahim, “Rapid Software Framework for the Implementation of Machine Learning Classification Models,” Int. J. Emerg. Technol. Adv. Eng., vol. 11, pp. 8–18, 2021, doi: 10.46338/ijetae0821_02.
[12] J. Li and Y. Chen, “A deep learning method for solving third-order nonlinear evolution equations,” Commun. Theor. Phys., vol. 72, no. 11, p. 115003, 2020, doi: 10.1371/journal.pone.0232414.
[13] C. Kadar and I. Pletikosa, “Mining large-scale human mobility data for long-term crime prediction,” EPJ Data Sci., vol. 7, no. 1, pp. 1–27, 2018, doi: 10.1140/epjds/s13688-018-0150-z.
[14] M. Luca, G. Barlacchi, B. Lepri, and L. Pappalardo, “A survey on deep learning for human mobility,” ACM Comput. Surv., vol. 55, no. 1, pp. 1–44, 2021, doi: 10.1145/3485125.
[15] P. K. Shukla et al., “A novel machine learning model to predict the staying time of international migrants,” Int. J. Artif. Intell. Tools, vol. 30, no. 02, p. 2150002, 2021, doi: 10.1142/S0218213021500020.
[16] R. Iliyasu and I. Etikan, “Comparison of quota sampling and stratified random sampling,” Biom. Biostat. Int. J. Rev, vol. 10, pp. 24–27, 2021.
How to Cite
ANUAR, Azreen; MOHD HUSSAIN, Nur Huzeima; BYRD, Hugh. Tree-based Machine Learning in Classifying Reverse Migration. Mathematical Sciences and Informatics Journal, [S.l.], v. 4, n. 1, p. 49-56, may 2023. ISSN 2735-0703. Available at: <>. Date accessed: 29 feb. 2024. doi:

Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.