FORECASTING THE UNEMPLOYMENT RATE IN MALAYSIA DURING COVID-19 PANDEMIC USING ARIMA AND ARFIMA MODELS

The unemployment issue is one of the most common problems faced by many countries around the world. The unemployment rates in developed countries often fluctuate throughout time. Similarly, Malaysia is also affected by the inconsistent unemployment rate especially during the COVID-19 pandemic. Therefore, in order to understand the trend better, ARIMA and ARFIMA were used to model and forecast the unemployment rate in Malaysia in this study. The dataset on the unemployment rate in Malaysia from January 2010 until July 2021 was obtained from Bank Negara Malaysia (BNM) official portal. The best time series models found were ARIMA (2, 1, 2) and ARFIMA (0, −0.2339, 0). The performance of the models was evaluated using mean absolute percentage error (MAPE), mean absolute error (MAE) and root mean square error (RMSE). It appeared that the ARFIMA model emerged as a better forecast model since it had better performance compared to ARIMA in forecasting the unemployment rate in Malaysia.


Introduction
The unemployment issue is one of the most common problems faced by many countries around the world. Unemployment has affected the standard of living of all groups of people and negatively affecting the country's economy. The effect of unemployment may result in negative impacts not only on individuals but also the community and the government. According to the Department of Statistics Malaysia (DOSM) (2020), the unemployment rate refers to the proportion of the unemployed population to the total population in the labour force. Meanwhile, the term unemployment means when unemployed persons are available for work and actively searching for a job but are unable to find one. Unemployment measures the health of the economy, which means that the higher the unemployment rate is, the more the negative impact on the labour market will be.
Coronavirus disease, also known as COVID-19, is a dangerous and contagious disease caused by a newly identified coronavirus (World Health Organization, 2020). In Malaysia, a huge number of people have become jobless as a result of the COVID-19 pandemic. The highest unemployment rate for the year 2020 was 5.3% in May, then reduced to 4.6% by September. However, the unemployment rate began to rise again in October and November 2020, reaching 4.7% in October and 4.8% in November (Tan, 2021). The unemployment rate in Malaysia recorded in February 2021 was 4.8% which was lower compared to 4.9% in January 2021, with an approximate decline from 782,500 persons to 777,500 persons based on the recent article by DOSM (Azman, 2021). Unemployment was a major economic issue with substantial negative social consequences that caused individuals to become poor (Didiharyono & Syukri, 2020). Many are still searching for jobs and worrying about losing their jobs. If the unemployment issue is not resolved, it may contribute to an increase in the number of jobless graduates, negatively harming society and the country (Hossain et al., 2018). Future projections of the unemployment rate are crucial for economic policy in identifying, planning, and controlling any continuing growth in the unemployment rate in the nation, including Malaysia.
In this context, it becomes even more important to be able to provide future predictions of the unemployment rate in Malaysia. Time series models are used in this study to comprehend the patterns and trends in particular events that occurred over time for the purpose of predictions. The time series will then be used to forecast future trends using these patterns (Farrelly, 2017). One common type of univariate forecasting model used in various studies is the autoregressive integrated moving average (ARIMA) model. This type of modelling is essentially flexible and was commonly used to analyse data series, forecasting, and control. According to Perone (2020), this model was straightforward and easy to fit. Furthermore, it can provide an initial understanding of the pattern of the unemployment rate. This method employs an iterative approach to determining the most appropriate model or the best model among all potential models (Didiharyono & Bakhtiar, 2018).
Research using the ARIMA model is extensively utilized by scholars, such as this study, which investigated the unemployment rates in Indonesia (Mahmudah, 2017), unemployment rates in South Sulawesi (Didiharyono & Syukri, 2020), and unemployment rates in Malaysia (Ramli et al., 2018) (Nor et al., 2018) (Lip et al., 2021). In a study by Ramli et al. (2018), the ARIMA (2,1,2) model was shown to be the most accurate in forecasting the unemployment rate in Malaysia using the dataset from 1960 until 2016. Meanwhile, a recent time series study by Lip et al. (2021) on forecasting the unemployment rate in Malaysia, which used the dataset from January 2012 until December 2018, revealed that the most appropriate model was the ARIMA (2,1,3) model in the Box-Jenkins method since it indicated the smallest value of all error measures compared to other models.
However, in certain situations, time series data also have a long memory relation, particularly during recessions and pandemic diseases. With a view to explaining the longevity of unemployment rates, it would be better to deploy a class of long-memory models. For this situation, the method that also can be applied is the autoregressive fractionally integrated moving average (ARFIMA) model. According to Peerajit et al. (2018), the ARFIMA model allows non-integer values of the differencing parameter in the presence of long memory which has been utilized in numerous researches. It is also called as 'long memory' process since it has the slow rate of convergences of their autocorrelation functions to zero as the lag increases (Brockwell and Davis, 2016). In a recent study in the United States, the long memory characteristics were found in the COVID-19 cases, and it concluded that the impact of pandemic diseases has a long-lasting effect on the dynamics of unemployment (Monge, 2021). A similar study was also done in Nigeria, which revealed that the case of unemployment in Nigeria has a long memory because the degree of mean reversion falls within the range where the parameter d was found to be 0.4 (Tule et al., 2018). Other than that, the ARFIMA model was said to be a reliable forecasting device for the unemployment rate in Japan (Kurita, 2010). Since there is a lack of study using the ARFIMA model in the unemployment rate in Malaysia and new data has become available, modelling and predicting the unemployment rate using both the ARIMA (updating the model) and the ARFIMA models will be the focus of this study.
The rest of the paper is organized as follows. Section 2 explains the dataset used in this study as well as the methodology for the ARIMA and ARFIMA models and their time series properties. Section 3 discusses the major findings on the unemployment rate that compares the forecasting performance of both models. Finally, Section 4 concludes the paper and proposes some recommendations for further research.

Methodology
The dataset of this study, the time series model used for this study, and the time series modelling procedure are presented in this section.

The Dataset
This study used the unemployment rate data that were obtained from Bank Negara Malaysia (BNM)'s official portal which was the monthly data from January 2010 until December 2020. The monthly data of the unemployment rate in Malaysia consisted of 132 observations, as shown in Figure 1. According to Figure 1, the unemployment rate in Malaysia began at 3.5% in January 2010 and then varied wildly from month to month between two lines which were 2.5% and 3.5%, until January 2014. Since then, the unemployment rate has gradually decreased to the lowest level until August 2014 and remained constant at 2.7% until November 2014. In July 2015, the unemployment rate started to fluctuate again between 3% and 3.5%. Then, the unemployment rate rose and peaked at 5.3% during May 2020. This was due to the COVID-19 pandemic, where the government had no other choice but to implement the MCO (Tan, 2021). The graph then showed a decrease to 4.6% by September 2020 and started to rise again in October 2020 to 4.7% and 4.8% in November 2020.

Time Series Model
The univariate time series models used, ARIMA (p, d, q) and ARFIMA (p, d, q) model, will be discussed in this part, followed by the time series modelling procedure for both the ARIMA and ARFIMA models.

The ARIMA (p, d, q) model
An ARIMA model is defined as a combination function of autoregressive (AR) and moving average (MA) models. This model is used for presenting stationary series. The general term of this model is ARIMA (p, d, q), where p and q represented the order of AR process and MA process, respectively, while the d represented the number of times the variable needs to use differencing approach to achieve stationary. Because of the non-negative integer, which is also defined as d, the ARIMA model is known as a short memory model. Then { } is an ARIMA is a causal ARMA (p, q) process. The ARIMA model is given by Equation (1), where { } ~ (0, 2 ). { } is a white noise sequence with mean 0 and variance 2 while is denoted as a time series value at time t. ( ) and ( ) are polynomials of degrees p and q, respectively, and ( ) ≠ 0 for | | ≤ 1 while B is a backward shift operator. The process is stationary if and only if d = 0, in which case it reduces to an ARMA (p, q) process (Brockwell and Davis, 2016).

The ARFIMA (p, d, q) model
An ARFIMA model is the stationary process with a much more slowly decreasing autocorrelation function (ACF). In ARFIMA (p, d, q), where p and q represented the AR order and MA order, respectively. The ARFIMA (p, d, q) process is a generalized form of the ARIMA process, in which the d form integer value switches to decimal form in order to capture long memory. ARFIMA (p, d, q) processes with |d| < 0.5 and the model is defined as in Equation (2), where { } ~ (0, 2 ). { } is a white noise sequence with mean 0 and variance 2 . ( ) and ( ) are polynomials of degrees p and q, respectively, satisfying ( ) ≠ 0 and ( ) ≠ 0 for all such that | | ≤ 1 while B is the backward shift operator (Brockwell & Davis, 2016).

Time Series Modelling Procedure
For the purpose of modelling, a total data of 139 observations of the unemployment rate in Malaysia were divided into two parts. In order to fit the ARIMA and ARFIMA models, the first 132 observations (95% of the total data) from January 2010 until December 2020 were used. The remaining 7 observations (5% of the total data) from January to July 2021 were used to validate the models. The accuracy of the forecasted values was assessed using mean absolute percentage error (MAPE), mean absolute error (MAE), and root mean square error (RMSE) (Lazim, 2018). The equations of this accuracy checking are given in Equations (3), (4) and (5).
where are the actual observed value and � are the predicted values while is the number of predicted values. The sample ACF plot can be used to assess whether the data has a long-memory. The plot in Figure 2 shows the presence of long memory due to the slowly decreasing autocorrelation function, as explained in Section 2.2.2 above. However, according to Bhansali and Kokoszka (2003), the long memory process, on the other hand, is associated with the short memory process. An appropriate short memory modelling technique, in fact, can also be applied for forecasting series with long memory characteristics. Therefore, the long memory process can also be modelled using the ARIMA model.
In order to fit the time series models, an assumption needs to be satisfied, which is the stationarity of the data. If a series fluctuates randomly around some fixed values, it is said to be stationary. These fixed values could be the series' mean value, constant values, or even zero values (Lazim, 2018). The series is not stationary if there exists a trend or seasonality. Thus, differencing needs to be performed on non-stationary series to remove the non-stationarity in time series. Sample PACF Some simple tests are used to determine if the residuals of the measured data are independent and evenly distributed random variables. If the hypothesis is not rejected, the theory of stationary processes will be developed in order to identify a suitable model. Therefore, the Portmanteau test of Ljung-Box will be used to determine the existence of autocorrelation in time series (Brockwell & Davis, 2016). Another portmanteau test developed by McLeod and Li (1983) may also be used to evaluate the independent and identically distributed (iid) hypothesis. If the corresponding p-value is larger than the significant value, the hypothesis is not rejected.
A Box-Cox transformation with the parameter value ( = 0) was used to stabilize the variability in the series (Brockwell & Davis, 2016). The Box-Cox transformation is denoted as follows: When the variability of the data grows or reduces with the level, the transformation procedures become useful. The variability may commonly be made almost constant by selecting the right . In particular, the variability can be stabilised by choosing = 0 for positive data whose standard deviation increases linearly with level (Box and Cox, 1964). The plot in Figure 1 shows non-stationary data with a linear trend. In order to deal with trend and seasonality, an additional ARIMA process called 'differencing' was needed to eliminate trends and seasonal components (Shaadan et al., 2019). Therefore, differencing at Lag 1 was applied in this study. The mean was then subtracted to produce a zero-mean stationary model, as shown in Figure 3. The 'Autofit' function in ITSM2000 software was used to fit the ARIMA (p, q) model by specifying the upper and lower limits for p and q. The maximum likelihood estimates of ∅ and for any fixed values of p and q were the values that minimize the Akaike's Information Criterion (AIC). The smaller the AICC value, the better the model (Brockwell & Davis, 2016). The best model based on the lowest AICC statistic value can be defined as given in Equations (6) and (7), where L= The Gaussian Likelihood for an ARMA process: (∅, , σ 2 ) = 1 The ACF and PACF of the residuals are used to assume that the fitted model is appropriate. In particular, the observed residuals in the ACF and PACF graphs should lie within the bounds of (±1.96/√ ) roughly 95% of the time to be considered as an appropriate model (Brockwell & Davis, 2016). If more than 5% of the correlations lie outside the boundaries, then the model is not fitted. Hence, the residual ACF and partial autocorrelation function (PACF) plots displayed in Figure 4 suggest that the fitted model was appropriate as all the spikes lie within the two horizontal boundaries. Next, the test of randomness was performed to test the validation of the model before forecasting the unemployment rate. The fitted model would pass the test if the p-value for the Ljung-Box statistic or McLeod-Li statistics larger than the significance level of α = 0.05. The null hypothesis of both Ljung-Box and McLeod-Li tests assumed that the data were independently distributed. Hence, the failure to reject the null hypothesis was required in order to make sure that the residuals of the time series model were independent (Brockwell & Davis, 2016). Finally, the fitted model was applied to predict the unemployment rate for seven months ahead which was from January 2021 to July 2021. The same steps were used to fit the ARFIMA model, like fitting the ARIMA model. However, an additional step was needed to fit the ARFIMA model because the value of non-integer d needs to be specified first (Mah et al., 2018). Therefore, the specification of the model as a fractionally integrated model was done by setting the non-integer d value to be between −0.5 and 0.5.

Results
This section presents a monthly forecast of Malaysia's unemployment rate using the ARIMA and ARFIMA models from January 2021 to July 2021.

The ARIMA (p, d, q) Model
Based on the lowest AICC value obtained using the 'autofit' option in ITSM2000, ARIMA (2,1,2) model appeared to be the best model in its class to predict the unemployment rate in Malaysia. The model is given by, where the white noise variance, {Z t } ~ WN (0.002956) and the AICC value for this model was −379.434. This model is valid based on the Ljung-Box statistics with a p-value of 0.82939 which is greater than the significance level of α = 0.05. The actual values, forecasted values, and the approximate 95% prediction bounds for the unemployment in Malaysia using the ARIMA (2,1,2) model are shown in Table 1 and Figure 5 below.

The ARFIMA (p, d, q) Model
According to the lowest AICC value, ARFIMA (0, −0.2339,0) was found to be the best ARFIMA model to predict the unemployment rate in Malaysia. The model is given by,  Table 2 and Figure 6 below.  and Together with Their 95% Forecast Boundaries from January until July 2021.

Comparison Between the ARIMA and ARFIMA Models
The comparison of forecast accuracy between ARIMA (2,1,2) and ARFIMA (0, −0.2339,0) models in predicting the unemployment rate in Malaysia was evaluated based on the MAPE, MAE, and RMSE values. The MAPE, MAE, and RMSE values for both ARIMA (2,1,2) and ARFIMA (0, −0.2339,0) models are tabulated in Table 3.  Table 3 shows the value for MAPE, MAE, and RMSE for the unemployment rate in Malaysia. It is evident that ARFIMA (0, −0.2339,0) model performs better in predicting the unemployment rate in Malaysia since it has lower values for MAPE, MAE, and RMSE compared to ARIMA (2,1,2).

Conclusion and Recommendation
This paper aims to forecast the unemployment rate in Malaysia using two time series models.
The performances for each model were checked by considering the lowest MAE, RMSE, and MAPE values. The lowest AICC value was used to select the best time series models in its class where both the ARIMA (2,1,2) and ARFIMA (0, −0.2339,0) fulfilled the selection criterion. The ARFIMA model provided a more accurate depiction of Malaysia's unemployment rate compared to that of the ARIMA model. This finding was in line with the prior study by Kurita (2010), which suggested that ARFIMA was certified as an adequate forecasting technique for forecasting the unemployment rate compared to the AR model. Monge (2021) also discovered a long memory characteristic in unemployment data during pandemic disease, which suggests the appropriateness of using the ARFIMA model. The findings presented in this study may benefit the policymakers and economic analysts in better understanding the context of unemployment during and after the coronavirus crisis by comparing previous economic and financial crises with the case of the most recent sub-periods referring to a pandemic. In this study, both the ARIMA and ARFIMA models were useful in forecasting the unemployment rate in Malaysia. However, when unforeseen occurrences happened, like that of the COVID-19 pandemic, the existing models may no longer be appropriate. Therefore, we suggest an intervention time series model be considered to analyse the effect of sudden events on time series data for future study. Since there were not many previous studies done on time series modelling on the unemployment rate, we suggest that other time series models such as Holt's model be included for further analysis.