PARTIAL LEAST SQUARES (PLS) LATENT VARIABLE MODELLING APPROACH FOR MEASURING DURATION OF ORTHODONTIC TREATMENT

In today's society, the quest for aesthetic perfection is no longer just an aspiration of the young. As a result, there is an increasing number of adult patients who seek for orthodontic treatment to improve not only the function but the appearance of their teeth as well. Patients who are going to wear braces will be curious on how long the orthodontic treatment will take and those who complete treatment on time may be more satisfied. Therefore, this retrospective study aims to model the factors that affect the duration of orthodontic treatment using Partial Least Squares Regression. Demographic profile, patient's severity of malocclusion, treatment planning and patient compliance data are collected from patient's folders who have completed orthodontic treatment. The result from Partial Least Squares (PLS) regression indicates that twelve variables which are patient's age, patient's gender, proposed treatment planning, seven malocclusion characteristics, clinician experience and oral hygiene condition significantly contribute to the treatment duration. This study also demonstrates the application of Variable Importance for Projection (VIP) to select significant predictor variables. The final PLS model with one extracted factor explains 89.96% of the variation in the duration of orthodontic treatment.


Introduction
Orthodontic is a branch of dentistry practices that concerns with facial growth, development of the dentition and occlusion. Orthodontic also involves the diagnosis, interception and treatment of occlusal anomalies. While, orthodontic treatment is a dental treatment that corrects irregularities of the teeth and the relation of the teeth to surrounding anatomy. Treatment is usually done by braces or mechanical aids (Mitchell, 2013). In today's society, the quest for aesthetic perfection is no longer just an aspiration of the young. As a result, there are increasing numbers of adult patients who seek for orthodontic treatment to improve not only function but appearance of their teeth as well. According to Zhang (2008), over the past decade, adult patients had increased to 3.8 million in 2008 which is by 46% New orthodontic patients need information on how long they have to wear the braces because when they wear the braces, their lifestyle will change. They have to take care of their braces, type of food taken and their oral hygiene during period of wearing the braces to avoid any problem occurs to their teeth. Moreover, longer overall treatment time will increase the treatment cost which will burden the patients and for psychological reason, patient will think for the second time to go for braces. Therefore, greater understanding of the factors that affect treatment time will be useful for several reasons. It is found that orthodontic treatment time usually ranges from one to three years. However, accurate estimation of time for complete orthodontic treatment can determine the successfulness of an orthodontic practice (Mavreas et al., 2008). Keim et al. (2004) conducted a survey on orthodontic practice in 2003 and found that finishing a case in the estimated time is considered as an important practice technique (Keim et al., 2004). Longer orthodontic treatment duration may reduce the effectiveness of the treatment. Skidmore et al. (2006) also found that prediction of orthodontic treatment duration still remains an unsolvable problem since there is not much research carried on evaluating the factors that affect treatment duration. Only a few studies have attempted to evaluate these factors especially in Malaysia. Thus, the main purpose of this study is to identify the factors that influence duration of orthodontic treatment for fixed appliance orthodontic treatment and model the factors using Partial least Squares regression.

Material and Methods
This retrospective study comprises of patients who had completed orthodontic treatment in orthodontic specialist clinic in UiTM Shah Alam. Demographic profile, patient's severity of malocclusion, treatment planning and patient compliance data are collected from patient's folders. Demographic variables consist of gender and age of the patient at the start of treatment. Orthodontist records the severity of malocclusion of the patient when evaluating the study models and the patient. Patient's overjet, overbite, crowding in lower and upper arches, buccal occlusion and incisors classification are recorded. The rotation of the teeth from normal arch is evaluated by the degree of crowding for each patient. Overjet is measured by the horizontal distance between the mandibular and maxillary central incisors in occlusion using a metal ruler while overbite is measured by the overlap of the mandibular central incisors and maxillary central incisors using a metal ruler (Franklin et. al., 1996). The incisors classifications which are classes I, class II/1, class II/2 and class III are recorded by using Angle's definitions. Treatment planning and mechanics variables include whether the patient is treated by extraction or non-extraction. The number of teeth extracted when treatment start and the specialist that is assigned to each patient are also noted (Johansson & Lundström, 2012). Second molar banded depends on the position of the teeth relative to opposite second molar and the first molar and the clinical practicalities in banding the teeth. Patient compliance variables comprise of oral hygiene, wire breakage and bracket breakage. Patient's oral hygiene will be recorded as poor if there are 2 or more teeth have plaques or irritated gingival. Orthodontic treatment time which is the dependent variable in this study is recorded from the start of the treatment until the patient is debonded. Date when the initial brackets are placed is considered as the starting time of treatment. Date when the fixed appliances are removed is considered as the ending time of treatment.
This study uses Partial Least Squares (PLS) regression to analyse the data by using SAS 9.3 software since some of the independent variables are highly correlated to each other. PLS becomes mostly useful when the number of predictors is more than the number of observations and when predictor variables are highly correlated. When two independent variables are highly correlated, they may be providing the same information. This implies that the two variables are collinear. This will cause problems when fitting the regression models (Liao & Valliant, 2012). Ordinary Least Squares (OLS) regression can over-fit the data when there are many predictor variables therefore alternative regression methods with small number of extracted factors can give better prediction of new observations.
There are several steps in Partial Least Squares (PLS) Regression. First, this study chooses optimal number of latent factors through cross validation. The most important part in performing Partial Least Squares regression is to determine the number of latent components that should be added to the PLS model. A successive number of latent components may lead to over fit the model. For every additional five or six observations that are included in the training dataset, one latent component can be added in the PLS model (Rhiel et al., 2002). Number of latent factors chosen is the one that has the lowest predicted residual sum of squares (PRESS). Partial Least Squares use cross validation (CV) method to select the number of significant latent components by dividing the data set into groups. In cross validation, the data are divided into groups usually five to nine groups (Wold, 1995). The model is fitted to all groups except one, and then the capability of the model to predict responses is checked for the group omitted. This process is repeated for each group to measure overall capability of the model. There are 5 different types of cross validation which are test set validation, one-at-a-time cross validation, blocked validation, split sample cross validation and random sample cross validation. This study uses all cross-validation methods and chooses the method that produces the smallest root mean error.
After determining the significant number of latent components, this study plots factor's weights. Factor's weights are used in determining Variable Importance in Projection (VIP). VIP is a statistic that can be used to choose the most important independent variables to the model. It examines both coefficients and loadings values of the predictor variables. The predictors are important if they have large coefficients while large loading values show that the predictors are important in modelling X (Wold et al., 2001). Sometimes, when the absolute value of the coefficient is small, the contribution of the predictor variable to the model is also considered small. As a result, that variable may be deleted from the model. However, in Partial Least Squares using VIP, a predictor variable that has small coefficient value may have large VIP value. This implies that this predictor variable contributes significantly to the model and important in predicting. Therefore, the independent variable has to be kept in the model. Variable that should be deleted from the model is the one with small values of both VIP and coefficient. A VIP value is considered as small if less than 0.8 (Wold, 1995).
After dropping unimportant variables, this study checks for outliers in the data. It is important to identify the potential outliers in Partial Least Squares model because it may affect the model parameter estimates. This study uses box plot to detect outliers. Box plot provides a graphical display of a distribution such as the median, the lower and upper quartiles and also aid in detecting outliers. The difference between lower quartile, Q1 and upper quartile, Q3 is called inter quartile range, IQ. A point outside an inner fence, Q1-1.5*IQ and Q3+1.5*IQ is considered a mild outlier. A point outside an outer fence, Q1 -3*IQ and Q3 + 3*IQ is considered an extreme outlier (Elmassad, 2013).
Lastly, there are a few steps to establish the final Partial Least Squares (PLS) model. First, PLS regression produces a p by c weight matrix, W for X where X is a matrix of independent variables. Then, an n by c factor scores matrix, T is produced such that T=XW. The weights maximize the covariance between the dependent variables and the factor scores. Next, ordinary least squares regression of Y on T is performed to produce Q such that Y=TQ+E. Q is a matrix of regression coefficients for T. Once the matrix Q is computed, the beta coefficients matrix for X in the PLS model can be computed such that B=WQ. Finally, the predictive regression model Y=XB+E is complete.

Results
This study uses proc univariate to do descriptive analysis for continuous variables. Table 1 shows that average duration for orthodontic treatment in the clinic is 29.73 months. It means that the patients take 2 to 3 years to undergo the treatment. The average patients' age is 19.87 years or approximately 20 years old. It ranges from 10 to 24 years old. Table 1 also displays the average for overbite which is 3.0 while the average for overjet is 4.867. It means that patients had mild malocclusion. The average number of teeth extracted is 2.867 or approximately 2 or 3 teeth extracted for the treatment. The average number of wire breakage is 0.400 while the average bracket breakage is 0.333. PLS procedure in SAS extracts at most 15 factors by default. The amount of variation accounted by each of the 15 extracted factors for both individual and cumulative is listed by the procedure. Table 2 shows that almost all variation is explained by the first extracted factor. The first factor explains 36.42% of the variation in predictor variables and 89.54% of variation in response variable. Partial Least Squares use cross validation (CV) method to produce a number of significant factors. The number of factors chosen is the one that has minimum predicted residual sum of squares (PRESS). Based on Table 3, one-at-a time cross validation produces the smallest root means PRESS which is 0.382. To choose the fewest numbers of latent components that should be included in the model, Van der Voet's randomization-based model comparison test is performed. Based on the Van der Voet's test in Table 4, the absolute minimum root mean PRESS, 0.382 is achieved with the smallest number of factors, 1. Therefore, only one extracted factor will be included in the model. After determining the significant number of latent components, this study plots factor's weights to determine which variables are important to the model. Figure 1 shows a cluster of X-variables that are weighted at nearly zero for both factors. These variables make small contribution to the model fit and removing them may improve the model's predictive capability. Only the variables inside the box will be included in the model. To explore further which predictors can be eliminated from the analysis, this study considers the regression coefficients for the standardized data. Usually, predictors with small coefficients in absolute value make a small contribution to predict the response variable. However, predictors with small regression coefficients are not necessarily to be excluded. Another approach in summarizing the contribution of a variable is the Variable Importance in Projection (VIP). VIP can be used to choose the most important independent variables to the model. Variable that should be deleted from the model is the one with small values of both VIP and coefficient. A VIP value is considered as small if it is less than 0.8. Table 5 shows that only 9 variables have VIP values greater than 0.8 which are type I right canine, age, number of teeth extracted, type I left molar classification, male patient, overbite, type I right molar, overjet and good oral hygiene. However, type III lower arch, type I incisors classification and clinician2 are chosen because they have significantly large coefficients. Therefore, twelve variables are included in the model. Other variables that have small absolute coefficients and small VIP values are dropped from the model. Choosing the most relevant predictors lead to an improvement of R 2 in the final model. Based on Table 6, R 2 value is slightly higher in the model with twelve significant variables compared to original model with all predictor variables. Total variation in predictor variables explained by the first extracted factor increases a lot from 36.42% to 70.86% after excluding insignificant variables. Total variation in response variable explained by the model with one latent factor also increases slightly from 89.54% to 89.96%.  Table 7 shows that the reduction in root mean PRESS reduces from 0.382 to 0.3463 indicating that the generated model has about a right number of components. For the final model, only one factor is required to explain almost all of the variation in both the predictors and the responses. In order to get the final equation for PLS model, the factor score matrix, T such that T=XW whereby W is a 12x1 weight matrix for X, is computed. The weight matrix, W is computed to maximize the covariance between the predictors and response variable. Table 8 summarises the independent variable's weights. The PLS weights matrix, W is a 12x1 matrix which can be written as below: After that, the first PLS factor score, T is determined by linearly combining the standardized predictor variables using the weights matrix values, W. Hence, the factor score matrix T=XW is a 15x1 matrix which can be written as below: T = Then, this study runs ordinary least square regression of Y on T such that Y=TQ+E to produce Q, the matrix of regression coefficients (loadings) for T. Based on the regression output in Table 9, the regression coefficient, Q is 1.486. Based on the regression coefficients, the highest value is 0.497 for variable type I right canine classification, second highest is 0.485 for variable age and the third highest is 0.478 for variable number of teeth extracted. Thus, most of the variation in orthodontic treatment duration is due to these three variables. This study found that when age increases by 1 year, the orthodontic treatment time will increase by 0.485 months. It means that older patient will undergo longer treatment duration. Moreover, when number of teeth extracted increases by 1 unit, the treatment time will also increase by 0.478 months. It means that patient with teeth extracted will have longer treatment duration compared to patient with no teeth extracted.
Good treatment planning is important. This can be seen from Malocclusion variables such as overjet and overbite that are found to lengthen the treatment time by 0.423 months and 0.44 months respectively. This study also found that there is significance difference in treatment duration between male and female patient. Treatment time for male patient is 0.447 months longer than female patient. Clinician's experience also contributes to treatment duration significantly. There is also significance difference in treatment duration for type I incisors classification compared to other types. Treatment time for Type I incisors classification takes 0.348 months longer than other types. At the same time, there is also a significance difference in treatment duration for severe crowding in lower arch. Treatment time for severe crowding in lower arch takes 0.356 months longer than treatment time for mild crowding in lower arch. On the different in treatment duration for type I molar classification and type I canine classification is also found to be significant. Treatment time for Type I left molar classification takes 0.467 months longer than other types while treatment time for Type I right molar takes 0.43 months longer than other types.
Patient compliance also plays important roles. This study found that there are significance differences between patient that has good oral hygiene, fair oral hygiene and poor oral hygiene. However, the treatment time for patient with good oral hygiene found to be 0.425 months longer than patient with fair and poor oral hygiene. This finding is contradicted to previous studies that found that good oral hygiene lead to shorter treatment duration. This might be happened because most of the patients in this study have good oral hygiene but also have quite severe malocclusion problems. Therefore, for future studies, researchers have to increase the sample size because it may vary the conditions of the patient and improve reliability of the model.

Conclusion
In conclusion, although treatment time can be predicted using small number of personal characteristics and treatment planning decision, but it is crucial for the researchers to carry as many as possible investigations on the factors that cause the variation in orthodontic treatment time. From the analysis discussed in the previous chapter, it can be concluded that duration of orthodontic treatment is influenced by several patient characteristics, malocclusion characteristics, treatment plan decisions and patient compliance. This study found that 89.96% of the variation in orthodontic treatment time is explained by the 12 variables which are patient's age, patient's gender, 7 malocclusion characteristics, proposed treatment planning and oral hygiene. The severity of initial malocclusion characteristics which are overbite and overjet measurement, buccal occlusion classification in both right molars, left molar and right canine, severe crowding in lower arch and Class I incisors classification seem to play important roles in determining the duration of orthodontic treatment. This study also concluded that the treatment duration is longer as the patient's age increases. Male patient also found to have a longer treatment duration compared to female. Extraction treatments also found to take longer time than the non-extraction cases. Duration of the treatment increases when the number of teeth extracted increases. The compliance of the patients for oral hygiene also found to contribute to treatment duration. Although this finding is contradicted to previous studies, but in this study, it may be caused by small number of patients with fair and poor oral hygiene. It is possible to predict treatment time for a patient based on the characteristics and patient compliance. In conclusion, Partial Least Squares (PLS) regression becomes really useful when the number of predictors is more than the number of observations and when predictor variables are highly correlated. The final PLS model with one extracted factor explains almost all of the variations in both the predictors and the responses. For future study, researcher can use Peer Assessment Rating (PAR) index as a measurement for the severity of malocclusion at the beginning of treatment and treatment difficulty. The PAR index is an occlusal index which has been shown to be applicable and reliable in other research. The index assesses the difference in scores between pre-treatment and post treatment of the study models which measures the degree of improvement.
friends and colleagues for their support and understanding through the duration of my master studies.