Web Scraping and Regression Analysis based on Machine Learning for COVID-19 with Rapid Software Platform
Since the recent incidence of global COVID-19 pandemic, expertise from different domains including scientists, clinicians, and healthcare experts keep on exploring for technologies to manage the COVID-19 data. Updated and accurate data collection is very critical for them to make a more effective and efficient decision on any aspects of the emergency consequences and events. Although some of them are inexpert data scientists, the important skills and knowledges to extract the recent data on COVID-19 is web data extraction and analysis. While tremendous of literature can be referred from the academic databases, it is difficult to find the report that presents the basis and fundamental methods for implementing web data analysis in a simple way with a rapid software platform. This paper demonstrates a simple framework for implementing web data extraction or web scraping to be analyzed in a rapid software platform. Python scripting language is the simple tool to conduct the web scraping method while RapidMiner is the rapid software for implementing the data visualization and analysis. Simple linear regression based on machine learning approach has been implemented with the RapidMiner to predict COVID-19 death based on the collected data. This paper will be useful for academicians and industry practitioners to conduct a more robust data analysis to accommodate a more challenge issue such as big data analytics in any domains.