Machine Learning – A Building Block of Data Science

March 04, 2021
By Ashish Aggarwal

The term “Machine Learning” (an application of “Artificial Intelligence”) was first discovered in the early 1950s itself. But it has been gaining popularity over the last 5 years at a faster rate than any other technology in the field of computer science. What is the reason, let us find out?

Today, 1.75 MB of data is generated by every human in every second which is analysed and processed by the companies at a micro-level. Here comes data science.

Data Science is a science to deal with data i.e. extracting deep insights from the raw data through various disciplines like statistics, machine learning algorithms, AI tools etc. and these insights are used by the companies to solve their business problems and further improve their business processes.

This data is undergone through various stages and to understand where machine learning fits into data science, consider an example of the healthcare industry, these stages would be used in predicting the health of a patient and would help doctors in providing preventive measures to patients.

Identification of Business Problem

Firstly, a data scientist should be aware of the kind of business problem he is dealing with. In this case, he must identify patients who can be prone to diseases and health issues in the future.

Data Acquisition Process

After the identification of the problem, the data is collected in various file formats (Excel, CSV, or any other database), here data collected can be carrying information regarding name, age, gender, weight, height, hip-waist ratio, blood pressure, sugar level etc. This data obtained may be gathered from the logs from web servers or the company’s database.

Data Scrubbing Stage

It is in this stage that cleaning of data is done. As the data obtained is completely raw, therefore it would contain various missing values that can blur our prediction and would not help doctors to come out with exact recommendations.

Exploratory Data Analysis (EDA)

In this analysis, data scientists extract the information from the medical dataset and would try to identify the patterns and characteristics within it. For example, divide the dataset into two groups – Healthy Patients and Disease-Prone Patients. So here, significant attributes from the dataset are identified and the relationship is established between these attributes. Considering age, weight, height etc. as critical factors and would play an important role in determining the health of patients.

Data Modelling Stage

Now it is the stage of data science where Machine Learning comes into the picture. After EDA, a model is built by choosing an appropriate machine learning algorithm. Machine learning includes various algorithms i.e. regression, clustering, decision tree etc. So, depending upon the prediction or analysis which a data scientist wants, he would choose the algorithm. Therefore, for designing a machine learning model, the dataset is divided into a certain ratio randomly and the first portion of the dataset is used for training the model and the rest portion of the dataset is tested against that model for checking the performance of the model.

Visualization & Deployment of Model

After the completion and evaluation of the model, the model is visualized through various statistical and other visualization business intelligence tools. Then the model would be deployed to doctors or in the hospital to determine the health of patients.

Hence, data science is a comprehensive process with increasing importance that uses AI, machine learning and deep learning algorithms in accordance with business area and requirement.