What is Machine Learning in Data Science?
|A commonly asked question these days is, ‘What is machine learning in data science?’
To answer this question, let us first understand what data science is.
Data science is a broad field that encompasses data collection, data processing, data analytics, and data modelling. The first two among these (data collection and data processing) form data engineering, whereas the latter two (data analytics and data modelling) constitute machine learning.
To understand machine learning, we first need to learn about one of its predecessors, data mining. Data mining is a set of algorithms used to uncover patterns in data. These patterns then help predict something in the future. Data mining was a hot topic in the early 2000s. One of the most popular algorithms in data mining is association rule mining, which is an algorithm used to find products that are likely to sell together. This algorithm was used extensively by supermarkets to design the layout of their stores and to create marketing campaigns.
Machine learning is also a set of algorithms that are used to learn patterns in data. These patterns are then used to predict or to find something in the data. But one might wonder what is the difference between data mining and machine learning since both seem to learn patterns in data.
Machine learning goes one step further and mandates that the performance of the algorithm over whichever task it is performing should improve as it processes data related to the problem. This improvement should be continuous and automatic. Therefore, the difference between a machine learning and a data mining algorithm is the fact that machine learning algorithms should improve their performance over time as they are exposed to more and more data that is relevant to the problem.
With this basic understanding of what machine learning is, let us now look at some popular applications of machine learning to strengthen our understanding of machine learning. Machine learning algorithms and techniques are widely applied in several application areas however the two most visible and most talked about areas are computer vision (CV) and natural language processing (NLP).
Computer vision is a field that enables computers to make sense of what they capture in their cameras. Cameras have been around for a long time, but computers were unable to understand or detect objects that would appear in the images or video taken by the camera. This problem has been largely solved thanks to a machine learning algorithm called “deep neural network”. A driverless car makes extensive use of deep neural networks to detect objects in the feed captured by its camera. Deep neural networks have become so popular that they are now studied as a part of a separate subject called Deep Learning.
Just like detecting objects in an image or video was a problem for many years, detecting patterns with language either written or spoken also was a problem was many years. This field, initially called computational linguistics, is now called natural language processing or NLP. NLP has made rapid strides in recent years due to advancements made by deep neural networks trained over large datasets of text. Deep neural networks have now become very good at predicting the next word based on a few starting words called “prompt” that the user is required to give to such models. The most popular of these models is of course ChatGPT, which has taken the world by storm since its release in 2022.
Although NLP and CV are the most talked-about applications of machine learning, there are several application areas such as finance, medicine, and marketing, where machine learning is bringing a great impact in terms of improvement in the performance of predictive models. Now, it is yet to be checked whether NLP, computer vision or some other application area will have a larger impact. But we can be sure, in the aggregate, machine learning is going to change how we live over the next decade just like how the internet changed our lives.
About the Author
Dr Aditya Narvekar is the Deputy Director of the Bachelor of Data Science program at SP Jain School of Global Management. His areas of specialisation are programming languages, databases and data warehouse.