What is data science, and what is machine learning? This is a short overview for someone who has never heard of either.
What Is Data Science?
In the abstract, data science is an interdisciplinary field that seeks to use algorithms to organize, process, and analyze data. It represents a shift towards using computer programing, specifically machine learning algorithms, and other, related computational tools to process and analyze data.
By 2008, companies starting using the term data scientists to refer to a growing group of professionals utilizing advanced computing to organize and analyze large datasets,[i] and thus from the get-go, the practical needs of professional contexts have shaped the field. Data science combines strands from computer science, mathematics (particularly statistics and linear algebra), engineering, the social sciences, and several other fields to address specific real-world data problems.
On a practical level, I consider a data scientist someone who helps develop machine learning algorithms to analyze data. Machine learning algorithms form the central techniques/tools around what constitutes data science. For me personally, if it does not involve machine learning, it is not data science.
What Is Machine Learning?
Machine learning is a complex term: What to say that a machine “learns”? Overtime data scientists have provided many intricate definitions of machine learning, but its most basic, machine learning algorithms are algorithms that adapt/modify how their approach to a task based on new data/information overtime.
Herbert Simon provides a commonly used technical definition: “Learning denotes changes in the system that are adaptive in the sense that they enable the system to do the task or tasks drawn from the same population more efficiently and more effectively the next time.”[ii] As this definition implies, machine learning algorithms adapt by iteratively testing its performance against the same or similar data. Data scientists (and others) have developed several types of machine learning algorithms, including decision tree modeling, neural networks, logistic regression, collaborative filtering, support vector machines, cluster analysis, and reinforcement learning among others.
Data scientists generally split machine learning algorithms into two categories: supervised and unsupervised learning. Both involve training the algorithm to complete a given task but differ on how they test the algorithm’s performance. In supervised learning, the developer(s) provide a clear set of answers as a basis for whether the prediction is correct; while for unsupervised learning, whether the algorithm’s performance is much more open-ended. I liken the difference to be like the exams teachers gave us in school: some tests, like multiple choice exams, have clear, right and wrong answers or solutions, but other exams, like essays, are open-ended with qualitative means of determining goodness. Just like the nature of the curriculum determines the best type of exam, which type of learning to performs depends on the project context and nature of the data.
Here are four instances where machine learning algorithms are useful in these types of tasks:
- Autonomy: To teach computers to do a task without the direct aid/intervention of humans (e.g. autonomous vehicles)
- Fluctuation: Help machines adjust when the requirements or data change over time
- Intuitive Processing: Conduct (or assist in) tasks humans do naturally but are unable to explain how computationally/algorithmically (e.g. image recognition)
- Big Data: Breaking down data that is too large to handle otherwise
Machine learning algorithms have proven to be a very powerful set of tools. See this article for a more detailed discussion of when machine learning is useful.
[i] Berkeley School of Information. (2019). What is Data Science? Retrieved from https://datascience.berkeley.edu/about/what-is-data-science/.
[ii] Simon in Kononenko, I., & Kukar, M. (2007). Machine Learning and Data Mining. Elsevier: Philadelphia.
Photo credit #1: Frank V at https://unsplash.com/photos/zbLW0FG8XU8
Photo credit #2: Brett
Jordan at https://unsplash.com/photos/HzOclMmYryc
One thought on “What Is Data Science and Machine Learning? A Short Guide for the Unsure”