Machine Learning Archives - Page 2 of 4

Ethno-Data: Introduction to My Blog

Hello, my name is Stephen Paff. I am a data scientist and an ethnographer. The goal of this blog is to explore the integration of data science and ethnography as an exciting and innovative way to understand people, whether consumers, users, fellow employees, or anyone else.

I want to think publicly. Ideas worth having develop in conversation, and through this blog, I hope to present my integrative vision so that others can potentially use it to develop their own visions and in turn help shape mine.

Please Note: Because my blog straddles two technical areas, I will split my posts based on how in-depth they go into each technical expertise. Many posts I will write for a general audience. I will write some posts, though, for data scientists discussing technical matters within that field, and other posts will focus on technical topics withn ethnography for anthropologists and other ethnographers. At the top of each post, I will provide the following disclaimers:

Data Science Technical Level:	None, Moderate, or Advanced
Ethnography Technical Level:	None, Moderate, or Advanced

Integrating Ethnography and Data Science

As a data scientist and ethnographer, I have worked on many types of research projects. In professional and business settings, I am excited by the enormous growth in both data science and ethnography but have been frustrated by how, despite recent developments that make them more similar, their respective teams seem to be growing apart and competitively against each other.

Within academia, quantitative and qualitative research methods have developed historically as distinct and competing approaches as if one has to choose which direction to take when doing research: departments or individual researchers specialize in one or the other and fight over scarce research funding. One major justification for this division has been the perception that quantitative approaches tend to be prescriptive and top-down compared with qualitative approaches which tend to be to descriptive and bottom-up. That many professional research contexts have inherited this division is unfortunate.

Recent developments in data science draw parallels with qualitative research and if anything, could be a starting point for collaborative intermingling. What has developed as “traditional” statistics taught in introductory statistics courses is generally top-down, assuming that data follows a prescribed, ideal model and asking regimented questions based on that ideal model. Within the development of machine learning been a shift towards models uniquely tailored to the data and context in question, developed and refined iteratively.[i] These trends may show signs of breaking down the top-down nature of traditional statistics work.

If there was ever a time to integrate quantitative data science and qualitative ethnographic research, it is now. In the increasingly important “data economy,” understanding users/consumers is vital to developing strategic business practices. In the business world, both socially-oriented data scientists and ethnographers are experts in understanding users/consumers, but separating them into competing groups only prevents true synthesis of their insights. Integrating the two should not just include combining the respective research teams and their projects but also encouraging researchers to develop expertise in both instead of simply specializing in one or the other. New creative energy could burst forth when we no longer treat these as distinct methodologies or specialties.

[i] Nafus, D., & Knox, H. (2018). Ethnography for a Data-Saturated World. Manchester: Manchester University Press, 11-12.

Photo credit #1: Frank V at https://unsplash.com/photos/IFLgWYlT2fI

Photo credit #2: Arif Wahid at https://unsplash.com/photos/y3FkHW1cyBE

Why Business Anthropologists Should Reconsider Machine Learning

high angle photo of robot — Photo by Alex Knight on Pexels.com

This article is a follow-up to my previous article – “Integrating Ethnography and Data Science” – written specifically for anthropologists and other ethnographers.

As an anthropologist and data scientist, I often feel caught in the middle two distinct warring factions. Anthropologists and data scientists inherited a historic debate between quantitative and qualitative methodologies in social research within modern Western societies. At its core, this debate has centered on the difference between objective, prescriptive, top-downtechniques and subjective, sitautional, flexible, descritpive bottom-up approaches.[i] In this ensuing conflict, quantative research has been demarcated into the top-down faction and qualitative research within the bottom-up faction to the detriment of understanding both properly.

In my experience on both “sides,” I have seen a tendency among anthropologists to lump all quantitative social research as proscriptive and top-down and thus miss the important subtleties within data science and other quantitative techniques. Machine learning techniques within the field are a partial shift towards bottom-up, situational and iterative quantitative analysis, and business anthropologists should explore what data scientists do as a chance to redevelop their relationship with quantitative analysis.

Shifts in Machine Learning

Text Box: Data science is in a uniquely formative and adolescent period.

Shifts within machine learning algorithm development give impetus for incorporating quantitative techniques that are local and interpretive. The debate between top-down vs. bottom-up knowledge production does not need – or at least may no longer need– to divide quantitative and qualitative techniques. Machine learning algorithms “leave open the possibility of situated knowledge production, entangled with narrative,” a clear parallel to qualitative ethnographic techniques.[ii]

At the same time, this shift towards iterative and flexible machine learning techniques is not total within data science: aspects of top-down frameworks remain, in terms of personnel, objectives, habits, strategies, and evaluation criteria. But, seeds of bottom-up thinking definitely exist prominently within data science, with the potential to significantly reshape data science and possibly quantitative analysis in general.

As a discipline, data science is in a uniquely formative and adolescent period, developing into its “standard” practices. This leads to significant fluctuations as the data scientist community defines its methodology. The set of standard practices that we now typically call “traditional” or “standard” statistics, generally taught in introductory statistics courses, developed over a several decade period in the late nineteenth and early twentieth century, especially in Britain.[iii] Connected with recent computer technology, data science is in a similarly formative period right now – developing its standard techniques and ways of thinking. This formative period is a strategic time for anthropologists to encourage bottom-up quantative techniques.

Conclusion

Business anthropologists could and should be instrumental in helping to develop and innovatively utilize these situational and iterative machine learning techniques. This is a strategic time for business anthropologists to do the following:

Immerse themselves into data science and encourage and cultivate bottom-up quantative machine learning techniques within data science
Cultivate and incorporate (when applicable) situational and iterative machine learning approaches in its ethnographies

For both, anthropologists should use the strengths of ethnographic and anthropological thinking to help develop bottom-up machine learning that is grounded in flexible to specific local contexts. Each requires business anthropologists to reexplore their relationship with data science and machine learning instead of treating it as part of an opposing “methodological clan.” [iv]

[i] Nafus, D., & Knox, H. (2018). Ethnography for a Data-Saturated World. Manchester: Manchester University Press, 11-12

[ii] Ibid, 15-17.

[iii] Mackenzie, D. (1981). Statistics in Britain 1865–1930: The Social Construction of Scientific Knowledge. Edinburgh: Edinburgh University Press.

[iv] Seaver, N. (2015). Bastard Algebra. In T. Boellstorff, & B. Maurer, Data, Now Bigger and Better (pp. 27-46). Chicago: Prickly Paradigm Press, 39.

The Promises and Failures of Current Artificial Intelligence Technology: An Interview with Gemma Clavell at Eticas (Part 1 of 3)

I spoke with Gemma Galdon-Clavell, founder of Eticas Foundation and Eticas Consulting about the social implications of artificial intelligence technologies. In this first part, we discussed the policy strategies for ensuring that our data and artificial intelligence systems built on our data are good quality, safe, and accountable.

Here are Part 2 and Part 3 of the interview.

Dr. Gemma Galdon-Clavell is a leading voice on technology ethics and algorithmic accountability. She is the founder and CEO of Eticas, where her multidisciplinary background in the social, ethical, and legal impact of data-intensive technology allows her and her team to design and implement practical solutions to data protection, ethics, explainability, and bias challenges in AI. She has conceived and architected the Algorithmic Audit Framework which now serves as the foundation for Eticas’s flagship product, the Algorithmic Audit.

To learn more about Gemma’s and Eticas’s work:

For more context on my interview series in general, click here.

The Promises and Failures of Current Artificial Intelligence Technology: An Interview with Gemma Clavell at Eticas (Part 2 of 3)

Here is the second part of three in my conversation with Gemma Clavell. We compared various corporate models – good and bad – for artificial intelligence and how to foster responsible corporate practices in this field.

Here is Part 1 and Part 3 of our interview.

To learn more about Gemma’s and Eticas’s work:

For more context on my interview series in general, click here.

The Promises and Failures of Current Artificial Intelligence Technology: An Interview with Gemma Clavell at Eticas (Part 3 of 3)

This is the third and final part of three in our conversation. In Part 3, she described the skills and types of people necessary to build and assess artificial intelligence teams.

Here is Part 1 and Part 2 of our interview.

To learn more about Gemma’s and Eticas’s work:

For more context on my interview series in general, click here.

EPIC Data Scientists + Ethnographers Group

I recently organized a professional group called EPIC Data Scientists + Ethnographers along with a few others who are both data scientists and ethnographers. Our goal is to form a virtual community to discuss ways to incorporate ethnography and data science, just like I strive to do on this website.

If you are interested in working with others on this or simply interested in learning more, feel free to join. Whether you are both a data scientist and ethnographer, only one of them, or neither, we would love to hear your perspective.

Thank you, EPIC, for helping to develop this and giving us a platform.

Photo credit: deepak pal at https://www.flickr.com/photos/158301585@N08/46085930481/

Resources on Integrating Data Science and Ethnography

Here is a list of resources about integrating data science and ethnography. Even though it is an up and coming field without a consistent list of publications, several fascinating and insightful resources do exist.

If there are any resources about integrating data science and ethnography that you have found useful, feel free to share them as well.

General Overviews:

Curran, John. “Big Data or ‘Big Ethnographic Data’? Positioning Big Data within the Ethnographic Space.” EPIC (2013). (Found here: https://www.epicpeople.org/big-data-or-big-ethnographic-data-positioning-big-data-within-the-ethnographic-space/)
Patel, Neal. “For a Ruthless Criticism of Everything Existing: Rebellion Against the Quantitative-Qualitative Divide.” EPIC (2013): 43-60.
Nick Seaver. “Bastard Algebra.” Boellstorff, Tom and Bill Maurer. Data, Now Bigger and Better. Chicago: Prickly Paradigm Press, 2015. 27-46.
Slobin, Adrian and Todd Cherkasky. “Ethnography in the Age of Analytics.” EPIC (2010).
Nafus, Dawn and Tye Rattenbury. Data Science and Ethnography: What’s Our Common Ground, and Why Does It Matter? 7 3 2018. <https://www.epicpeople.org/data-science-and-ethnography/>.
Nick Seaver. “The nice thing about context is that everyone has it.” Media, Culture & Society (2015).

Books:

Nafus, Dawn and Hannah Knox. Ethnography for a Data-Saturated World. Manchester: Manchester Univeristy Press, 2018.
Boellstorff, Tom and Bill Maurer. Data, Now Bigger and Better! Chicago: Prickly Paradigm Press, 2015.
Mackenzie, Adrian. Machine Learners: Archaeology of a Data Practice. Cambridge: The MIT Press, 2017.

Examples and Case Studies:

“Autonomous Drive: Teaching Cars Human Behaviour” by Melissa Cefkin on the Youtube Channel DrivingTheNation: https://www.youtube.com/watch?v=6koKuDegHAM
Eslami, Motahhare, et al. “First I “like” it, then I hide it: Folk Theories of Social Feeds.” Curation and Algorithms (2016).
Giaccardi, Elisa, Chris Speed and Neil Rubens. “Things Making Things: An Ethnography of the Impossible.” (2014).

Elish, M. “The Stakes of Uncertainty: Developing and Integrating Machine Learning in Clinical Care.” EPIC (2018).
Madsen, Matte My, Anders Blok and Morten Axel Pedersen. “Transversal collaboration: an ethnography in/of computational social science.” Nafus, Dawn. Ethnography for a Data-saturated World. Manchester: Manchester Univeristy Press, 2018.
Thomas, Suzanne, Dawn Nafus and Jamie Sherman. “Algorithms as fetish: Faith and possibility in algorithmic work.” Big Data & Society (2018): 1-11.

Articles and Blog Posts:

“An Engineering Anthropologist: Why tech companies need to hire software developers with ethnographic skills” by Astrid Countee: http://ethnographymatters.net/blog/2016/06/22/an-engineering-anthropologist-why-tech-companies-need-to-hire-software-developers-with-ethnographic-skills/
“Cross-disciplinary Insights Teams: Integrating Data Scientists and User Researchers at Spotify” by Sara Belt and Peter Gilks: https://www.epicpeople.org/cross-disciplinary-insights-teams-integrating-data-scientists-and-user-researchers-at-spotify/
“Data is a stakeholder” by Schaun Wheeler: https://towardsdatascience.com/data-is-a-stakeholder-31bfdb650af0
“Why Big Data Needs Thick Data” by Tricia Wang: https://medium.com/ethnography-matters/why-big-data-needs-thick-data-b4b3e75e3d7

My Own Articles on This Website:

Podcasts and Lectures:

“Computational Anthropology: Quali-quantitative Analyses of Attention Economies during the Covid-19 Lockdown” by Morten Axel Pedersen: https://www.material.city/recordings/mortenaxelpedersen
“Human-Driven Machine Learning with Saleema Amershi”: https://datastori.es/115-human-driven-machine-learning-with-saleema-amershi/#t=29:00.204
“Welcome to Dataworld, by Alexander Taylor”: https://player.fm/series/camthropod/episode-13-welcome-to-dataworld-by-alex-taylor
“Machine Learning for Artists with Gene Kogan”: https://datastori.es/114-machine-learning-for-artists-with-gene-kogan/#t=34:28.738

Ethical Considerations:

“Caroline Sinders on Ethical Product Design for Machine Learning”: https://design.blog/2017/03/23/caroline-sinders-on-ethical-product-design-for-machine-learning/
“The Trouble with Bias” by Kate Crawford: https://www.youtube.com/watch?v=fMym_BKWQzk
“Justice for ‘Data Janitors’” by Lilly Irani: http://www.publicbooks.org/justice-for-data-janitors/
Elish, Madeleine. “Moral Crumple Zones: Cautionary Tales in Human-Robot Interaction.” Engaging Science, Technology, and Society (2019).
boyd, danah and Kate Crawford. “Critical Questions for Big Data: Provocations for a Cultural, Technological, and Scholarly Phenomenon.” Information, Communication, & Society (2012): 662-679.

Four Innovative Projects that Integrated Data Science and Ethnography

In a previous article, I have discussed the value of integrating data science and ethnography. On LinkedIn, people commented that they were interested and wanted to hear more detail on potential ways to do this. I replied, “I have found explaining how to conduct studies that integrate the two practically is easier to demonstrate through example than abstractly since the details of how to do it vary based on the specific needs of each project.”

In this article, I intend to do exactly that: analyze four innovative projects that in some way integrated data science and ethnography. I hope these will spur your creative juices to help think through how to creatively combine them for whatever project you are working on.

Synopsis:

Project:	How It Integrated Data Science and Ethnography:	Link to Learn More:
No Show Model	Used ethnography to design machine learning software	https://ethno-data.com/show-rate-predictor/
Cybersensitivity Study	Used machine learning to scale up the scope of an ethnographic inquiry to a larger population	https://ethno-data.com/masters-practicum-summary/
Facebook Newsfeed Folk Theories	Used ethnography to understand how users make sense of and behave towards a machine learning system they encounter and how this, in turn, shapes the development of the machine learning algorithm(s)	https://dl.acm.org/doi/10.1145/2858036.2858494
Thing Ethnography	Used machine learning to incorporate objects’ interactions into ethnographic research	https://dl.acm.org/doi/10.1145/2901790.2901905 and https://www.semanticscholar.org/paper/Things-Making-Things%3A-An-Ethnography-of-the-Giaccardi-Speed/2db5feac9cc743767fd23aeded3aa555ec8683a4?p2df

Project 1: No Show Model

A medical clinic at a hospital system in New York City asked me to use machine learning to build a show rate predictor in order to inform an improve its scheduling practices. During the initial construction phase, I used ethnography to both understand in more depth understand the scheduling problem the clinic faced and determine an appropriate interface design.

Through an ethnographic inquiry, I discovered the most important question(s) schedulers ask when scheduling their appointments. This was, “Of the people scheduled for a given doctor on a particular day, how many of them are likely to actually show up?” I then built a machine learning model to answer this exact question. My ethnographic inquiry provided me the design requirements for the data science project.

In addition, I used my ethnographic inquiries to design the interface. I observed how schedulers interacted with their current scheduling software, which gave me a sense for what kind of visualizations would work or not work for my app.

This project exemplifies how ethnography can be helpful both in the development stage of a machine learning project to determine machine learning algorithm(s) needs and on the frontend when communicating the algorithm(s) to and assessing its successfulness with its users.

As both an ethnographer and a data scientist, I was able to translate my ethnographic insights seamlessly into machine learning modeling and API specifications and also conducted follow-up ethnographic inquiries to ensure that what I was building would meet their needs.

Project 2: Cybersensitivity Study

I conducted this project with Indicia Consulting. Its goal was to explore potential connections between individuals’ energy consumption and their relationship with new technology. This is an example of using ethnography to explore and determine potential social and cultural patterns in-depth with a few people and then using data science to analyze those patterns across a large population.

We started the project by observing and interviewing about thirty participants, but as the study progressed, we needed to develop a scalable method to analyze the patterns across whole communities, counties, and even states.

Ethnography is a great tool for exploring a phenomenon in-depth and for developing initial patterns, but it is resource-intensive and thus difficult to conduct on a large group of people. It is not practical for saying analyzing thousands of people. Data science, on the other hand, can easily test the validity across an entire population of patterns noticed in smaller ethnographic studies, yet because it often lacks the granularity of ethnography, would often miss intricate patterns.

Ethnography is also great on the back end for determining whether the implemented machine learning models and their resulting insights make sense on the ground. This forms a type of iterative feedback loop, where data science scales up ethnographic insights and ethnography contextualizes data science models.

Thus, ethnography and data science cover each other’s weaknesses well, forming a great methodological duo for projects centered around trying to understand customers, users, colleagues, or other users in-depth.

Project 3: Facebook Newsfeed Folk Theories

In their study, Motahhare Eslami and her team of researchers conducted an ethnographic inquiry into how various Facebook users conceived of how the Facebook Newsfeed selects which posts/stories rise to the top of their feeds. They analyze several different “folk theories” or working theories by everyday people for the criteria this machine learning system uses to select top stories.

How users think the overall system works influences how they respond to the newsfeed. Users who believe, for example, that the algorithm will prioritize the posts of friends for whom they have liked in the past will often intentionally like the posts of their closest friends and family so that they can see more of their posts.

Users’ perspectives on how the Newsfeed algorithm works influences how they respond to it, which, in turn, affects the very data the algorithm learns from and thus how the algorithm develops. This creates a cyclic feedback loop that influences the development of the machine learning algorithmic systems over time.

Their research exemplifies the importance of understanding how people think about, respond to, and more broadly relate with machine learning-based software systems. Ethnographies into people’s interactions with such systems is a crucial way to develop this understanding.

In a way, many machine learning algorithms are very social in nature: they – or at least the overall software system in which they exist – often succeed or fail based on how humans interact with them. In such cases, no matter how technically robust a machine learning algorithm is, if potential users cannot positively and productively relate to it, then it will fail.

Ethnographies into the “social life” of machine learning software systems (by which I mean how they become a part of – or in some cases fail to become a part of – individuals’ lives) helps understand how the algorithm is developing or learning and determine whether they are successful in what we intended them to do. Such ethnographies require not only in-depth expertise in ethnographic methodology but also an in-depth understanding how machine learning algorithms work to in turn understand how social behavior might be influencing their internal development.

Project 4: Thing Ethnography

Elise Giaccardi and her research team have been pioneering the utilization of data science and machine learning to understand and incorporate the perspective of things into ethnographies. With the development of the internet of things (IOT), she suggests that the data from object sensors could provide fresh insights in ethnographies of how humans relate to their environment by helping to describe how these objects relate to each other. She calls this thing ethnography.

This experimental approach exemplifies one way to use machine learning algorithms within ethnographies as social processes/interactions in of themselves. This could be an innovative way to analyze the social role of these IOT objects in daily life within ethnographic studies. If Eslami’s work exemplifies a way to graft ethnographic analysis into the design cycle of machine learning algorithms, Giaccardi’s research illustrates one way to incorporate data science and machine learning analysis into ethnographies.

Conclusion

Here are four examples of innovative projects that involve integrating data science and ethnography to meet their respective goals. I do not intend these to be the complete or exhaustive account of how to integrate these methodologies but as food for thought to spur further creative thinking into how to connect them.

For those who, when they hear the idea of integrating data science and ethnography, ask the reasonable question, “Interesting but what would that look like practically?”, here are four examples of how it could look. Hopefully, they are helpful in developing your own ideas for how to combine them in whatever project you are working on, even if its details are completely different.

Photo credit #1: StartupStockPhotos at https://pixabay.com/photos/startup-meeting-brainstorming-594090/

Photo credit #2: DarkoStojanovicat at https://pixabay.com/photos/medical-appointment-doctor-563427/

Photo credit #3: NASA at https://unsplash.com/photos/Q1p7bh3SHj8

Photo credit #4: Kon Karampelas at https://unsplash.com/photos/HUBofEFQ6CA

Photo credit #5: Pixabay at https://www.pexels.com/photo/app-business-connection-device-221185/

Three Key Differences between Data Science and Statistics

Data science’s popularity has grown in the last few years, and many have confused it with its older, more familiar relative: statistics. As someone who has worked both as a data scientist and as a statistician, I frequently encounter such confusion. This post seeks to clarify some of the key differences between them.

Before I get into their differences, though, let’s define them. Statistics as a discipline refers to the mathematical processes of collecting, organizing, analyzing, and communicating data. Within statistics, I generally define “traditional” statistics as the the statistical processes taught in introductory statistics courses like basic descriptive statistics, hypothesis testing, confidence intervals, and so on: generally what people outside of statistics, especially in the business world, think of when they hear the word “statistics.”

Data science in its most broad sense is the multi-disciplinary science of organizing, processing, and analyzing computational data to solve problems. Although they are similar, data science differs from both statistics and “traditional” statistics:

Difference	Statistics	Data Science
#1	Field of Mathematics	Interdisciplinary
#2	Sampled Data	Comprehensive Data
#3	Confirming Hypothesis	Exploratory Hypotheses

Difference #1: Data Science Is More than a Field of Mathematics

Statistics is a field of mathematics; whereas, data science refers to more than just math. At its simplest, data science centers around the use of computational data to solve problems,[i] which means it includes the mathematics/statistics needed to break down the computational data but also the computer science and engineering thinking necessary to code those algorithms efficiently and effectively, and the business, policy, or other subject-specific “smarts” to develop strategic decision-making based on that analysis.

Thus, statistics forms a crucial component of data science, but data science includes more than just statistics. Statistics, as a field of mathematics, just includes the mathematical processes of analyzing and interpreting data; whereas, data science also includes the algorithmic problem-solving to do the analysis computationally and the art of utilizing that analysis to make decisions to meet the practical needs in the context. Statistics clearly forms a crucial part of the process of data science, but data science generally refers to the entire process of analyzing computational data. On a practical level, many data scientists do not come from a pure statistics background but from a computer science or engineering, leveraging their coding expertise to develop efficient algorithmic systems.

Difference #2: Comprehensive vs Sample Data

In statistical studies, researchers are often unable to analyze the entire population, that is the whole group they are analyzing, so instead they create a smaller, more manageable sample of individuals that they hope represents the population as a whole. Data science projects, however, often involves analyzing big, summative data, encapsulating the entire population.

The tools of traditional statistics work well for scientific studies, where one must go out and collect data on the topic in question. Because this is generally very expensive and time-consuming, researchers can only collect data on a subset of the wider population most of the time.

Recent developments in computation, including the ability to gather, store, transfer, and process greater computational data, have expanded the type of quantitative research now possible, and data science has developed to address these new types of research. Instead of gathering a carefully chosen sample of the population based on a heavily scrutinized set of variables, many data science projects require finding meaningful insights from the myriads of data already collected about the entire population.

Difference #3: Exploratory vs Confirming

Data scientists often seek to build models that do something with the data; whereas, statisticians through their analysis seek to learn something from the data. Data scientists thus often assess their machine learning models based on how effectively they perform a given task, like how well it optimizes a variable, determines the best course of action, correctly identifies features of an image, provides a good recommendation for the user, and so on. To do this, data scientists often compare the effectiveness or accuracy of the many models based on a chosen performance metric(s).

In traditional statistics, the questions often center around using data to understand the research topic based on the findings from a sample. Questions then center around what the sample can say about the wider population and how likely its results would represent or apply to that wider population.

In contrast, machine learning models generally do not seek to explain the research topic but to do something, which can lead to very different research strategy. Data scientists generally try to determine/produce the algorithm with the best performance (given whatever criteria they use to assess how a performance is “better”), testing many models in the process. Statisticians often employ a single model they think represents the context accurately and then draw conclusions based on it.

Thus, data science is often a form of exploratory analysis, experimenting with several models to determine the best one for a task, and statistics confirmatory analysis, seeking to confirm how reasonable it is to conclude a given hypothesis or hypotheses to be true for the wider population.

A lot of scientific research has been theory confirming: a scientist has a model or theory of the world; they design and conduct an experiment to assess this model; then use hypothesis testing to confirm or negate that model based on the results of the experiment. With changes in data availability and computing, the value of exploratory analysis, data mining, and using data to generate hypotheses has increased dramatically (Carmichael 126).

Data science as a discipline has been at the forefront of utilizing increased computing abilities to conduct exploratory work.

person holding gold-colored pocket watch

Conclusion

A data scientist friend of mine once quipped to me that data science simply is applied computational statistics (c.f. this). There is some truth in this: the mathematics of data science work falls within statistics, since it involves collecting, analyzing, and communicating data, and, with its emphasis and utilization of computational data, would definitely be a part of computational statistics. The mathematics of data science is also very clearly applied: geared towards solving practical problems/needs. Hence, data science and statistics interrelate.

They differ, however, both in their formal definitions and practical understandings. Modern computation and big data technologies have had a major influence on data science. Within statistics, computational statistics also seeks to leverage these resources, but what has become “traditional” statistics does not (yet) incorporate these. I suspect in the next few years or decades, developments in modern computing, data science, and computational statistics will reshape what people consider “traditional” or “standard” statistics to be a bit closer to the data science of today.

For more details, see the following useful resources:

Ian Carmichael’s and J.S. Marron’s “Data science vs. statistics: two cultures?” in the Japanese Journal of Statistics and Data Science: https://link.springer.com/article/10.1007/s42081-018-0009-3

“Data Scientists Versus Statisticians” at https://opendatascience.com/data-scientists-versus-statisticians/ and https://medium.com/odscjournal/data-scientists-versus-statisticians-8ea146b7a47f

“Differences between Data Science and Statistics” at https://www.educba.com/data-science-vs-statistics/

Photo credit #1: Andrea Piacquadio at https://www.pexels.com/photo/woman-draw-a-light-bulb-in-white-board-3758105/

Photo credit #2: Carlos Muza at https://unsplash.com/photos/hpjSkU2UYSU

Photo credit #3: Hans-Peter Gauster at https://unsplash.com/photos/3y1zF4hIPCg

Photo credit #4: Kendall Lane at https://unsplash.com/photos/yEDhhN5zP4o

[i] Carmichael 118.

Share this:

Share this:

Shifts in Machine Learning

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Synopsis:

Project 1: No Show Model

Project 2: Cybersensitivity Study

Project 3: Facebook Newsfeed Folk Theories

Project 4: Thing Ethnography

Conclusion

Share this:

Difference #1: Data Science Is More than a Field of Mathematics

Difference #2: Comprehensive vs Sample Data

Difference #3: Exploratory vs Confirming

Conclusion

Share this: