Ethno-Data: Introduction to My Blog

            Hello, my name is Stephen Paff. I am a data scientist and an ethnographer. The goal of this blog is to explore the integration of data science and ethnography as an exciting and innovative way to understand people, whether consumers, users, fellow employees, or anyone else.

            I want to think publicly. Ideas worth having develop in conversation, and through this blog, I hope to present my integrative vision so that others can potentially use it to develop their own visions and in turn help shape mine.

Please Note: Because my blog straddles two technical areas, I will split my posts based on how in-depth they go into each technical expertise. Many posts I will write for a general audience. I will write some posts, though, for data scientists discussing technical matters within that field, and other posts will focus on technical topics withn ethnography for anthropologists and other ethnographers. At the top of each post, I will provide the following disclaimers:

Data Science Technical Level: None, Moderate, or Advanced
Ethnography Technical Level: None, Moderate, or Advanced

Integrating Ethnography and Data Science

As a data scientist and ethnographer, I have worked on many types of research projects. In professional and business settings, I am excited by the enormous growth in both data science and ethnography but have been frustrated by how, despite recent developments that make them more similar, their respective teams seem to be growing apart and competitively against each other.

Within academia, quantitative and qualitative research methods have developed historically as distinct and competing approaches as if one has to choose which direction to take when doing research: departments or individual researchers specialize in one or the other and fight over scarce research funding. One major justification for this division has been the perception that quantitative approaches tend to be prescriptive and top-down compared with qualitative approaches which tend to be to descriptive and bottom-up. That many professional research contexts have inherited this division is unfortunate.

Recent developments in data science draw parallels with qualitative research and if anything, could be a starting point for collaborative intermingling. What has developed as “traditional” statistics taught in introductory statistics courses is generally top-down, assuming that data follows a prescribed, ideal model and asking regimented questions based on that ideal model. Within the development of machine learning been a shift towards models uniquely tailored to the data and context in question, developed and refined iteratively.[i] These trends may show signs of breaking down the top-down nature of traditional statistics work.

If there was ever a time to integrate quantitative data science and qualitative ethnographic research, it is now. In the increasingly important “data economy,” understanding users/consumers is vital to developing strategic business practices. In the business world, both socially-oriented data scientists and ethnographers are experts in understanding users/consumers, but separating them into competing groups only prevents true synthesis of their insights. Integrating the two should not just include combining the respective research teams and their projects but also encouraging researchers to develop expertise in both instead of simply specializing in one or the other. New creative energy could burst forth when we no longer treat these as distinct methodologies or specialties.


[i] Nafus, D., & Knox, H. (2018). Ethnography for a Data-Saturated World. Manchester: Manchester University Press, 11-12.

Photo credit #1: Frank V at  https://unsplash.com/photos/IFLgWYlT2fI

Photo credit #2: Arif Wahid at https://unsplash.com/photos/y3FkHW1cyBE

Why Business Anthropologists Should Reconsider Machine Learning

high angle photo of robot
Photo by Alex Knight on Pexels.com

This article is a follow-up to my previous article – “Integrating Ethnography and Data Science” – written specifically for anthropologists and other ethnographers.

As an anthropologist and data scientist, I often feel caught in the middle two distinct warring factions. Anthropologists and data scientists inherited a historic debate between quantitative and qualitative methodologies in social research within modern Western societies. At its core, this debate has centered on the difference between objective, prescriptive, top-downtechniques and subjective, sitautional, flexible, descritpive bottom-up approaches.[i] In this ensuing conflict, quantative research has been demarcated into the top-down faction and qualitative research within the bottom-up faction to the detriment of understanding both properly.

In my experience on both “sides,” I have seen a tendency among anthropologists to lump all quantitative social research as proscriptive and top-down and thus miss the important subtleties within data science and other quantitative techniques. Machine learning techniques within the field are a partial shift towards bottom-up, situational and iterative quantitative analysis, and business anthropologists should explore what data scientists do as a chance to redevelop their relationship with quantitative analysis.

Shifts in Machine Learning

Text Box: Data science is in a uniquely formative and adolescent period.

Shifts within machine learning algorithm development give impetus for incorporating quantitative techniques that are local and interpretive. The debate between top-down vs. bottom-up knowledge production does not need – or at least may no longer need– to divide quantitative and qualitative techniques. Machine learning algorithms “leave open the possibility of situated knowledge production, entangled with narrative,” a clear parallel to qualitative ethnographic techniques.[ii]

At the same time, this shift towards iterative and flexible machine learning techniques is not total within data science: aspects of top-down frameworks remain, in terms of personnel, objectives, habits, strategies, and evaluation criteria. But, seeds of bottom-up thinking definitely exist prominently within data science, with the potential to significantly reshape data science and possibly quantitative analysis in general.

As a discipline, data science is in a uniquely formative and adolescent period, developing into its “standard” practices. This leads to significant fluctuations as the data scientist community defines its methodology. The set of standard practices that we now typically call “traditional” or “standard” statistics, generally taught in introductory statistics courses, developed over a several decade period in the late nineteenth and early twentieth century, especially in Britain.[iii] Connected with recent computer technology, data science is in a similarly formative period right now – developing its standard techniques and ways of thinking. This formative period is a strategic time for anthropologists to encourage bottom-up quantative techniques.

Conclusion

Business anthropologists could and should be instrumental in helping to develop and innovatively utilize these situational and iterative machine learning techniques. This is a strategic time for business anthropologists to do the following:

  1. Immerse themselves into data science and encourage and cultivate bottom-up quantative machine learning techniques within data science
  2. Cultivate and incorporate (when applicable) situational and iterative machine learning approaches in its ethnographies

For both, anthropologists should use the strengths of ethnographic and anthropological thinking to help develop bottom-up machine learning that is grounded in flexible to specific local contexts. Each requires business anthropologists to reexplore their relationship with data science and machine learning instead of treating it as part of an opposing “methodological clan.” [iv]


[i] Nafus, D., & Knox, H. (2018). Ethnography for a Data-Saturated World. Manchester: Manchester University Press, 11-12

[ii] Ibid, 15-17.

[iii] Mackenzie, D. (1981). Statistics in Britain 1865–1930: The Social Construction of Scientific Knowledge. Edinburgh: Edinburgh University Press.

[iv] Seaver, N. (2015). Bastard Algebra. In T. Boellstorff, & B. Maurer, Data, Now Bigger and Better (pp. 27-46). Chicago: Prickly Paradigm Press, 39.

Three Key Differences between Data Science and Statistics

woman draw a light bulb in white board

Data science’s popularity has grown in the last few years, and many have confused it with its older, more familiar relative: statistics. As someone who has worked both as a data scientist and as a statistician, I frequently encounter such confusion. This post seeks to clarify some of the key differences between them.

Before I get into their differences, though, let’s define them. Statistics as a discipline refers to the mathematical processes of collecting, organizing, analyzing, and communicating data. Within statistics, I generally define “traditional” statistics as the the statistical processes taught in introductory statistics courses like basic descriptive statistics, hypothesis testing, confidence intervals, and so on: generally what people outside of statistics, especially in the business world, think of when they hear the word “statistics.”

Data science in its most broad sense is the multi-disciplinary science of organizing, processing, and analyzing computational data to solve problems. Although they are similar, data science differs from both statistics and “traditional” statistics:

DifferenceStatistics Data Science
#1 Field of Mathematics Interdisciplinary
#2 Sampled Data Comprehensive Data
#3 Confirming Hypothesis Exploratory Hypotheses

Difference #1: Data Science Is More than a Field of Mathematics

Statistics is a field of mathematics; whereas, data science refers to more than just math. At its simplest, data science centers around the use of computational data to solve problems,[i] which means it includes the mathematics/statistics needed to break down the computational data but also the computer science and engineering thinking necessary to code those algorithms efficiently and effectively, and the business, policy, or other subject-specific “smarts” to develop strategic decision-making based on that analysis.

Thus, statistics forms a crucial component of data science, but data science includes more than just statistics. Statistics, as a field of mathematics, just includes the mathematical processes of analyzing and interpreting data; whereas, data science also includes the algorithmic problem-solving to do the analysis computationally and the art of utilizing that analysis to make decisions to meet the practical needs in the context. Statistics clearly forms a crucial part of the process of data science, but data science generally refers to the entire process of analyzing computational data. On a practical level, many data scientists do not come from a pure statistics background but from a computer science or engineering, leveraging their coding expertise to develop efficient algorithmic systems.

laptop computer on glass-top table

Difference #2: Comprehensive vs Sample Data

In statistical studies, researchers are often unable to analyze the entire population, that is the whole group they are analyzing, so instead they create a smaller, more manageable sample of individuals that they hope represents the population as a whole. Data science projects, however, often involves analyzing big, summative data, encapsulating the entire population.

 The tools of traditional statistics work well for scientific studies, where one must go out and collect data on the topic in question. Because this is generally very expensive and time-consuming, researchers can only collect data on a subset of the wider population most of the time.

Recent developments in computation, including the ability to gather, store, transfer, and process greater computational data, have expanded the type of quantitative research now possible, and data science has developed to address these new types of research. Instead of gathering a carefully chosen sample of the population based on a heavily scrutinized set of variables, many data science projects require finding meaningful insights from the myriads of data already collected about the entire population.

stack of jigsaw puzzle pieces

Difference #3: Exploratory vs Confirming  

Data scientists often seek to build models that do something with the data; whereas, statisticians through their analysis seek to learn something from the data. Data scientists thus often assess their machine learning models based on how effectively they perform a given task, like how well it optimizes a variable, determines the best course of action, correctly identifies features of an image, provides a good recommendation for the user, and so on. To do this, data scientists often compare the effectiveness or accuracy of the many models based on a chosen performance metric(s).

In traditional statistics, the questions often center around using data to understand the research topic based on the findings from a sample. Questions then center around what the sample can say about the wider population and how likely its results would represent or apply to that wider population.

In contrast, machine learning models generally do not seek to explain the research topic but to do something, which can lead to very different research strategy. Data scientists generally try to determine/produce the algorithm with the best performance (given whatever criteria they use to assess how a performance is “better”), testing many models in the process. Statisticians often employ a single model they think represents the context accurately and then draw conclusions based on it.

Thus, data science is often a form of exploratory analysis, experimenting with several models to determine the best one for a task, and statistics confirmatory analysis, seeking to confirm how reasonable it is to conclude a given hypothesis or hypotheses to be true for the wider population.

A lot of scientific research has been theory confirming: a scientist has a model or theory of the world; they design and conduct an experiment to assess this model; then use hypothesis testing to confirm or negate that model based on the results of the experiment. With changes in data availability and computing, the value of exploratory analysis, data mining, and using data to generate hypotheses has increased dramatically (Carmichael 126).

Data science as a discipline has been at the forefront of utilizing increased computing abilities to conduct exploratory work.

person holding gold-colored pocket watch

Conclusion

 A data scientist friend of mine once quipped to me that data science simply is applied computational statistics (c.f. this). There is some truth in this: the mathematics of data science work falls within statistics, since it involves collecting, analyzing, and communicating data, and, with its emphasis and utilization of computational data, would definitely be a part of computational statistics. The mathematics of data science is also very clearly applied: geared towards solving practical problems/needs. Hence, data science and statistics interrelate.

They differ, however, both in their formal definitions and practical understandings. Modern computation and big data technologies have had a major influence on data science. Within statistics, computational statistics also seeks to leverage these resources, but what has become “traditional” statistics does not (yet) incorporate these. I suspect in the next few years or decades, developments in modern computing, data science, and computational statistics will reshape what people consider “traditional” or “standard” statistics to be a bit closer to the data science of today.

   For more details, see the following useful resources:

Ian Carmichael’s and J.S. Marron’s “Data science vs. statistics: two cultures?” in the Japanese Journal of Statistics and Data Science: https://link.springer.com/article/10.1007/s42081-018-0009-3
“Data Scientists Versus Statisticians” at https://opendatascience.com/data-scientists-versus-statisticians/ and https://medium.com/odscjournal/data-scientists-versus-statisticians-8ea146b7a47f
“Differences between Data Science and Statistics” at https://www.educba.com/data-science-vs-statistics/

Photo credit #1: Andrea Piacquadio at https://www.pexels.com/photo/woman-draw-a-light-bulb-in-white-board-3758105/

Photo credit #2: Carlos Muza at https://unsplash.com/photos/hpjSkU2UYSU

Photo credit #3: Hans-Peter Gauster at https://unsplash.com/photos/3y1zF4hIPCg

Photo credit #4: Kendall Lane at https://unsplash.com/photos/yEDhhN5zP4o


[i] Carmichael 118.

Three Situations When Ethnography Is Useful in a Professional Setting

This is a follow-up to my previous article, “What Is Ethnography,” outlining ways ethnography is useful in professional settings.

To recap, I defined ethnography as a research approach that seeks “to understand the lived experiences of a particular culture, setting, group, or other context by some combination of being with those in that context (also called participant-observation), interviewing or talking with them, and analyzing what is produced in that context.”

Ethnography is a powerful tool, developed by anthropologists and other social scientists over the course of several decades. Here are three types of situations in professional settings when I have found to use ethnography to be especially powerful:

1. To see the given product and/or people in action
2. When brainstorming about a design
3. To understand how people navigate complex, patchwork processes

Situation #1: To See the Given Product and/or People in Action

Ethnography allows you to witness people in action: using your product or service, engaging in the type of activity you are interested, or in whatever other situation you are interested in studying.

Many other social science research methods involve creating an artificial environment in which to observe how participants act or think in. Focus groups, for example, involve assembling potential customers or users into a room: forming a synthetic space to discuss the product or service in question, and in many experimental settings, researchers create a simulated environment to control for and analyze the variables or factors they are interested in.

Ethnography, on the other hand, centers around observing and understanding how people navigate real-world settings. Through it, you can get a sense for how people conduct the activity for which you are designing a product or service and/or how people actually use your product or service.

For example, if you want to understand how people use GPS apps to get around, one can see how people use the app “in the wild:” when rushing through heavy traffic to get to a meeting or while lost in the middle of who knows where. Instead of hearing their processed thoughts in a focus group setting or trying to simulate the environment, you can witness what the tumultuousness yourself and develop a sense for how to build a product that helps people in those exact situations.

Situation #2: When Brainstorming about a New Product Design

Ethnography is especially useful during the early stages of designing a product or service, or during a major redesign. Ethnography helps you scope out the needs of your potential customers and how they approach meeting said needs. Thus, it helps you determine how to build a product or service that addresses those needs in a way that would make sense for your users.

During such initial stages of product design, ethnography helps determine the questions you should be asking. Many have a tendency during these initial stages to construct designs based on their own perception of people’s needs and desires and miss what the customers’ or users’ do in fact need and desire. Through ethnography, you ground your strategy in the customers’ mindsets and experiences themselves.

The brainstorming stages of product development also require a lot of flexibility and adaptability: As one determines what the product or service should become, one must be open to multiple potential avenues. Ethnography is a powerful tool for navigating such ambiguity. It centers you on the users, their experiences and mindsets, and the context which they might use the product or service, providing tools to ask open-ended questions and to generate new and helpful ideas for what to build.

Situation #3: To Understand How People Navigate Complex, Patchwork Processes

At a past company, I analyzed how customer service representatives regularly used the various software systems when talking with customers. Over the years, the company had designed and bought various software programs, each to perform a set of functions and with unique abilities, limitations, and quirks. Overtime, this created a complex web of interlocking apps, databases, and interfaces, which customer service representatives had to navigate when performing their job of monitoring customer’s accounts. Other employees described the whole scene as the “Wild West:” each customer service representative had to create their own way to use these software systems while on the phone with a (in many cases disgruntled) customer.

Many companies end up building such patchwork systems – whether of software, of departments or teams, of physical infrastructure, or something else entirely – built by stacking several iterations of development overtime until, they become a hydra of complexity that employees must figure out how to navigate to get their work done.

Ethnography is a powerful tool for making sense of such processes. Instead of relying on official policies for how to conduct various actions and procedures, ethnography helps you understand and make sense of the unofficial and informal strategies people use to do what they need. Through this, you can get a sense for how the patchwork system really works. This is necessary for developing ways to improve or build open such patchwork processes.

In the customer service research project, my task was to develop strategies to improve the technology customer service representatives used as they talked with customers. Seeing how representatives used the software through ethnographic research helped me understand and focus the analysis on their day-to-day needs and struggles.

Conclusion

Ethnography is a powerful tool, and the business world and other professional settings have been increasingly realizing this (c.f. this and this ). I have provided three circumstances where I have personally found ethnography to be invaluable. Ethnography allows you to experience what is happening on the ground and through that to shape and inform the research questions we ask and recommendations or products we build for people in those contexts.

Photo credit #1: DariusSankowski at https://pixabay.com/photos/navigation-car-drive-road-gps-1048294/

Photo credit #2: AbsolutVision at https://unsplash.com/photos/82TpEld0_e4

Photo credit #3: Tony Wan at https://unsplash.com/photos/NSXmh14ccRU

Anthropologist in I.T. (Comic, Funny)

Here’s a fun little comic about some of my experiences working as an anthropologist in I.T. It’s actually a blast.

I wrote this comic for the University of Memphis Anthropology Department, where they featured it on their Fall 2018 newsletter.

Thank you, Rusty Haner, for illustrating the panels.

When Is Machine Learning Useful?

In a past blog post, I defined and described what machine learning is. I briefly highlighted four instances where machine learning algorithms are useful. This is what I wrote:

  1. Autonomy: To teach computers to do a task without the direct aid/intervention of humans (e.g. autonomous vehicles)
  2. Fluctuation: Help machines adjust when the requirements and data change over time
  3. Intuitive Processing: Conduct or assist in tasks humans do but are unable to explain how computationally/algorithmically (e.g. image recognition)
  4. Big Data: Breaking down data that is too large to handle otherwise

The goal of this blog post is to explain each in more detail.

Case #1: Autonomy

Car, Automobile, 3D, Self-Driving

The first major use of machine learning centers around teaching computers to do a task or tasks without the direct aid or intervention of humans. Self-driving vehicles are a high-profile example of this: teaching a vehicle to drive (scanning the road and determining how to respond to what is around it) without the aid of or with minimal direct oversight from a human driver.

There are two types basic types of tasks that machine learning systems might perform autonomously:

  1. Tasks humans frequently perform
  2. Tasks humans are unable to perform.

Self-driving cars exemplify the former: humans drive cars, but self-driving cars would perform all or part of the driving process. Another example would be chatbots and virtual assistants like Alexa, Cortana, and Ok Google, which seek to converse with users independently. Such tasks might completely or partially complete the human activity: for example, some customer service chatbots are designed to determine the customer’s issue but then to transfer to a human when the issue has a certain complexity.

Humans have also sought to build autonomous machine learning algorithms to perform tasks that humans are unable to perform. Unlike self-driving cars, which conduct an activity many people do, people might also design a self-driving rover or submarine to drive and operate in a world that humans have so far been unable to inhabit, like other planets in our Solar System or the deep ocean. Search engines are another example: Google uses machine learning to help refine search results, which involves analyzing a massive amount of web data beyond what a human could normally do.

Case #2: Fluctuating Data

Business, Success, Curve, Hand, Draw, Present, Trend

Machine learning is also powerful tool for making sense of and incorporating fluctuating data. Unlike other types of models with fixed processes for how it predicts its values, machine learning models can learn from current patterns and adjust both if the patterns fluctuate overtime or if new use cases arise. This can be especially helpful when trying to forecast the future, allowing the model to decipher new trends if and when they emerge. For example, when predicting stock prices, machine learning algorithms can learn from new data and pick up changing trends to make the model better at predicting the future.

Of course, humans are notorious for changing overtime, so fluctuation is often helpful in models that seek to understand human preferences and behavior. For example, user recommendations – like Netflix’s, Hulu’s, or YouTube’s video recommendation systems – adjust based on the usage overtime, enabling them to respond to individual and/or collective changes in interests.

Case #3: Intuitive Processing

Flat, Recognition, Facial, Face, Woman, System

Data scientist frequently develop machine learning algorithms to teach computers how to do processes that humans do naturally but for which we are unable to fully explain how computationally. For example, popular applications of machine learning center around replicating some aspect of sensory perception: image recognition, sound or speech recognition, etc. These replicate the process of inputting sensory information (e.g. sight and sound) and processing, classifying, and otherwise making sense of that information. Language processing, like chatbots, form another example of this. In these contexts, machine learning algorithms learn a process that humans can do intuitively (see or hear stimuli and understand language) but are unable to fully explain how or why.

Many early forms of machine learning arose out of neurological models of how human brains work. The initial intention of neural nets, for instance, were to model our neurological decision-making process or processes. Now, much contemporary neurological scholarship since has disproven the accuracy of neural nets in representing how our brains and minds work.[i] But, whether they represent how human minds work at all, neural networks have provided a powerful technique for computers to use to process and classify information and make decisions. Likewise, many machine learning algorithms replicate some activity humans do naturally, even if the way they conduct that human task has little to do with how humans would.

Case #4: Big Data

Technology, 5G, Aerial, Abstract Background

Machine learning is a powerful tool when analyzing data that is too large to break down through conventional computational techniques. Recent computer technologies have increased the possibility of data collection, storage, and processing, a major driver in big data. Machine learning has arisen as a major, if not the major, means of analyzing this big data.

Machine learning algorithms can manage a dizzying array of variables and use them to find insightful patterns (like lasso regression for linear modeling). Many big data cases involve hundreds, thousands, and maybe even tens or hundreds of thousands of input variables, and many machine learning techniques (like best subsets selection, stepwise selection, and lasso regression) process the myriads of variables in big data and determine the best ones to use. 

Recent developments computing provides the incredible processing power necessary to do such work (and debatably, machine learning is currently helping to push computational power and provide a demand for greater computational abilities). Hand-calculations and computers several decades ago were often unable to handle the calculations necessary to analyze large information: demonstrated, for example, by the fact that computer scientists invented the now popular neural networks many decades ago, but they did not gain popularity as a method until recent computer processing made them easy and worthwhile to run.

Tractors and other large-scale agricultural techniques coincided historically with the enlargement of farm property sizes, where the such machinery not only allowed farmers to manage large tracks of land but also incentivized larger farms economically. Likewise, machine learning algorithms provide the main technological means to analyze big data, both enabling and in turn incentivized by rise of big data in the professional world.

Conclusion

Here I have described four major uses of machine learning algorithms. Machine learning has become popular in many industries because of at least one of these functionalities, but of course, they are not the only potential current uses. In addition, as we develop machine learning tools, we are constantly inventing more. Given machine learning’s newness compared to many other century-old technologies, time will tell all the ways humans utilize it.

Photo credit #1: Mike MacKenzie at https://www.flickr.com/photos/mikemacmarketing/30212411048/

Photo credit #2: julientromeur at https://pixabay.com/illustrations/car-automobile-3d-self-driving-4343635/

Photo credit #3: geralt at https://pixabay.com/illustrations/business-success-curve-hand-draw-1989130/

Photo credit #4: geralt at https://pixabay.com/illustrations/flat-recognition-facial-face-woman-3252983/

Photo credit #5: mohamed_hassan at https://pixabay.com/illustrations/technology-5g-aerial-4816658/


[i] See Richard, Nagyfi. The differences between Artificial and Biological Neural Networks. 4 September 2018. https://towardsdatascience.com/the-differences-between-artificial-and-biological-neural-networks-a8b46db828b7; and Tcheang, Lili. Are Artificial Neural Networks like the Human Brain? And does it matter? 7 November 2018. https://medium.com/digital-catapult/are-artificial-neural-networks-like-the-human-brain-and-does-it-matter-3add0f029273.

What Is Ethnography: A Short Description for the Unsure

What is ethnography, and how has it been used in the professional world? This article is a quick and dirty crash course for someone who has never heard of (or knows little about) ethnography.

Anthropology at its most basic is the study of human cultures and societies. Cultural anthropologists generally seek to understand current cultures and societies by conducting ethnography.

In short, ethnography involves seeking to understand the lived experiences of a particular culture, setting, group, or other context by some combination of being with those in that context (called participant-observation), interviewing or talking with them, and analyzing what happens and what is produced in that context.

It is an umbrella term for a set of methods (including participant-observation, interviews, group interviews or focus groups, digital recording, etc.) employed with that goal, and most ethnographic projects use some subset of these methods given the needs of the specific project. In this sense, it is similar to other umbrella methodologies – like statistics – in that it encapsulates a wide array of different techniques depending on the context.

two woman chatting

One conducts ethnographic research to understand something about the lived experiences of a context. In the professional world, for example, ethnography is frequently useful in the following contexts:

  1. Market Research: When trying to understand customers and/or users in-depth
  2. Product Design: When trying to design or modify a product by seeing how people use it in action
  3. Organizational Communication and Development: When trying to understand a “people problem” within an organization.

In this article, I expound in more detail on situations where ethnographic research is useful in in professional settings.

Ethnographies are best understood through examples, so the table below include excellent example ethnographies and ethnographic researchers in various industries/fields:

Project Area
Computer Technology Development at Intel Market Research
Vacuum CMarket Research Examples Market Research
Psychiatric Wards in Healthcare Organizational Management
Self-Driving Cars at Nissan Artificial Intelligence
Training of Ethnography in Business Schools Education of Ethnography

These, of course, are not the only some situations where ethnography might be helpful. Ethnography is a powerful tool to develop a deep understanding of others’ experiences and to develop innovative and strategic insights.

Photo credit #1: Paolo Nicolello at https://unsplash.com/photos/hKVg7ldM5VU.

Photo credit #2: mentatdgt at https://www.pexels.com/photo/two-woman-chatting-1311518/.

What Is Data Science and Machine Learning? A Short Guide for the Unsure

 What is data science, and what is machine learning? This is a short overview for someone who has never heard of either.

What Is Data Science?

 In the abstract, data science is an interdisciplinary field that seeks to use algorithms to organize, process, and analyze data. It represents a shift towards using computer programing, specifically machine learning algorithms, and other, related computational tools to process and analyze data.

By 2008, companies starting using the term data scientists to refer to a growing group of professionals utilizing advanced computing to organize and analyze large datasets,[i] and thus from the get-go, the practical needs of professional contexts have shaped the field. Data science combines strands from computer science, mathematics (particularly statistics and linear algebra), engineering, the social sciences, and several other fields to address specific real-world data problems.

On a practical level, I consider a data scientist someone who helps develop machine learning algorithms to analyze data. Machine learning algorithms form the central techniques/tools around what constitutes data science. For me personally, if it does not involve machine learning, it is not data science.

What Is Machine Learning?

 Machine learning is a complex term: What to say that a machine “learns”? Overtime data scientists have provided many intricate definitions of machine learning, but its most basic, machine learning algorithms are algorithms that adapt/modify how their approach to a task based on new data/information overtime.

Herbert Simon provides a commonly used technical definition: “Learning denotes changes in the system that are adaptive in the sense that they enable the system to do the task or tasks drawn from the same population more efficiently and more effectively the next time.”[ii] As this definition implies, machine learning algorithms adapt by iteratively testing its performance against the same or similar data. Data scientists (and others) have developed several types of machine learning algorithms, including decision tree modeling, neural networks, logistic regression, collaborative filtering, support vector machines, cluster analysis, and reinforcement learning among others.

Data scientists generally split machine learning algorithms into two categories: supervised and unsupervised learning. Both involve training the algorithm to complete a given task but differ on how they test the algorithm’s performance. In supervised learning, the developer(s) provide a clear set of answers as a basis for whether the prediction is correct; while for unsupervised learning, whether the algorithm’s performance is much more open-ended. I liken the difference to be like the exams teachers gave us in school: some tests, like multiple choice exams, have clear, right and wrong answers or solutions, but other exams, like essays, are open-ended with qualitative means of determining goodness. Just like the nature of the curriculum determines the best type of exam, which type of learning to performs depends on the project context and nature of the data.

Here are four instances where machine learning algorithms are useful in these types of tasks:

  1. Autonomy: To teach computers to do a task without the direct aid/intervention of humans (e.g. autonomous vehicles)
  2. Fluctuation: Help machines adjust when the requirements or data change over time
  3. Intuitive Processing: Conduct (or assist in) tasks humans do naturally but are unable to explain how computationally/algorithmically (e.g. image recognition)
  4. Big Data: Breaking down data that is too large to handle otherwise

Machine learning algorithms have proven to be a very powerful set of tools. See this article for a more detailed discussion of when machine learning is useful.


[i] Berkeley School of Information. (2019). What is Data Science? Retrieved from https://datascience.berkeley.edu/about/what-is-data-science/.

[ii] Simon in Kononenko, I., & Kukar, M. (2007). Machine Learning and Data Mining. Elsevier: Philadelphia.

Photo credit #1: Frank V at https://unsplash.com/photos/zbLW0FG8XU8

Photo credit #2: Brett Jordan at https://unsplash.com/photos/HzOclMmYryc