When Is Machine Learning Useful?

In a past blog post, I defined and described what machine learning is. I briefly highlighted four instances where machine learning algorithms are useful. This is what I wrote:

  1. Autonomy: To teach computers to do a task without the direct aid/intervention of humans (e.g. autonomous vehicles)
  2. Fluctuation: Help machines adjust when the requirements and data change over time
  3. Intuitive Processing: Conduct or assist in tasks humans do but are unable to explain how computationally/algorithmically (e.g. image recognition)
  4. Big Data: Breaking down data that is too large to handle otherwise

The goal of this blog post is to explain each in more detail.

Case #1: Autonomy

Car, Automobile, 3D, Self-Driving

The first major use of machine learning centers around teaching computers to do a task or tasks without the direct aid or intervention of humans. Self-driving vehicles are a high-profile example of this: teaching a vehicle to drive (scanning the road and determining how to respond to what is around it) without the aid of or with minimal direct oversight from a human driver.

There are two types basic types of tasks that machine learning systems might perform autonomously:

  1. Tasks humans frequently perform
  2. Tasks humans are unable to perform.

Self-driving cars exemplify the former: humans drive cars, but self-driving cars would perform all or part of the driving process. Another example would be chatbots and virtual assistants like Alexa, Cortana, and Ok Google, which seek to converse with users independently. Such tasks might completely or partially complete the human activity: for example, some customer service chatbots are designed to determine the customer’s issue but then to transfer to a human when the issue has a certain complexity.

Humans have also sought to build autonomous machine learning algorithms to perform tasks that humans are unable to perform. Unlike self-driving cars, which conduct an activity many people do, people might also design a self-driving rover or submarine to drive and operate in a world that humans have so far been unable to inhabit, like other planets in our Solar System or the deep ocean. Search engines are another example: Google uses machine learning to help refine search results, which involves analyzing a massive amount of web data beyond what a human could normally do.

Case #2: Fluctuating Data

Business, Success, Curve, Hand, Draw, Present, Trend

Machine learning is also powerful tool for making sense of and incorporating fluctuating data. Unlike other types of models with fixed processes for how it predicts its values, machine learning models can learn from current patterns and adjust both if the patterns fluctuate overtime or if new use cases arise. This can be especially helpful when trying to forecast the future, allowing the model to decipher new trends if and when they emerge. For example, when predicting stock prices, machine learning algorithms can learn from new data and pick up changing trends to make the model better at predicting the future.

Of course, humans are notorious for changing overtime, so fluctuation is often helpful in models that seek to understand human preferences and behavior. For example, user recommendations – like Netflix’s, Hulu’s, or YouTube’s video recommendation systems – adjust based on the usage overtime, enabling them to respond to individual and/or collective changes in interests.

Case #3: Intuitive Processing

Flat, Recognition, Facial, Face, Woman, System

Data scientist frequently develop machine learning algorithms to teach computers how to do processes that humans do naturally but for which we are unable to fully explain how computationally. For example, popular applications of machine learning center around replicating some aspect of sensory perception: image recognition, sound or speech recognition, etc. These replicate the process of inputting sensory information (e.g. sight and sound) and processing, classifying, and otherwise making sense of that information. Language processing, like chatbots, form another example of this. In these contexts, machine learning algorithms learn a process that humans can do intuitively (see or hear stimuli and understand language) but are unable to fully explain how or why.

Many early forms of machine learning arose out of neurological models of how human brains work. The initial intention of neural nets, for instance, were to model our neurological decision-making process or processes. Now, much contemporary neurological scholarship since has disproven the accuracy of neural nets in representing how our brains and minds work.[i] But, whether they represent how human minds work at all, neural networks have provided a powerful technique for computers to use to process and classify information and make decisions. Likewise, many machine learning algorithms replicate some activity humans do naturally, even if the way they conduct that human task has little to do with how humans would.

Case #4: Big Data

Technology, 5G, Aerial, Abstract Background

Machine learning is a powerful tool when analyzing data that is too large to break down through conventional computational techniques. Recent computer technologies have increased the possibility of data collection, storage, and processing, a major driver in big data. Machine learning has arisen as a major, if not the major, means of analyzing this big data.

Machine learning algorithms can manage a dizzying array of variables and use them to find insightful patterns (like lasso regression for linear modeling). Many big data cases involve hundreds, thousands, and maybe even tens or hundreds of thousands of input variables, and many machine learning techniques (like best subsets selection, stepwise selection, and lasso regression) process the myriads of variables in big data and determine the best ones to use. 

Recent developments computing provides the incredible processing power necessary to do such work (and debatably, machine learning is currently helping to push computational power and provide a demand for greater computational abilities). Hand-calculations and computers several decades ago were often unable to handle the calculations necessary to analyze large information: demonstrated, for example, by the fact that computer scientists invented the now popular neural networks many decades ago, but they did not gain popularity as a method until recent computer processing made them easy and worthwhile to run.

Tractors and other large-scale agricultural techniques coincided historically with the enlargement of farm property sizes, where the such machinery not only allowed farmers to manage large tracks of land but also incentivized larger farms economically. Likewise, machine learning algorithms provide the main technological means to analyze big data, both enabling and in turn incentivized by rise of big data in the professional world.

Conclusion

Here I have described four major uses of machine learning algorithms. Machine learning has become popular in many industries because of at least one of these functionalities, but of course, they are not the only potential current uses. In addition, as we develop machine learning tools, we are constantly inventing more. Given machine learning’s newness compared to many other century-old technologies, time will tell all the ways humans utilize it.

Photo credit #1: Mike MacKenzie at https://www.flickr.com/photos/mikemacmarketing/30212411048/

Photo credit #2: julientromeur at https://pixabay.com/illustrations/car-automobile-3d-self-driving-4343635/

Photo credit #3: geralt at https://pixabay.com/illustrations/business-success-curve-hand-draw-1989130/

Photo credit #4: geralt at https://pixabay.com/illustrations/flat-recognition-facial-face-woman-3252983/

Photo credit #5: mohamed_hassan at https://pixabay.com/illustrations/technology-5g-aerial-4816658/


[i] See Richard, Nagyfi. The differences between Artificial and Biological Neural Networks. 4 September 2018. https://towardsdatascience.com/the-differences-between-artificial-and-biological-neural-networks-a8b46db828b7; and Tcheang, Lili. Are Artificial Neural Networks like the Human Brain? And does it matter? 7 November 2018. https://medium.com/digital-catapult/are-artificial-neural-networks-like-the-human-brain-and-does-it-matter-3add0f029273.

Recently Published Article: “Anthropology by Data Science”

tea set and newspaper placed on round table near comfortable chair
Photo by Ekrulila on Pexels.com

I am pleased to announce that the Annals of Anthropological Practice has accepted my article “Anthropology by Data Science.” https://anthrosource.onlinelibrary.wiley.com/doi/10.1111/napa.12169. In it, I reflect on the relationship anthropologist have cultivated with data science as a discipline and the importance of integrating machine learning techniques into ethnographic practice.

Annals of Anthropological Practice is overseen by the National Association for the Practice of Anthropology (NAPA) within the American Anthropological Association. Thank you, NAPA, for publishing my article and thank you to all the unnamed editors and reviewers in the process.

Interdisciplinary Anthropology and Data Science Master’s Thesis: A Quick and Dirty Project Summary

This is a quick and dirty summary of my master’s practicum research project with Indicia Consulting over the summer of 2018. For anyone interested in more detail, here is a more detailed report, and here is the final report with Indicia. 

Background

My practicum was the sixth stage of a several year-long research project. The California Energy Commission commissioned this larger project to understand the potential relationship between individual energy consumption and technology usage. In stages one through five, we isolated certain clusters of behavior and attitudes around new technology adoption – which Indicia called cybersensitivity – and demonstrated that cybersensitivity tended to associate with a willingness to adopt energy-saving technology like smart meters.

This led to a key question: How can one identify cybersensivity among a broader population such as a community, county, or state? Answering this question was the main goal of my practicum project.

In the past stages of the research project, the team used ethnographic research to establish criteria for whether someone was a cybersensitive based on several hours of interviews and observations about their technology usage. These interviews and observations certainly helped the research team analyze behavioral and attitudinal patterns, determine what patterns were significant, and develop those into the concept of cybersensitivity, but they are too time- and resource-intensive to perform with an entire population. One generally does not have the ability to interview everyone in a community, county, or state. I sought to address this directly in my project.

TaskTimelineTask NameResearch TechniqueDescription
Task 1June 2015-Sept 2018General Project TasksAdministrative (N/A)Developed project scope and timeline, adjusting as the project unfolds
Task 2July 2015 – July 2016Documenting and analyzing emerging attitudes, emotions, experiences, habits, and practices around technology adoptionSurveyConducted survey research to observe patterns of attitudes and behaviors among cybersensitives/awares.
Task 3Sept 2016 – Dec 2016Identifying the attributes and characteristics and psychological drivers of cybersensitivesInterviews and Participant-ObservationConducted in-depth interviews and observations coding for psych factor, energy consumption attitudes and behaviors, and technological device purchasing/usage.
Task 4*Sept 2016 – July 2017Assessing cybersensitives’ valence with technologyStatistical AnalysisTested for statistically significant differences in demographics, behaviors, and beliefs/attitudes between cyber status groups
Task 5Aug 2017 – Dec 2018  Developing critical insights for supporting residential engagement in energy efficient behaviorsStatistical AnalysisAnalyzed utility data patterns of study participants, comparing it with the general population.
Task 6March 2018 – Aug 2018Recommending an alternative energy efficiency potential modelDecision Tree ModelingConstructed decision tree models to classify an individual’s cyber status

Project Goal

The overall goal for the project was to produce a scalable method to assess whether someone exhibits cybersensitivity based on data measurable across an entire population. In doing this, the project also helped address the following research needs:

  1. Created a method to further to scale across a larger population, assessing whether cybersensitives were more willing to adopt energy saving technologies across a community, county, or state
  2. Provided the infrastructure to determine how much promoting energy-saving campaigns targeting cybersensitives specifically would reduce energy consumption in California
  3. Helped the California Energy Commission determine the best means to reach cybersensitives for specific energy-saving campaigns

The Project

I used machine learning modeling to create a decision-making flow to isolate cybersensitives in a population. Random forests and decision trees produced the best models for Indicia’s needs: random forests in accuracy and robustness and decision trees in human decipherability. Through them, I created a programmable yet human-comprehensible framework to determine whether an individual is cybersensitive based on behaviors and other characteristics that an organization could be easily assess within a whole population. Thus, any energy organization could easily understand, replicate, and further develop the model since it was both easy for humans to read and encodable computationally. This way organizations could both use and refine it for their purposes.

Conclusion

This is a quick overview of my master’s practicum project. For more details on what modeling I did, how I did it, what results it produced, and how it fit within the wider needs of the multi-year research project, please see my full report.

I really appreciated the opportunity it posed to get my hands dirty integrating ethnography and data science to help address a real-world problem. This summary only scratches the surface of what Indicia did with the Californian Energy Commission to encourage sustainable energy usage societally. Hopefully, though, it will inspire you to integrate ethnography and data science to address whatever complex questions you face. It certainly did for me.

Thank you to Susan Mazur-Stommen and Haley Gilbert for your help in organizing and completing the project. I would like to thank my professorial committee at the University of Memphis – Dr. Keri Brondo, Dr. Ted Maclin, Dr. Deepak Venugopal, and Dr. Katherine Hicks – for their academic support as well.

The Anthropology of Machine Learning

In the spring of 2018, I researched how anthropologists and related social scholars have analyzed data science and machine learning for my Master’s in Anthropology at the University of Memphis. For the project, I assessed the anthropological literature on data science and machine learning to date and explore potential connections between anthropology and data science, based on my perspective as a data scientist and anthropologist. Here is my final report.

Thank you, Dr. Ted Maclin, for your help overseeing and assisting this project.

Anthropology by Data Science: The EPIC Project with Indicia Consulting as an Exploratory Case Study

This is my practicum report with Indicia Consulting. In lieu of a master’s thesis, the University of Memphis Department of Anthropology required that we master’s students conduct a practicum project. For this, we had to partner with an organization and complete a 300+ hour anthropological research project based on the organization’s needs and our skills and interests. My practicum project was Indicia’s EPIC Project with the California Energy Commission (see this link and this link for more details on the EPIC Project). In this report, I outline potential ways to integrate ethnographic/anthropological and data science research in professional settings.

In November 2019, the American Anthropological Association’s Committee for the Anthropology of Science, Technology, and Computing (CASTAC) awarded me the David Hakken Graduate Student Prize for innovative science and technology scholarship.

Full Report:

Loader Loading…
EAD Logo Taking too long?

Reload Reload document
| Open Open in new tab

Download [1.56 MB]

The Anthropology Department also required that you publicly present your practicum research to the University of Memphis campus. This PowerPoint summarizes my practicum project. If you are not keen to read the 99 page full report, this is a much shorter alternative:

If you are interested in learning more about the project, please check out the following:

  1. Indicia Consulting’s Final Research Report with the California Energy Commission
  2. My Presentation at the 2019 Memphis Data Conference for Data Scientists Specifically

Computerized Knowledge Production: Machine Learning Models as Social Actors

The following is a presentation I gave at the Society for Applied Anthropology’s 2018 annual conference in Philadelphia, PA. In it, I describe how I think anthropologists should understand, analyze, and relate to machine learning and data science.

Memphis Data Conference: Anthropology by Data Science: The EPIC Project with Indicia Consulting as an Exploratory Case Study

Below is a talk I gave at the 2019 Memphis Data conference, organized by the University of Memphis to discuss data science research in the Memphian community. In this presentation, I summarize a project I did with Indicia Consulting that integrated data science and ethnography.

Check out these articles for a more detailed description of the projects: a short project summary, my master’s thesis about the project, and Indicia’s full report.

Applied Anthropology Conference Presentation: Integrating Anthropology and Data Science

On July 8th, 2021, I presented virtually at the Congress of Anthropologists and Ethnologists of Russia in Tomsk, Siberia, organized by Association of Anthropologists and Ethnologists of Russia. My talk was titled “Integrating Anthropology and Data Science,” which I presented as part of its subcommittee for applied and business anthropology. I discussed the unique opportunities integrating data science could provide anthropologists and potential strategies for how to integrate the two disciplines.

Here was my original abstract for the conference:

Here is my full presentation:

I had a great time, and I hope you enjoy it as well.