How do we build relatable machine learning models that regular people can understand? This is a presentation about how design principles apply to the development of machine learning systems. Too often in data science, machine learning software is not built with regular people who will interact with it in mind.
I argue that in order to make machine learning software relatable, we need to use design thinking to intentionally build in mechanisms for users to form their own mental models of how the machine learning software works. Failing to include theses helps cultivate the common sense that machine learning is a black box for users.
I gave three different versions of this talk at Quant UX Con on June 8th, 2022, the Royal Institute of Anthropology’s annual conference on June 10th, 2022, and Google’s AI + Design Tooling Research Symposium on August 5th, 2022.
I hope you find it interesting and feel free to share any thoughts you might have.
Thank you for the conference and talk organizers for making this happen, and I appreciate all the insightful conversations I had about the role of design thinking in building relatable machine learning.
I worked as a data scientist at a hospital in New York City during the worst of the covid-19 pandemic. Over the spring and summer, we became overwhelmed as the city turned into (and left) the global hotspot for covid-19. I have been processing everything that happened since.
The pandemic overwhelmed the entire hospital, particularly my physician colleagues. When I met with them, I could often notice the combined effects of physical and emotional exhaustion in their eyes and voices. Many had just arrived from the ICU, where they had spent several hours fighting to keep their patients alive only to witness many of them die in front of them, and I could sense the emotional toll that was taking.
My experiences of the pandemic as a data scientist differed considerably yet were also exhausting and disturbing in their own way. I spent several months day-in and day-out researching how many of our patients were dying from the pandemic and why: trying to determine what factors contributed to their deaths and what we could do as a hospital to best keep people alive. The patient who died the night before in front of the doctor I am currently meeting with became, for me, one a single row in an already way-too-large data table of covid-19 fatalities.
I felt like a helicopter pilot overlooking an out-of-control wildfire.[1] In such wildfires, teams of firefighters (aka doctors) position themselves at various strategic locations on the ground to push back the fire there as best they can. They experience the flames and carnage up close and personal. My placement in the helicopter, on the other hand, removes me from ground zero, instead forcing me to see and analyze the fire in its entirety and its sweeping and massive destruction across the whole forest. My vantage point provides a strategic vantage point to determine the best ways to fight it, shielding me from the immediate destruction. Nevertheless, witnessing the vastness of the carnage from the air had its own challenges, stress, and emotional toll.
Being an anthropologist by training, I am accustomed to being “on the ground.” Anthropology is predicated on the idea that to understand a culture or phenomena, one must understand the everyday experiences of those on the ground amidst it, and my anthropological training has instilled an instinct to go straight to and talk to those in the thick of it.
Yet, this experience has taught me that that perception is overly simplistic: the so-called “ground” has many layers to it, especially for a complex phenomenon like a pandemic. Being in the helicopter is another way to be in the thick of it just as much as standing before the flames.
Many in the United States have made considerable and commendable efforts to support frontline health workers. Yet, as the pandemic progresses, and its societal effects grow in complexity in the coming months I think we need to broaden our understanding of where the “frontlines” are and who a “frontline worker” is worthy of our support.
In actual battlefields where the “frontline” metaphor comes from, militaries also set up layered teams to support the logistical needs of ground soldiers who also must frequently put themselves in harm’s way in the process. The frontline of this pandemic seems no different.
I think we need to expand our conceptions of what it means to be on the frontlines accordingly. Like anthropology, modern journalism, a key source of pandemic information for many of us, can fall into the issue of overfocusing on the “worst of the worst,” potentially ignoring the broader picture and the diversity of “frontline” experiences. For example, interviewing the busiest medical caregivers in the worst affected hospitals in the most affected places in the world likely does promote viewership, but only telling those stories ignores the experiences and sacrifices of thousands of others necessary to keep them going.
To be clear, in this blog, I do not personally care about acknowledgement of my own work nor do I think we should ignore the contributions of these medical professional “ground troops” in any way. Rather, in the spirt of “yes and,” we should extend our understanding of the “frontline workers” to acknowledge and celebrate the contributions of many other essential professionals during this crisis, such as transportation services, food distribution, postal workers, etc. I related my own experiences as a data scientist because they helped me learn this, not for any desire for recognition.
This might help us appreciate the complexity of this crisis and its social effects, and the various types of sacrifices people have been making to address it. As it is becoming increasingly clear that this pandemic is not likely to go anywhere anytime soon, appreciating the full extent of both could help us come together to buckle down and fight it.
[1] This video helped me understand the logistics of fighting wildfires, a fascinating topic in itself: https://www.youtube.com/watch?v=EodxubsO8EI. Feel free to check it out to understand my analogy in more depth.
This is a quick and dirty summary of my master’s practicum research project with Indicia Consulting over the summer of 2018. For anyone interested in more detail, here is a more detailed report, and here is the final report with Indicia.
Background
My practicum was the sixth stage of a several year-long research project. The California Energy Commission commissioned this larger project to understand the potential relationship between individual energy consumption and technology usage. In stages one through five, we isolated certain clusters of behavior and attitudes around new technology adoption – which Indicia called cybersensitivity – and demonstrated that cybersensitivity tended to associate with a willingness to adopt energy-saving technology like smart meters.
This led to a key question: How can one identify cybersensivity among a broader population such as a community, county, or state? Answering this question was the main goal of my practicum project.
In the past stages of the research project, the team used ethnographic research to establish criteria for whether someone was a cybersensitive based on several hours of interviews and observations about their technology usage. These interviews and observations certainly helped the research team analyze behavioral and attitudinal patterns, determine what patterns were significant, and develop those into the concept of cybersensitivity, but they are too time- and resource-intensive to perform with an entire population. One generally does not have the ability to interview everyone in a community, county, or state. I sought to address this directly in my project.
Task
Timeline
Task Name
Research Technique
Description
Task 1
June 2015-Sept 2018
General Project Tasks
Administrative (N/A)
Developed project scope and timeline, adjusting as the project unfolds
Task 2
July 2015 – July 2016
Documenting and analyzing emerging attitudes, emotions, experiences, habits, and practices around technology adoption
Survey
Conducted survey research to observe patterns of attitudes and behaviors among cybersensitives/awares.
Task 3
Sept 2016 – Dec 2016
Identifying the attributes and characteristics and psychological drivers of cybersensitives
Interviews and Participant-Observation
Conducted in-depth interviews and observations coding for psych factor, energy consumption attitudes and behaviors, and technological device purchasing/usage.
Task 4*
Sept 2016 – July 2017
Assessing cybersensitives’ valence with technology
Statistical Analysis
Tested for statistically significant differences in demographics, behaviors, and beliefs/attitudes between cyber status groups
Task 5
Aug 2017 – Dec 2018
Developing critical insights for supporting residential engagement in energy efficient behaviors
Statistical Analysis
Analyzed utility data patterns of study participants, comparing it with the general population.
Task 6
March 2018 – Aug 2018
Recommending an alternative energy efficiency potential model
Decision Tree Modeling
Constructed decision tree models to classify an individual’s cyber status
Project Goal
The overall goal for the project was to produce a scalable method to assess whether someone exhibits cybersensitivity based on data measurable across an entire population. In doing this, the project also helped address the following research needs:
Created a method to further to scale across a larger population, assessing whether cybersensitives were more willing to adopt energy saving technologies across a community, county, or state
Provided the infrastructure to determine how much promoting energy-saving campaigns targeting cybersensitives specifically would reduce energy consumption in California
Helped the California Energy Commission determine the best means to reach cybersensitives for specific energy-saving campaigns
The Project
I used machine learning modeling to create a decision-making flow to isolate cybersensitives in a population. Random forests and decision trees produced the best models for Indicia’s needs: random forests in accuracy and robustness and decision trees in human decipherability. Through them, I created a programmable yet human-comprehensible framework to determine whether an individual is cybersensitive based on behaviors and other characteristics that an organization could be easily assess within a whole population. Thus, any energy organization could easily understand, replicate, and further develop the model since it was both easy for humans to read and encodable computationally. This way organizations could both use and refine it for their purposes.
Conclusion
This is a quick overview of my master’s practicum project. For more details on what modeling I did, how I did it, what results it produced, and how it fit within the wider needs of the multi-year research project, please see my full report.
I really appreciated the opportunity it posed to get my hands dirty integrating ethnography and data science to help address a real-world problem. This summary only scratches the surface of what Indicia did with the Californian Energy Commission to encourage sustainable energy usage societally. Hopefully, though, it will inspire you to integrate ethnography and data science to address whatever complex questions you face. It certainly did for me.
Thank you to Susan Mazur-Stommen and Haley Gilbert for your help in organizing and completing the project. I would like to thank my professorial committee at the University of Memphis – Dr. Keri Brondo, Dr. Ted Maclin, Dr. Deepak Venugopal, and Dr. Katherine Hicks – for their academic support as well.
The following was a project I completed for a graduate course in Artificial Intelligence I took at the University of Memphis in the spring of 2019. For the project, I analyzed whether part of speech evaluation could modulate Markov Chain-based next word predictors. In particular, I developed and tested two different strategies for incorporating part of speech predictions, which I termed excluder and multiplier. The multiplier method performed better than the excluder and matched the performance of the control. Hopefully, this is a helpful exploration into ways to use lexical information to improve next word predictors.