User-Centric Thinking in Data Science: Conversation with Anna Wu at Google Cloud (Part 3 of 3)

I interviewed Anna Wu, a UX researcher and data scientist overseeing Google Cloud’s Compute Engine. In this final part of the conversation, we discuss how design thinking may useful within data science and machine learning.

Here is the first interview if you would like to start from scratch, and here is more information about Interview Series that this is a part of.

Here is Part 1 and Part 2 of our interview.

Anna Wu, established leader in building and leading high-performing data teams to drive changes impacting hundreds of millions of users. Currently as a research manager at Google, she leads a team of quantitative UX researchers applying UX methods and large scale analytics to inform Cloud product development. 

Before this recent chapter, Anna had 10+ years practicing UX and data science at top IT companies and research labs as a UX researcher, data scientist, research scientist at Microsoft, IBM Research and Palo Alto Research Center. She got her PhD in HCI from Penn State and master/bachelor degrees from Tsinghua University.

Resources:

User-Centric Thinking in Data Science: Conversation with Anna Wu at Google Cloud (Part 1 of 3)

I interviewed Anna Wu, a UX researcher and data scientist overseeing Google Cloud’s Compute Engine, as the next installment of my Interview Series,. In this first part of our conversatoin, she discusses her journey from mechanical engineering into UX research and data science and the importance of effective storytelling within these two fields.

Here is Part 2 and Part 3 of our interview.

Anna Wu, established leader in building and leading high-performing data teams to drive changes impacting hundreds of millions of users. Currently as a research manager at Google, she leads a team of quantitative UX researchers applying UX methods and large scale analytics to inform Cloud product development. 

Before this recent chapter, Anna had 10+ years practicing UX and data science at top IT companies and research labs as a UX researcher, data scientist, research scientist at Microsoft, IBM Research and Palo Alto Research Center. She got her PhD in HCI from Penn State and master/bachelor degrees from Tsinghua University.

Resources mentioned:

User-Centric Thinking in Data Science: Conversation with Anna Wu at Google Cloud (Part 2 of 3)

In this second part of my interview with Anna Wu, she describes the interconnections between data science and qualitative UX research.

Here is the first interview if you would like to start from scratch, and here is more information about Interview Series that this is a part of.

Here is Part 1 and Part 3 of our interview.

Anna Wu, established leader in building and leading high-performing data teams to drive changes impacting hundreds of millions of users. Currently as a research manager at Google, she leads a team of quantitative UX researchers applying UX methods and large scale analytics to inform Cloud product development. 

Before this recent chapter, Anna had 10+ years practicing UX and data science at top IT companies and research labs as a UX researcher, data scientist, research scientist at Microsoft, IBM Research and Palo Alto Research Center. She got her PhD in HCI from Penn State and master/bachelor degrees from Tsinghua University.

Resources mentioned:

Three Key Differences between Data Science and Statistics

woman draw a light bulb in white board

Data science’s popularity has grown in the last few years, and many have confused it with its older, more familiar relative: statistics. As someone who has worked both as a data scientist and as a statistician, I frequently encounter such confusion. This post seeks to clarify some of the key differences between them.

Before I get into their differences, though, let’s define them. Statistics as a discipline refers to the mathematical processes of collecting, organizing, analyzing, and communicating data. Within statistics, I generally define “traditional” statistics as the the statistical processes taught in introductory statistics courses like basic descriptive statistics, hypothesis testing, confidence intervals, and so on: generally what people outside of statistics, especially in the business world, think of when they hear the word “statistics.”

Data science in its most broad sense is the multi-disciplinary science of organizing, processing, and analyzing computational data to solve problems. Although they are similar, data science differs from both statistics and “traditional” statistics:

DifferenceStatistics Data Science
#1 Field of Mathematics Interdisciplinary
#2 Sampled Data Comprehensive Data
#3 Confirming Hypothesis Exploratory Hypotheses

Difference #1: Data Science Is More than a Field of Mathematics

Statistics is a field of mathematics; whereas, data science refers to more than just math. At its simplest, data science centers around the use of computational data to solve problems,[i] which means it includes the mathematics/statistics needed to break down the computational data but also the computer science and engineering thinking necessary to code those algorithms efficiently and effectively, and the business, policy, or other subject-specific “smarts” to develop strategic decision-making based on that analysis.

Thus, statistics forms a crucial component of data science, but data science includes more than just statistics. Statistics, as a field of mathematics, just includes the mathematical processes of analyzing and interpreting data; whereas, data science also includes the algorithmic problem-solving to do the analysis computationally and the art of utilizing that analysis to make decisions to meet the practical needs in the context. Statistics clearly forms a crucial part of the process of data science, but data science generally refers to the entire process of analyzing computational data. On a practical level, many data scientists do not come from a pure statistics background but from a computer science or engineering, leveraging their coding expertise to develop efficient algorithmic systems.

laptop computer on glass-top table

Difference #2: Comprehensive vs Sample Data

In statistical studies, researchers are often unable to analyze the entire population, that is the whole group they are analyzing, so instead they create a smaller, more manageable sample of individuals that they hope represents the population as a whole. Data science projects, however, often involves analyzing big, summative data, encapsulating the entire population.

 The tools of traditional statistics work well for scientific studies, where one must go out and collect data on the topic in question. Because this is generally very expensive and time-consuming, researchers can only collect data on a subset of the wider population most of the time.

Recent developments in computation, including the ability to gather, store, transfer, and process greater computational data, have expanded the type of quantitative research now possible, and data science has developed to address these new types of research. Instead of gathering a carefully chosen sample of the population based on a heavily scrutinized set of variables, many data science projects require finding meaningful insights from the myriads of data already collected about the entire population.

stack of jigsaw puzzle pieces

Difference #3: Exploratory vs Confirming  

Data scientists often seek to build models that do something with the data; whereas, statisticians through their analysis seek to learn something from the data. Data scientists thus often assess their machine learning models based on how effectively they perform a given task, like how well it optimizes a variable, determines the best course of action, correctly identifies features of an image, provides a good recommendation for the user, and so on. To do this, data scientists often compare the effectiveness or accuracy of the many models based on a chosen performance metric(s).

In traditional statistics, the questions often center around using data to understand the research topic based on the findings from a sample. Questions then center around what the sample can say about the wider population and how likely its results would represent or apply to that wider population.

In contrast, machine learning models generally do not seek to explain the research topic but to do something, which can lead to very different research strategy. Data scientists generally try to determine/produce the algorithm with the best performance (given whatever criteria they use to assess how a performance is “better”), testing many models in the process. Statisticians often employ a single model they think represents the context accurately and then draw conclusions based on it.

Thus, data science is often a form of exploratory analysis, experimenting with several models to determine the best one for a task, and statistics confirmatory analysis, seeking to confirm how reasonable it is to conclude a given hypothesis or hypotheses to be true for the wider population.

A lot of scientific research has been theory confirming: a scientist has a model or theory of the world; they design and conduct an experiment to assess this model; then use hypothesis testing to confirm or negate that model based on the results of the experiment. With changes in data availability and computing, the value of exploratory analysis, data mining, and using data to generate hypotheses has increased dramatically (Carmichael 126).

Data science as a discipline has been at the forefront of utilizing increased computing abilities to conduct exploratory work.

person holding gold-colored pocket watch

Conclusion

 A data scientist friend of mine once quipped to me that data science simply is applied computational statistics (c.f. this). There is some truth in this: the mathematics of data science work falls within statistics, since it involves collecting, analyzing, and communicating data, and, with its emphasis and utilization of computational data, would definitely be a part of computational statistics. The mathematics of data science is also very clearly applied: geared towards solving practical problems/needs. Hence, data science and statistics interrelate.

They differ, however, both in their formal definitions and practical understandings. Modern computation and big data technologies have had a major influence on data science. Within statistics, computational statistics also seeks to leverage these resources, but what has become “traditional” statistics does not (yet) incorporate these. I suspect in the next few years or decades, developments in modern computing, data science, and computational statistics will reshape what people consider “traditional” or “standard” statistics to be a bit closer to the data science of today.

   For more details, see the following useful resources:

Ian Carmichael’s and J.S. Marron’s “Data science vs. statistics: two cultures?” in the Japanese Journal of Statistics and Data Science: https://link.springer.com/article/10.1007/s42081-018-0009-3
“Data Scientists Versus Statisticians” at https://opendatascience.com/data-scientists-versus-statisticians/ and https://medium.com/odscjournal/data-scientists-versus-statisticians-8ea146b7a47f
“Differences between Data Science and Statistics” at https://www.educba.com/data-science-vs-statistics/

Photo credit #1: Andrea Piacquadio at https://www.pexels.com/photo/woman-draw-a-light-bulb-in-white-board-3758105/

Photo credit #2: Carlos Muza at https://unsplash.com/photos/hpjSkU2UYSU

Photo credit #3: Hans-Peter Gauster at https://unsplash.com/photos/3y1zF4hIPCg

Photo credit #4: Kendall Lane at https://unsplash.com/photos/yEDhhN5zP4o


[i] Carmichael 118.

Data Science and the Myth of the “Math Person”

woman holding books

“Data science is doable,” a fellow attendee of the EPIC’s 2018 conference in Honolulu would exclaim like a mantra. The conference was for business ethnographers and UX researchers interested in understanding and integrating data science and machine learning into their research. She was specifically trying to address a tendency she has noticed– which I have seen as well: qualitative researchers and other so-called “non-math people” frequently believe that data science is far too technical for them. This seems ultimately rooted in cultural myths about math and math-related fields like computer science, engineering, and now data science, and in a similar vein as her statement, my goal in this essay is to discuss these attitudes and show that data science, like math, is relatable and doable if you treat it as such.

The “Math Person”

In the United States, many possess an implied image of a “math person:” a person supposedly naturally gifted at mathematics. And many who do not see themselves as fitting that image simply decry that math simply isn’t for them. The idea that some people are inherently able and unable to do math is false, however, and prevents people from trying to become good at the discipline, even if they might enjoy and/or excel at it.

Most skills in life, including mathematical skills, are like muscles: you do not innately possess or lack that skill, but rather your skill develops as you practice and refine that activity. Anybody can develop a skill if they practice it enough.  

Scholars in anthropology, sociology, psychology, and education have documented how math is implicitly and explicitly portrayed as something some people can do and some cannot do, especially in math classes in grade school. Starting in early childhood, we implicitly and sometimes explicitly learn the idea that some people are naturally gifted at math but for others, math is simply not their thing. Some internalize that they are gifted at math and thus take the time to practice enough to develop and refine their mathematical skills; while others internalize that they cannot do math and thus their mathematical abilities become stagnant. But this is simply not true.

Anyone can learn and do math if he or she practices math and cultivates mathematical thinking. If you do not cultivate your math muscle, then well it will become underdeveloped and, then, yes, math becomes harder to do. Thus, as a cruel irony someone internalizing that he or she cannot do math can turn into a self-fulfilling prophecy: he or she gives up on developing mathematical skills, which leads to its further underdevelopment.

Similarly, we cultivate another false myth that people skilled in mathematics (or math-related fields like computer science, engineering, and data science) in general do not possess strong social and interpersonal communication skills. The root for this stereotype lies in how we think of mathematical and logical thinking than actual characteristics of mathematicians, computer scientists, or engineers. Social scientists who have studied the social skills of mathematicians, computer scientists, and engineers have found no discernable difference in social and interpersonal communication skills with the rest of the world.  

Quantitative and Qualitative Specialties

Anyone can learn and do math if he or she practices math and cultivates mathematical thinking.

The belief that some people are just inherently good at math and that such people do not possess strong social and interpersonal communication skills contributes to the division between quantitative and qualitative social research, in both academic and professional contexts. These attitudes help cultivate the false idea that quantitative research and qualitative research are distinct skill sets for different types of people: that supposedly quantitative research can only be done “math people” and qualitative research by “people people.” They suddenly become separate specialties, even though social research by its very nature involves both. Such a split unnecessarily stifles authentic and holistic understanding of people and society.

In professional and business research contexts, both qualitative and quantitative researchers should work with each other and eventually through that process, slowly learn each other’s skills. If done well, this would incentivize researchers to cultivate both mathematical/quantitative, and interpersonal/qualitative research skills.

It would reward professional researchers who develop both skillsets and leverage them in their research, instead of encouraging researchers to specialize in one or the other. It could also encourage universities to require in-depth training of both to train their students to become future workers, instead of requiring that students choose among disciplines that promote one track over the other.

Working together is only the first step, however, whose success hinges on whether it ultimately leads to the integration of these supposedly separate skillsets. Frequently, when qualitative and quantitative research teams work together, they work mostly independently – qualitative researchers on the qualitative aspect of the project and quantitative researchers on the quantitative aspects of the project – thus reinforcing the supposed distinction between them. Instead, such collaboration should involve qualitative researchers developing quantitative research skills by practicing such methods and quantitative researchers similarly developing qualitative skills.

Conclusion

Anyone can develop mathematics and data science skills if they practice at it. The same goes with the interpersonal skills necessary for ethnographic and other qualitative research. Depicting them as separate specialties – even if they come together to do each of their specialized parts in a single research projects – functions stifles their integration as a singular set of tools for an individual and reinforces the false myths we have been teaching ourselves that data science is for math, programming, or engineering people and that ethnography is for “people people.” This separation stifles holistic and authentic social research, which inevitably involves qualitative and quantitative approaches.

Photo credit #1: Andrea Piacquadio at https://www.pexels.com/photo/woman-holding-books-3768126/

Photo credit #2: Antoine Dautry at https://unsplash.com/photos/_zsL306fDck

Photo credit #3: Mike Lawrence at https://www.flickr.com/photos/157270154@N05/28172146158/ and http://www.creditdebitpro.com/

Photo credit #4: Ryan Jacobson at https://unsplash.com/photos/rOYhgmDIOg8