Articles

When Is Machine Learning Useful?

In a past blog post, I defined and described what machine learning is. I briefly highlighted four instances where machine learning algorithms are useful. This is what I wrote:

  1. Autonomy: To teach computers to do a task without the direct aid/intervention of humans (e.g. autonomous vehicles)
  2. Fluctuation: Help machines adjust when the requirements and data change over time
  3. Intuitive Processing: Conduct or assist in tasks humans do but are unable to explain how computationally/algorithmically (e.g. image recognition)
  4. Big Data: Breaking down data that is too large to handle otherwise

The goal of this blog post is to explain each in more detail.

Case #1: Autonomy

Car, Automobile, 3D, Self-Driving

The first major use of machine learning centers around teaching computers to do a task or tasks without the direct aid or intervention of humans. Self-driving vehicles are a high-profile example of this: teaching a vehicle to drive (scanning the road and determining how to respond to what is around it) without the aid of or with minimal direct oversight from a human driver.

There are two types basic types of tasks that machine learning systems might perform autonomously:

  1. Tasks humans frequently perform
  2. Tasks humans are unable to perform.

Self-driving cars exemplify the former: humans drive cars, but self-driving cars would perform all or part of the driving process. Another example would be chatbots and virtual assistants like Alexa, Cortana, and Ok Google, which seek to converse with users independently. Such tasks might completely or partially complete the human activity: for example, some customer service chatbots are designed to determine the customer’s issue but then to transfer to a human when the issue has a certain complexity.

Humans have also sought to build autonomous machine learning algorithms to perform tasks that humans are unable to perform. Unlike self-driving cars, which conduct an activity many people do, people might also design a self-driving rover or submarine to drive and operate in a world that humans have so far been unable to inhabit, like other planets in our Solar System or the deep ocean. Search engines are another example: Google uses machine learning to help refine search results, which involves analyzing a massive amount of web data beyond what a human could normally do.

Case #2: Fluctuating Data

Business, Success, Curve, Hand, Draw, Present, Trend

Machine learning is also powerful tool for making sense of and incorporating fluctuating data. Unlike other types of models with fixed processes for how it predicts its values, machine learning models can learn from current patterns and adjust both if the patterns fluctuate overtime or if new use cases arise. This can be especially helpful when trying to forecast the future, allowing the model to decipher new trends if and when they emerge. For example, when predicting stock prices, machine learning algorithms can learn from new data and pick up changing trends to make the model better at predicting the future.

Of course, humans are notorious for changing overtime, so fluctuation is often helpful in models that seek to understand human preferences and behavior. For example, user recommendations – like Netflix’s, Hulu’s, or YouTube’s video recommendation systems – adjust based on the usage overtime, enabling them to respond to individual and/or collective changes in interests.

Case #3: Intuitive Processing

Flat, Recognition, Facial, Face, Woman, System

Data scientist frequently develop machine learning algorithms to teach computers how to do processes that humans do naturally but for which we are unable to fully explain how computationally. For example, popular applications of machine learning center around replicating some aspect of sensory perception: image recognition, sound or speech recognition, etc. These replicate the process of inputting sensory information (e.g. sight and sound) and processing, classifying, and otherwise making sense of that information. Language processing, like chatbots, form another example of this. In these contexts, machine learning algorithms learn a process that humans can do intuitively (see or hear stimuli and understand language) but are unable to fully explain how or why.

Many early forms of machine learning arose out of neurological models of how human brains work. The initial intention of neural nets, for instance, were to model our neurological decision-making process or processes. Now, much contemporary neurological scholarship since has disproven the accuracy of neural nets in representing how our brains and minds work.[i] But, whether they represent how human minds work at all, neural networks have provided a powerful technique for computers to use to process and classify information and make decisions. Likewise, many machine learning algorithms replicate some activity humans do naturally, even if the way they conduct that human task has little to do with how humans would.

Case #4: Big Data

Technology, 5G, Aerial, Abstract Background

Machine learning is a powerful tool when analyzing data that is too large to break down through conventional computational techniques. Recent computer technologies have increased the possibility of data collection, storage, and processing, a major driver in big data. Machine learning has arisen as a major, if not the major, means of analyzing this big data.

Machine learning algorithms can manage a dizzying array of variables and use them to find insightful patterns (like lasso regression for linear modeling). Many big data cases involve hundreds, thousands, and maybe even tens or hundreds of thousands of input variables, and many machine learning techniques (like best subsets selection, stepwise selection, and lasso regression) process the myriads of variables in big data and determine the best ones to use. 

Recent developments computing provides the incredible processing power necessary to do such work (and debatably, machine learning is currently helping to push computational power and provide a demand for greater computational abilities). Hand-calculations and computers several decades ago were often unable to handle the calculations necessary to analyze large information: demonstrated, for example, by the fact that computer scientists invented the now popular neural networks many decades ago, but they did not gain popularity as a method until recent computer processing made them easy and worthwhile to run.

Tractors and other large-scale agricultural techniques coincided historically with the enlargement of farm property sizes, where the such machinery not only allowed farmers to manage large tracks of land but also incentivized larger farms economically. Likewise, machine learning algorithms provide the main technological means to analyze big data, both enabling and in turn incentivized by rise of big data in the professional world.

Conclusion

Here I have described four major uses of machine learning algorithms. Machine learning has become popular in many industries because of at least one of these functionalities, but of course, they are not the only potential current uses. In addition, as we develop machine learning tools, we are constantly inventing more. Given machine learning’s newness compared to many other century-old technologies, time will tell all the ways humans utilize it.

Photo credit #1: Mike MacKenzie at https://www.flickr.com/photos/mikemacmarketing/30212411048/

Photo credit #2: julientromeur at https://pixabay.com/illustrations/car-automobile-3d-self-driving-4343635/

Photo credit #3: geralt at https://pixabay.com/illustrations/business-success-curve-hand-draw-1989130/

Photo credit #4: geralt at https://pixabay.com/illustrations/flat-recognition-facial-face-woman-3252983/

Photo credit #5: mohamed_hassan at https://pixabay.com/illustrations/technology-5g-aerial-4816658/


[i] See Richard, Nagyfi. The differences between Artificial and Biological Neural Networks. 4 September 2018. https://towardsdatascience.com/the-differences-between-artificial-and-biological-neural-networks-a8b46db828b7; and Tcheang, Lili. Are Artificial Neural Networks like the Human Brain? And does it matter? 7 November 2018. https://medium.com/digital-catapult/are-artificial-neural-networks-like-the-human-brain-and-does-it-matter-3add0f029273.

Four Lessons in Time Management: What Graduate School Taught Me about Time Management

three round analog clocks and round gray mats

I am a Type-A personality who likes to do a variety of different activities yet cannot help but give each of them my all. Through this, I have learned a ton about time management. In particular, from 2017 to 2019, I was in graduate school at the University of Memphis while working as both a data scientist and a user researcher. I was easily working 70-90 hours a week.

Necessity is often the best teacher, and during this trial by fire, I figured out how to manage my time efficiently and effectively. Here are four personal lessons I learned for how to manage time well:

Lesson #1 Rest Effectively
Lesson #2 Work in Short-Term Sprints
Lesson #3 Complete Tasks during the Optimal Time of Day
Lesson #4 Rotating between Types of Tasks to Replenish Myself

Lesson #1: Rest Effectively

Developing an effective personal rhythm in which I had time to both work and relax throughout the day was necessary to ensure that I could work productively.

When many people think about time management (or at least when I do), they often focus on strategies/techniques to be productive during work time. Managing one’s time while working is definitely important, but I have found that resting and recuperating effectively is by far the most important single practice to cultivate to work productively.

Developing an effective personal rhythm in which I had time to both work and relax throughout the day was necessary to ensure that I could work effectively.

woman doing yoga meditation on brown parquet flooring

Several different activities help me relax: taking walks, exercising, hanging out with friends and colleagues, reading, watching videos, etc. People have a variety of ways to relax, so maybe some of those are great for you, and maybe you do something else entirely.

Generally, to relax I chose an activity that contrasted and complemented the work I had just been doing. For example, if my work was interviewing people – which I did frequently as a user researcher – then I would unwind with quiet, solitary tasks like walking or reading, but if my work was solitary like programming or writing a paper, I might unwind by socializing with others. Relaxing with a different type of activity as my work would allow me to rest and rejuvenate from the specific strains of that work activity.

I have seen a tendency in some of U.S. work/business culture to constantly push to do more. The goal is usually productivity – that is to get more done – and it makes sense to think that doing more will, well, lead to getting more things done.

That is true to a point, though, or at least to me. There comes a point when trying to do more actually prevents me from getting more done. Instead, taking enough time to rest and recuperate unwinds my mind so that when I am working, I am ready to go. This leads to greater productivity across all counts:

  1. Quantitatively: I can complete a greater number of tasks
  2. Qualitatively: The tasks I complete are of better quality
  3. Efficiency: It takes me a lot less time to complete the same task

I think the idea that doing more work leads to greater productivity is a major false myth in the modern U.S. workforce. Instead, it leads to overwork, stress, and inefficiency, stifling genuine productivity.

Self-care through incorporating rest into my work rhythm has not only been necessary for my mental health but also to be a productive worker. In discussions around self-care, I have often a juxtaposition between being more productive and taking care of oneself, but those two concerns reinforce each other not contradict each other. Overworking without taking enough time to recuperate prevents me from being an effective and productive human worker. Instead, the question is how to cultivate life-giving and rejuvenating practices and disciplines so that I can become productive and maintain so.

Lesson #2: Work in Short-Term Sprints

I developed a practice of completing tasks in twenty-five-minute chunks. I would set the timer for twenty-five-minutes and work intensely without stopping on the given task/project until the time was up. (My technique has some similarities with the Pomodoro Technique, but without as many rules or requirements.) I realized that twenty-five-minutes was how long I could mentally work continuously on a single task without thinking about something else or needing a break. After that time, I would start to get tired and inefficient, so giving myself a break would let me unwind and rejuvenate.

After one of these twenty-five-minute sprints, I would take a break of at least five minutes: walk around, watch an interesting video, go talk with a colleague or friend, whatever I needed to do to unwind. These breaks were the time my brain would need to process what I was doing and reenergize for the next task. Given that my day would be made up of several of these twenty-five-minute sprints, for the first one or two, I might take a five minute break, but a few more, I might take a longer break as I had done more to unwind from.

A crucial skill for this practice has been successfully breaking down the given project to complete in the timed chunks. For some projects, I would designate a short-term task or goal to complete in the twenty-five-minutes. With my course readings, for example, I generally had to submit a summary and analysis of the readings. Thus, my goal during each twenty-five-minute sprint would be to finish one article or chapter – both reading it and writing the summary and analysis. I would start by reading the most significant subsections, generally the introduction and conclusion, summarizing and analyzing it as I read. That generally took up half of my twenty-five-minutes, so in whatever remaining time I had left, I would read the remaining sections.

This provided enough time to get a sense for the reading’s argument and complete the assignment, even in the off-chance that I did not have time to finish reading the entire article. In only twenty-five-minutes, I would knock out a whole reading, including my summary and analysis: one less task to worry about. Spending twenty-five-minutes a day is not that much of a burden either. Doing this, I would complete all the readings for my courses within the first few weeks of the semester, opening time over the next several months when my other work would pick up.

aerial photography of mountain ridge

I could not split all activities into short-term tasks to complete in twenty-five-minutes, though. For those I could not, the trick was to estimate how much time an overall task would take. For example, if my supervisor gave me a month to complete a project, I would then calculate how many twenty-five-minute slots I would need per day given how many total hours I would likely need to spend on the project.

Data science projects are notoriously nonlinear, meaning that I could just about never break them down into sets of twenty-five-minute tasks, but rather almost always had to just figure out how much total time to budget like this. The various parts of a data science project – like data cleaning, building the model(s), and then improving/refining said model – could take widely different amount of times to complete and often fed into each other anyways. The first data science projects were always the hardest to determine how long they would take, but after doing many of them, I developed an intuitive sense of how much time to budget.

toddler's standing in front of beige concrete stair

The fear of a blank page and resulting procrastination were major issues I had to overcome when working on a project. At the beginning of the project, before I had broken down the task and determined the best strategy for how to complete it, focusing could be difficult. If I was not careful, the stress of the blank page or complete openness of the new project could cause me to become distracted and want to do something else instead. In more extreme cases, this could lead to procrastinating in getting started at all.

To get my ideas on paper, during the first twenty-five-minute sprint of a new task, I would look through all my materials and brainstorm how I would complete the task. Through this, I would develop an initial to do list of items that I could do in the ensuing sprints. Even though my to do list almost always changed overtime, this allowed me to get started. The most important caveat was to make sure I did that planning session when I was able to handle such an open-ended task (something I discuss in more detail in Lesson #3).

I also addressed my tendency to procrastinate by creating my own stricter deadlines for when a project was due. Extreme procrastination (like putting off starting or completing something until the last minute when you must rush to complete a task in the last several hours before its deadline) would destroy my productivity. Having to work in a mad rush would prevent me from having the balance between work and rest I discussed in Lesson #1 necessary to work productively. And when I have a lot of tasks, rushing last minute for one project would prevent me from working ahead on future projects, which would have then caused me to fall behind on them and create a vicious cycle of procrastination.

Thus, I would set my own deadline a week or two prior to a project’s actual deadline. For example, if I had four weeks to write an assignment, I would set my own deadline of three weeks for a presentable draft, and no matter what, I would meet this deadline. I would treat this like my actual deadline and never missed it. This presentable draft may not be perfect or amazing yet but something that in a pinch I would feel comfortable turning in: a solid B or B- quality version, not the A or A+ awesomeness my perfectionist self prefers. I might need to proofread once or twice to smooth out some kinks, but it has all the basic components of the task or assignment done. That way, if I became too busy with other projects to do that proofreading, it was good enough quality that I could still turn it in without editing in a pinch.

In the remaining week, I would then work out those minor issues, combing it a few more times to make it top quality, but if another, higher priority project or issue arose during that final week needing more of my attention than I anticipated, I could still have something to turn in. By making sure I stayed ahead with an adequate draft, I never had to worry about falling behind and rushing to finish as assignment last minute, and being a week or so ahead provided a cushion or shock absorber to handling any unforeseeable issues without falling behind. Through this, I never missed a single deadline despite working multiple jobs and being a full-time student.

Lesson #3: Complete Tasks during the Optimal Time of Day

I have found that certain types of activities are easier for me during certain times of the day. For example, being a morning person, I do my best work first thing in the morning. Thus, I would perform my most open-ended, creative, and strategic types of tasks – like brainstorming and breaking down a new project, solving an open-ended problem, and writing an essay or report – then. In the early afternoon, I would try to schedule any meetings and interviews (if that worked in the other people’s schedules as well of course), and in the late afternoon and evening, I would complete more menial, plug-and-chug aspects of a project that need less intense mental thought and more rote implementation of what I came up with that morning, like writing the code of an algorithm I had mapped out in the morning or proofreading a paper I already wrote. This would ensure that I would be fresh and efficient when doing the complex, open-ended tasks and not wasting my time and energy trying to force myself to complete such tasks during the times of the day when I am naturally tired, slower, and less efficient.

Lesson #4: Leveraging Different Types of Tasks to Replenish Myself

As both a data scientist and anthropologist, I have had to do a wide variety of tasks, using many different skills, ranging from talking and interviewing people to math proofs and programming to scholarly and non-fiction writing. This variety has been something I could use to replenish myself. Each of these activities is in of itself stimulating to me, but doing one of them exclusively for long periods of time would become draining after a while.

In agriculture, certain crops use up certain nutrients in the soil (like corn depletes nitrogen particularly strongly), so farmers will often rotate between crops to replenish the nutrients in the soil from the previous crop. Likewise, I found rotating between several different types of activities helpful for rejuvenating and replenishing my mind from the last activity.

If I had to do a series of very logical tasks like math or programming, I might replenish with a social task as my next activity like interviewing or meeting with people, or if I interviewed people for several hours, I would next break from that by doing something solitary like programming or writing. I would use these rotations strategically to rest from one activity while still practicing and developing other skill sets.

Conclusion

These are the lessons I learned for how to sustain myself while working 80-100-hour weeks. The first lesson was crucial: developing an effective rhythm between work and rest that enabled me to work productively, efficiently, and sustainably. The other three were my specific strategies for how I created that rhythm. I developed and refined them during intense, busy periods of my life in order to still produce high quality work while maintaining my sanity. Hopefully, they are helpful food for thought for anyone else trying to develop his or her own time-management strategies.

Photo credit #1: Karim MANJRA at https://unsplash.com/photos/dtSCKE9-8cI

Photo credit #2: Jared Rice at https://unsplash.com/photos/NTyBbu66_SI

Photo credit #3: Carl Heyerdahl at https://unsplash.com/photos/KE0nC8-58MQ

Photo credit #4: Allie Smith at https://unsplash.com/photos/eXGSBBczTAY

Photo credit #5: NeONBRAND at https://unsplash.com/photos/KYxXMTpTzek

Photo credit #6: Alex Siale at https://unsplash.com/photos/qH36EgNjPJY

Photo credit #7: Jukan Tateisi at https://unsplash.com/photos/bJhT_8nbUA0

Photo credit #8: Ksenia Makagonova at https://unsplash.com/photos/Vq-EUXyIVY4

Photo credit #9: Dawid Zawila at https://unsplash.com/photos/-G3rw6Y02D0

Photo credit #10: Dennis Jarvis at https://www.flickr.com/photos/archer10/3555040506/

Methodological Complementarianism: Being the Mix in Mixed Methods

photo of women at the meeting
Photo by RF._.studio on Pexels.com

I wrote this essay for my midterm for a course I took on conducting program evaluation as an anthropologist taught by Dr. Michael Duke at the University of Memphis Anthropology Master’s program. In it, I synthesize Donna Mertens’s discussion of employing mixed methods research for program evaluation work in her book, Mixed Methods Design in Evaluation, as a way to present the need for what I call methodological complementarianism.

Methodological complementarianism involves complementing those on the team one is working with by advancing for the complementary perspectives that the team needs. When conducting transdisciplinary work as applied anthropologists, instead of explicitly or implicitly seeking to maintain a “pure” anthropological approach, I think we should have a greater willingness to produce something anew in that environment, even if it no longer fits the “pure” boundaries of proper anthropology or ethnography but rather some kind of hybrid emerging out of the needs of the situation. Methodological complementarianism is one practical way to do that I have been exploring.

The Stages of Learning a New Data or Programming Skill

Many people have admirably sought to learn data science, data analytics, a programming language, or some other data or programming skill in order to develop themselves professionally and/or seek a new career path. Excitingly, learning such skills has become significantly easier to do online. But this online learning can also foster unrealistic understandings of what learning one of these skills entails, since it can remove prospective learners from the physical community of experts who help introduce prospective learners to the expectations of that field.

The goal of this article is to help rectify that by explaining the basic steps typically needed to develop a mastery of a new data or programming skill. This will hopefully help inform high-level expectations for learning the skill would entail but also help you choose the right courses or set of courses to ensure you develop all three stages.

By data skill, I mean any data field like data science, data analytics, or data engineering, or any specific skill or practice within a data field that someone might seek to learn, and by programming skill I mean the skills necessary to learn and code in a programming language.

These are the three basic learning stages to master any of these topics:

Stage 1: Grasp the basic concepts of the topic
Stage 2: Complete a guided project
Stage 3: Complete a self-directed project

Stage 1: Grasping Basic Concepts

Grasping basic concepts entails learning the relevant vocabulary, syntax, and key approaches. Often programs teach each concept distinctly, one at a time. For example, when learning a new programming language, you might learn the major commands and syntax rules, and for data science, you might learn about each of the most prominent machine learning models one at a time.

This is different from applying the concepts widely, and at this stage, you may not be able to handle mixing all the concepts together in a complex problem yet (that’s Stage 2). Programs often teach the material at this point sequentially (even though that can be difficult for nonlinear learners).

For example, W3Schools provides grounded Stage 1 teaching for most programming languages and data science skills. They provide sequential exercises working through the basic syntax components of a new language, ever so slightly increasing in complexity along the way.

Now, only performing the first stage does not entail a full mastery of topic. After practicing each piece one at a time, you must also transition into Stage 2 where you start to learn how to combine them when completing a more complex problem.

Stage 2: Guided Project

Here you practice putting all the pieces together through a guided project(s). This guided project is a model for how each of the components fit together in an actual project. I liken these to building a Lego kit: following step-by-step instructions to build a cool model (instead of building your own object from scratch, which is Stage 3). They hold your hand through its completion to illustrate what putting all of the isolated skills and concepts together during a complicated project would entail.

Stage 3: Independent Project

In the third stage, you bring everything have learned together to complete a project on your own. Unlike in Stage 2, when they held your hand, you now have the freedom to struggle, which is necessary to learn. You are developing the skills involved in forming and carrying out a project on your own.

At the same time, you are learning what it looks like to implement those skills “in the wild” of a real-life project. In the previous stages, instructors often coddle their students: providing cleaned and perfectly ready-to-do example problems that you might find in a textbook, necessary to learn the basic concepts. Like a Lego kit, the components of the project have been groomed to make what you are producing. In Stage 3, you often start to experience the types of messiness common in real-world projects, when you have to find the pieces you need and/or figure out how to make do with the ones what you have.

For example, among data science learners, this stage is when students first learn to deal with the complexities of finding the right data for their problem; determining the best questions for a given dataset; and/or cleaning inconsistent data. Beforehand, most examples probably had already cleaned data that matched the specific task they were built for.

A certain amount of trial by fire is often needed to learn how to develop your own project. Your instructor(s) might take a little more of a backseat role during this process, looking over what you have done, answering any questions you might have, and nudging you when necessary. In my experience, exploring strategies yourself is the best way to learn Stage 3. Hopefully, at the end of it all, you will produce a nifty project that you can show prospective employers or whoever else you might wish to impress.

Conclusion

These are the three most common stages to develop initial mastery of a new data or programming skill or field. Now, they are the skill levels generally necessary to learn the new skill, but there are plenty of further levels of learning after you complete these. For example, grasping basic data science concepts, completing a guided project, and learning how to conduct your own self-directed data science project would be enough to make you a new inductee into the data science community, but you would still be a newbie data scientist. It is only the tip of the iceberg for what you can learn and how you would grow as a data scientist.

Now, despite calling them stages, not everyone learns them in sequential order, especially given the variety of extenuating circumstances and learning styles. For example, some might complete all three stages for a specific subset of skills in the field they are learning, and then go back to Stage 1 for another subset. Most education programs will include all three stages, more or less in order.

Some education programs, however, might completely lack or provide insufficient resources for one or two stages. Assessing whether a program adequately includes all three can be an effective way to determine how good they are at teaching and whether they are worth your money and/or time. When choosing to learn a new skill, I would recommend a program or combination of programs that includes all three. If a program you want to do or are currently completing lacks one or two of these stages, you can try to find another (hopefully free) way to complete that stage yourself online. For example, online courses and tutorials very frequently fail to provide Stage 3 (and in some cases, Stage 2), so after you complete one, I would recommend finding a project to work on.

Finally, when you are encountering a difficulty learning, it might be because you need to go backwards to a previous stage. For example, when many learners move to Stage 2, they must periodically swing back into Stage 1 to review a few core concepts when they see those concepts applied in a new way. Similarly, when completing a project in Stage 3, there is nothing wrong with reviewing Stage 2 or even Stage 1 materials.  

Now, be careful because you can falsely attribute this. Learning anything can be frustrating. Sometimes the difficulties you are having are not rooted in the need to review or relearn past material, but you simply need to push through with the new material until you start to get it. In those cases, some students revert backwards into a set of material in which they can feel safe and confident instead of challenging themselves. Even in those cases, however, like rocking a car by going into reverse and then drive to get over a bump, quickly going backwards can help launch you forward over the hurdle. In such cases, what is most important is to know yourself – your learning tendencies and how you typically respond – and check in as much as you can with instructors and/or experts in the field who have been there and done that to help you determine the best ways to overcome whatever challenge you are having.  

Photo credit #1: Jukan Tateisi at https://unsplash.com/photos/bJhT_8nbUA0

Photo credit #2: qimono at https://pixabay.com/illustrations/cog-wheels-gear-wheel-machine-2125178/

Photo credit #3: Bonneval Sebastien at https://unsplash.com/photos/lG-6_ox_UXE

Photo credit #4: Holly Mandarich at https://unsplash.com/photos/UVyOfX3v0Ls

Photo credit #5: George Bakos at https://unsplash.com/photos/VDAzcZyjun8

What Is Ethnography: A Short Description for the Unsure

What is ethnography, and how has it been used in the professional world? This article is a quick and dirty crash course for someone who has never heard of (or knows little about) ethnography.

Anthropology at its most basic is the study of human cultures and societies. Cultural anthropologists generally seek to understand current cultures and societies by conducting ethnography.

In short, ethnography involves seeking to understand the lived experiences of a particular culture, setting, group, or other context by some combination of being with those in that context (called participant-observation), interviewing or talking with them, and analyzing what happens and what is produced in that context.

It is an umbrella term for a set of methods (including participant-observation, interviews, group interviews or focus groups, digital recording, etc.) employed with that goal, and most ethnographic projects use some subset of these methods given the needs of the specific project. In this sense, it is similar to other umbrella methodologies – like statistics – in that it encapsulates a wide array of different techniques depending on the context.

two woman chatting

One conducts ethnographic research to understand something about the lived experiences of a context. In the professional world, for example, ethnography is frequently useful in the following contexts:

  1. Market Research: When trying to understand customers and/or users in-depth
  2. Product Design: When trying to design or modify a product by seeing how people use it in action
  3. Organizational Communication and Development: When trying to understand a “people problem” within an organization.

In this article, I expound in more detail on situations where ethnographic research is useful in in professional settings.

Ethnographies are best understood through examples, so the table below include excellent example ethnographies and ethnographic researchers in various industries/fields:

Project Area
Computer Technology Development at Intel Market Research
Vacuum CMarket Research Examples Market Research
Psychiatric Wards in Healthcare Organizational Management
Self-Driving Cars at Nissan Artificial Intelligence
Training of Ethnography in Business Schools Education of Ethnography

These, of course, are not the only some situations where ethnography might be helpful. Ethnography is a powerful tool to develop a deep understanding of others’ experiences and to develop innovative and strategic insights.

Photo credit #1: Paolo Nicolello at https://unsplash.com/photos/hKVg7ldM5VU.

Photo credit #2: mentatdgt at https://www.pexels.com/photo/two-woman-chatting-1311518/.

What Is Data Science and Machine Learning? A Short Guide for the Unsure

 What is data science, and what is machine learning? This is a short overview for someone who has never heard of either.

What Is Data Science?

 In the abstract, data science is an interdisciplinary field that seeks to use algorithms to organize, process, and analyze data. It represents a shift towards using computer programing, specifically machine learning algorithms, and other, related computational tools to process and analyze data.

By 2008, companies starting using the term data scientists to refer to a growing group of professionals utilizing advanced computing to organize and analyze large datasets,[i] and thus from the get-go, the practical needs of professional contexts have shaped the field. Data science combines strands from computer science, mathematics (particularly statistics and linear algebra), engineering, the social sciences, and several other fields to address specific real-world data problems.

On a practical level, I consider a data scientist someone who helps develop machine learning algorithms to analyze data. Machine learning algorithms form the central techniques/tools around what constitutes data science. For me personally, if it does not involve machine learning, it is not data science.

What Is Machine Learning?

 Machine learning is a complex term: What to say that a machine “learns”? Overtime data scientists have provided many intricate definitions of machine learning, but its most basic, machine learning algorithms are algorithms that adapt/modify how their approach to a task based on new data/information overtime.

Herbert Simon provides a commonly used technical definition: “Learning denotes changes in the system that are adaptive in the sense that they enable the system to do the task or tasks drawn from the same population more efficiently and more effectively the next time.”[ii] As this definition implies, machine learning algorithms adapt by iteratively testing its performance against the same or similar data. Data scientists (and others) have developed several types of machine learning algorithms, including decision tree modeling, neural networks, logistic regression, collaborative filtering, support vector machines, cluster analysis, and reinforcement learning among others.

Data scientists generally split machine learning algorithms into two categories: supervised and unsupervised learning. Both involve training the algorithm to complete a given task but differ on how they test the algorithm’s performance. In supervised learning, the developer(s) provide a clear set of answers as a basis for whether the prediction is correct; while for unsupervised learning, whether the algorithm’s performance is much more open-ended. I liken the difference to be like the exams teachers gave us in school: some tests, like multiple choice exams, have clear, right and wrong answers or solutions, but other exams, like essays, are open-ended with qualitative means of determining goodness. Just like the nature of the curriculum determines the best type of exam, which type of learning to performs depends on the project context and nature of the data.

Here are four instances where machine learning algorithms are useful in these types of tasks:

  1. Autonomy: To teach computers to do a task without the direct aid/intervention of humans (e.g. autonomous vehicles)
  2. Fluctuation: Help machines adjust when the requirements or data change over time
  3. Intuitive Processing: Conduct (or assist in) tasks humans do naturally but are unable to explain how computationally/algorithmically (e.g. image recognition)
  4. Big Data: Breaking down data that is too large to handle otherwise

Machine learning algorithms have proven to be a very powerful set of tools. See this article for a more detailed discussion of when machine learning is useful.


[i] Berkeley School of Information. (2019). What is Data Science? Retrieved from https://datascience.berkeley.edu/about/what-is-data-science/.

[ii] Simon in Kononenko, I., & Kukar, M. (2007). Machine Learning and Data Mining. Elsevier: Philadelphia.

Photo credit #1: Frank V at https://unsplash.com/photos/zbLW0FG8XU8

Photo credit #2: Brett Jordan at https://unsplash.com/photos/HzOclMmYryc

Recently Published Article: “Anthropology by Data Science”

tea set and newspaper placed on round table near comfortable chair
Photo by Ekrulila on Pexels.com

I am pleased to announce that the Annals of Anthropological Practice has accepted my article “Anthropology by Data Science.” https://anthrosource.onlinelibrary.wiley.com/doi/10.1111/napa.12169. In it, I reflect on the relationship anthropologist have cultivated with data science as a discipline and the importance of integrating machine learning techniques into ethnographic practice.

Annals of Anthropological Practice is overseen by the National Association for the Practice of Anthropology (NAPA) within the American Anthropological Association. Thank you, NAPA, for publishing my article and thank you to all the unnamed editors and reviewers in the process.

Interdisciplinary Anthropology and Data Science Master’s Thesis: A Quick and Dirty Project Summary

This is a quick and dirty summary of my master’s practicum research project with Indicia Consulting over the summer of 2018. For anyone interested in more detail, here is a more detailed report, and here is the final report with Indicia. 

Background

My practicum was the sixth stage of a several year-long research project. The California Energy Commission commissioned this larger project to understand the potential relationship between individual energy consumption and technology usage. In stages one through five, we isolated certain clusters of behavior and attitudes around new technology adoption – which Indicia called cybersensitivity – and demonstrated that cybersensitivity tended to associate with a willingness to adopt energy-saving technology like smart meters.

This led to a key question: How can one identify cybersensivity among a broader population such as a community, county, or state? Answering this question was the main goal of my practicum project.

In the past stages of the research project, the team used ethnographic research to establish criteria for whether someone was a cybersensitive based on several hours of interviews and observations about their technology usage. These interviews and observations certainly helped the research team analyze behavioral and attitudinal patterns, determine what patterns were significant, and develop those into the concept of cybersensitivity, but they are too time- and resource-intensive to perform with an entire population. One generally does not have the ability to interview everyone in a community, county, or state. I sought to address this directly in my project.

TaskTimelineTask NameResearch TechniqueDescription
Task 1June 2015-Sept 2018General Project TasksAdministrative (N/A)Developed project scope and timeline, adjusting as the project unfolds
Task 2July 2015 – July 2016Documenting and analyzing emerging attitudes, emotions, experiences, habits, and practices around technology adoptionSurveyConducted survey research to observe patterns of attitudes and behaviors among cybersensitives/awares.
Task 3Sept 2016 – Dec 2016Identifying the attributes and characteristics and psychological drivers of cybersensitivesInterviews and Participant-ObservationConducted in-depth interviews and observations coding for psych factor, energy consumption attitudes and behaviors, and technological device purchasing/usage.
Task 4*Sept 2016 – July 2017Assessing cybersensitives’ valence with technologyStatistical AnalysisTested for statistically significant differences in demographics, behaviors, and beliefs/attitudes between cyber status groups
Task 5Aug 2017 – Dec 2018  Developing critical insights for supporting residential engagement in energy efficient behaviorsStatistical AnalysisAnalyzed utility data patterns of study participants, comparing it with the general population.
Task 6March 2018 – Aug 2018Recommending an alternative energy efficiency potential modelDecision Tree ModelingConstructed decision tree models to classify an individual’s cyber status

Project Goal

The overall goal for the project was to produce a scalable method to assess whether someone exhibits cybersensitivity based on data measurable across an entire population. In doing this, the project also helped address the following research needs:

  1. Created a method to further to scale across a larger population, assessing whether cybersensitives were more willing to adopt energy saving technologies across a community, county, or state
  2. Provided the infrastructure to determine how much promoting energy-saving campaigns targeting cybersensitives specifically would reduce energy consumption in California
  3. Helped the California Energy Commission determine the best means to reach cybersensitives for specific energy-saving campaigns

The Project

I used machine learning modeling to create a decision-making flow to isolate cybersensitives in a population. Random forests and decision trees produced the best models for Indicia’s needs: random forests in accuracy and robustness and decision trees in human decipherability. Through them, I created a programmable yet human-comprehensible framework to determine whether an individual is cybersensitive based on behaviors and other characteristics that an organization could be easily assess within a whole population. Thus, any energy organization could easily understand, replicate, and further develop the model since it was both easy for humans to read and encodable computationally. This way organizations could both use and refine it for their purposes.

Conclusion

This is a quick overview of my master’s practicum project. For more details on what modeling I did, how I did it, what results it produced, and how it fit within the wider needs of the multi-year research project, please see my full report.

I really appreciated the opportunity it posed to get my hands dirty integrating ethnography and data science to help address a real-world problem. This summary only scratches the surface of what Indicia did with the Californian Energy Commission to encourage sustainable energy usage societally. Hopefully, though, it will inspire you to integrate ethnography and data science to address whatever complex questions you face. It certainly did for me.

Thank you to Susan Mazur-Stommen and Haley Gilbert for your help in organizing and completing the project. I would like to thank my professorial committee at the University of Memphis – Dr. Keri Brondo, Dr. Ted Maclin, Dr. Deepak Venugopal, and Dr. Katherine Hicks – for their academic support as well.

Evaluating the Effectiveness of Part of Speech Augmentation in Next Word Predictors

The following was a project I completed for a graduate course in Artificial Intelligence I took at the University of Memphis in the spring of 2019. For the project, I analyzed whether part of speech evaluation could modulate Markov Chain-based next word predictors. In particular, I developed and tested two different strategies for incorporating part of speech predictions, which I termed excluder and multiplier. The multiplier method performed better than the excluder and matched the performance of the control. Hopefully, this is a helpful exploration into ways to use lexical information to improve next word predictors.

Loader Loading…
EAD Logo Taking too long?

Reload Reload document
| Open Open in new tab

Download

Photo Credit: Brett Jordan from https://unsplash.com/photos/EvJ7uvqQb3E

The Anthropology of Machine Learning

In the spring of 2018, I researched how anthropologists and related social scholars have analyzed data science and machine learning for my Master’s in Anthropology at the University of Memphis. For the project, I assessed the anthropological literature on data science and machine learning to date and explore potential connections between anthropology and data science, based on my perspective as a data scientist and anthropologist. Here is my final report.

Thank you, Dr. Ted Maclin, for your help overseeing and assisting this project.