How to Analyze Texts with Data Science

flat lay photography of an open book beside coffee mug

A friend and fellow professor, Dr. Eve Pinkser, asked me to give a guest lecture on quantitative text analysis techniques within data science for her Public Health Policy Research Methods class with the University of Illinois at Chicago on April 13th, 2020. Multiple people have asked me similar questions about how to use data science to analyze texts quantitatively, so I figured I would post my presentation for anyone interested in learning more.

It provides a basic introduction of the different approaches so that you can determine which to explore in more detail. I have found that many people who are new to data science feel paralyzed when trying to navigate through the vast array of data science techniques out there and unsure where to start.

Many of her students needed to conduct quantitative textual analysis as part of their doctoral work but struggled in determining what type of quantitative research to employ. She asked me to come in and explain the various data science and machine learning-based textual analysis techniques, since this was out of her area of expertise. The goal of the presentation was to help the PhD students in the class think through the types of data science quantitative text analysis techniques that would be helpful for their doctoral research projects.

Hopefully, it would likewise allow you to determine the type or types of text analysis you might need so that you can then look those up in more detail. Textual analysis, as well as the wider field of natural language processing within which it is a part of, is a quickly up-and-coming subfield within data science doing important and groundbreaking work.

Photo credit: fotografierende at https://www.pexels.com/photo/flat-lay-photography-of-an-open-book-beside-coffee-mug-3278768/

Evaluating the Effectiveness of Part of Speech Augmentation in Next Word Predictors

The following was a project I completed for a graduate course in Artificial Intelligence I took at the University of Memphis in the spring of 2019. For the project, I analyzed whether part of speech evaluation could modulate Markov Chain-based next word predictors. In particular, I developed and tested two different strategies for incorporating part of speech predictions, which I termed excluder and multiplier. The multiplier method performed better than the excluder and matched the performance of the control. Hopefully, this is a helpful exploration into ways to use lexical information to improve next word predictors.

Loader Loading…
EAD Logo Taking too long?

Reload Reload document
| Open Open in new tab

Download

Photo Credit: Brett Jordan from https://unsplash.com/photos/EvJ7uvqQb3E