For my first interview in the Interview Series, I interviewed Astrid Countee. She is a business anthropologist and technologist with a background in anthropology, software engineering, and data science. She currently works as a user researcher at the peer-to-peer distributed company Holo, as a research associate at The Plenary, as an arts and education nonprofit, and as a co-founder of Missing Link Studios which distributes the This Anthro Life podcast.
If the audio does not play on your computer, you can download it here:
Aspiring data scientists will frequently ask me for recommendations about the best way to learn data science. Should they try a bootcamp or enroll in an online data science course, or any of the myriad options out there?
In the last several years, we have seen the development of many different types of educational programs that teach data science, ranging from free online tutorials to bootcamps to advanced degrees at universities, and the pandemic has seemed to have fostered the establishment of even more programs to meet the increased demand for remote learning. Although probably overall a good thing, having more options increases the complexity of deciding which one to do and the potential noise of programs upselling their services.
This article is a high-level survey of the four basic types of data science education programs to help you think about which might work best for you. Without already knowing data science, it can be difficult to assess how effective a program is at teaching it. Hopefully, this article will help break that chicken-and-the-egg conundrum.
These are the four basic ways to learn data science:
Do-it-yourself learning
Online courses
Bootcamps
Master’s degree or other university degree in data science (or related field)
I will discuss them in order from the cheapest to most expensive. I also included two hybrid strategies that combine a few of these that are worth considering as well. This table provides a quick, high-level synopsis of each one:
Option 1: Do-It-Yourself Online
There are tons of free, online data science resources that can either teach data science from scratch or explain just about any data science content you could possibly want to know. These range from tutorials for those who learn by doing like W3Schools, videos on YouTube and other sites for audio learners like Andrew Ng’s YouTube series, articles for visual learners who enjoy reading like Towards Data Science. You could scour the internet and teach yourself. It has the pros of being free and perfectly flexible to tailor to your schedule.
But as a former teacher, I have found independent learning is not for everyone. You must be entirely self-motivated and self-structured to teach yourself like this. So, know yourself: are you the type of person who could learn well completely independently like this?
Education programs tend to provide these resources that you might lack if you went it alone:
1) Curriculum Oversight: Data science experts in any education program generally establish some kind of data science curriculum for you that includes the necessary topics in the field. Many people who are new to data science do not know yet what data science concepts and skills are most important to learn about. This can create a chicken and egg problem for self-learners who must learn the field at least a little to know the most important items to learn in the first place. Data science programs help circumvent this by giving you an initial curriculum to started with.
2) Guidance of the Norms of the Field: In addition to the teaching the material, education programs implicitly introduce students to data science norms and ways of thinking. Even though there are times to deviate from the established custom, they are important when first working on teams with fellow data scientists. Sometimes self-learners learn the literal material but do not gather the implicit perspectives that enables their incorporation into the data science community.
3) External Social Accountability: Education programs provide a form of social accountability that subtly encourages you to get the work done. Self-learners must rely almost exclusively on their own self-motivation and self-accountability, which, in my experience, works for some people but not others.
4) Social Resources: Education programs (especially ones that meet either in-person or virtually) provide various people – teachers, students, and in some cases mentees/underlings – with whom one can talk through problems with, help you discover your weaknesses and shortcomings, and determine ways to address them. Minute programming details that are easily overlooked by beginners, but experts might easily spot can cause your entire program to fail. To learn independently, you will have to either solve all of these yourself or find data science friends or family who are willing to help you.
5) Certification of Skills: Education programs bestow degrees, grades, and other certifications as external proof that you do, in fact, possess the requisite skills in a data science role. Learning on your own, you must prove that you have these skills to employers by yourself. Developing a portfolio of thought-provoking projects, you have done is the best way to demonstrate this.
6) Guidance in Forming Projects: An impressive project works wonders for showcasing your data science skills. In my experience, beginners to data science often do not yet possess the skills to create, complete, and market a thought-provoking yet doable project, and one of the most important roles data science educators can have is helping students think through how to develop one. You must do this yourself when learning alone.
One can overcome each of these deficits. I have found that for people who learn well independently, its cost and flexibility advantages easily outweigh these cons. Thus, the crucial question is, Would this form of independent learning work for you? In my experience, it works for a comparatively small percent of people, but for those it works for, it is a great option.
If you do decide to teach yourself, I would recommend considering the following:
1) Be conscientious about your learning style when crafting your material. For example, if you are a visual learner, then reading online material resources would be best, but if you are more of an auditory learner, then I would recommend watching video tutorials/lectures on say YouTube.
2) If you have data science friends willing to help you, they can be a great asset, particularly in determining what data science materials to learn, troubleshooting any coding issues you might have, and/or developing a good project(s).
3) People in general learn data science best by doing data science. Avoid the common trap of only reading about data science without getting your hands dirty and experimenting yourself (preferably with unclean, annoying, real-world data, not already trimmed, “textbook perfect” data). Using pristine data to first learn the concepts is fine, but make sure you graduate yourself to practicing with real-life dirty data.
Option 2: Online Course
A variety of online courses exist. Most of them are relatively cheap (usually around $20-$50 a month or $100-$200 per course). For example, at the time of writing this, Udemy has an introductory data science course for a flat rate of $94.99, and Coursera a course for $19.99 a month (both with prices varying based on discounts and other special deals). Online courses are generally the cheapest of the courses you can enroll in, and because of the length of most, you will probably have to take several levels of courses (introductory to advanced) to learn the field.
Another advantage is that they are flexible: You can learn at your own pace, based on the needs of your schedule. This is really valuable for people who also working a job and studying on the side, with family commitments, and/or other obligations complicating their schedules. Keep in mind, though, that because you often pay per month, how many months you take often dictates the final cost. At the end of the day, spending an extra $100 or so to take a few more months to complete the course is still much cheaper than the other course options.
On the other hand, however, like doing it yourself, they tend to lack the social benefits of classroom learning: instructors to ask questions to and provide external social accountability, and fellow students to work alongside. In my experience, this makes it a very challenging for some learners, but others are not as comparatively affected by it.
In addition, many online courses provide more of a cursory summary of data science and lack the complex projects that are both necessary to learn data science and to market yourself to others. Even though there are exceptions, online courses are often good at introducing data science concepts rather than an in-depth exploration. Many focus on canned problems with already cleaned, ready-to-do data instead of letting you practice on the messy, complex, and often just plain silly data most data scientists actually have to use at their jobs. They also often lack the personnel for one-on-one coaching to mentor each student through portfolio-building projects with complex data.
Thus, online courses tend to provide good, cost-effective introductions to data science, helpful to see whether you like the field (see Hybrid #1 below), but do not generally provide the refined training necessary to become a data scientist. Now, some programs are evolving their courses. Especially as the pandemic increases demand for remote learning, online learning platforms are developing more robust online data science courses. If you choose to learn by taking online courses, I recommend supplementing it with your own projects to get experience practicing data science work and showcase in job interviews.
Hybrid #1: Use an Online course to Introduce Data Science (or Programming)
If you are completely new to data science, an online course can provide a low-cost, structured space to get a sense for what the field entails and determine whether it is a good fit for you. I have seen many people enroll in several thousand-dollar bootcamps or university degree programs only to learn there that they do not like doing data science work. An online course is a much cheaper space to discern that.
You could always explore data science yourself for free to decide whether you like it (see Option 1) instead of taking an online course, but I have found that many people who have never seen data science before do not know what to look up in the field to get started. An introductory online course is not that expensive, and the initial orientation into the major topic areas can be well worth the cost.
There are three basic versions of this approach:
1) If you do not already know a programming language, take an online programming course. I explained in this article why I would recommend Python as the language to learn (with Julia as a close second). If you do not like programming, then you have learned the lesson that you should not become a data scientist, and even if you do not end up in data science, programming is such a valuable skill that having some training in it will only help your occupational prospects in most other related fields.
2) If you do know a programming language, take an introductory data science course. These often provide a high-level overview of data science, especially helpful for people who need to work with data scientists and understand what they are talking about. If you need a math refresher, this is a great option as well.
3) I have seen prospective data scientists take online data analytics courses to prepare them for and determine their potential interest in data science. I would not recommend this, however. Even though data scientists will sometimes treat data analytics as a “diet” or “basic” version of data science, data analytics is different field requiring different skills. For example, data analytics courses typically do not include the rigorous programming. They generally focus on R and SQL if they teach programming at all, which are fine languages for data analytics and statistics but not enough for data science (for which you would want a language like Python). Data analytics and data science also generally emphasize different fields of math: data analytics tends to rely on statistics while data science on linear algebra, for example. Thus, what you would learn in those courses would not apply to data science as much as you would think. Now, if you are unsure of whether you would like to become a data scientist or data analyst, then a data analytics course might help you understand and get a feel for data analytics, but I would not use them to assess whether data science is a good fit for you.
Once you complete the online course, if you still think you would enjoy doing data science work, then you can choose any of the options to learn the field in more depth. This may seem like just getting you back to square one, but by taking an introductory programming or data science course, you have levelled yourself up so to speak and are more ready to face the “boss battle” of becoming a data scientist.
Option 3: Data Science Bootcamps
Data science bootcamps have also become popular. They tend to be several weeks long (in my experience often ranging from 2 to 6 months) intensive training programs. The traditional pre-pandemic bootcamp was in-person and would often cost around $10,000 to $15,000. Metis’s bootcamp is a good example of what they often look like.
Their biggest pros are that they offer the advantages of classroom education far more cheaply and in much less time than getting a university degree. They are a significant step-up cost-wise than the previous options (see Con 2 below), but they seek to provide a comparable (but less academically advanced and in-depth) scope of knowledge as a master’s degree in data science for a significantly lower price and in a fraction of the time. Even though it can often make their pace feel intense, the good bootcamps tend to mostly succeed at providing this. This makes them a great option for anyone who knows they want to become a data scientist. Finally, unlike the previous options, you get a teacher(s) to ask questions to and motivate you, and a set of fellow students to struggle through concepts with. The best programs offer the occupational coaching and build strong networks in data science communities to help their students find jobs afterwards.
They have some major cons, however:
1) They can feel fast-paced, unloading complex concepts in a short amount of time. Many of my friends who have done bootcamps have reported feeling cognitive whiplash. Expect those weeks/months to be mentally intense and to subsume your life. Data science bootcamps are often 9-5 full-time jobs during that time, and you will likely be too mentally exhausted to work on other things in the evenings or weekend (plus in some cases you will have homework to complete then anyways). A few weeks or months is not terribly long for such an ordeal, but it makes them much less flexible than the previous options. For example, this forces many students to take time from their current jobs to complete the bootcamp and to limit their social, familial, and other obligations as much as they can during their bootcamp. This makes it difficult for anyone unable to take time off work, with busy social or familial lives, or otherwise with a lot going on.
2) At several thousand dollars, they are clearly noticeably more expensive of the than the previous options (but still much cheaper than universities). Some offer scholarships and other services on a need-basis, but even then, the opportunity cost of having to put a job on hold can still be expensive. Given their general high salaries, landing a data science job would likely make the money back, but it takes a hefty initial investment.
This makes it an especially poor option for anyone thinking about data science but not sure whether they want to do it. $10,000 is a lot to spend to simply learn you do not like the field, and there are many cheaper ways to initially explore the field (see especially Hybrid #1). The cost still might be worth it, however, for anyone who really wants to become a data scientist but does not yet possess key skills and knowledge.
3) At the time of writing this, the Covid-19 pandemic has forced most data science bootcamps to meet remotely anyways, making their services far more similar to the much cheaper online courses. That said, many have sought to simulate the classroom environment virtually, trying to provide some type of social environment, but the classroom environment was a major advantage that made their significant increase in costs over the previous options worthwhile.
4) They tend to exist in large cities (especially tech centers). For example, bootcamps in the United States tend to concentrate in New York City, Los Angeles, Chicago, San Francisco, etc. Prior to the pandemic, anyone not living in those places would have to travel and temporarily reside in wherever their chosen bootcamp was, an additional expense.
5) They are often difficult for people who do not know programming and for those who do not know college-level mathematics like linear algebra, calculus, and statistics. If you do not know programming, I would recommend learning a programming language like Python (for more see this article I wrote explaining why to learn Python of all languages) through either a cheap online course and/or online tutorials first. Some data science bootcamps offer a preparatory introduction online course that teaches the prerequisite coding and math skills for those who do not understand it. They are worth consideration as well, but keep in mind the equivalent online course might be cheaper with roughly the same educational value.
If you decide to do a bootcamp, these criteria are important when researching which bootcamp to choose:
1) Project Orientation: How well do they enable you to practice data science through portfolio-building projects, and how impressive are the projects its alum did? The best data science bootcamps are generally teach in a project-oriented fashion.
2) Job-Finding Resources and/or Job Guarantee: What resources or coaching do they give to help you find a job afterwards? Help networking, presenting yourself, and interviewing, for example, are important skills to finding a job as a data scientist, and in addition to teaching you technical curriculum, the best programs tend to find occupational coaches to help specifically with the job-finding process. Also, some programs give a job guarantee: if you do not find a data science job after a certain number of months after graduating then they refund tuition. This generally shows they take job finding important enough to risk their own money on it (although do check at the fine print on the guarantee to see the exact terms they are agreeing to).
3) Alum Resources: A surprisingly import detail to consider is how much resources a bootcamp invests in cultivating alumni networks. I was surprised by how receptive to meeting/networking alum of the online bootcamp I did, and how satisfied alum tend to be with the bootcamp. The effort a bootcamp makes to work with and maintain relationships with its alum impact this significantly. Connectedness with alum can be difficult to assess when researching programs from afar, but asking whether you can speak with alum(s) to learn about their experiences with the program, checking a bootcamp’s alum activity on LinkedIn and other social media websites, and asking about what kind of networking opportunities with alum they facilitate can be great ways to assess how intentional a program is about cultivating relationships.
4) Scholarship Options: Some programs offer full or at least partial scholarships based on need. Clearly, ways to knock down the cost of the bootcamp would be great, especially if a bootcamp seems like an ideal option for you, but the cost seems too daunting.
Hybrid #2: Online Bootcamp
Online bootcamps tend to possess the schedule flexibility of online courses but offer more rigorous, personal (albeit remote) learning, allowing you to combine the best of aspects of data science bootcamps and online programs. They are also generally cheaper than traditional bootcamps (yet also more expensive than an online course). Finally, they tend to be a much better option for those who do not live in a major city that happens to have a local data science bootcamp program. The pandemic, if anything, has probably helped produce even more online bootcamp programs, since it has forced data science bootcamps to teach virtually.
I enrolled in Springboard’s online data science bootcamp in 2017, a great example of an online bootcamp. At the time, they cost roughly $1,000 a month (at the time of writing their standard rate is $1,490 a month and state their program generally takes six months). This is cheaper than traditional bootcamps but still a few totaling around $10,000 for six months. They had online curriculum typical of online courses but also provided weekly virtual meetings with an instructor to discuss the material and any issues you are having. Now they seem to include virtual lessons online. This individualized training and remote classroom environment are the main value adds over an online course, and you must assess whether, for you, they would be worth the additional cost. They are self-paced, providing much greater flexibility on when and how often you work than typical bootcamps. They also refunded your money if you did not find a job in six months after completion.
If you choose this option, be aware of the potential pitfalls of both online courses and traditional bootcamps. Just like with online programs, you will need to evaluate whether you are comfortable learning the curriculum by yourself (even you can meet with a mentor for major issues once a week, you would be doing the bulk learning by yourself throughout the week). Like with traditional bootcamps, expect the learning to be mentally intense and make sure they help you develop portfolio-building projects and provide job-finding resources and training.
Option 4: Master’s Degree or Other University Degree
The final option is to go back to school to get a degree in data science. This is the most expensive and time-consuming option: a master’s degree (a logical choice if you already have a bachelor’s) is generally the shortest, taking two years. But they cost upwards of $100,000. Even if partial or full scholarships decrease that cost, the opportunity cost of spending several years of your life in school is still higher than any of the other options. It can give a resume boost, however, if you know how to leverage it properly, which will likely increase your salary to make up for the initial cost. I would only recommend getting a master’s degree if you already know you love data science (say because you have already been working in the field, preferably if you also have already figured out the specific area of data science you want to do) but want to take your skills, technique, and/or theoretical knowledge of how the models work to the next level.
The best way to refine your data science skills is by doing data science: finding or creating contexts to push you as you practice data science. Graduate schools are not the only potential environment to refine one’s data science skills (e.g., all the previous options could involve that if done well), and even though graduate schools can be great at providing rigor, these other options can be a lot cheaper and more flexible. Finally, at the time of writing this, at least, the demand for data scientists exceeds the number of actual people in the field, and so getting a data science job without an “official” university degree in data science is pretty realistic.
University data science degree programs are relatively new – generally only a few years old. Thus, not all universities have literal data science degrees or departments but instead require that you enroll in a related program like computer science, statistics, or engineering to learn data science. This does not always mean these other programs are bad or unhelpful, but it often means you will have to perform extraneous or semi-extraneous tasks to data science proper in order to complete your degree (in some cases with minimal help from faculty from other fields).
When considering a program, you should make sure they are proactive about teaching professional and not just academic data science skillsets. These are the specific questions I would research to assess how well they might prepare you for non-academic data science jobs:
1) What proportion of their faculty currently work or at least have worked in the industry as a data scientist (or other similar job title)?
2) How well connected is the department with local organizations, and might they be able to leverage these relationships to help you work with these organizations through a work-study program or internship during the program and/or employment afterwards?
3) Will they help you build – or at least give you the flexibility to build – one’s thesis into an applied data science project that would boost your resume to future employers?
If your chosen program lacks these, I would strongly recommend building resume/portfolio-boosting projects and networking with local data scientists on the side while completing the program. This takes considerable time and energy, so ideally your department would actively help you in this work, instead of requiring that you do it on your own while also completing all their work.
Funding options is something else to consider. Are they willing to fund your degree fully or at least partly? Work-study programs where you work while getting your master’s can be a great way to graduate with no debt and gain resume-building work experiences (although they can make you busy). I benefitted greatly from working as a data scientist while completing my master’s, both because I graduated with no debt and because it allowed me to practice and refine my skills.
Finally, most universities require that you live nearby and attend physically (at least before and likely after the pandemic). Thus, you might have to find a place near you or be willing to relocate for a few years if there is not a data science degree program nearby. If so, you should factor moving expenses into the cost of doing the program.
Conclusion
Learning data science can be an awesome yet daunting prospect, and finding the right strategy for you is complicated, particularly given all the pedagogical, logistical, and financial considerations. Hopefully, this article has helped you think through how to journey forward.