Part 12 of The series where I interview my heroes.

You can find me on twitter @bhutanisanyam1

During the past few interviews, I’ve had the chance of interacting with Kaggle Grandmasters, Technical Leaders, Practitioners, and Two Distinguished Researchers and an OpenAI Fellow.

Today, I’m super excited to be interviewing one of my Role Models and gurus: Dr Rachel Thomas.

Rachel is Co-Founder and researcher at Fast.ai, Assistant Professor at The Data Institute, USF.

She holds a Ph.D. in Math from the Duke University.

Dr Rachel Thomas

About the Series:

I have very recently started making some progress with my Self-Taught Machine Learning Journey. But to be honest, it wouldn’t be possible at all without the amazing community online and the great people that have helped me.

In this Series of Blog Posts, I talk with People that have really inspired me and whom I look up to as my role-models.

The motivation behind doing this is, you might see some patterns and hopefully you’d be able to learn from the amazing people that I have had the chance of learning from.


Sanyam Bhutani:​ Hello Rachel, Thank you so much for taking the time to do this interview.

Dr. Rachel Thomas: Thanks for having me! I have enjoyed reading the other interviews in the series.

Sanyam Bhutani:​ You’ve worked as a Data Scientist at Uber, you hold a Ph.D. in Math and are currently a researcher at one of the most ‘uncool’ non-profit research lab, fast.ai

Can you tell us when did Deep Learning first come into the picture, What got you interested in Deep Learning at first?

Dr. Rachel Thomas: I first got interested in deep learning in 2013 when people were starting to use it to win Kaggle competitions. I was already working in machine learning, and I could see that there was a lot of potential for deep learning in solving practical problems. At the time, the field felt very exclusive and it was hard to find practical information. Practical Deep Learning for Coders is the resource that I wish had existed 5 years ago for my younger self.

Sanyam Bhutani: Could you tell us more about your role at fast.ai and how does a day at fast.ai look like?

Dr. Rachel Thomas: This varies a ton week to week depending on what events are going on. Things I spend time on include: the deep learning courses, preparing for speaking engagements, writing, random administrative tasks, or other teaching (such as my computational linear algebra course).

Sanyam Bhutani: Fast.ai has really democratised Deep Learning for everyone globally. Could you share one or maybe a few stories of your students that you’re really proud of?

Dr. Rachel Thomas: There are so many fast.ai students and alum doing awesome work! This list includes just a tiny fraction of them:

  • Alexandre Cadrin, a radiologist, took 1st place in a Kaggle competition recently!
  • Christine Payne, who formerly worked on supercomputers and as a classical pianist, was chosen as an OpenAI scholar after the fast.ai course, and she created a great neural network music generator.
  • A participant in the current course, Alena Hurley, has achieved the state of the art in classifying the primary site of origin for metastasized cancer.
  • Karthik Mahadevan, an industrial designer in Amsterdam, previously developed a smartphone-based device that identifies malaria in magnified images of blood smears as part of his work in rural Uganda. Since taking the fast.ai course, he has built and launched envision, an app to help the visually impaired read text in their native dialect, explain scenes that camera captures in detail, and recognise faces of friends and family.
  • Reshama Shaikh created many helpful resources while taking the course, and is active in the data science community as an organizer for NYC WiMLDS and NYC PyLadies, a board member of WiMLDS, and a member of the NumFocus D&I in scientific computing committee.
  • It is also fantastic how many people have participated in the Language Model Zoo, including teams that achieved the state of the art for Thai, Polish, German, Indonesian, Hindi, and Malay.

Again, this is just a small subset of all the students and alum that we are so proud of!

Sanyam Bhutani: I’m super excited about the new fast.ai DL MOOC, being a student of Fast.ai v3-live. Could you tell the readers what’s next for Fast.ai? 
You’ve already made cutting edge research very uncool. What’s next?

Dr. Rachel Thomas: Our goal is to keep making deep learning easier and easier to use, while simultaneously delivering better and better results. For instance, in version 3 of the course (going on now), we had people deploying web apps with their models after just a week or two of the course. This was certainly not the case the 1st time we taught the course, and is possible in part because the underlying technology, including the fastai library, has improved so much. Eventually we want to get to the point where even non-coders can effectively apply deep learning.

Most people associate fast.ai primarily with our free Practical Deep Learning for Coders course, but our research and software are also key components of our work.

Sanyam Bhutani: I have to confess: As much as I’m a fan of the Top Down approach. Initially I found it difficult to follow fast.ai, I would spend too much time reading theory which would indeed be later taught by Jeremy in another lecture.

Most of us have been taught in a bottom up manner our entire student life, How can we adapt better to the “Top Down” approach?

Dr. Rachel Thomas: This is a good question! For those unfamiliar with the concept, math is traditionally taught in a “bottom up approach,” in which you have to learn each individual item you’ll be using before you can eventually combine them into something interesting, but many students lose motivation or drop out along the way. In contrast, areas like sports or music are often taught in a “top-down” way in which a child can enjoy playing baseball, even if they don’t know many of the formal rules. Children playing baseball have a general sense of the “whole game”, and learn the details later, over time. We use this top-down approach at fast.ai to get people using deep learning to solve problems right away, and then we teach about the underlying details later as time goes on. Our approach was inspired by Harvard professor David Perkins and mathematician Paul Lockhart.

I still find myself defaulting into a “bottom-up” approach sometimes, because it’s such a habit after 2 decades of traditional schooling. Using something when we do don’t understand the underlying details can feel uncomfortable, and I think the key is to just accept that discomfort and do it anyway.

Sanyam Bhutani: You’re also very vocal about Ethics and diversity in AI. Could you share a few things that we must focus on and a few things that we must avoid when building Software 2.0?

Dr. Rachel Thomas: This topic is so important to me, as we are seeing negative consequences of tech showing up in everything from Facebook’s role in the genocide in Myanmar, to how YouTube has disproportionately been used to radicalize white supremacists.

Briefly, a few things to consider are:

For those who are interested in learning more, here are a few of the talks and blog posts I’ve created on the topic:

Sanyam Bhutani: I also want to ask about your thoughts on AutoML: Do you think we’ll become obsolete and AutoML will eventually automate part of a data scientist’s toolbox or even the complete job?

Dr. Rachel Thomas: I think that we are already starting to automate parts of a data scientist’s toolbox, and that this can be a positive. Automated tools such as spell check and SwiftKey in other domains have been very useful!

As I wrote in my series on AutoML, I think that it is an incorrect focus to try to create products that completely automate data science (in part, because such attempts invariably miss important components), but that we should instead think of Augmented ML. Whereas AutoML is often focused on the goal of complete automation, the focus of augmented ML is on figuring out how a human and machine can best work together to take advantage of their different strengths. An example of augmented ML is Leslie Smith’s learning rate finder (paper here). The learning rate finder (a chart you look at to determine a good learning rate) is faster than AutoML approaches to the same problem, improves the data scientist’s understanding of the training process, and encourages more powerful multi-step approaches to training models.

I believe in all industries, tools are being created to allow workers to be more efficient. This can be good, when it entails automating work that humans find tedious or difficult. However, it is and will continue to have an impact on the number of jobs, since greater efficiency often allows for a smaller number of workers. I believe that societal and policy solutions (such as re-introducing competition, enforcing antitrust laws, addressing negative externalities, protecting human rights, a negative income tax, and universal basic income) are needed to address this.

Sanyam Bhutani: Why do you think that even though Math is the backbone of DL, it gets much less attention when compared to “ML”. How can we make “Math Uncool”?

Dr. Rachel Thomas: Ha! Unfortunately I think math is already “uncool”, only in a bad way. There are so many problems with how math is taught in the USA and many other countries, as well as harmful and false cultural beliefs. For instance, math is often taught in a very vertical way, with each year building on the previous. If a student has a bad teacher or bad experience one year, often there’s no way for them to catch up in future years, and many people get turned off to math permanently.

There is a widespread myth that some people’s brains just aren’t wired the right way for math, or that someone may just be “not a math person”. All the scientific evidence is against this, yet it can become a self-fulfilling prophecy for people that believe it.

Also, there are a lot of fun and useful areas of math that aren’t taught until after most people have dropped out of the field — such as discrete math, combinatorics, linear algebra, and groups, rings, & fields. These areas all have a very different “flavor” from the calculus sequence, and I think it’s too bad that most schools require students to get through a few semesters of calculus first, as opposed to letting students dabble in a variety of interesting areas.

I highly recommend everyone read Paul Lockhart’s essay, “A Mathematician’s Lament”. He talks about a nightmare world where children are not allowed to sing or make music until graduate school, after having spent their childhoods transcribing sheet music by hand. This is what we do with math. Anything that adds more patterns, playfulness, & creativity back to math education is a good thing.

Sanyam Bhutani: How do you stay up to date with the cutting edge?

Dr. Rachel Thomas: There is an overwhelming amount of research happening in the field, so I think it is impossible to stay up to date on everything. The main way I keep up is via Twitter. If you are new to Twitter (or perhaps a bit skeptical about Twitter, like I used to be), I wrote some tips for getting started, and Radek Osmulski, a fast.ai alum and Kaggle winner, has some great advice here. I also subscribe to several newsletters, including Sebastian Ruder’s NLP newsletter, Jack Clark’s Import AI, Data & Society, and the Berkman Klein Center Buzz.

Sanyam Bhutani: What are your thoughts about the Machine Learning Hype?

Dr. Rachel Thomas: There really is a huge amount of potential for machine learning to have an impact, so in some aspects the hype is reasonable. Where hype becomes harmful is when companies make misleading or exaggerated promises of what their products are capable of. Not only is this bad for people that purchase “snake oil”, but it can cause people to write off the entire field of machine learning, which is bad for everyone. Major tech companies are often at least partly to blame for misleading hype in their marketing.

As an example, hype around IBM’s Watson was harmful for MD Anderson in spending millions on an unfruitful partnership, for patients who falsely believed that Watson would cure their illness, and for our field when people concluded AI is only hype (and IBM should bear most of the blame for making exaggerated promises in that case). I’ve also repeatedly been critical of Google’s marketing — there are enough exciting things going on at Google that they shouldn’t need to exaggerate or oversell their achievements.

Sanyam Bhutani: I’m a fan of your amazing blogposts. 
Could you share a few tips for the readers who want to become better (Tech) writers ?

Dr. Rachel Thomas: One piece of advice is to consider that your target audience is you-6-months-ago, not Geoffrey Hinton. What would have been helpful for your former self to hear? You are best positioned to help people one step behind you. Many experts have forgotten what it was like to be a beginner (or an intermediate) and have forgotten why the topic is hard to understand when you are first learning it. The context of your particular background, your particular style, and your knowledge level will give a different twist to what you’re writing about. I wrote a post on getting started blogging.

I really appreciate when Andrew Trask spoke in his interview with you about the importance of high-quality blog posts and putting time into your writing. I am slightly embarrassed by how much time I put into many of my blog posts (I typically go through many iterations and re-writes), but it often pays off.

Sanyam Bhutani: The fast.ai philosophy is: Anyone can do DL, you don’t need to have a PhD to contribute to the field.

Being a Math PhD yourself, Could you share some of your thoughts about a “Non-Technical” student contributing to the field?

Dr. Rachel Thomas: I know that my math PhD has helped open doors for me as a credential, but in terms of the content I studied, I feel like I’ve used little of it (my goal at the time was to become a math professor, in which case I definitely would have needed the degree). I’ve written previously about some of the opportunity costs and downsides of doing a PhD.

I encourage everyone to learn math and technical topics on an “as-needed” basis. That is, start doing the work you are interested in doing, and if you come across some topic that you really need to be able to continue, learn it at that point. I don’t recommend trying to front-load all the math and technical topics that you think you may need, because in many cases you won’t need nearly as much as you think, and this can lead to students feeling bogged down or losing motivation.

Also, the fields of computer science and math are huge, so even someone with a “traditional, technical background” is only going to have studied some subset of the many, many computer science topics out there. For instance, my college education taught me how to prove if an algorithm was NP-complete or Turing computable, but nothing about testing, version control, web apps, or how the internet works.

Sanyam Bhutani: Before we conclude, any advice for the beginners who even though are excited about the field, feel overwhelmed to even get started with Deep Learning?

Dr. Rachel Thomas: I actually still feel overwhelmed with deep learning, just because there is such a huge volume of interesting research & advances coming out all the time. It’s really important to be patient with yourself, and to try to remember how much more you know now than you did 6 months ago. The adage that people overestimate how much they can learn in 1 month and underestimate how much they can learn in 5 years is very true.

As you learn, helping others (through writing blog posts, answering questions online, tutoring, assisting with workshops for beginners, etc) is a good way to be reminded that you are learning something, as well as to cement your knowledge and to give back. One test of whether you truly understand something is whether you can teach it to someone else.

Sanyam Bhutani: Thank you so much for doing this interview.


You can find me on twitter @bhutanisanyam1
Subscribe to my Newsletter for updates on my new posts and interviews with My Machine Learning heroes and Chai Time Data Science