作者:Chanin Nantasenamat
中文翻译:数据科学入门前需要知道的10件事

I get asked quite often on my YouTube channel (Data Professor) the following questions about how to break into data science:

  • How to become a Data Scientist?
  • What is the roadmap to being a Data Scientist?
  • What courses should I take to learn Data Science?

So I thought that it would probably be a great idea to write an article about it. And so, here it is. It should be noted that the 10 things that I wish I knew about learning data science is based on my personal journey as a self-taught data scientist. The thing is if I could turn back time and advise my 22 year old self about learning data science, then these are some of the things that I would like to say.
I started my data science journey back in 2004. It was a time when the term data science was in its infancy while the more widely used term was data mining. In was not until 2012 that the term data science started to gain traction and propelled itself to mainstream popularity as made possible by the Harvard Business Review article entitled Data Scientist: The Sexiest Job of the 21st Century_ _by Thomas Davenport and D.J. Patil.

What is Data Science?

In a nutshell, Data Science is a field that essentially makes use of data to solve problems and bring impact, value and insights to companies and organisations. Data science has been applied to a wide range of disciplines and industries spanning education, finance, healthcare, geology, retail, travel and esports. The technical skill sets of data science involves the use of data collection, data pre-processing, exploratory data analysis, data visualisation, statistical analysis, machine learning, programming and software engineering. Aside from the technical side, there are various soft skills that are desirable for a data scientist. A high-level overview of the essential skill sets of a Data Scientist is provided in the following infographic.

Infographic by Chanin Nantasenamat (AKA Data Professor)

1. Your Data Science Journey is Personal

Your Data Science Journey is personal. Don’t compare yourself to others, remember that everyone is unique and that each of us are on different journey. Why would we want to be on someone else’s journey? Focus on your own data science journey. It is okay to be delayed by setbacks but don’t let these obstructions keep you from reaching your goal. It’s better to be late than never.
Embrace imposter syndrome and consider the insecurities as a guiding map _that will help you in the grand scheme of things of your data science journey. Particularly, this may lead you to the path of self-improvement. Craft your own list of things to learn and do. Identify data science concepts and skills that you don’t yet know and jot down what you would like to know. Then from this bucket list of data science concepts/skills, focus on learning just 1 new thing a day. Over the course of 1 year, you’ll be amazed at the _compound effect and how much new concepts and skills that you will have learned.

2. How to Learn Data Science?

Learning Styles

How do we learn? Learning style* is popularly classified into 3 major types:

  1. Visual (See)
  2. Auditory (Hear)
  3. Kinaesthetic (Do)

*Disclaimer: It should be noted that there is no scientific proof for the learning styles and thus herein we used the term ‘popularly’ to depict the mainstream popularity of its use. The learning style is used herein to illustrate the various and many form and medium that exists. Advices presented herein are based solely on my own opinion and experience. Please refer to the published research on learning style myth at:https://www.apa.org/news/press/releases/2019/05/learning-styles-myth
Knowledge is everywhere and the source of learning comes in many shape and form. For example, you can learn from books, blogs, videos, podcasts, audio books, lectures, teaching and most important of all by doing.

“The Best Way to Learn Data Science is By Doing Data Science.” — Chanin Nantasenamat (AKA Data Professor)

As you learn new concepts or skills (i.e. from visual and auditory), you can reinforce what you have learned by immediately applying that newfound knowledge to your data science project (i.e. kinaesthetic). By constantly doing data science, you will gradually reinforce and hone the new concepts and skills that you had just learned. And over time you will have mastered them.
In addition, to further reinforce your understanding of these new concepts or skills you can teach others (i.e. writing a tutorial blog, making a video tutorial and teaching others). By doing this, you can harness the above mentioned 3 learning styles and thereby maximize your learning potential. It is also worthy to note that teaching others will help you to materialize the new concepts or skills into your very own wordings and in doing so helps to reorganize your thoughts and better your understanding of it.

Learn How to Learn

This is just the tip of the iceberg on advices on how to learn. In fact, there is an online course on Coursera called Learning How to Learn by Dr. Barbara Oakley and Dr. Terrence Sejnowski, which is a great course that will teach you some of the learning techniques to help you learn more efficiently.
Another great read is a Medium article by Evernote entitled Learning From the Feynman Technique, which summarizes the learning technique devised by the Nobel laureate and physicist, Richard Feynman. Additionally, a YouTube video on The 25 Best Scientific Study Tips provides actionable tips on effective study tips that you can also used in learning data science.
Moreover, Scott Young has written an excellent book on Ultralearning where he shares his self-education experience in learning MIT’s 4-year computer science curriculum in just 1 year. In addition, Josh Kaufman delivered a TED talk and described in his book The First 20 Hours_ _that we can learn anything that we want in just 20 hours.
Mastering the art of learning will allow you to learn and study data science more effectively and in turn will make your learning experience much more enjoyable.

Strategies for Learning Data Science & Skill Sets Needed

Late last year, I released a YouTube video Strategies for Learning Data Science in 2020 where I share some of the practical tips and tricks to get started in your data science journey. You will also want to check out How to Become a Data Scientist (Learning Path and Skill Sets Needed) where I take you on a bird’s eye look at the holistic landscape of data science and cover the 8 important skill sets that all Data Scientists should know about. Additional videos providing strategies and advices on learning data science can be found in the Data Science 101 playlist on the Data Professor YouTube channel.
Ken Jee has made an excellent Medium article and YouTube video on How to ULTRALEARN Data Science. _Additionally, he also shares his tips in his YouTube video [_How I Would Learn Data Science (If I Had to Start Over)](https://www.youtube.com/watch?v=4OZip0cgOho)_._

4. Resources for Learning Data Science (Fee vs Free)

There are an abundance of learning resources out there for learning data science. In fact there are so much that it may be overwhelming to choose from. I will break down the available learning resources into 2 major types: Fee vs Free.
In the following sections, I will be listing some of the resources for learning data science for fee and free.

Learning Resources for a Fee

  1. Set aside dedicated time every day (preferably 1–2 hours or at least 45 minutes everyday) that you can spend learning and doing Data science
  2. Avoid distractions (Turn off your phones, avoid checking social media, etc.). If you cannot stop distractions from reaching you then maybe it may be a better idea to move yourself from a distractive environment. This means that you should find someplace quiet where you can put your undivided attention to focus.
  3. Don’t procrastinate, don’t over think, and just do it! (like Nike) To help you overcome this, try applying the 2-minute rule (read this Medium article on How to Stop Procrastinating by Using the ‘2-Minute Rule’) to help keep you in motion.

Because at the end of the day, if you’re not making progress, you’re not learning and you’re not getting ahead to meet the goals and be where you want to be in your career.

7. Embrace Failure and Learn to Love Debugging

Embrace failure. You’ll have to learn to get comfortable with the uncomfortable. Because simply put, there’s No Free Lunch. No pain, No gain. So when you encounter failure, don’t dwell on it, just get back up and keep on trying.
It is perfectly okay to get stuck, it is okay to don’t understand algorithm X, and it is okay to not know how to debug your failed code. You can take a break to refresh your mind before getting back into tackling your challenge. Sometimes your mind gets clogged and get sluggish and so taking a break may help to rejuvenate and refresh the mind.
When you are stuck with a coding error for your data science project and you are not sure on how to proceed. If you have a friend who is knowledgeable in coding, ask him or her. If not, search Stack Overflow if there is already an answer for your question. If not, ask!
Learn to love debugging, take it as a learning opportunity that you can gain valuable insights and lessons learned from failures and mistakes. Because if you don’t fail, you don’t learn. But when you do fail, don’t be too harsh on yourself and learn to get back up and start over. You want to be resilient to failure.

8. Don’t Worry About Trying to Learn Everything

A newcomer to the field may be stunned by all the fancy terminologies but try not to be intimidated and remember that Data science and Machine learning is a dynamic, growing and evolving field and therefore there will always be the introduction of new technologies. Simply put, the only thing that will remain constant is change itself.
As mentioned above, don’t be intimidated and take the dive and start. It does not matter where you start, the most important thing that does matter is that you actually start your data science journey.

Focus on the Basics

  1. Data wrangling (Python — pandas, R — dplyr)
  2. Read up on statistics so that you can apply them in your models. For example, applying proper statistics to Compare models (parametric vs non-parametric).
  3. Exploratory data analysis and descriptive statistics for gaining an overview of the data
  4. Start with building simple and interpretable machine learning models (linear regression, tree-based methods)
  5. Use machine learning approaches that you are confident in using (knowing the math behind it)

    Focus on the Project and Not on the Technology

    Don’t over think. Overcome the “What language should I learn?” dilemma, choose one and move on.
    Know that programming is a tool, which should help you in taking your project’s idea forward to development and deployment
    The underlying concepts of programming is language agnostic, meaning that the core fundamentals applies across languages:
  • Defining variables, arrays, data frames, etc.
  • Flow control (e.g. for loops, if and else statements)
  • Specific tasks in Data science
    - Data wrangling / Data pre-processing
    - Data visualization
    - Model building
    - Model deployment

    9. Make Your Projects Reproducible

    Some of the benefits of making your data science projects reproducible are as follows:

    Others can help you

  • When you are faced with a coding error, it is essential to make a minimal working example (MWE) as it will allow others to reproduce your errors so that they can help you.

    Save time for your future self and others

  • Export your project as Docker containers as well as Python’s and Conda’s environments. Because what works today may not work 6 months from now owing to the constantly changing versions of the underlying libraries that are installed in your coding environment. It is thus essential to use virtual environments, Docker containers or at least export the library versions (shown below for pip and conda).

Exporting environment in pip:
pip freeze > requirements.txt
Exporting environment in conda:
conda env export > environment.yml

10. Learning Success Starts from Within

This section explore the idea that the level of success for your data science journey starts from within. It is about preparing your mind for what is to come and become of you. These concepts include: Curiosity, Love the Process, Growth Mindset and Grit.

Curiosity

Curiosity can be considered to be one of the core and necessary skill for becoming a data scientist because it keep us motivated and persistent in the pursuit of creative ways of solving problems. Albert Einstein once compared curiosity and knowledge.

“Curiosity is more important than knowledge.” ―Albert Einstein

Eric Colson stressed the importance of curiosity in his Harvard Business Review article Curiosity-Driven Data Science.

“…think less about how data science will support and execute your plans and think more about how to create an environment to empower your data scientists to come up with things you never dreamed of.” ―Eric Colson

Loving the Process

Learning data science is not an easy endeavor nor an impossible feat. It is definitely possible for an individual from a non-technical background to break into data science as I did and discussed in my previous Medium article How a Biologist Became a Data Scientist.
When talking about loving the process, three names come to mind: Michael Jordan, Gary Vaynerchuk and Clément Mihailescu. These three individuals can be considered to be the best in what they do and their passion for what they do are relentless.
In signing his first professional basketball contract, Michael Jordan made sure that a special

“Love of the Game”

clause was included in the contract which would allow Jordan to play basketball whenever and wherever without restrictions.
As Gary Vaynerchuk (Chairman of VaynerX, CEO of VaynerMedia, 5-Time NYT Bestselling Author) says in a YouTube video when asked if he could delegate most of his job to spend less time at work.

“I love the process of the work, I love the grind, I love the climb.…I would suffocate if I couldn’t put out the work that’s needed to accomplish the things that I want.” ―Gary Vaynerchuk

Clément Mihailescu (CEO of AlgoExpert, Ex-Facebook Software Engineer and Tech YouTuber) says in a YouTube video about how he doesn’t experience burn out.

“At the end of the day, you have to enjoy the process. Whatever it is that you’re doing, whatever endeavor you’re pursuing, you have to enjoy the day to day, you have to love the nitty gritty stuff. You have to live and breath it.” ―Clément Mihailescu

Growth Mindset and Grit

Based on several years of research, Angela Duckworth (Founder and CEO of Character Lab and Professor of Psychology at the University of Pennsylvania) defines the term grit _in her best-selling book [_Grit: The Power of Passion and Perseverance](https://amzn.to/2TWEoHJ) (YouTube video) as the combination of passion and persistence. Particularly, an excerpt of her definition of grit is:

“Grit is the tendency to sustain interest in and effort toward very long-term goals.” ―Angela Duckworth

Carol Dweck described in her book Mindset: Changing The Way You Think to Fulfil Your Potential_ _findings from her research on the two main mindset guiding our life: (1) growth mindset and (2) fixed mindset. The former has been associated with success while the latter will usually lead to self-doubt and unfulfilled life. In her TED talk, Dweck proposes the importance of working outside your comfort zone as the key to improving your performance.
In data science, change is inevitable as there will always be the introduction of new and challenging concepts that may overwrite or redefine prior concepts altogether. We will always be bombarded with complex challenges, to cope with these change and challenges, starts from within, particularly having the right mindset that help steer your path to success.

Bonus: 11. Taking Full Responsibility

It is often easy to come up with excuses and blame countless things for the misfortunes of life. When we do this, “we have zero accountability” as Gary Vaynerchuk would always say (an excellent YouTube video on Stop Blaming Others & Take Full Responsibility).
Learning data science is no different than any other endeavor that we do in our life. The thing is will we be accountable for our own delays or obstacles that we encounter during our learning journey or will we not take full responsibility and put the blame elsewhere.
Consider the following quotes on taking full responsibility (watch these on YouTube for the first two quotes and the third quote)

“Take full responsibility for what happens to you, it is one of the highest form of human maturity. Accepting full responsibility, it’s the day you know you have pass from childhood to adulthood.” ―Jim Rohn


“Until you accept responsibility for your life, someone else runs your life.” ―Orrin Woodward


“Everything on you, everything’s your fault. You want to really win in life? You want to get real happy? Do you know why I’m really happy? Because I think that everything is my fault. If I don’t like it, I can change.“ ―Gary Vaynerchuk

Now, take a moment and reflect. Let’s start taking accountability and taking full responsibility, you’ll be amazed at how much you can achieve in your data science journey. Only if we can be objective and take full responsibility for our actions and lack of progress, will we be empowered to do something about it. I’ll leave you with this quote by Jim Rohn.

Success is not something you pursue, success is something you become. ―Jim Rohn

Concluding Remarks

And there you have it, the 10 things that I wish I knew about learning data science if I could go back in time and tell my 22 year old self about learning data science. I hope that these are useful in getting you started on your data science journey or if you have already started, hope that you can find something useful from it. Until next time, the best way to learn data science is to do data science and please enjoy the journey!


About Me

I work full-time as an Associate Professor of Bioinformatics and Head of Data Mining and Biomedical Informatics at a Research University in Thailand. In my after work hours, I’m a YouTuber (AKA the Data Professor) making online videos about data science. In all tutorial videos that I make, I also share Jupyter notebooks on GitHub (Data Professor GitHub page).

Data Professor

Data Science, Machine Learning, Bioinformatics, Research and Teaching are my passion. The Data Professor YouTube…

www.youtube.com