The one thing you need to master data science

By Sharp Sight

(This article was first published on r-bloggers – SHARP SIGHT LABS, and kindly contributed to R-bloggers)

When you ask people what makes a person great – what makes someone an elite performer – they commonly say “talent.” Most people believe that elite performers are born with their talent. Most people believe that top performers come into the world with an innate talent that makes them special.

You see something like this in data science too. People hear about elite data scientists and they assume that these people are just naturally gifted. “He must be a genius.” “She must have been born with a talent for math.” “That guy has a gift for programming.” From the outside, people look at top performing data scientists and say, “that could never be me … I don’t have that gift.”

In data science and more generally, people think that innate talent is what causes exceptional performance.

This may be one of the biggest misconceptions in history.

As it turns out, there has been extensive research on elite performers of all kinds: musicians, doctors, chess players, even mathematicians. Time after time, research demonstrates that top performers are made not born.

The myth of innate talent

The idea that talent creates greatness was detailed and largely debunked in the popular book, Talent is Overrated, by Geoff Colvin.

In a specific example in Talent is Overrated, Colvin writes about a study of music students that was performed in 1992. In this study, they studied a group of several hundred young music students. They categorized the performance abilities of these students into 5 (i.e., the top performers, middling performers, low performers, etc).

Next, they collected data on these students. They investigated how many hours per day the students practiced by interviewing the students and their parents. They asked when the students first started playing their instrument (what age).

When they analyzed the data, there were a few critical insights.

First, when they looked for evidence of innate talent in the highest performing students, they didn’t find it. Essentially, they analyzed the highest performing music students, and looked for early signs of talent. Among the best performing music students, there had been no unusual signs of talent when they were young.

Second, they found that there was a single factor that predicted the performance of the students: how much they practiced.

Practice was the secret. Practice was the thing that made the “great performers” great.

You can use this information as a data science student. The primary factor that enable you to master data scientist is practice.

Practice can make you a top-tier data scientist

To learn and master data science, you need to practice. But importantly, it’s a very particular kind of practice that leads to expertise.

To master data science, you need deliberate practice.

Deliberate practice is practice that is specifically designed to push you beyond your current skill level.

Unfortunately, when most people practice a skill, they actually don’t push themselves. According to Anders Ericsson, a famous researcher of human expertise and human performance, when most people practice they focus on things that they already know how to do.

For example, let’s say that you already know how to create a bar chart.

ggplot(data = diamonds, aes(x = carat, y = price) +
  geom_line()

If you you’ve already mastered the syntax to create a bar chart, but you keep practicing it intensely without moving on to more advanced techniques, you are unlikely to improve as a data scientist.

Now don’t misunderstand: it is useful to occasionally practice techniques that you’ve already mastered. However, if you don’t push yourself forward by expanding your skills to more advanced techniques, then you won’t develop as a data scientist.

Another example is with musicians: if you’re a musician, and you learn only 5 simple songs, and you only play those 5 songs for 10 years, you won’t improve. To improve, you need to master the basics, but you need to consistently push yourself beyond your current skills.

To quote Anders Ericsson: “[Deliberate practice] entails considerable, specific, and sustained efforts to do something you can’t do well—or even at all.”

What this means for you, as a developing data scientist, is that you need to practice in a way that:

  1. Enables you to learn techniques
  2. Enables you to practice those techniques
  3. Pushes you beyond your present level so you continue to improve.

These principles will help you develop a good practice system. However, to give you a clearer understanding of how to optimize your practice, let’s dig a little deeper into the nature of deliberate practice.

What is deliberate practice?

In Talent is Overrated, Colvin described the most important features of deliberate practice.

Deliberate practice is:

  1. Designed for improvement
  2. Repeatable
  3. Provides feedback
  4. Mentally demanding
  5. Not much fun

Let’s examine each of these features one by one. I’ll also explain how you can apply them to your study of data science.

Deliberate practice is designed to improve performance

To train optimally, your practice should be specifically designed to push you beyond your current skill.

In Talent is Overrated, Colvin gives the example of Tiger Woods dropping a ball in a sand trap and deliberately stepping on it to press the ball further into the sand, making the shot very, very challenging.

Let me explain how you can apply this to studying data science.

You need a system that is deliberately designed to challenge you. You need to practice in a way that pushes you beyond your comfort zone.

So for example, if you’ve already learned the basics of ggplot2, like geom_line(), geom_bar(), geom_point(), pushing yourself might mean learning the theme() function and the corresponding element functions. Or if you’ve already learned ggplot::theme(), then you may need to move on to more advanced visualization techniques, etc.

I want to emphasize, however, that the important word is ‘designed.’ You need a system that is designed to challenge you. As a data science student, this will be one of your biggest obstacles. You may not be able to design a practice routine yourself. As a beginning or intermediate student, it will be difficult for you to design a system yourself that teaches you the right skills, in the right order, in such a way that you’re continuously challenged by your practice. It’s hard to do without a coach or mentor.

Sadly, I also think that most data science courses lack well designed practice systems. I’ve heard many stories from students who say that they “took an online course” but they still can’t write code. In most cases, I suspect that this is because the course lacks a well designed practice system to teach you skills, but then push you beyond your skill level to higher and higher levels of mastery.

One way or another, I highly recommend that you create or invest in a practice system that is designed to improve your performance.

Deliberate practice is repeatable

To master a skill, you need to repeat your practice activities until they are second nature.

Sadly, people learning data science rarely repeat their practice activities. At best, most people learn a new technique, and then practice it only a couple of times. Inevitably, if they don’t repeat their practice, they forget the technique. Again, you frequently hear people say “I took several online courses, but I still can’t write R code very well.” They frequently talk about learning a technique, but then forgetting it. The reason they forget, is that they fail to practice over the long run.

If you want to be a top-performing data scientist, you need to repeat your data science practice until you can perform techniques unconsciously (i.e., without thinking about it).

To help you understand what I mean by “perform techniques unconsciously,” I’ll break down the phases of learning as follows:

  1. Unconscious incompetence
  2. Conscious incompetence
  3. Conscious competence
  4. Unconscious competence

Let me explain these.

Before learning a skill, you will be unconsciously incompetent. Essentially, this means that before you begin learning a new technique, you’re bad at it and you don’t even know it. For example, if you want to learn data science, but you don’t even know which packages to learn, you would be “unconsciously incompetent.” At this stage, you don’t even know what you don’t know.

Next, when you first learn a skill, you become consciously incompetent. At this stage, you are unskilled, but you are aware of your lack of skill. For example, the first time you learn how to create a bar chart with geom_bar(), you’ll essentially be consciously incompetent. If you only practice it a few times, you’ll probably struggle to remember how to execute the technique. You would lack skill, and you’d be acutely aware of your lack of skill. That’s conscious incompetence.

But if you repeat your practice for a few days and weeks, you’ll begin to be consciously competent. At this level, you can smoothly execute a technique, but still with some mental effort. For example, if you systematically and repeatedly practice the techniques from ggplot2 and dplyr for a couple of weeks, you’ll eventually reach a point where you can execute those techniques. Having said that, at this stage, it still requires effort. You’ll need to consciously think about the code. You’ll need to work to remember the syntax. But the important thing is that you will remember. You will be able to execute the techniques. This stage, when you can perform the activity (but only with mental effort), is called conscious competence.

Finally, if you stick with it and you systematically practice your data science techniques, you’ll reach unconscious competence. At the stage of unconscious competence, you can execute techniques without any thought at all. The techniques are so well practiced, that you can do them without effort. For example, I have students that say that they can write R code “with their eyes closed.” That’s unconscious competence.

That should be your goal. Your goal should be unconscious competence in the techniques of the tidyverse, like ggplot2, dplyr, stringr, and tidyr functions. Your goal should be to be able to write the code effortlessly, fluidly, rapidly, and smoothly. Imagine writing R code effortlessly, with your eyes closed, as fast as you can type. That should be your goal. “Fluency” in R.

It sounds hard, but this is absolutely possible for you as a data scientist.

You can achieve this level of “fluency,” but it requires you to practice. It requires repeated practice.

You need to repeat your data science practice until writing R code is second nature.

Deliberate practice provides feedback

Moreover, for your practice to be effective, you need feedback. This is a major difference between “regular” practice and “deliberate practice.”

Ideally, this feedback should come from a skilled instructor. Having an expert analyze your performance to identify strengths and weaknesses is extremely helpful.

Having said that, not everyone can afford a coach or tutor for this sort of guidance.

On the other hand, as data scientists, we’re actually somewhat lucky. In many instances, we get feedback on our techniques directly from our programming environment. If you type some code into R-Studio, it either runs without error, or it doesn’t. If your code contains an error, you’ll get an error message (however cryptic it may be) about what you did wrong. Alternatively, if your code runs without errors, you commonly get other types of feedback by examining the output. For example, if you intended to make a line chart, you can examine the output. Did the code produce the exact line chart that you envisioned? Did the code do what you thought it would? This is also feedback.

Ideally though, your data science practice system should provide more than just the output from R-Studio. Ideally, you want to get feedback that your exact answer was correct or not.

If you know the right tools to use, this is absolutely possible. Moreover, once you start using a feedback-driven practice system to learn data science, your progress will accelerate. You’ll learn more, faster, and make fewer mistakes.

Deliberate practice is mentally demanding

Another detail of deliberate practice is that it’s mental challenging.

If you’re really pushing yourself to learn and practice skills that are out of your comfort zone, it’s going to hurt a little. It’s going to be mentally challenging.

For example, the first time you start learning the functions from the tidyr package (which I recommend if you’re a beginner), they might be a little difficult to understand. tidyr reshapes your data into new formats. These transformations that reshape your data can be difficult to understand.

And it’s not just that some techniques are difficult to understand. Simply remembering R syntax can be challenging. If you’ve just learned a new R technique, it will be difficult to remember the syntax after a few days. Even if you learn a technique sufficiently the first time, it’s very likely that you’ll begin to forget it very quickly.

Ultimately in data science, some techniques are hard to understand. Syntax is hard to remember. The process of learning new, advanced techniques (and pushing yourself to memorize the syntax) is hard. If you’re doing it right, learning data science is mentally demanding. If you’re going to master data science, it should be hard. Deliberate practice is supposed to be hard. If it’s not, then you’re not pushing yourself hard enough. If your practice always feels easy, you’re doing it wrong.

The challenge for you, as someone who’s learning data science, is that you need a practice system that continuously pushes you beyond your skill level towards techniques of increasing difficulty.

Deliberate practice is not fun

Finally, deliberate practice is not fun.

In talking about this, I still want to emphasize that you can rapidly master data science if you know how to practice. You can learn data science faster than you ever thought possible. I think you can learn data science 2x, 3x, even 5x faster than average, if you know how to practice.

Having said that, a good practice system is not a magic wand. It’s not going to make data science effortless. It’s going to be hard. It will be frustrating at times. It’s not always fun.

If you want to master data science, you need to embrace the hurt. You need to accept that there’s no success without struggle. You need to accept that mastering data science will be a little painful sometimes, and you need to power through.

If you can embrace the fact that deliberate practice – the type of practice that leads you to mastery – will be hard sometimes, then you can succeed.

A quick guide to deliberate practice for data science

“Deliberate practice requires that one identify certain sharply defined elements of performance that need to be improved, and then work intently on them.”

– Geoff Colvin

Now that you understand what deliberate practice is and how it can help you, here are a few recommendations.

Identify sharply defined techniques that you can practice

To engage in deliberate practice, you need to sharply define a set of techniques, and practice them intensely until you master them. After you master them, define additional techniques (more advanced techniques) and practice them as well.

As a data scientist, this means that you should sharply define individual techniques, practice those techniques repeatedly, and move on to harder techniques as you progress.

The tidyverse functions are small units that you can practice

As I wrote in a recent article, the modular nature of R’s tidyverse makes it somewhat easy to define “practicable techniques.” The functions of the tidyverse are highly modular. Almost all of the functions in the tidyverse do one thing. You should consider the functions of the tidyverse to be like sharply defined techniques that you can individually learn, practice, and master.

For example, I consider dplyr::mutate() to be one technique. dplyr::arrange() is another separate technique. tidyr::gather() is a technique. Within ggplot2, geom_line(), geom_bar(), and geom_point() should be considered separate techniques. These are individual techniques that you can practice and master. They are the small units that you need to practice.

Establish a practice system

Ideally, you should set up a practice system. You need a system that will teach you the right skills in the right order. You need a system that enables you to practice your techniques repeatedly over time until you reach mastery. You need a system that gives you feedback.

If you don’t have these things, you are unlikely to reach your full potential. But with the right system in place, you can travel very rapidly on the path to data science mastery.

Commit to practice

As I’ve mentioned, if you want to become a top-tier data scientist, you need to practice. It’s not enough to learn a technique and practice it one time. You need to practice a technique repeatedly over time until it becomes second nature. When you reach that point, you need to move on to new skills that push you beyond your current skill level.

Practicing like this requires commitment. It’s hard. You need to be disciplined. You need to commit to showing up every day and doing the work. There are. no. shortcuts.

I promise you though, if you can commit to practice, and you practice data science the right way, then you can learn data science very, very quickly.

Our data science course is reopening soon

If you’re interested in rapidly mastering data science, then sign up for our list right now.

We will be re-opening our flagship course, Starting Data Science, within a few weeks.

Starting Data Science will teach you the essentials of R, including ggplot2, dplyr, tidyr, stringr, and lubridate. It will also give you a practice system that you can use to rapidly master these techniques.

If you sign up for our email list, you’ll get an exclusive invitation to join the course when it opens.

SIGN UP NOW

The post The one thing you need to master data science appeared first on SHARP SIGHT LABS.

To leave a comment for the author, please follow the link and comment on their blog: r-bloggers – SHARP SIGHT LABS.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Leave a Reply

Your email address will not be published. Required fields are marked *


Time limit is exhausted. Please reload CAPTCHA.