If you are wondering why you should take a data science test, the answer is simple: skills matter. By taking the test, you will:

- know where you stand in the community of AI practitioners
- review your performance to learn about your strengths and weaknesses
- access personalized study plan to prepare for interviews
- fast track to job opportunities within our network
- get a certificate

And, it’s free. Your results are only ever shared with your permission to refer you to a company. Let’s go over the data science test.

# I What is the data science test

The data science test is one of six standardized tests that were developed by a team of AI and assessment experts at Workera to evaluate the skills of people working as a Data Scientist (DS), Data Analyst (DA), Machine Learning Engineer (MLE), Deep Learning Engineer (DLE), Machine Learning Researcher (MLR), or Deep Learning Researcher (DLR). It is comprised of multiple choice questions selected from a large database^{}, so that different test takers get different questions, and takes 15 minutes to complete.

You can learn more about these roles in our AI Career Pathways report and about the other tests in The Skills Boost^{}.

# II What to expect in the data science test

Before taking a test, it is important to understand what it evaluates and how it is graded. The grading rubric for the data science test includes three categories:

*Using probabilities*, which covers probability theory including probability distributions (e.g., uniform, normal, Bernoulli, binomial, Poisson, etc.), operations on probabilities (e.g., sum of probabilities, probabilistic independence, conditional probabilities, etc.), and properties of random variables (such probability density functions or Bayes theorem).*Using statistics*, which encompasses foundational objects and methods from statistics such as statistical parameters (e.g., mean, variance, covariance, mode, etc.), hypothesis testing (e.g. z-test, t-test, A/B test, p-value, etc.), and statistical properties (bias/variance tradeoff, central limit theorem, etc.).*Analyzing data*, which spans data analysis techniques to preprocess data (e.g., clean, transform, augment, or parse), visualize data (e.g., plot, rank, and compare), and use popular metrics (such as accuracy, R-squared, residuals, precision, and recall).

You will be evaluated and assigned to a skill level in each category: beginning, developing, or accomplished, depending on your mastery of the skill at hand. Your skill level in data science will be determined using a combination of your scores across all three categories.

You can learn about the categories and performance levels in the table below.

Category | Beginning | Developing | Accomplished |
---|---|---|---|

Using probabilities | Demonstrates limited understanding of basic objects and methods from probability theory. | Demonstrates ability to use objects and methods from probability theory with some effectiveness. This includes calculating the probability of an event, understanding various probability distributions, and working with random variable and their properties. | Demonstrates ability to apply objects and methods from probability theory to real-world problems and identify their use cases and shortcomings. |

Using statistics | Demonstrates limited understanding of basic objects and methods from statistics. | Demonstrates ability to use objects and methods from statistics with some effectiveness. This includes understanding hypothesis testing, computing statistical parameters, and statistical properties. | Demonstrates ability to apply objects and methods from statistics to real-world problems and identify their use cases and shortcomings. |

Analyzing data | Demonstrates limited understanding of working with data. | Demonstrates ability to extract information from data with some effectiveness. This include preprocessing, visualizing, and evaluating metrics. | Demonstrates ability to extract valuable information from data to solve business problems. |

At the end of the test, you’ll see your overall skill category in data science.

You will also receive feedback for every skills evaluated (e.g., *Calculating the probability of a intersection of events* or *Calculating conditional probabilities using the Bayes rule*).

# III Data science practice questions

Nothing beats practice! Here are examples of questions you might encounter in the data science test. Think carefully before selecting your answer. Then, click submit to see the answer and get feedback.

### Question 1: Classification metrics

### Question 2: Residual plots for linear regression

### Question 3: Balls in Bags

### Question 4: Bernoulli distribution mean and variance

# IV Tips for the data science test

Now that you know what to expect in our data science test, it’s time to take it! You can take the test up to three times in a 90-day period (unless the test is being administered to you by a company for a job) and your results are only ever shared with your permission. The first test is simply meant to act as a baseline to show you where to start studying. So why wait? Sign up here to take the data science test.

You are designing a model that diagnoses whether a patient requires high risk surgery or not. Assume that$:$

- the binary label for a patient that needs the surgery is 1 (Positive)
- the binary label for a patient that could use a less aggressive medical treatment is 0 (Negative)

The single most important factor to take into account is that the model should not suggest the surgery option to a patient that could receive alternative treatment.

Which of these accuracy metrics will be the most important to you? (TP means “True Positive”, FN means “False Negative”, FP means “False Positive”, TN means “True Negative”.)

TP/(TP+FP)

TN/(TN+FN)

TP/(TP+FN)

You have trained a linear regression model on a dataset (X, Y) and plotted the residuals $r$ against the predicted values $\hat{y}$. Here is the plot$:$

What can you conclude from this plot?

Two or more input variables (features) are closely related to one another, it might be beneficial to apply a dimensionality reduction method such as PCA before applying the linear regression.

There is likely a linear relationship between the input data and the output label. Thus, a linear regression seems appropriate.

There is likely a non-linear relationship between the input data and the output label. Thus, a linear regression doesn't seem appropriate.

None of the above.

Bag I contains 4 white and 6 black balls while another Bag II contains 4 white and 3 black balls. One ball is drawn at random from one of the bags (with 50% probability for each bag) and it is found to be black. Find the probability that it was drawn from Bag I.

8/12

7/12

4/12

5/12

If $X$ is a random variable following a Bernoulli distribution, then $P(X = 1) = p = 1 - P(X = 0) = 1 - q$. Which of the following propositions is **true**?

$E[X] = 1-p$

$Var(X) = p(1-q)$

$E[X^2] = p$

None of the above