AI organizations divide their work into data engineering, modeling, deployment, business analysis, and AI infrastructure. The necessary skills to carry out these tasks are a combination of technical, behavioral, and decision making skills. The data science case study interview focuses on technical and decision making skills, and you’ll encounter it during an onsite round for a Data Scientist (DS), Data Analyst (DA), Machine Learning Engineer (MLE) or Machine Learning Researcher (MLR). You can learn more about these roles in our AI Career Pathways report and about other types of interviews in The Skills Boost.

I What to expect in the data science case study interview

The interviewer is evaluating your approach to a real-world data science problem. The interview revolves around a technical question which can be open-ended. There is no exact solution to the question; it’s your thought process that the interviewer is evaluating. Here’s a list of interview questions you might be asked:

How many cashiers should be at a Walmart store at a given time?
You notice a spike in the number of user-uploaded videos on your platform in June. What do you think is the cause, and how would you test it?
Your company is thinking of changing its logo. Is it a good idea? How would you test it?
Could you tell if a coin is biased?
In a given day, how many birthday posts occur on Facebook?
What are the different performance metrics for evaluating ride sharing services?
How will you test if a chosen credit scoring model works or not? What dataset(s) do you need?
Given a user’s history of purchases, how do you predict their next purchase?

II Recommended framework

All interviews are different, but the ASPER framework is applicable to a variety of case studies:

Ask. Ask questions to uncover details that were kept hidden by the interviewer. Specifically, you want to answer the following questions: “what are the product requirements and evaluation metrics?”, “what data do I have access to?”, ”how much time and computational resources do I have to run experiments?”.
Suppose. Make justified assumptions to simplify the problem. Examples of assumptions are: “we are in small data regime”, “events are independent”, “the statistical significance level is 5%”, “the data distribution won’t change over time”, “we have three weeks”, etc.
Plan. Break down the problem into tasks. A common task sequence in the data science case study interview is: (i) data engineering, (ii) modeling, and (iii) business analysis.
Execute. Announce your plan, and tackle the tasks one by one. In this step, the interviewer might ask you to write code or explain the maths behind your proposed method.
Recap. At the end of the interview, summarize your answer and mention the tools and frameworks you would use to perform the work. It is also a good time to express your ideas on how the problem can be extended.

III Interview tips

Every interview is an opportunity to show your skills and motivation for the role. Thus, it is important to prepare in advance. Here are useful rules of thumb to follow:

Articulate your thoughts in a compelling narrative.

Data scientists often need to convert data into actionable business insights, create presentations, and convince business leaders. Thus, their communication skills are evaluated in interviews and can be the reason of a rejection. Your interviewer will judge the clarity of your thought process, your scientific rigor, and how comfortable you are using technical vocabulary.

Example 1: Your interviewer will notice if you say “correlation matrix” when you actually meant “covariance matrix”.

Example 2: Mispronouncing a widely used technical word or acronym such as Poisson, ICA, or AUC can affect your credibility. For instance, ICA is pronounced aɪ-siː-eɪ (i.e., “I see A”) rather than “Ika”.

Example 3: Show your ability to strategize by drawing the AI project development life cycle on the whiteboard.

Tie your task to the business logic.

Example 1: If you are asked to improve Instagram’s news feed, identify what’s the goal of the product. Is it to have users spend more time on the app, users click on more ads, or drive interactions between users?

Example 2: You present graphs to show the number of salesperson needed in a retail store at a given time. It is a good idea to also discuss the savings your insight can lead to.

Alternatively, your interviewer might give you the business goal, such as improving retention, engagement or reducing employee churn, but expect you to come up with a metric to optimize.

Example: If the goal is to improve user engagement, you might use daily active users as a proxy and track it using their clicks (shares, likes, etc.).

Brush up your data science foundations before the interview.

You have to leverage concepts from probability and statistics such as correlation vs. causation or statistical significance. You should also be able to read a test table.

Example: You’re a professor currently evaluating students with a final exam, but considering switching to a project-based evaluation. A rumor says that the majority of your students are opposed to the switch. Before making the switch, what would you like to test? In this question, you should introduce notation to state your hypothesis and leverage tools such as confidence intervals, p-values, distributions, and tables. Your interviewer might then give you more information. For instance, you have polled a random sample of 300 students in your class and observed that 60% of them were against the switch.

Avoid clear-cut statements.

Because case studies are often open-ended and can have multiple valid solutions, avoid making categorical statements such as “the correct approach is …” You might offend the interviewer if the approach they are using is different from what you describe. It’s also better to show your flexibility with and understanding of the pros and cons of different approaches.

Study topics relevant to the company.

Data science case studies are often inspired by in-house projects. If the team is working on a domain-specific application, explore the literature.

Example 1: If the team is working on time series forecasting, you can expect questions about ARIMA, and follow-ups on how to test whether a coefficient of your model should be zero.

Example 2: If the team is building a recommender system, you might want to read about the types of recommender systems such as collaborative filtering or content-based recommendation. You may also learn about evaluation metrics for recommender systems (Shani and Gunawardana, 2017).

Listen to the hints given by your interviewer.

Example: The interviewer gives you a spreadsheet in which one of the columns has more than 20% missing values, and asks you what you would do about it. You say that you’d discard incomplete records. Your interviewer follows up with “Does the dataset size matter?”. In this scenario, the interviewer expects you to request more information about the dataset and adapt your answer. For instance, if the dataset is small, you might want to replace the missing values with a good estimate (such as the mean of the variable).

Show your motivation.

In data science case study interviews, the interviewer will evaluate your excitement for the company’s product. Make sure to show your curiosity, creativity and enthusiasm.

When you are not sure of your answer, be honest and say so.

Interviewers value honesty and penalize bluffing far more than lack of knowledge.

When out of ideas or stuck, think out loud rather than staying silent.

Talking through your thought process will help the interviewer correct you and point you in the right direction.

IV Resources

You can build decision making skills by reading data science war stories and exposing yourself to projects. Here’s a list of useful resources to prepare for the data science case study interview.

In Your Client Engagement Program Isn’t Doing What You Think It Is, Stitch Fix scientists (Glynn and Prabhakar) argue that “optimal” client engagement tactics change over time and companies must be fluid and adaptable to accommodate ever-changing client needs and business strategies. They present a contextual bandit framework to personalize an engagement strategy for each individual client.
For many Airbnb prospective guests, planning a trip starts at the search engine. Search Engine Optimization (SEO) helps make Airbnb painless to find for past guests and easy to discover for new ones. In Experimentation & Measurement for Search Engine Optimization, Airbnb data scientist De Luna explains how you can measure the effectiveness of product changes in terms of search engine rankings.
Coordinating ad campaigns to acquire new users at scale is time-consuming, leading Lyft’s growth team to take on the challenge of automation. In Building Lyft’s Marketing Automation Platform, Sampat shares how Lyft uses algorithms to make thousands of marketing decisions each day such as choosing bids, budgets, creatives, incentives, and audiences; running tests; and more.
In this Flower Species Identification Case Study, Olson goes over a basic Python data analysis pipeline from start to finish to illustrate what a typical data science workflow looks like.
Before producing a movie, producers and executives are tasked with critical decisions such as: do we shoot in Georgia or in Gibraltar? Do we keep a 10-hour workday or a 12-hour workday? In Data Science and the Art of Producing Entertainment at Netflix, Netflix scientists and engineers (Kumar et al.) show how data science can help answer these questions and transform a century-old industry with data science.

Data scientists carry out data engineering, modeling, and business analysis tasks. They demonstrate solid scientific foundations as well as business acumen (see Figure above). Communication skills are usually required, but the level depends on the team.

Data analysts carry out data engineering and business analysis tasks as shown in the figure above. Their skills complement those of people who train models, deploy them, and build software infrastructure. They demonstrate solid analytical skills as well as business acumen. They are accomplished in query languages such as SQL and commonly use spreadsheet software tools. However, they don’t need algorithmic coding skills. Communication skills are usually required, but the level depends on the team.

Machine learning engineers carry out data engineering, modeling, and deployment tasks. They demonstrate solid scientific and engineering skills (see Figure above). Communication skills requirements vary among teams.

Machine learning researchers carry out data engineering and modeling tasks. They demonstrate outstanding scientific skills (see Figure above). Communication skills requirements vary among teams.

Developing an AI project development life cycle involves five distinct$:$ data engineering, modeling, deployment, business analysis, and AI infrastructure.

Data science case study interview