TABLE OF CONTENTS

AI organizations divide their work into data engineering, modeling, deployment, business analysis, and AI infrastructure. The necessary skills to carry out these tasks are a combination of technical, behavioral, and decision making skills. The data science case study interview focuses on technical and decision making skills, and you’ll encounter it during an onsite round for a Data Scientist (DS), Data Analyst (DA), Machine Learning Engineer (MLE) or Machine Learning Researcher (MLR). You can learn more about these roles in our AI Career Pathways report and about other types of interviews in The Skills Boost.

I   What to expect in the data science case study interview

The interviewer is evaluating your approach to a real-world data science problem. The interview revolves around a technical question which can be open-ended. There is no exact solution to the question; it’s your thought process that the interviewer is evaluating. Here’s a list of interview questions you might be asked:

II   Recommended framework

All interviews are different, but the ASPER framework is applicable to a variety of case studies:

  1. Ask. Ask questions to uncover details that were kept hidden by the interviewer. Specifically, you want to answer the following questions: “what are the product requirements and evaluation metrics?”, “what data do I have access to?”, ”how much time and computational resources do I have to run experiments?”.
  2. Suppose. Make justified assumptions to simplify the problem. Examples of assumptions are: “we are in small data regime”, “events are independent”, “the statistical significance level is 5%”, “the data distribution won’t change over time”, “we have three weeks”, etc.
  3. Plan. Break down the problem into tasks. A common task sequence in the data science case study interview is: (i) data engineering, (ii) modeling, and (iii) business analysis.
  4. Execute. Announce your plan, and tackle the tasks one by one. In this step, the interviewer might ask you to write code or explain the maths behind your proposed method.
  5. Recap. At the end of the interview, summarize your answer and mention the tools and frameworks you would use to perform the work. It is also a good time to express your ideas on how the problem can be extended.

III   Interview tips

Every interview is an opportunity to show your skills and motivation for the role. Thus, it is important to prepare in advance. Here are useful rules of thumb to follow:

Articulate your thoughts in a compelling narrative.

Data scientists often need to convert data into actionable business insights, create presentations, and convince business leaders. Thus, their communication skills are evaluated in interviews and can be the reason of a rejection. Your interviewer will judge the clarity of your thought process, your scientific rigor, and how comfortable you are using technical vocabulary.

Example 1: Your interviewer will notice if you say “correlation matrix” when you actually meant “covariance matrix”.

Example 2: Mispronouncing a widely used technical word or acronym such as Poisson, ICA, or AUC can affect your credibility. For instance, ICA is pronounced aɪ-siː-eɪ (i.e., “I see A”) rather than “Ika”.

Example 3: Show your ability to strategize by drawing the AI project development life cycle on the whiteboard.

Tie your task to the business logic.

Example 1: If you are asked to improve Instagram’s news feed, identify what’s the goal of the product. Is it to have users spend more time on the app, users click on more ads, or drive interactions between users?

Example 2: You present graphs to show the number of salesperson needed in a retail store at a given time. It is a good idea to also discuss the savings your insight can lead to.

Alternatively, your interviewer might give you the business goal, such as improving retention, engagement or reducing employee churn, but expect you to come up with a metric to optimize.

Example: If the goal is to improve user engagement, you might use daily active users as a proxy and track it using their clicks (shares, likes, etc.).

Brush up your data science foundations before the interview.

You have to leverage concepts from probability and statistics such as correlation vs. causation or statistical significance. You should also be able to read a test table.

Example: You’re a professor currently evaluating students with a final exam, but considering switching to a project-based evaluation. A rumor says that the majority of your students are opposed to the switch. Before making the switch, what would you like to test? In this question, you should introduce notation to state your hypothesis and leverage tools such as confidence intervals, p-values, distributions, and tables. Your interviewer might then give you more information. For instance, you have polled a random sample of 300 students in your class and observed that 60% of them were against the switch.

Avoid clear-cut statements.

Because case studies are often open-ended and can have multiple valid solutions, avoid making categorical statements such as “the correct approach is …” You might offend the interviewer if the approach they are using is different from what you describe. It’s also better to show your flexibility with and understanding of the pros and cons of different approaches.

Study topics relevant to the company.

Data science case studies are often inspired by in-house projects. If the team is working on a domain-specific application, explore the literature.

Example 1: If the team is working on time series forecasting, you can expect questions about ARIMA, and follow-ups on how to test whether a coefficient of your model should be zero.

Example 2: If the team is building a recommender system, you might want to read about the types of recommender systems such as collaborative filtering or content-based recommendation. You may also learn about evaluation metrics for recommender systems (Shani and Gunawardana, 2017).

Listen to the hints given by your interviewer.

Example: The interviewer gives you a spreadsheet in which one of the columns has more than 20% missing values, and asks you what you would do about it. You say that you’d discard incomplete records. Your interviewer follows up with “Does the dataset size matter?”. In this scenario, the interviewer expects you to request more information about the dataset and adapt your answer. For instance, if the dataset is small, you might want to replace the missing values with a good estimate (such as the mean of the variable).

Show your motivation.

In data science case study interviews, the interviewer will evaluate your excitement for the company’s product. Make sure to show your curiosity, creativity and enthusiasm.

When you are not sure of your answer, be honest and say so.

Interviewers value honesty and penalize bluffing far more than lack of knowledge.

When out of ideas or stuck, think out loud rather than staying silent.

Talking through your thought process will help the interviewer correct you and point you in the right direction.

IV   Resources

You can build decision making skills by reading data science war stories and exposing yourself to projects. Here’s a list of useful resources to prepare for the data science case study interview.

Data scientists carry out data engineering, modeling, and business analysis tasks. They demonstrate solid scientific foundations as well as business acumen (see Figure above). Communication skills are usually required, but the level depends on the team.

Data analysts carry out data engineering and business analysis tasks as shown in the figure above. Their skills complement those of people who train models, deploy them, and build software infrastructure. They demonstrate solid analytical skills as well as business acumen. They are accomplished in query languages such as SQL and commonly use spreadsheet software tools. However, they don’t need algorithmic coding skills. Communication skills are usually required, but the level depends on the team.

Machine learning engineers carry out data engineering, modeling, and deployment tasks. They demonstrate solid scientific and engineering skills (see Figure above). Communication skills requirements vary among teams.

Machine learning researchers carry out data engineering and modeling tasks. They demonstrate outstanding scientific skills (see Figure above). Communication skills requirements vary among teams.

Developing an AI project development life cycle involves five distinct$:$ data engineering, modeling, deployment, business analysis, and AI infrastructure.

Author(s)

  1. Kian Katanforoosh - Founder at Workera, Lecturer at Stanford University - Department of Computer Science, Founding member at deeplearning.ai

Acknowledgment(s)

  1. The layout for this article was originally designed and implemented by Jingru Guo, Daniel Kunin, and Kian Katanforoosh for the deeplearning.ai AI Notes, and inspired by Distill.

Footnote(s)

  1. Job applicants are subject to anywhere from 3 to 8 interviews depending on the company, team, and role. You can learn more about the types of AI interviews in The Skills Boost. This includes the machine learning algorithms interview, the deep learning algorithms interview, the machine learning case study interview, the deep learning case study interview, the data science case study interview, and more coming soon.
  2. It takes time and effort to acquire acumen in a particular domain. You can develop your acumen by regularly reading research papers, articles, and tutorials. Twitter, Medium, and websites of data science and machine learning conferences (e.g., KDD, NeurIPS, ICML, and the like) are good places to read the latest releases. You can also find a list of hundreds of Stanford students' projects on the Stanford CS230 website.
For members
Want evaluate and credential your skills, or land a job in AI?
For companies
Are you hiring AI engineers and scientists?

↑ Back to top