Developing an AI project development life cycle involves five distinct tasks. No single individual has enough skills (or time) to carry out all tasks in AI project development. Thus, teams include individuals who focus on part of the cycle. Here is a visual representation of six technical roles and how they relate to various tasks.
I What tasks does a data scientist carry out?
Data scientists carry out data engineering, modeling, and business analysis tasks as shown in Figure 1. This includes:
- data engineering subtasks such as defining data requirements, collecting, labeling, inspecting, cleaning, augmenting, and moving data.
- modeling subtasks such as training machine learning models, fitting probabilistic or statistical models, defining evaluation metrics, searching hyperparameters, and reading research papers.
- business analysis subtasks such as building data visualizations, dashboards for business intelligence, presenting technical work to clients or colleagues, translating statistics into actionable business insights, running A/B tests, and analyzing datasets.
Their skills complement those of people who deploy models and build software infrastructure.
II What skills does a data scientist need?
Data scientists demonstrate solid scientific foundations as well as business acumen (see Figure 2). Communication skills are usually required, because data scientists often interface with product managers, clients, or business leaders to provide insights for decision making. They understand business and product metrics such as conversions, click-through rates, and customer lifetime value.
They mostly write prototyping code, as opposed to production code written by engineers, and throw out most of the code they write.
If you’re interested in comparing your skills to other data scientists, we recommend taking the standardized machine learning, data science, mathematics, and algorithmic coding tests on Workera. If you’re a company hiring data scientists, you can administer computerized tests to AI job applicants for free using Workera Test and connect with AI practitioners using Workera Connect.
III What tools does a data scientist use?
Data scientists in different companies use different tools, but some tools stand out. The following tools grouped by task are the most frequently used tools identified in our research.
- Modeling is primarily done in Python using packages such as numpy, scikit-learn, pandas, matplotlib, TensorFlow, and PyTorch.
- Data engineering happens in Python and/or SQL or other domain-specific query languages.
- Business analysis is performed in Python, R, other domain-specific tools such as Tableau or Excel, or presentation software applications such as PowerPoint or Keynote
- Collaboration and workflow is managed with a version control system such as Git, Subversion, or Mercurial along with a command line interface (CLI) such as Unix and an integrated development environment (IDE) such as Jupyter Notebook or Sublime.
IV In what team structure does a data scientist fit?
Building an AI team requires bringing together complementary individuals who can progressively carry out the tasks of the AI project development lifecycle. AI teams focus on data engineering and modeling from the beginning, because they need to validate the feasibility of an AI project or idea. As the project becomes more mature, the team starts focusing on deployment, business analysis, and AI infrastructure.
Data scientists combine well with software engineers and software engineers-machine learning. Data scientists prototype solutions to prove a concept, while engineers make the project available to users.
Conclusion
This article aims to clarify what a data scientist is, what tasks they carry out, and what skills they need. If you’re an AI practitioner, we hope it helps you choose a career track.
Companies may refer to this role as data scientist, data analyst, machine learning engineer, research scientist, statistician, quantitative analyst, full-stack data scientist, and other titles. If you’re a hiring manager, we hope that it helps you define your job requirements.
AI organizations are constantly evolving, so this article is a work in progress. We intend to revise it as our team learns more about new roles.