Managing Data Science Teams

Being a manager anywhere is hard, yet overseeing a data science team can be especially challenging. Roles are still in flux, turnover is high, and companies are ironing out the best ways for teams to function. And being a technical whiz doesn’t necessarily prepare you to manage others.

Whether you’re managing a data science team today, preparing to launch one, or hope to do so in the future, this field guide will make you a better data science manager in the enterprise.

In this field guide:

How to Hire and Retain Data Scientists
Habits of Successful Data Science Managers
Knowledge Management
Collaboration
Scaling Data Science Teams

How to Hire and Retain Data Scientists

On a panel discussion at the recent Rev summit for data science leaders, three experts shared their tips for hiring, retaining, and nurturing data science talent.

Their tips are summarized below.

Tips from Data Science Managers

Recruit leaders first

If you start with a junior hire or someone fresh from academia, they’re likely to feel lost and frustrated without mentorship. Michelangelo D’Agostino, senior director of data science at ShopRunner, suggested installing a more experienced person first to give the team direction.

Choose managers carefully

Don’t just focus on technical talent and experience. The panelists agreed that humility, curiosity and an ability to listen and take feedback are crucial traits for a senior role. “Someone who’s going to be in charge has to know they don’t have all the good ideas or all the answers,” D’Agostino said. He suggested asking prospective hires to describe a situation in which they failed and how they would avoid repeating it to gauge capacity for self-reflection.

Rethink data challenges

Given the competitive hiring landscape, onerous take-home tests can screen out qualified candidates and create a tense, exam-like atmosphere. You don’t need these challenges to make strong hires, said Patrick Phelps, lead data scientist at Insight Data Science. “It’s really hard to scale…[and] it takes a huge amount of time to grade,” he said. “I’d rather just put a good data scientist on my team in a room with them for an hour.” If you do include a challenge, D’Agostino suggests having candidates complete a coding exercise in the office and talk through it as in an informal code review.

Data Scientist Hiring and Onboarding Plan

This hiring and onboarding plan template walks through key questions to help find and train new data scientists. The plan template includes key questions around attracting top talent, hiring process, on boarding, retention, and more.

By taking a systematic approach, data science leaders will maximize the odds of finding and cultivating a team that is greater than the sum of its constituent parts.

Download PDF of this hiring plan for easy printing.

Attracting Top Talent

What’s your differentiated value proposition for candidate data scientists? List three things that make the opportunity unique, that you think will resonate with your target candidate pool. Test your pitch on your group. Get feedback.
What are 1-3 risks that might make the opportunity less appealing that competitive opportunities? How can you mitigate or get ahead of them?

Hiring Process

What are the three most important attributes for your candidates?
What is your assessment plan for each?

Onboarding

What activities and outcomes need to have been achieved in the first 30, 60, and 90 days?
What are the most important pieces of “tribal knowledge” your new hire needs to know, and how will she learn them? Examples include data sources, project methodologies, stakeholder dynamics, notable wins and losses, etc.

Retention and Management

What skills do you hope this candidate develops over the first year?
What metrics will determine success of this candidate after a year? Examples include certain business metrics, community contributions, number of insights produced, or project iteration velocity.

Retain Your Talent

Don’t oversell the role. Half of data scientists stay at their jobs for two years or less. To reduce turnover, be truthful about the position you’re hiring for, advised Conor Jensen, customer success manager at Domino. “Be very realistic upfront about what the role is, what the pain is going to be, where you think the impact is going to be, and what the timeline looks like,” he said. “A lot of times we get very excited about what we’re going to accomplish as data scientists, and we can get a little ahead of ourselves.”

Understand team members’ motivations. Jensen recommended taking time to discover each employee’s goals, interests and personal incentives. Then you can pair them with rewarding projects and recognize accomplishments in a meaningful way.

Offer support. “Data science can be a discipline of failures: Models fail, processes fail, data sources turn out to be terrible,” Phelps said. He suggested offering positive reinforcement and reminding team members that it can take years to see an impact. Jensen also suggested breaking problems into manageable chunks so employees aren’t intimidated by an overwhelming project.

Create learning opportunities. Data scientists often leave their jobs because they’re bored, observed D’Agostino. If core projects aren’t cutting-edge, he suggested creating opportunities for team members to learn new things, such as a weekly lunch to discuss the latest research or occasional hackathons to test a new software framework or computational technique.

Habits of Successful Data Science Managers

The following are the seven habits we have observed in many successful data science managers, in no particular order.

Build bridges to other stakeholders. Avoid friction and crossed wires by opening communication channels with other teams. Consider putting a data scientist and product manager in a room for an hour before each new project to ensure they’re on the same page. Making data scientists attend meetings without their laptops can force them to communicate with other stakeholders. Giving data scientists opportunities to explain their work to engineers, product managers, and others can also improve communication.
Track performance. Use a template to keep track of what you discussed, the objectives you set, and the feedback you gave during one-on-one meetings with your reports. Relying on memory won’t work.
Aim to take projects to production. Preparing teams to deploy their own API services and to productionalize code helps you move faster, and you don’t get blocked on engineering resources that might not be available.
Start on-call rotation. As teams get bigger, set up a weekly rotation of data scientists on call to fix models that break. That encourages better documentation and gives those not on-call time to focus on core projects.
Ask the dumb questions. Seemingly simple questions can open the door to identifying and solving fundamental problems.
Always be learning. Read prolifically to keep up with developments in this quickly evolving field. Consume not only technical material, but also insights about management and organizational psychology.
Get out of the way, but not forever. If you’re a new manager, consider stepping away from coding for three to six months. Otherwise, you risk never truly embracing the manager role, and might under-serve the team. After that, feel free to tackle non-critical projects or those nobody else wants to do.

Knowledge Management

Data scientists often greet the topic of knowledge management with a sense of dread. Some see it as a time-sucking distraction from their “real” jobs; others don’t fully grasp what it means. Even many who see the concept’s value find the process painful.

But knowledge management capabilities will become a key source of competitive advantage for companies, according to Matthew Granade, chief market intelligence officer at Point72, and Mac Steele, director of product at Domino Data Lab. In the video below, the pair laid out why knowledge management matters and how businesses should make it a priority.

The key points about knowledge management for data science teams are below.

What is knowledge management?

The goal of knowledge management is to capture insight, which can be defined as “better understanding.” Insight is thus relative—it’s about constantly improving upon previous ideas. From Einstein to Freud, insight is often seen as the purview of the “lone genius.” In reality, most insight comes from collaborating with others and expanding on existing ideas.

Creating that kind of “compounding machine” requires a way to capture knowledge, a framework for users to follow and mechanisms to improve through feedback. Increasingly, companies’ futures will be determined by how well they do this. With more algorithms and infrastructure widely available, the pool of data science talent growing and requirements to share data expanding, the ability to capture and augment unique insights will become a key differentiator.

Why is knowledge management difficult?

Some knowledge management challenges plague every industry:

Organizing knowledge in advance is difficult. Classifications are often too rigid, since you don’t know what will matter in the future.
There are few incentives to participate. As one data scientist said, “I get paid for what I build this year, not maintaining what I built last year.”
It’s a classic collective action problem. No one wants to be the first to spend time on documentation. When knowledge is being captured, it can be hard to know how to act on it.
Systems always lag behind reality. If knowledge management takes extra time and is done in a different system from the core work, its quality will suffer.

Other obstacles are unique to data science teams:

People use different tools. Knowledge management is tougher when some team members work in R and others in Python, and when some store code on GitHub and others in email. Training people to use the same systems is difficult because of high turnover.
The components of a single project are scattered. Artifacts and insights may be spread across a Docker store, a wiki, a PowerPoint presentation, etc.
If you have code, that doesn’t mean you can rerun it. A meta-analysis of 600 computational research papers found that only 20 percent of the code could be re-run; of that share, many second attempts yielded slightly different results.

How to improve knowledge management

There are four steps that can help data science leaders improve knowledge management in their enterprise organizations:

1. Capture as much knowledge as possible in one place.

The more things are in there, the more connections you have across them, and the value grows that way. You don’t want people operating on the fringes. A common platform that encompasses both the core work and knowledge management is key to ensure it gets done and minimizes the burden. If you can’t capture everything, start with the most valuable model or knowledge, and build a system around that.

Test: Ask five data scientists in your company, separately, “How many projects do you think this team is doing right now?” They’ll probably have different answers.

2. Choose a knowledge management system that allows for discovery, provenance, reuse, and modularity

Discovery: Data scientists spend much of their time searching for information, cutting into productivity. Teams have to decide whether to curate knowledge (the Yahoo approach) or index it (the Google approach). Curation makes sense when the domain is relatively stable. Indexing and searching is best when the domain is fluid, and you can’t possibly know beforehand what the taxonomy should look like.

Test: Ask a new hire to work on a topic, and time how long it takes them to collect the right artifacts. If it’s weeks or months, that’s a red flag.

Provenance: Let people focus on the aspects of knowledge management that matter. Use a platform that allows people to synthesize their work, not have to track which software version they used.

Test: Write down beforehand what percentage of time you think your team members should spend on documentation. Then ask a few how long they actually spend. This could be eye-opening.

Reuse: If it won’t run, it won’t get reused. That requires access to not only code, but also historical versions of datasets.

Test: Ask a new hire to reproduce the work that another data scientist did six months ago, preferably one who has left the team or organization. Ask him or her to update it with the most recent data. If it takes a week or a month, that’s troubling.

Decompose and Modularize: Ensure that people have the incentives and tools to create building blocks that can be reused and built upon.

Test: Ask two teams that have worked on similar projects to do a post mortem and identify overlapping work.

3. Identify the right unit of knowledge

Compounding systems rely on units of knowledge. In academia, those are books and papers; in software, it’s code. In data science, the model is the right thing to organize around, because it’s the thing data scientists make. The model includes the data, code, parameters and results.

4. Think beyond technology

Changes at the people and process levels are also important. Reframe how people see their jobs: They should spend less time doing and more time codifying and learning. Make collaboration a priority in hiring and compensation. Finally, while knowledge management should be seen as everyone’s job, some organizations create new roles for curating or facilitating knowledge.

Collaboration

The following three videos provide a range of lessons on fostering collaboration among data scientists and other stakeholders within the enterprise.

Internal Practices for Facilitating Collaboration

Fostering Collaborative Environments

How to Affect Company Change

Scaling Data Science Teams

What does it take to run a sophisticated data science organization? What are some of the things that need to be on your mind as you scale to a repeatable, high-throughput data science machine? The two videos below provide two perspectives.

Lessons from The Climate Corporation

Erik Andrejko, VP of Science at The Climate Corporation, has spent a number of years focused on this problem, building and growing multi-disciplinary data science teams.

In the video below, Erik discusses what it takes to continue building world-class data science teams. He also discusses the practice of data science, the scaling of organizations, and key components and best practices of a data science project.

Lessons from the Fortune 500

Through working with companies ranging from agile startups to the Fortune 500, we have been able to curate use cases and learnings from these organizations about the challenges and successes of growing data science teams.

In this video we share some of those learnings, including: Goals for data science programs, their challenges, performing a diagnosis, managing projects and systems, and leveraging a data science platform to scale.

Next Steps

This field guide covered the human components of managing data science teams in the enterprise: Hiring and onboarding, nurturing teams to success, building the right habits, capturing and managing knowledge, and fostering collaboration. Now, those data scientists need something to do. Learn how to manage data science projects in the enterprise.