Why Data Science teams need generalists, not specialists

Why Data Science teams need generalists, not specialists
HIROSHI WATANABE/GETTY IMAGES

In The Wealth of Nations, Adam Smith shows how the division of labor becomes the main source of increased productivity. An example is the assembly line of a pin factory: β€œOne worker pulls the wire, another straightens it, a third cuts it, a fourth sharpens the end, a fifth grinds the other end to fit the head.” Thanks to specialization focused on specific functions, each employee becomes a highly qualified specialist in his narrow task, which leads to increased process efficiency. Output per worker increases many times over, and the factory becomes more efficient in producing pins.

This division of labor by functionality is so ingrained in our minds even today that we quickly organized our teams accordingly. Data Science is no exception. Complex algorithmic business capabilities require multiple work functions, so companies typically create teams of specialists: researchers, data engineers, machine learning engineers, cause-and-effect scientists, and so on. The work of the specialists is coordinated by the product manager with the transfer of functions in a manner that resembles a pin factory: β€œone person receives the data, another models it, a third executes it, a fourth measures” and so on,

Alas, we should not optimize our Data Science teams to improve productivity. However, you do this when you understand what you are producing: pins or something else, and simply strive to increase efficiency. The purpose of assembly lines is to complete a task. We know exactly what we want - pins (as in Smith's example), but any product or service can be mentioned in which the requirements fully describe all aspects of the product and its behavior. The role of employees is to fulfill these requirements as efficiently as possible.

But the goal of Data Science is not to complete tasks. Rather, the goal is to explore and develop strong new business opportunities. Algorithmic products and services such as recommendation systems, customer interactions, classification of style preferences, sizing, clothing design, logistics optimization, seasonal trend detection and much more cannot be developed in advance. They must be studied. There are no blueprints to replicate, these are new possibilities with inherent uncertainty. Coefficients, models, model types, hyperparameters, all necessary elements must be learned through experimentation, trial and error, and repetition. With pins, training and design are done in advance of production. With Data Science, you learn as you do, not before.

In a pin factory, when training comes first, we neither expect nor want workers to improvise on any feature of the product other than to improve production efficiency. Specializing tasks makes sense because it leads to process efficiency and production consistency (without changes to the final product).

But when the product is still developing and the goal is training, specialization interferes with our goals in the following cases:

1. It increases coordination costs.

That is, those costs that accumulate during the time spent communicating, discussing, justifying and prioritizing the work that needs to be done. These costs scale super-linearly with the number of people involved. (As J. Richard Hackman taught us, the number of relationships r grows similarly to the function of the number of terms n according to this equation: r = (n^2-n)/2. And each relationship reveals some amount of the cost relationship.) When data scientists are organized by function, at every stage, with every change, every handover, etc., many specialists are required, which increases coordination costs. For example, statistical modelers who want to experiment with new features will have to coordinate with data engineers who add to the data sets every time they want to try something new. Likewise, each new model trained means that the model developer will need someone to coordinate with to put it into production. Coordination costs act as a price for iteration, making them more difficult and expensive and more likely to cause the study to be abandoned. This may interfere with learning.

2. It makes waiting times difficult.

Even more daunting than coordination costs is the time lost between work shifts. While coordination costs are usually measured in hours - the time it takes to conduct meetings, discussions, design reviews - wait time is usually measured in days, weeks or even months! Functional specialists' schedules are difficult to balance because each specialist must be distributed across multiple projects. A one-hour meeting to discuss changes can take weeks to smooth out the workflow. And after agreeing on the changes, it is necessary to plan the actual work itself in the context of many other projects that occupy the working time of specialists. Work involving code fixes or research that only takes a few hours or days to complete may take much longer before resources become available. Until then, iteration and learning are suspended.

3. It narrows the context.

The division of labor can artificially limit learning by rewarding people for remaining in their specialty. For example, a research scientist who must stay within the scope of his functionality will focus his energy on experimenting with different types of algorithms: regression, neural networks, random forest, and so on. Of course, good algorithm choices can lead to incremental improvements, but there is typically much more to be gained from other activities, such as integrating new data sources. Likewise, it will help develop a model that exploits every bit of explanatory power inherent in the data. However, its strength may lie in changing the objective function or relaxing certain constraints. This is difficult to see or do when her work is limited. Because a technical scientist specializes in optimizing algorithms, he is much less likely to do anything else, even if it brings significant benefits.

To name the signs that appear when data science teams act as pin factories (for example, in simple status updates): β€œwaiting for data pipeline changes” and β€œwaiting for ML Eng resources” are common blockers. However, I believe the more dangerous influence is what you don't notice, because you can't regret what you don't already know. Flawless execution and the complacency gained from achieving process efficiency can mask the truth that organizations are unaware of the learning benefits they are missing out on.

The solution to this problem, of course, is to get rid of the factory pin method. To encourage learning and iteration, data scientist roles should be generic but with broad responsibilities independent of the technical function, i.e. organize data scientists so that they are optimized for learning. This means hiring β€œfull stack specialists”—general specialists who can perform a variety of functions, from concept to modeling, implementation to measurement. It's important to note that I'm not suggesting that hiring full-stack talent should reduce the number of employees. Rather, I will simply assume that when they are organized differently, their incentives are better aligned with the learning and performance benefits. For example, let's say you have a team of three people with three business skills. In a pin factory, each technician will devote a third of his time to each job task, since no one else can do his job. In a full stack, each generalist is fully dedicated to the entire business process, scale-up, and training.

With fewer people supporting the production cycle, coordination is reduced. The generalist moves fluidly between features, expanding the data pipeline to add more data, trying new features in models, deploying new versions to production for causal measurements, and repeating steps as quickly as new ideas come up. Of course, the station wagon performs different functions sequentially and not in parallel. After all, it's just one person. However, completing a task usually takes only a fraction of the time required to access another specialized resource. So, the iteration time decreases.

Our generalist may not be as skilled as a specialist in a particular job function, but we do not strive for functional perfection or small incremental improvements. Rather, we strive to learn and discover more and more professional challenges with gradual impact. With a holistic context for a complete solution, he sees opportunities that a specialist would miss. He has more ideas and more possibilities. He fails too. However, the cost of failure is low and the benefits of learning are high. This asymmetry promotes rapid iteration and rewards learning.

It's important to note that the amount of autonomy and skill diversity afforded to full stack scientists is largely dependent on the robustness of the data platform on which to work. A well-designed data platform abstracts data scientists from the complexities of containerization, distributed processing, automatic failover, and other advanced computing concepts. In addition to abstraction, a robust data platform can provide seamless connectivity to experimental infrastructure, automate monitoring and alerting, enable automatic scaling and visualization of algorithmic results and debugging. These components are designed and built by the data platform engineers, meaning they are not passed on from the data scientist to the data platform development team. It is the Data Science specialist who is responsible for all the code used to run the platform.

I, too, was once interested in the functional division of labor using process efficiency, but through trial and error (there is no better way to learn), I discovered that typical roles better facilitate learning and innovation and provide the right metrics: discovering and building many more business opportunities than specialized approach. (A more effective way to learn about this approach to organizing than the trial and error I went through is to read Amy Edmondson's book Team Collaboration: How Organizations Learn, Innovate, and Compete in the Knowledge Economy).

There are some important assumptions that may make this approach to organizing more or less reliable in some companies. The iteration process reduces the cost of trial and error. If the cost of error is high, you may want to reduce them (but this is not recommended for medical applications or manufacturing). Additionally, if you are dealing with petabytes or exabytes of data, specialization in data engineering may be required. Likewise, if maintaining online business capabilities and their availability is more important than improving them, functional excellence may trump learning. Finally, the full stack model relies on the opinions of people who know about it. They are not unicorns; you can find them or prepare them yourself. However, they are in high demand and attracting and retaining them will require competitive compensation, strong corporate values ​​and challenging work. Make sure your company culture can support this.

Even with all that said, I believe that the full stack model provides the best starting conditions. Start with them, and then consciously move towards a functional division of labor only when absolutely necessary.

There are other disadvantages of functional specialization. This can lead to loss of responsibility and passivity on the part of workers. Smith himself criticizes the division of labor, suggesting that it leads to dulling of talent, i.e. workers become ignorant and withdrawn as their roles are limited to a few repetitive tasks. While specialization may provide process efficiency, it is less likely to inspire workers.

In turn, versatile roles provide all the things that drive job satisfaction: autonomy, mastery, and purpose. Autonomy is that they do not depend on anything to achieve success. Mastery lies in strong competitive advantages. And the sense of purpose lies in the opportunity to have an impact on the business they create. If we can get people excited about their work and have a big impact on the company, then everything else will fall into place.

Source: habr.com

Add a comment