NASA estimated that it took 400,000 engineers, scientists, and technicians to send astronauts to the moon on the Apollo missions. The massive workforce was comprised of people from four major enterprise companies and a host of subcontractors who worked for them.
Like sending astronauts to the moon, building AI requires access to a large number of people to gather, process, and structure training data. Speed to market is a priority, as AI development teams contend with the challenges of innovating fast in an increasingly competitive marketplace. As more companies seek fast access to talent that is in short supply, crowdsourcing has emerged as an alternative to an in-house team.
Crowdsourcing uses the cloud to send data tasks to a large number of anonymous people, who are paid based on the number of tasks they complete. While it offers a cheap option for training machine learning algorithms, it’s rarely as inexpensive as it seems.
Measuring Quality
In the first article of this three-part series, we explored the importance of quality. The better your data, the better your model will perform. And when the people who tag, label, or annotate your data provide low-quality work, your model struggles to learn.
There are three methods we use in the workforce industry to measure work quality. At CloudFactory, we use one or more of these methods to measure the quality of our own vetted, managed workforce on every project.
-
Consensus – We assign several people do the same task, and the correct answer is the one that comes back from the majority of workers. This is the crowdsourcing model.
-
Gold standard – There is a correct answer for the task, and we measure quality based on correct and incorrect tasks.
-
Sample review – We select a random sample of completed tasks, and a more experienced worker, such as a team lead or project manager, reviews the sample for accuracy.
3 Hidden Costs of the Crowd
Over a decade of processing critical business data for companies around the globe, we’ve learned that applying the crowdsourcing model to data processing for AI applications can get you access to a large number of workers. But it can create other issues that affect your speed to market. Here are some of the hidden costs of the crowd.
1. Poor data quality
Anonymity is a bug, not a feature, when it comes to crowdsourcing. Workers have little accountability for poor results. When task answers aren't straightforward and objective, crowdsourcing requires double-entry and consensus models to be used as control measures. If you’re unsatisfied with the work, often you must send the work through again, hoping for a different result, placing more of the QA burden on your team. Each time a task is sent to the crowd, costs rack up.
“Re-working poorly labeled data is very expensive,” said Brian Rieger, COO of Labelbox, a California-based company that provides tools for labeling and managing training data.
At CloudFactory, we have a microtasking platform that can distribute a single task to multiple workers, using the consensus model to measure quality. Our client success team found consensus models cost at least 200% more per task than processes where quality standards can be met from the first pass. Managed teams are better suited to tasks requiring high quality because they can handle more nuanced tasks and get them right the first time.
Our client success team found consensus models cost at least 200% more per task than processes where quality standards can be met from the first pass.
2. Lack of agility
In AI development, tasks can change as you train your models, so your workforce must be able to adapt. That requires a tight communication and feedback loop with your workforce. And if there isn't continuity in the workforce, it’s more difficult to acquire learned domain expertise and context that make it possible to adapt to changes in the workflow quickly. As a result, your process will be inefficient and your models will struggle to learn.
“Labelers get better at annotation tasks over time, as they get familiar with the source imagery and the nuances of the interpretation desired. Labelers who are better at labeling lead to better training data and that leads to better model performance,” said Brian Rieger (COO, Labelbox).
Crowdsourcing limits that agility to modify and evolve your process, creating a barrier to worker specialization, or proficiency with your data and process that grows over time. Workers are ever-changing, few overcome the learning curve, and you are likely to see less continuity in your data. Any changes in your process can create bottlenecks.
Data workers on a managed team can increase their domain expertise - or understanding of your rules and edge cases - over time, so they can make informed subjective decisions that are more accurate and result in higher quality data.
3. Management burden
When you crowdsource your data production, you can expect worker churn. As new workers join the crowd, you’ll have to rely on the business rules you created and task new workers with training themselves on how to do the work. If your team is bringing each new worker up to speed, be sure to allocate time for that management responsibility.
With some crowdsourcing options, you are responsible for posting your projects, reviewing and selecting candidate submissions, and managing worker relationships. You’ll need to factor in your costs to attract, train, and manage a disconnected group of workers.
If you’re considering a crowd model, look into who owns your data as part of your agreement. In addition to platform and transaction fees, some crowdsourcing vendors stake ownership on the data that passes through their platforms, which means they’re allowed to use your data to train their own algorithms or serve their own customers.
While it can be difficult to determine the end-to-end cost of a crowdsourced project, you can plan for the crowd to cost more per task as you send low-quality data back to the crowd for reprocessing. Watch for hidden fees in technology, onboarding, and training.
The Bottom Line
While only a handful of astronauts returned to Earth on the Apollo missions, there were 400,000 workers who made it all possible to cheer their return. Developing an AI product requires similar collaboration and strong communication among huge teams of people, many of whom are doing disparate work. Accuracy, consistency, and agility in your workforce is paramount to success because AI models require high accuracy and consistency, something that crowds can’t deliver.
This article is the second in a series of three about scaling your training data operations for AI. The next article will explore best practices for your AI data production line.
Crowdsourcing Workforce Strategy Training Data AI & Machine Learning