Data annotation tools for machine learning are advancing fast. In technical solution engineering at CloudFactory, we’re seeing new tools and new features nearly every month. One emerging feature is automation, also known as pre-annotation or auto labeling. This article will focus on some of its benefits and drawbacks.

What’s auto labeling?

Auto labeling is a feature found in data annotation tools that apply artificial intelligence (AI) to enrich, annotate, or label a dataset. Tools with this feature augment the work of humans in the loop to save time and money on data labeling for machine learning.

Most tools allow you to load pre-annotated data into the tool. More advanced tools, which are evolving into platforms (e.g., tool plus Software Development Kit or SDK), allow you to leverage AI or bring your own algorithm to the tool to improve the data enrichment process by auto labeling data.

Other tools offer prediction models that suggest annotations so workers can validate them. Some features leverage embedded neural networks that can learn from every annotation made. All of these features can save time and resources for machine learning teams and will have a profound effect on data annotation workflows.

Top benefits of auto labeling

In our work with organizations using tools to annotate images for machine learning, we find auto labeling can be helpful when it is applied in a data annotation workflow in two ways:

1) Pre-annotate some or all of your dataset. Workers come behind the automation to review, correct, and complete the annotations. Automation cannot annotate everything; there will be exceptions and edge cases. It’s also far from perfect, so you must plan for people to make reviews and corrections as necessary.

2) Reduce the amount of work sent to people. An auto-labeling model can assign a confidence level based on the use case, task difficulty, and other factors. It enriches the dataset with annotations, and sends annotations with lower confidence scores to a person for review or correction.

We’ve run time experiments, with one team using tools that have an automation feature versus another team that is manually annotating the same data. In some cases, we’ve seen auto labeling provide low quality results which increases the amount of time required per annotation task. Other times, it has provided a helpful starting point and reduced task time.

In one image annotation experiment, auto labeling combined with human-powered review and improvements was 10% faster than the 100% manual labeling process. That time savings increased to 40% to 50% faster as the automation learned over time.

It also had a more than five-pixel margin of error for vehicles and missed the objects that were farthest from the camera. As you can see in the image, an auto-labeling feature tagged a garbage bin as a person. It’s important to keep in mind that pre-annotation predictions are based on existing models and any misses in the auto labeling reflect the accuracy of those models.

Data annotation tools can include automation, also called auto labeling, which uses artificial intelligence to label data, and workers can confirm or correct those labels, saving time in the process. This screenshot of a street sign shows how auto-labeling enriched an image with a bounding box around a garbage can. It was a mistake. It labeled the object as a person. While auto labeling is not perfect, it can provide a helpful starting point and reduce task time for teams of data labelers.In this photograph from one of our experiments, an auto labeling feature tagged a garbage bin as a person.

Some tasks are ripe for pre-annotation. For example, if you use the example from our experiment, you could use pre-annotation to label images, and a team of data labelers can determine whether to resize or delete the labels, or bounding boxes. This reduction of labeling time can be helpful for a team that needs to annotate images at pixel-level segmentation.

Our takeaway from the experiments is that applying auto labeling requires creativity. We find that our clients who use it successfully are willing to experiment, fail, and pivot their process as necessary.

The bottom line on auto labeling

Auto labeling is a game-changer but it’s not a slam dunk. Like most AI-powered solutions, it requires creativity and iteration along the way to successfully generate time and resource savings. Using these features saves annotation time but you’ll still have to perform quality control checks on the work that is done.

We expect auto labeling to continue to improve, so this is an area to keep an eye on as you prepare for your next machine learning project. To learn more about data annotation tools, check out Data Annotation Tools for Machine Learning (An Evolving Guide).

Data Labeling Workforce Strategy Data Annotation Tools AI & Machine Learning

Get the latest updates on CloudFactory by subscribing to our blog