If you’re on an AI project team that has massive data that requires labeling for machine learning or deep learning, you’re in a race to usable data. Outsourcing seems the easiest answer. But what happens when data labeling involves protected or private data? What are the security risks that come with outsourcing your data labeling? Here’s the short answer: you’ll need to take a close look at your data labeling service provider and ask some critical questions.
5 Risks with Outsourcing Data Labeling
Let’s start with the risks. There are a number of ways data security could be compromised by a data labeling service:
- Workers access your data on an insecure network or on a device that lacks malware protection.
- Workers can download or save portions of your data. For example, they can take screen captures and share them via social media or email.
- Workers label your data while seated in a public place.
- Workers lack training, context, or accountability for security protocols.
- Your data labeling service lacks certifications for data security, such as HIPAA or SOC 2.
Security and Your Data Labeling Service
With the right policies and processes, your data labeling service can offer data security that is equivalent to what you would have if you labeled your data in-house. The key is finding data labeling service providers that respect your data the same way you do.
Here’s what to look for in your data labeling service when you require extra security:
- Workforce - Each worker undergoes a background check and must sign a non-disclosure agreement (NDA) or similar document that communicates your expectations about data security. The workforce should be managed closely for compliance with security requirements.
- Devices - Workers must turn in any devices they bring to the workplace, including mobile phones and external drives. On the devices workers use, your service provider should disable any capability to download or otherwise store data.
- Workspace - Workers must complete the work in a place where their computer screens cannot be viewed by people who haven’t met the data-security requirements for your project.
Your data annotation tool choice also can affect security. The right data labeling service will be able to provide a tool or make recommendations for the best tools for your use case and requirements. See our guide to data annotation tools for more information.
A Secure Alternative: CloudFactory
For the last decade, we’ve outsmarted outsourcing. CloudFactory uses a tiered approach to security that allows us to meet the basic levels of security that all of our clients need and expanded levels of security that many of our clients require.
We offer three levels of security to meet your unique requirements:
- Essentials: Every CloudFactory client gets this level of data security. We vet each worker with a rigorous background screening and additional evaluative measures, including a personal interview and resume validation. All of our workers sign a non-disclosure agreement that extends to all CloudFactory client work, and they get training in how to do the work. Workers use computers that run the latest virus software. This is acceptable for most data work.
- Shield: When work requires greater security, such as compliance with GDPR regulations, we provide enhanced IT and network security. We designate leaders who receive comprehensive training on the data security guidelines and requirements for your work. These team leaders physically monitor all worker activity during each shift. We provide a CloudFactory-managed facility with enhanced security, including 24x7 monitoring via closed-circuit television (CCTV). Building access is tightly restricted, with full-time security personnel and badged entry. All visitor access is logged and monitored.
- Shield Plus: If you are working with highly sensitive information, such as data that contains personally identifiable information (PII) or protected health information (PHI), we provide Shield Plus. This level includes everything at the Shield level, plus enhanced worker background screenings. Workers also must surrender external devices (e.g., cell phone, tablet) before entering the room where they will label your data. We train workers on industry-specific compliance standards (e.g., HIPAA), and work is done in a room with 24x7x365 CCTV monitoring and badged entry.
6 Critical Questions to Ask Your Data Labeling Service About Security
Ask these questions to find out if your data labeling service provider knows how to respect your data:
- How do you screen workers? Can all of your data labelers sign a non-disclosure agreement related to their work on our data?
- What measures do you take to keep data labelers from saving screenshots, downloading, or otherwise capturing my data for use somewhere else?
- Is the working environment secure? Is it locked? How do data labelers access the facility? Who else has access?
- Do you provide an option to bring workers to a secure location for sensitive data work?
- How do you protect data that comes with special regulatory requirements, such as HIPAA or GDPR?
- How do you measure quality? How do you ensure data labeling is accurate and consistent across all data labelers and datasets? (Be sure to ask about quality, because if your data is sensitive, quality is likely important too.)
Don’t let security concerns stop you from using a service that will accelerate the strategic part of machine learning: model training, tuning, and algorithm development. Ask the right questions to find a data labeling service provider that can meet your security requirements. And reclaim more time to focus on innovation and business growth.
To learn more about CloudFactory’s secure data labeling service, download our security specifications.