Synthetic Data for Computer Vision
Train computer vision models faster and more efficiently with Synthetic Data. This new AI technology uses 3D models instead of real data for training, making it a flexible and cost-effective solution.
Synthetic data is a new technology in AI that solves the problem of collecting and labeling training data for computer vision models. Synthetic data is generated or rendered using 3D models instead of collecting real data, making it a faster and more flexible solution. This data is similar to how Hollywood creates animations, and the 3D models describe the size, shape, and appearance of objects for a computer to visualize.
How Computer Vision Models Are Trained
Computer vision models are created to perform a certain task. For example, to detect or classify objects seen on camera footage. But before they can perform this task they have to be trained. This training happens pretty much in the same way as humans learn; we show the model large amounts of example images. Those images can either be of something we are trying to find, or something we don’t want to find. By repeating this process for many hours on powerful computers, the AI model will end up learning its task, which means you can then show it images it has never seen and it will give you the correct answer based on the task.
For years, researchers and companies have been collecting data to train better and better AI models, because the more unique data you show it, the better it will perform its task. Those collections of data are what we call “datasets”, and many of them are available publicly for free.
The Problem of Collecting Data
At the same time, the training process is also AI’s biggest weakness. Despite everyone’s best efforts, there simply isn’t enough training data for most use-cases. It’s easy enough to collect images of people to train a person detector. But let’s say you want to train an AI model to tell you if someone is wearing a backpack and what color shirt they are wearing. To train an AI model for this task you would need to collect photos of every possible type of backpack and shirt color. This quickly becomes an impossible task considering you would need 1000s of examples of each variation.
Not all data is useful either. Images that are very similar, say from the same location, camera, or same object won’t really help an AI learn. Images need to be diverse with different and unique features so the model creates a good understanding of what it needs to do. This means that in most cases only a few frames from a video recording is useful.
Sometimes collecting data is even just plain dangerous. Think about an AI model trained to alert an operator when an accident has taken place, a fire has broken out, or people fighting.
The Problem of Labelling Data
You also need to explain to your AI model what you expect from it. This is done through a process called labeling, or annotating. It’s similar to how you would tag faces of people on Facebook or add keywords to your holiday snaps. First, you draw a rectangle (or bounding box) around the object of interest and then you put a label on it. You’ll need to repeat these steps for each object in the image. And this has to be done for every single image you would like your model to train on.
While labeling a few images by yourself might be fun and certainly educational, a good size dataset can easily run in the 100,000s of pictures and take up to a minute per picture to label.
Today, you can find dozens of annotation companies that specialize in labelling your pictures often using workforces in low income countries. Besides being expensive for large datasets, it’s also extremely common to find labelling mistakes. These mistakes can create real problems when you include them in your AI training process.
And this is where synthetic data comes in to the picture.
What Synthetic Data Actually Is
Synthetic data is a relatively new technology in the field of Artificial Intelligence, and at CVEDIA, we have been pioneering its use over the past number of years.
In short, synthetic data solves the need for data collection by generating, or rendering the training images. While there are several types of synthetic data, we’ll focus on the 3D rendering approach here. This is by far the most powerful type and gives the greatest flexibility.
Rendering is a process very similar to how Hollywood creates animation movies. Instead of using real-actors and objects, it draws them based on 3D models. 3D models are digital versions of an object or person. It describes an object’s size, shape and looks so that a computer can visualize it on screen.
Because rendering is very fast, you can create 1000’s of images per hour without leaving your desk. The rendering can take place on any regular desktop computer, often using the same technology that is used to create video games.
For this to work correctly you need a large collection of 3D models, or you’ll suffer from the same problems as real data collection. At CVEDIA, we have over 30,000 3D models for many types of objects, including clothing and exotic animals to buildings and ships.
It’s this rich variety in objects that gives rendered synthetic data its power.
Data Generation Instead of Data Collection
Since the objects you render are virtual, you also have a lot of freedom. You could place any type of backpack on a 3D model of a person and give it 100s of different colors. This generated picture directly replaces the need for an actual picture of a person with that backpack. Which means you won’t actually have to go out in the world and collect pictures of it. This is a huge time saver, especially if what you’re trying to detect or classify is a rare or very specific object.
Synthetic Data for Edge Cases
In some cases there might simply not be any data. Take for example our work with RESOLVE. They developed a tiny camera called WildEyes AI, which runs AI to detect poachers and protect endangered species. The species they are trying to protect is near extinct, which makes collecting images of it in the wild nearly impossible. Using synthetic data, we were able to recreate these images. For example, the spotted snow leopard and successfully train an AI model to detect this animal.
An Example of Synthetic Data
In many ways it looks just like the real thing. Unless you’ve got a trained eye it might look real to you. But it’s not!
This video was rendered using only 3D models, and it allowed us to train an AI model for use in a smart city application.
Other Types of Synthetic Data
So far we’ve focused on 3D rendering to generate synthetic data. This is our method of choice at CVEDIA but there are 2 other types:
Deep Learning Based Synthetic Data: Although it sounds like a chicken and the egg problem, AI can be used to generate training data as well. This method does not give you the same level of control or flexibility as 3D rendering though. There are no 3D models or cameras, and you have no control on the annotation labels. How it works is that you train a neural network on either a public dataset or your own data after which it can “synthetize” more images like the ones it has already seen.
Augmentations: Augmentation is the process of changing a photo to be different enough for a network to learn something new from it, but similar enough that the meaning remains the same. For example, flipping a photo horizontally (like a mirror) is a completely different image for a neural network, but the meaning is still the same. There are 100’s of different augmentation types and people use them a lot to extract more value from a small dataset.
Is Synthetic Data Always Better Than Real Data?
That really depends on what you are trying to do. Synthetic datasets are great when you want your application to work anywhere, anytime and in unknown situations. But there are times where an AI is being run from very similar camera viewpoints. For example, within the same type of environment, or when the camera position stays the same. Here it might be better just to collect real data and benefit from the biases it contains (yes, biases can be a good thing, too!). In a way, we call this process ‘overfitting’.
How Does Synthetic Data Help With AI Bias?
Another major problem that plagues many AI applications is “bias”. Because training data is often collected from specific regions in the world, it’s highly biased. Simply put, there are a lot more white people in most datasets then there are black or Asian people. This imbalance in data can cause all kinds of wrong behavior in a neural network.
Because the training data is rendered, there is a lot more control on what it looks like. For example, we can generate images of people with different ethnicities or body types in a fair and balanced way. By showing these balanced examples to a neural network you eliminate its ability to learn a bias.
How Does Synthetic Data Help With Privacy?
Imagine a situation in which you are collecting training data but legally cannot use it. This could be something as simple as recording people on the streets in Germany. Without proper blurring of faces, bodies or license plates, you’re at legal risk. Unfortunately, blurring those objects creates images that simply won’t work for AI training.
With synthetic data we can recreate those environments and the people in them and generate data for any task you want to perform. This means you won’t have to collect any privacy sensitive data and your application will still work fine.
Synthetic Data for Validation Cases
After creating an AI model with either real or synthetic data, you’ll usually want to measure its performance. You’ll want to make sure that in between updates nothing changed or broke. You can use real data for this, but it presents a very limited test at best.
Using synthetic data you create a scene only once, for example, a ship on the ocean. And then change many settings like weather conditions, ocean conditions, or even time of day. From this single scene you’re now able to synthesize thousands of different variations.
Now that is powerful validation!
Summary of the Main Benefits of Synthetic Data
- No Data Collection: Synthetic data is the process of rendering images using 3D models, which can be used to train neural networks. This means you will no longer have to collect data out in the field, saving you a lot of time and money.
- No Manual Labeling: The rendered images also include all the annotations you need to train an AI model for a specific task, so you no longer have to label millions of images or outsource the work to annotation companies.
- Reduced Bias: Datasets created through rendering contain less biases than what you would find in real data. This helps a neural network get a much better understanding of a task.
- Flexible Conditions: Synthetic data is created using virtual cameras and can simulate different camera angles, light conditions, and weather conditions.
- Privacy Protected: No real humans or vehicles are used, which means there is no risk for GDPR or privacy violations.