Big Data Management Platform for A.I.

From Dataset to Training Set

Explore Public Datasets

CVEDIA is a free, cloud-based service simplifying image dataset collection, preparation, and processing.

We give you direct access to standardized versions of public image datasets such as Open Images, COCO, ImageNet, and YouTube-8M. Using our open-source CLI tool, you can easily export your filters and augmentations to your local server in the format of your favorite machine learning framework including Torch, Caffe, TensorFlow, MXNet, Digits, Theano, and Deeplearning4J.

The art and science of training neural networks and applying them to specific problems across disciplines begins with data preparation and management.

We are intimately aware of the various infrastructure problems that limit our ability to work with ever growing datasets. CVEDIA developed a series of productivity tools designed to simplify data collection and preparation. We provide you with 30 real-time image augmentations, metadata augmentations, advanced filters, and direct exports that allow you to customize all of the parameters for input. Transform your raw datasets to augmented and preprocessed training sets with ease and efficiency.

The ability to train large models fast to push the boundaries of what is possible with computer vision begins with high quality datasets. CVEDIA lets you create training sets quicker than with any other dataset management platform.


We support many image sources including the leading image repositories, geospatial imagery, video sources, telemetry data, Multispectrum, Polygon, Segmentation and SIFT Data, and biomedical imagery. Work from existing datasets or create your very own.

Real-Time Image Augmentations

Choose from dozens of standard image transformations and apply them in any order and quantity to a dataset or selection with our custom engineered solution - ARC. Based on user specified logic, ARC applies image modifications in real-time.

Metadata Augmentations

Filter and normalize metadata with custom code. Metadata augmentations include random crops, rescale, custom zoom levels, and geographic coordinates. EXIF data can be selectively applied. Images with polygon augmentation is also supported.

Advanced Filters

CVEDIA’s Metadata Query Language provides unprecedented flexibility in preparing and transforming your datasets. Slice and filter datasets by any imaginable variable. Search and filter directly on image EXIF data or on statistical values like global or channel Mean.

Direct Exports

The CVEDIA-CLI tool is an Open Source project hosted on GitHub that directly exports your selected filters and augmentations in batches to your local server in the format of the machine learning application of your choice.


Various annotation types including textual labels and sentences, bounding boxes, closed polygons, opened polygons, pixel-based maps and segmentations, and numbered and labeled landmarks are all supported by CVEDIA.


Online Training Sets

Using a custom implementation on your Neural Network framework, we enable bi-directional communication with the Cvedia API, which results in streaming of augmented images directly to the GPU. No more mass storage or excessive downloading on huge training sets.

Adaptive Training Sets

With online training, the training Loss back is automatically reported, allowing Cvedia to refine the training set in real-time and skip redundant images, offering an even faster convergence.

Cloud Computing Integration

Seamlessly integrate all of your work directly to a Cloud Computing service like Amazon AMI or Google Cloud Machine Learning Service.

Speed and Scalability

Able to scale from a single CPU to multiple GPUs to multiple machines. CVEDIA’s scalability is on demand. CVEDIA reduces training imes from weeks to a few hours. Our platform optimizes massive image datasets.


Working with the languages and networks you know, like C++ and Python, your exported data augmentations and filters are compatible with every single machine learning framework that is on the market.

Standardized Datasets

Datasets are comprised of attributes with varying scales. Bypass the usual first step of data normalization with our standardized datasets. Images, annotations and metadata are normalized for you to save you time and boost performance.

Integrity Checking

Confirms data on disk has been downloaded. Safety feature during API changes or other system-based interruptions.

Resume Downloading

Exports that have been interrupted can be resumed from the last downloaded entry, while still retaining a valid subset on disk for testing.


Customize. Transform. Cut. Slice. Adapt. Rescale. Fine tune. Your datasets. Easily manage your most important asset with CVEDIA.

The first platform of its kind that tackles the gruelling process of collecting, cleaning, and managing datasets for computer vision from beginning to end. Regardless of the data source or the application, our tools allow you to create clean, non-corrupt data faster than any other platform. Work smarter, not harder.