Core Technology

Model Forge

CVEDIA's proprietary end-to-end model lifecycle and deployment system, built to ensure that AI models don’t just work — they work everywhere they are deployed.

Enabling Technology

Bridging Data Science and Deployment

Model Forge bridges the gap between advanced data science, real-world hardware constraints, and continuous performance improvement. All without adding complexity for integrators or end users.

— Automatic optimization for any target hardware
— One canonical model per capability, improvements benefit everyone
— Consent-based active learning for continuous improvement
— Version-controlled, non-breaking updates

Modern AI doesn't fail in the lab, it fails in deployment. Model Forge is CVEDIA's answer to this reality.

Compilation Pipeline

model.onnx

→

Parse → Optimize → Quantize → Schedule

TensorRT

.engine

OpenVINO

.xml/.bin

Hailo

.hef

MNN

.mnn

TFLite

.tflite

RKNN

.rknn

DEEPX

.dxnn

Blaize

.so

One Model Definition.
Every Target Optimized

Model Forge is a neural network compiler that transforms a single model architecture into hardware-specific optimized artifacts. It handles quantization, kernel fusion, memory layout optimization, and instruction scheduling, automatically tuned for each target's constraints.

INT8 Automatic quantization with calibration, no manual tuning required

FUSE Layer fusion and kernel optimization per-backend

TILE Memory tiling for cache efficiency on constrained devices

BENCH Automated accuracy and latency validation per target

Supported Inference Targets

Auto-detect and deploy. Runtime selects optimal backend automatically.

NVIDIA

CUDA / TensorRT

Intel

OpenVINO

Hailo

Hailo-8 / 8L

Rockchip

RKNN NPU

ARM

CPU / Mali GPU

Blaize

GSP

DeepX

dx-m1, dx-m1m, dx-h1 quattro

Technical Capabilities

Quantization

Post-training quantization with INT8/FP16, per-channel and per-tensor calibration, mixed-precision layer selection, accuracy validation against FP32 baseline, and automatic fallback for sensitive layers.

Optimization

Conv-BN-ReLU fusion, attention kernel optimization, memory layout transformation (NCHW↔NHWC), constant folding and dead code elimination, and dynamic shape support where available.

Validation

Automated benchmark suite per target, latency P50/P95/P99 profiling, mAP regression testing, memory usage tracking, and CI/CD integration support.

Deployment Pipeline

Continuous Deployment Pipeline

One canonical model. Improvements ship to all targets automatically.

Train

Single model trained on synthetic + real data

Compile

Model Forge generates optimized artifacts

Validate

Automated testing on all target hardware

Deploy

Runtime auto-selects correct variant

Learn

Feedback improves next iteration

// Initialize runtime - hardware auto-detected
auto runtime = cvedia::Runtime();

// Load model - correct variant selected
auto model = runtime.loadModel("person_vehicle");

// Run inference - same API everywhere
auto detections = model.infer(frame);

// That's it. Works on GPU, CPU, NPU.

Works on any device. No extra configuration

Integrators write against a single API. Model Forge handles device detection, model selection, and memory management. Deploy to a datacenter GPU or a $50 edge device with the same code path.

No TensorRT / OpenVINO / Hailo SDK expertise required

Automatic device enumeration and fallback

Model updates ship without code changes

Same binary runs on any supported target

See Model Forge in Action

Run our models on your hardware. We'll show you the numbers.

Set Up a Call Read the Docs