Model Forge

CVEDIA's proprietary end-to-end model lifecycle and deployment system, built to ensure that AI models don’t just work — they work everywhere they are deployed.

Bridging Data Science and Deployment

Model Forge bridges the gap between advanced data science, real-world hardware constraints, and continuous performance improvement. All without adding complexity for integrators or end users.

  • Automatic optimization for any target hardware
  • One canonical model per capability, improvements benefit everyone
  • Consent-based active learning for continuous improvement
  • Version-controlled, non-breaking updates

Modern AI doesn't fail in the lab, it fails in deployment. Model Forge is CVEDIA's answer to this reality.

Compilation Pipeline
model.onnx
Parse → Optimize → Quantize → Schedule
TensorRT
.engine
OpenVINO
.xml/.bin
Hailo
.hef
MNN
.mnn
TFLite
.tflite
RKNN
.rknn
DEEPX
.dxnn
Blaize
.so

One Model Definition.
Every Target Optimized

Model Forge is a neural network compiler that transforms a single model architecture into hardware-specific optimized artifacts. It handles quantization, kernel fusion, memory layout optimization, and instruction scheduling, automatically tuned for each target's constraints.

INT8 Automatic quantization with calibration, no manual tuning required
FUSE Layer fusion and kernel optimization per-backend
TILE Memory tiling for cache efficiency on constrained devices
BENCH Automated accuracy and latency validation per target

Supported Inference Targets

Auto-detect and deploy. Runtime selects optimal backend automatically.

NVIDIA
CUDA / TensorRT
Intel
OpenVINO
Hailo
Hailo-8 / 8L
Rockchip
RKNN NPU
ARM
CPU / Mali GPU
Blaize
GSP
DeepX
dx-m1, dx-m1m, dx-h1 quattro

Technical Capabilities

Quantization

Post-training quantization with INT8/FP16, per-channel and per-tensor calibration, mixed-precision layer selection, accuracy validation against FP32 baseline, and automatic fallback for sensitive layers.

Optimization

Conv-BN-ReLU fusion, attention kernel optimization, memory layout transformation (NCHW↔NHWC), constant folding and dead code elimination, and dynamic shape support where available.

Validation

Automated benchmark suite per target, latency P50/P95/P99 profiling, mAP regression testing, memory usage tracking, and CI/CD integration support.

Continuous Deployment Pipeline

One canonical model. Improvements ship to all targets automatically.

Train

Single model trained on synthetic + real data

Compile

Model Forge generates optimized artifacts

Validate

Automated testing on all target hardware

Deploy

Runtime auto-selects correct variant

Learn

Feedback improves next iteration

// Initialize runtime - hardware auto-detected
auto runtime = cvedia::Runtime();

// Load model - correct variant selected
auto model = runtime.loadModel("person_vehicle");

// Run inference - same API everywhere
auto detections = model.infer(frame);

// That's it. Works on GPU, CPU, NPU.

Works on any device. No extra configuration

Integrators write against a single API. Model Forge handles device detection, model selection, and memory management. Deploy to a datacenter GPU or a $50 edge device with the same code path.

No TensorRT / OpenVINO / Hailo SDK expertise required
Automatic device enumeration and fallback
Model updates ship without code changes
Same binary runs on any supported target

See Model Forge in Action

Run our models on your hardware. We'll show you the numbers.