Inference 1.0 is now available, a redesigned prediction engine for running computer vision models. This update focuses on faster model loading, improved resource utilization, and cleaner separation between serving and the model runtime.
The new engine provides multi-backend support (such as PyTorch, ONNX, TensorRT), automatic model loading, and a composable dependency system so you only install required components. This provides a modular architecture for local deployments, Docker workloads, edge devices, and production systems.