ONNX (Open Neural Network Exchange), launched by Microsoft and Facebook in 2017, is an open format that standardises how neural-network models move between frameworks. You can train a model in PyTorch, export it to ONNX, and serve it on ONNX Runtime, TensorRT or any other compatible backend — especially handy for mobile and edge deployments. In the LLM world, vLLM and framework-native stacks dominate, but ONNX remains the broadest common denominator for portability of smaller models, vision and audio systems. The ecosystem really lives in onnxruntime, onnxruntime-genai and the various hardware-specific Execution Providers built around them.
External Links