Scikit-Learn Model Deployment with ONNX: Run Anywhere, Faster

Published: July 19, 2025

When deploying Scikit-Learn models, the common approach is to serialize them using formats like pickle or joblib. While this works well within Python environments, it comes with a major limitation: your model becomes tightly coupled to the specific versions of Python, Scikit-Learn, and NumPy used during training. Even a minor version mismatch can cause compatibility issues, breaking your application at runtime.

To overcome this, we can convert Scikit-Learn models to the ONNX (Open Neural Network Exchange) format. ONNX provides a standardized, portable representation of machine learning models that can run across platforms and environments—without relying on Python or specific library versions.

Why Use ONNX for Scikit-Learn Models? 🤔

ONNX is an open standard for machine learning model interoperability. Here are a few compelling reasons to use it:

Version Independence: ONNX models are decoupled from Python and Scikit-Learn versions, making them far more robust to version changes.
Cross-Platform Portability: You can deploy ONNX models in various environments, including C++, JavaScript, and mobile platforms.
Hardware Acceleration: ONNX Runtime enables fast inference using CPU, GPU (CUDA), and even TensorRT for NVIDIA hardware—greatly improving performance in production. 🚀

Step-by-Step Guide

1. Install Required Packages

First, follow the installation guide at onnxruntime.ai to install onnxruntime. The instructions will differ depending on whether you want to deploy your model on a CPU or GPU. Then, install the other necessary packages:

pip install skl2onnx onnx numpy scikit-learn

🔧 Optional: To check ONNX Runtime’s available providers (e.g., to see if your GPU is recognized), run the following:

import onnxruntime as ort
print(ort.get_available_providers())

2. Train a Scikit-Learn Model

For this example, we'll train a simple Logistic Regression model on the Iris dataset.

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression

X, y = load_iris(return_X_y=True)
model = LogisticRegression(max_iter=1000)
model.fit(X, y)

3. Convert to ONNX Format

Next, we convert the trained model to the ONNX format. We need to define the input shape and type. For batch processing, we set the batch size dimension to None.

from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType

# For batch processing, the shape is [None, X.shape[1]], 
# where None is the batch size and X.shape[1] is the number of input features.
initial_type = [('float_input', FloatTensorType([None, X.shape[1]]))]
onnx_model = convert_sklearn(model, initial_types=initial_type)

# Save the model to a file
with open("logreg.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

4. Run Inference with ONNX Runtime

Finally, we can use onnxruntime to load the .onnx model and perform inference. By including "CUDAExecutionProvider", we tell the runtime to use the GPU if it's available.

import onnxruntime as ort
import numpy as np

# 'CUDAExecutionProvider' ensures the model runs on GPU if available, falling back to CPU.
session = ort.InferenceSession("logreg.onnx", providers=["CUDAExecutionProvider", "CPUExecutionProvider"])
input_name = session.get_inputs()[0].name
output_name = session.get_outputs()[0].name

# Prepare a sample input and run inference
sample_input = X[:5].astype(np.float32)
predictions = session.run([output_name], {input_name: sample_input})[0]

print("Predictions from ONNX model:")
print(predictions)

Conclusion

Adopting ONNX for your Scikit-Learn models allows decoupling your models from specific library versions and gives you the ability to leverage hardware acceleration. While pickle and joblib are convenient for development, ONNX provides the reliability and speed needed for production environments. To learn more about the different formats for saving Scikit-Learn models, you can check out the official documentation.