Choosing between KServe vs BentoML vs Seldon is not just a feature checklist exercise. These platforms solve overlapping model serving problems, but they assume different team skills, infrastructure maturity, deployment workflows, and production priorities.
If your organization is evaluating a model serving platform for commercial production use, the practical question is: do you need Kubernetes-native standardization, Python-first developer velocity, or flexible inference graphs and pipelines? This comparison breaks down the real trade-offs using the provided research data only.
1. What KServe, BentoML, and Seldon Are Designed to Solve
At a high level, KServe, BentoML, and Seldon Core all help teams move trained machine learning models from experimentation into production serving. The difference is how each platform approaches that transition.
| Platform | Core Design Goal | Primary Abstraction | Best-Fit Operating Model |
|---|---|---|---|
| KServe | Standardized Kubernetes-native model serving | InferenceService CRD |
Kubernetes-first ML platforms, Kubeflow/CNCF-aligned organizations |
| BentoML | Python-first model packaging and serving | Bento archive / Python service | Developer-friendly ML APIs, fast iteration, flexible deployment targets |
| Seldon Core | Flexible Kubernetes-native inference graphs and pipelines | SeldonDeployment, or Model + Pipeline in v2 |
Complex inference workflows, A/B testing, multi-step pipelines, drift monitoring |
KServe: Kubernetes-native standardization
KServe was previously known as KFServing and is an open-source, Kubernetes-based model serving tool. It provides a custom Kubernetes resource definition, the InferenceService, to define model serving behavior.
Its main goal is to hide much of the underlying Kubernetes deployment complexity so users can focus on ML-specific configuration. According to the research, KServe supports advanced capabilities such as autoscaling, scale-to-zero, canary deployments, automatic request batching, and out-of-the-box support for many popular ML frameworks.
KServe is also described as tightly aligned with the cloud-native ecosystem. The Spheron research identifies it as a CNCF Incubating project and highlights its use of Knative for serverless scale-to-zero.
BentoML: Python-first packaging and APIs
BentoML takes a different route. It is a Python framework for wrapping ML models into deployable services. Instead of starting from Kubernetes YAML, teams define services in Python using a simple object-oriented interface.
A BentoML deployment packages the model weights, serving code, dependencies, and runtime configuration into a self-contained archive called a Bento. The same Bento can be deployed to plain Kubernetes clusters, KServe, Seldon Core, Knative, AWS Lambda, Azure Functions, Google Cloud Run, or managed BentoML services, according to the Xebia research.
This makes BentoML especially attractive when the team’s bottleneck is developer workflow rather than Kubernetes platform engineering.
Seldon Core: inference graphs and pipelines
Seldon Core is an open-source Kubernetes-native serving tool developed as part of the broader Seldon ecosystem. It provides high-level Kubernetes CRDs and supports canary deployments, A/B testing, and Multi-Armed-Bandit deployments.
The research distinguishes Seldon Core’s graph-based approach from KServe’s standardized serving abstraction. Seldon can define inference graphs with transformers, routers, combiners, and ensembles. In Seldon Core v2, the central abstractions are Model and Pipeline CRDs, with pipelines represented as directed acyclic graphs.
Key insight: KServe standardizes model serving, BentoML simplifies packaging and developer iteration, and Seldon specializes in flexible inference graphs and pipelines.
2. Quick Comparison Table: Features, Strengths, and Trade-Offs
For teams comparing KServe vs BentoML vs Seldon, the fastest way to narrow the choice is to map platform strengths to your operating model.
| Category | KServe | BentoML | Seldon Core |
|---|---|---|---|
| Primary style | Kubernetes-native standardized serving | Python-first service packaging | Kubernetes-native graph/pipeline serving |
| Main abstraction | InferenceService |
Bento / Python service | SeldonDeployment; in v2, Model + Pipeline |
| Kubernetes required upfront | Yes for production-style behavior | No for local development; yes for Kubernetes deployment | Yes for production-style behavior |
| Local development experience | Limited without Kubernetes | Strong; bentoml serve locally |
More complex; often requires local Kubernetes |
| Standard framework support | Strong for Scikit-Learn, PyTorch, TensorFlow, XGBoost | Strong; built-in support for standard frameworks | Strong for Scikit-Learn, XGBoost, TensorFlow; PyTorch requires extra effort in the Xebia comparison |
| Custom model support | Any Docker image; Python SDK available | Any Python customization | Any Docker image; SDK or duck typing |
| Pre/post-processing | Transformer in InferenceService |
Any Python code in service | Transformers, routers, combiners, inference graphs |
| Autoscaling | Autoscaling and scale-to-zero via Knative in serverless mode | Via standard orchestration; BentoCloud adds managed options | Kubernetes-native scaling; v2 scale-to-zero requires external configuration per Spheron |
| Traffic splitting | Canary deployments | Deployment-dependent | Canary, A/B testing, Multi-Armed-Bandit |
| Observability | Knative metrics; V2 protocol helps standard dashboards | Request/model/custom metrics; managed offering adds tracing/log aggregation | Prometheus, Jaeger, payload logging, drift-related analytics in research |
| GPU/multi-model fit | Strong per-pod isolation; one model per InferenceService | Strong per-Bento isolation; one model per Bento | MLServer can serve multiple models per process |
| Main trade-off | Kubernetes/Knative complexity | CI/CD changes for Bento packaging; Kubernetes operator maturity caveat for Yatai | Operational complexity; version/protocol-specific constraints |
3. Deployment Model: Kubernetes-Native vs Developer-Friendly Packaging
Deployment model is one of the biggest differences in the KServe vs BentoML vs Seldon decision.
KServe deployment model
KServe uses Kubernetes CRDs. A typical deployment is defined as an InferenceService pointing to a model runtime and storage location.
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: fraud-detector
spec:
predictor:
sklearn:
storageUri: gs://my-bucket/fraud-model
resources:
limits:
cpu: "1"
memory: 2Gi
requests:
cpu: "1"
memory: 2Gi
KServe can run in two modes described by the Spheron research:
| KServe Mode | How It Works | Best Fit | Trade-Off |
|---|---|---|---|
| Serverless mode | Uses Knative Serving, including Activator request buffering | Bursty or unpredictable workloads | Adds Knative dependency and potential cold-start considerations |
| RawDeployment mode | Uses standard Kubernetes Deployments and Services | High-throughput endpoints needing predictable latency | No native Knative scale-to-zero |
KServe’s deployment model works well when your organization already treats Kubernetes manifests, Helm charts, or GitOps workflows as standard production paths. Xebia’s research notes that KServe can integrate with existing DevOps pipelines because deployment requires a relatively simple resource definition.
BentoML deployment model
BentoML starts with Python. A basic BentoML service from the research looks like this:
import bentoml
from bentoml.io import JSON
model_runner = bentoml.sklearn.get("fraud_detection:latest").to_runner()
svc = bentoml.Service("fraud_detector", runners=[model_runner])
@svc.api(input=JSON(), output=JSON())
def predict(input_data):
features = preprocess(input_data)
result = model_runner.predict.run(features)
return {"fraud_probability": float(result[0])}
You can run it locally with:
bentoml serve service:svc --reload
The research emphasizes that BentoML does not require Kubernetes knowledge at the beginning. Teams can iterate locally, then containerize or deploy later.
However, Xebia also notes a workflow trade-off: BentoML packages the service class, serialized model, code, dependencies, and Dockerfile into a separate archive or directory. That means Kubernetes delivery may require CI/CD pipeline changes compared with teams already shipping plain Docker images and Kubernetes manifests.
Seldon deployment model
Seldon Core is also Kubernetes-native. In the research, Seldon deployments are defined through Kubernetes CRDs such as SeldonDeployment.
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: fraud-detector
spec:
predictors:
- name: default
graph:
name: classifier
implementation: SKLEARN_SERVER
modelUri: gs://my-bucket/fraud-model
componentSpecs:
- spec:
containers:
- name: classifier
resources:
requests:
memory: 1Gi
cpu: 1
The major differentiator is Seldon’s graph and pipeline architecture. You can define multi-step inference paths, including transformers, routers, and combiners.
graph:
name: ab-test
implementation: RANDOM_ABTEST
parameters:
- name: ratioA
value: "0.5"
children:
- name: model-a
implementation: SKLEARN_SERVER
modelUri: gs://bucket/model-v1
- name: model-b
implementation: SKLEARN_SERVER
modelUri: gs://bucket/model-v2
Practical takeaway: If your team wants to write Python and ship quickly, BentoML has the shortest local feedback loop. If your team already operates Kubernetes ML infrastructure, KServe and Seldon align more directly with platform engineering workflows.
4. Model Framework Support and Runtime Flexibility
Framework support matters because production teams often serve models from multiple libraries: Scikit-Learn, PyTorch, TensorFlow, XGBoost, HuggingFace, or custom code.
| Framework / Runtime Area | KServe | BentoML | Seldon Core |
|---|---|---|---|
| Scikit-Learn | Supported out of the box in research | Built-in support | Easy to serve |
| XGBoost | Supported out of the box in research | Built-in support | Easy to serve |
| TensorFlow | Supported out of the box in research | Built-in support | Easy to serve |
| PyTorch | Supported out of the box in Xebia comparison | Built-in support | No built-in support in Xebia comparison; possible via Triton with extra effort |
| Custom models | Any Docker image; Python SDK | Any Python code | Any Docker image; SDK or duck typing |
| HuggingFace | Mentioned in Seldon Core v2 MLServer runtimes and KServe runtime examples | BentoML examples include OpenLLM-style usage in source data | Supported by MLServer in Seldon Core v2 source data |
KServe framework support
Xebia’s comparison found that KServe made all tested standard frameworks fairly easy to serve. Standard frameworks are “first class” in KServe because it provides pre-built Docker images and direct InferenceService definitions.
KServe also allows any Docker image as part of the deployment, so custom frameworks and languages can be used to an extent. For Python-based custom logic, KServe provides an SDK with an abstract class that can be inherited.
BentoML framework support
BentoML supports standard frameworks and handles model serialization, deserialization, dependencies, and input/output handling. Xebia describes the implementation of BentoML’s service interface as usually fitting within a few lines of code.
Because BentoML services are Python code, custom preprocessing, model invocation, and post-processing can be implemented directly in the service.
Seldon framework support
Xebia’s research found that Seldon Core can easily serve Scikit-Learn, XGBoost, and TensorFlow models. For PyTorch, the same source notes there is no built-in support in that comparison; PyTorch can be achieved via Triton Server but requires additional effort and use of Seldon’s v2 protocol.
Spheron’s research adds that Seldon Core v2 uses MLServer, which supports scikit-learn, XGBoost, LightGBM, PyTorch, TensorFlow, and HuggingFace runtimes.
The important caveat is version and runtime selection: Seldon’s capabilities can vary depending on whether you are using older Seldon Core patterns, Seldon Core v2, MLServer, or Triton.
5. Autoscaling, Traffic Splitting, and Canary Deployment Capabilities
Production model serving usually requires controlled rollouts, elastic scaling, and rollback strategies. The source data shows all three platforms can participate in scalable deployments, but the mechanisms differ.
| Capability | KServe | BentoML | Seldon Core |
|---|---|---|---|
| Autoscaling | Supported; Knative-based in serverless mode | Through container orchestration; managed options add more | Kubernetes-native scaling patterns |
| Scale-to-zero | Native via Knative in KServe serverless mode | Not described as native in source data | Spheron says no native scale-to-zero in v2 without external config |
| Canary deployment | Supported | Deployment-dependent | Supported |
| A/B testing | Not emphasized as core differentiator in source data | Deployment-dependent | Supported |
| Multi-Armed-Bandit | Not emphasized as core differentiator in source data | Not identified in source data | Supported |
| Request batching | Automatic request batching mentioned for KServe | Adaptive batching mentioned for BentoML | Depends on runtime choices |
KServe autoscaling and scale-to-zero
KServe supports autoscaling and scale-to-zero. In serverless mode, this is provided via Knative. Spheron’s research notes that the Knative Activator buffers requests during scale-to-zero and routes them to warm pods.
For large model workloads, cold starts can matter. Spheron reports 2–8 minutes of cold start time for a 70B model on H100 in the autoscaling table. It also describes KServe’s ModelCar pattern, where model weights are stored as an init container image rather than fetched from remote storage at every pod startup.
For a 140 GB Llama 3 70B model, Spheron reports the difference as 4–6 minutes for remote NFS fetch at 400–600 MB/s versus 40 seconds from local NVMe at 3–4 GB/s.
BentoML autoscaling and batching
BentoML handles scaling through standard container orchestration when deployed as containers. The Reintech source notes that deploying to Kubernetes with a Horizontal Pod Autoscaler gives automatic scaling based on CPU or memory.
BentoML also supports adaptive batching out of the box, which the source says can improve throughput for high-traffic scenarios. Reintech cites a benchmark where BentoML handles 1000+ requests per second with p95 latency under 50ms for ResNet50 on modest hardware. As with any benchmark, teams should validate performance against their own model, hardware, and traffic patterns.
Seldon traffic management
Seldon Core supports canary deployments, A/B testing, and Multi-Armed-Bandit deployments. Its graph abstraction allows routing logic to be built directly into the deployment.
Xebia specifically highlights Seldon’s ability to define custom ROUTER components and COMBINER components, enabling ensembles and Multi-Armed-Bandit-style deployments.
Critical warning: Seldon’s richer routing and graph capabilities can increase operational complexity. The Reintech source explicitly notes that this flexibility requires solid Kubernetes knowledge and operational maturity.
6. Observability, Monitoring, and Production Debugging
Observability is where model serving platforms move beyond “container running” into production-grade ML operations.
| Observability Area | KServe | BentoML | Seldon Core |
|---|---|---|---|
| Prometheus metrics | Inherited through Knative stack in source data | Exports request/model/custom metrics | Metrics flow to Prometheus |
| Distributed tracing | Not detailed deeply in source data | Manual OpenTelemetry possible; managed offering adds tracing/log aggregation | Jaeger integration mentioned |
| Payload logging / drift | Not identified as core in source data | Not identified as core in source data | Payload logging and Alibi Detect integration mentioned |
| Unified API monitoring | V2 protocol helps standardize dashboards | Python-service oriented | Runtime/protocol dependent |
KServe observability
KServe inherits the Knative observability stack in serverless mode, including request metrics, revision metrics, and autoscaling metrics. The research also notes that KServe supports the V2 inference protocol, which can help standardize how clients communicate with different model types.
That standardization is valuable when a platform team wants consistent dashboards across TensorFlow, PyTorch, XGBoost, and custom models.
BentoML observability
BentoML includes request metrics, model metrics, and custom metrics APIs, according to the Reintech source. The same source says BentoCloud adds distributed tracing and log aggregation, while OpenTelemetry can be integrated manually.
This fits BentoML’s overall pattern: strong developer ergonomics, with production observability depending on how the deployment target is configured.
Seldon observability
Seldon Core has the deepest observability story in the provided research. Reintech describes Prometheus metrics, Jaeger tracing, log aggregation integration, and an analytics component for payload logging and drift detection.
Spheron also highlights Alibi Detect integration in Seldon Core v2. Alibi Detect supports outlier detection, adversarial detection, and concept drift monitoring. In Seldon Core v2, a drift detector can be added as a node in the pipeline graph and run inline with inference requests.
7. Security, Governance, and Enterprise Readiness
The provided source data does not include a detailed security control comparison such as RBAC matrices, audit logs, secrets handling, vulnerability management, or compliance certifications. At the time of writing, teams should evaluate those requirements directly against their Kubernetes platform, managed services, and vendor documentation.
That said, the research does provide several enterprise-readiness signals.
| Area | KServe | BentoML | Seldon Core |
|---|---|---|---|
| Open-source positioning | Open-source Kubernetes model serving | Open-source Python framework | Open-source core serving tool |
| Enterprise alignment signal | Used by Bloomberg, NVIDIA, Samsung SDS, Cisco in Xebia source | BentoCloud mentioned as maintained managed path | Seldon Deploy described as larger paid solution |
| Governance strengths from source data | Standardized serving APIs, CRDs, canary deployment, scale controls | Packaged reproducible Bentos containing code, model, dependencies, config | Pipeline graphs, payload logging, drift detection, advanced rollout patterns |
| Governance caveat | Requires Kubernetes/Knative maturity | Yatai self-hosting caveat in source data | Operational complexity and version/runtime caveats |
KServe enterprise fit
KServe’s enterprise readiness comes from Kubernetes-native control, standardized CRDs, and CNCF alignment. Xebia also notes use by companies including Bloomberg, NVIDIA, Samsung SDS, and Cisco.
For organizations building an internal ML platform, KServe’s standard InferenceService abstraction can help enforce consistent deployment patterns across teams.
BentoML enterprise fit
BentoML’s governance advantage is packaging reproducibility. A Bento includes model weights, serving code, Python dependencies, and runtime configuration. That can make it easier to move the same artifact from local development to Kubernetes.
However, Spheron raises a practical caveat around Yatai, the Kubernetes operator for BentoML deployments. The source describes Yatai as stable but not actively evolving, and says teams self-hosting BentoML on Kubernetes should factor in potential maintenance gaps. It also identifies BentoCloud as the current first-party deployment path for teams wanting a maintained managed experience.
Seldon enterprise fit
Seldon’s enterprise-readiness signals include graph-based control, advanced rollout strategies, payload logging, and drift detection integrations. It is also connected to a broader paid Seldon Deploy solution, according to Xebia.
The trade-off is complexity. Seldon can be powerful for platform teams that need custom inference graphs, but those same capabilities require Kubernetes and operational expertise.
8. Best Use Cases: When to Choose KServe, BentoML, or Seldon
Here is the practical decision guide for KServe vs BentoML vs Seldon based on the researched capabilities.
1. Choose KServe when you need standardized Kubernetes ML serving
KServe is a strong fit when:
- Kubernetes-first platform: Your organization already runs production workloads on Kubernetes.
- Standardized APIs: You want a consistent serving interface across frameworks.
- Kubeflow/CNCF alignment: You value cloud-native ecosystem integration.
- Scale-to-zero: You need native Knative scale-to-zero for bursty workloads.
- Model runtime flexibility: You want pluggable runtimes such as Triton, vLLM, or HuggingFace TGI as described in the Spheron source.
KServe is less ideal if your team does not yet have Kubernetes or Knative operational maturity.
2. Choose BentoML when developer velocity matters most
BentoML is a strong fit when:
- Python-first team: Your ML engineers want to define services in Python.
- Fast local iteration: You want to run and test locally with
bentoml serve. - Custom preprocessing: You need arbitrary Python code around model inference.
- Flexible deployment targets: You may deploy to Kubernetes, KServe, Seldon Core, Knative, AWS Lambda, Azure Functions, or Google Cloud Run.
- Packaged reproducibility: You want model, code, dependencies, and runtime config bundled together.
BentoML is less ideal if you need a purely Kubernetes-native CRD workflow from day one, or if self-hosting via Yatai raises maintenance concerns for your team.
3. Choose Seldon when inference workflows are complex
Seldon Core is a strong fit when:
- Inference graphs: You need preprocessors, routers, combiners, explainers, or ensembles.
- Advanced rollout patterns: You need canary, A/B, or Multi-Armed-Bandit deployments.
- Drift monitoring: You want Alibi Detect integration in Seldon Core v2 pipelines.
- Async inference: You need Kafka-based event-driven inference, as described in the Spheron source.
- Multi-model serving: You want MLServer to load multiple models in one process.
Seldon is less ideal for teams that want the lowest operational complexity or the simplest local development loop.
9. Common Migration Paths Between Model Serving Platforms
Model serving choices are rarely permanent. Teams often migrate as their ML platform matures.
| Migration Path | Why Teams Move | Practical Consideration |
|---|---|---|
| BentoML → KServe | Team starts Python-first, then standardizes on Kubernetes ML serving | BentoML-packaged models can be deployed to KServe according to Xebia |
| BentoML → Seldon Core | Team needs richer routing, inference graphs, or A/B testing | BentoML can deploy to Seldon Core per Xebia, but graph behavior must be modeled in Seldon |
| KServe → Seldon Core | Team needs complex pipelines, routers, combiners, or inline drift nodes | Requires adopting Seldon CRDs and operational patterns |
| Seldon Core → KServe | Team wants standardized serving APIs and simpler model endpoint abstraction | Advanced graph features may need redesign |
| Plain Kubernetes → Any of the three | Plain Deployments lack ML-aware readiness, rollout, observability, and scaling semantics | Platform choice depends on whether the team prioritizes Python workflow, standardization, or graph flexibility |
From BentoML to Kubernetes-native serving
A common path is starting with BentoML because it lets ML engineers ship quickly without learning Kubernetes deeply. As the platform grows, teams may move production serving into KServe or Seldon for stronger Kubernetes-native control.
The Xebia source explicitly states BentoML-packaged models can be deployed in KServe and Seldon Core, making it a possible stepping stone rather than a dead end.
From KServe to Seldon
Teams may move from KServe to Seldon when single-model endpoint standardization is no longer enough. For example, if you need a preprocessor, multiple candidate models, a custom router, and a combiner in one deployment, Seldon’s inference graph model is a better conceptual fit.
From Seldon to KServe
The reverse migration can also happen. If a team no longer needs complex graphs and wants a simpler standardized InferenceService model, KServe may reduce conceptual overhead.
Migration advice: Do not migrate only for feature parity. Migrate when your dominant operational problem changes: developer speed, Kubernetes standardization, or inference workflow complexity.
10. Final Recommendation Based on Team Size and Infrastructure Maturity
The best choice in KServe vs BentoML vs Seldon depends less on which platform has the longest feature list and more on what your team can operate reliably.
| Team / Infrastructure Profile | Recommended Starting Point | Why |
|---|---|---|
| Small ML team, limited Kubernetes experience | BentoML | Strong local development, Python-first APIs, minimal Kubernetes knowledge required initially |
| Growing team deploying multiple production models | KServe or BentoML | KServe if platform team owns Kubernetes; BentoML if ML engineers own service packaging |
| Mature Kubernetes platform team | KServe | Standardized CRDs, Knative option, consistent inference APIs |
| Team with complex multi-step inference | Seldon Core | Graphs, routers, combiners, pipelines, A/B testing, Multi-Armed-Bandit patterns |
| Team serving many smaller models on shared GPUs | Seldon Core v2 with MLServer | MLServer can load multiple models in one process, according to Spheron |
| Team serving isolated high-value model endpoints | KServe or BentoML | Per-pod/per-Bento isolation can reduce blast radius compared with shared MLServer processes |
| Enterprise platform requiring standardized model serving | KServe | CNCF alignment and InferenceService standardization |
| Team prioritizing managed BentoML experience | BentoCloud | Source data identifies it as BentoML’s maintained first-party managed path |
Decision shortcut
Use this simple rule:
- Choose BentoML if your biggest problem is packaging models into APIs quickly.
- Choose KServe if your biggest problem is standardizing model serving on Kubernetes.
- Choose Seldon Core if your biggest problem is orchestrating complex inference workflows.
Bottom Line
For most teams, the right model serving platform is determined by operating model.
KServe is the strongest fit for Kubernetes-native organizations that want standardized model serving, Knative-based scale-to-zero, canary deployments, and framework-agnostic APIs. It is especially compelling when an internal platform team already manages Kubernetes and wants a consistent InferenceService abstraction.
BentoML is the best fit for Python-first teams that value fast local iteration, simple service definitions, and reproducible packaging. Its main trade-off is that production Kubernetes delivery can require CI/CD changes, and self-hosted Yatai should be evaluated carefully at the time of writing.
Seldon Core is the best fit for advanced inference workflows: graphs, pipelines, routers, combiners, A/B tests, Multi-Armed-Bandit deployments, Kafka-based async inference, and drift monitoring. The trade-off is operational complexity and the need to understand Seldon version/runtime differences.
FAQ: KServe vs BentoML vs Seldon
Which is easiest to start with: KServe, BentoML, or Seldon?
BentoML is the easiest to start with based on the source data. It lets developers define a service in Python, run it locally with bentoml serve, and iterate without needing Kubernetes upfront.
Which platform is most Kubernetes-native?
KServe and Seldon Core are both Kubernetes-native. KServe centers on the InferenceService CRD, while Seldon uses CRDs such as SeldonDeployment and, in v2, Model and Pipeline.
Which is best for canary deployments and A/B testing?
Seldon Core has the richest rollout patterns in the research, including canary deployments, A/B testing, and Multi-Armed-Bandit deployments. KServe also supports canary deployments.
Which platform supports scale-to-zero?
KServe supports native scale-to-zero in serverless mode through Knative. The Spheron source states that Seldon Core v2 does not provide native scale-to-zero without external configuration, and BentoML scale-to-zero is not described as a native feature in the provided sources.
Which platform is best for custom preprocessing and post-processing?
All three can handle custom preprocessing, but in different ways. BentoML allows arbitrary Python code inside the service. KServe uses transformers in the InferenceService. Seldon Core supports transformers and richer inference graph components such as routers and combiners.
Can BentoML work with KServe or Seldon?
Yes. The Xebia research states that BentoML-packaged models can be deployed to multiple runtimes, including plain Kubernetes clusters, Seldon Core, KServe, Knative, AWS Lambda, Azure Functions, and Google Cloud Run.










