KServe vs BentoML vs Seldon Can Make or Break MLOps

Choosing between KServe vs BentoML vs Seldon is not just a feature checklist exercise. These platforms solve overlapping model serving problems, but they assume different team skills, infrastructure maturity, deployment workflows, and production priorities.

If your organization is evaluating a model serving platform for commercial production use, the practical question is: do you need Kubernetes-native standardization, Python-first developer velocity, or flexible inference graphs and pipelines? This comparison breaks down the real trade-offs using the provided research data only.

1. What KServe, BentoML, and Seldon Are Designed to Solve

At a high level, KServe, BentoML, and Seldon Core all help teams move trained machine learning models from experimentation into production serving. The difference is how each platform approaches that transition.

Platform	Core Design Goal	Primary Abstraction	Best-Fit Operating Model
KServe	Standardized Kubernetes-native model serving	`InferenceService` CRD	Kubernetes-first ML platforms, Kubeflow/CNCF-aligned organizations
BentoML	Python-first model packaging and serving	Bento archive / Python service	Developer-friendly ML APIs, fast iteration, flexible deployment targets
Seldon Core	Flexible Kubernetes-native inference graphs and pipelines	`SeldonDeployment`, or `Model` + `Pipeline` in v2	Complex inference workflows, A/B testing, multi-step pipelines, drift monitoring

KServe: Kubernetes-native standardization

KServe was previously known as KFServing and is an open-source, Kubernetes-based model serving tool. It provides a custom Kubernetes resource definition, the InferenceService, to define model serving behavior.

Its main goal is to hide much of the underlying Kubernetes deployment complexity so users can focus on ML-specific configuration. According to the research, KServe supports advanced capabilities such as autoscaling, scale-to-zero, canary deployments, automatic request batching, and out-of-the-box support for many popular ML frameworks.

KServe is also described as tightly aligned with the cloud-native ecosystem. The Spheron research identifies it as a CNCF Incubating project and highlights its use of Knative for serverless scale-to-zero.

BentoML: Python-first packaging and APIs

BentoML takes a different route. It is a Python framework for wrapping ML models into deployable services. Instead of starting from Kubernetes YAML, teams define services in Python using a simple object-oriented interface.

A BentoML deployment packages the model weights, serving code, dependencies, and runtime configuration into a self-contained archive called a Bento. The same Bento can be deployed to plain Kubernetes clusters, KServe, Seldon Core, Knative, AWS Lambda, Azure Functions, Google Cloud Run, or managed BentoML services, according to the Xebia research.

This makes BentoML especially attractive when the team’s bottleneck is developer workflow rather than Kubernetes platform engineering.

Seldon Core: inference graphs and pipelines

Seldon Core is an open-source Kubernetes-native serving tool developed as part of the broader Seldon ecosystem. It provides high-level Kubernetes CRDs and supports canary deployments, A/B testing, and Multi-Armed-Bandit deployments.

The research distinguishes Seldon Core’s graph-based approach from KServe’s standardized serving abstraction. Seldon can define inference graphs with transformers, routers, combiners, and ensembles. In Seldon Core v2, the central abstractions are Model and Pipeline CRDs, with pipelines represented as directed acyclic graphs.

Key insight: KServe standardizes model serving, BentoML simplifies packaging and developer iteration, and Seldon specializes in flexible inference graphs and pipelines.

2. Quick Comparison Table: Features, Strengths, and Trade-Offs

For teams comparing KServe vs BentoML vs Seldon, the fastest way to narrow the choice is to map platform strengths to your operating model.

Category	KServe	BentoML	Seldon Core
Primary style	Kubernetes-native standardized serving	Python-first service packaging	Kubernetes-native graph/pipeline serving
Main abstraction	`InferenceService`	Bento / Python service	`SeldonDeployment`; in v2, `Model` + `Pipeline`
Kubernetes required upfront	Yes for production-style behavior	No for local development; yes for Kubernetes deployment	Yes for production-style behavior
Local development experience	Limited without Kubernetes	Strong; `bentoml serve` locally	More complex; often requires local Kubernetes
Standard framework support	Strong for Scikit-Learn, PyTorch, TensorFlow, XGBoost	Strong; built-in support for standard frameworks	Strong for Scikit-Learn, XGBoost, TensorFlow; PyTorch requires extra effort in the Xebia comparison
Custom model support	Any Docker image; Python SDK available	Any Python customization	Any Docker image; SDK or duck typing
Pre/post-processing	Transformer in `InferenceService`	Any Python code in service	Transformers, routers, combiners, inference graphs
Autoscaling	Autoscaling and scale-to-zero via Knative in serverless mode	Via standard orchestration; BentoCloud adds managed options	Kubernetes-native scaling; v2 scale-to-zero requires external configuration per Spheron
Traffic splitting	Canary deployments	Deployment-dependent	Canary, A/B testing, Multi-Armed-Bandit
Observability	Knative metrics; V2 protocol helps standard dashboards	Request/model/custom metrics; managed offering adds tracing/log aggregation	Prometheus, Jaeger, payload logging, drift-related analytics in research
GPU/multi-model fit	Strong per-pod isolation; one model per InferenceService	Strong per-Bento isolation; one model per Bento	MLServer can serve multiple models per process
Main trade-off	Kubernetes/Knative complexity	CI/CD changes for Bento packaging; Kubernetes operator maturity caveat for Yatai	Operational complexity; version/protocol-specific constraints

3. Deployment Model: Kubernetes-Native vs Developer-Friendly Packaging

Deployment model is one of the biggest differences in the KServe vs BentoML vs Seldon decision.

KServe deployment model

KServe uses Kubernetes CRDs. A typical deployment is defined as an InferenceService pointing to a model runtime and storage location.

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: fraud-detector
spec:
  predictor:
    sklearn:
      storageUri: gs://my-bucket/fraud-model
      resources:
        limits:
          cpu: "1"
          memory: 2Gi
        requests:
          cpu: "1"
          memory: 2Gi

KServe can run in two modes described by the Spheron research:

KServe Mode	How It Works	Best Fit	Trade-Off
Serverless mode	Uses Knative Serving, including Activator request buffering	Bursty or unpredictable workloads	Adds Knative dependency and potential cold-start considerations
RawDeployment mode	Uses standard Kubernetes Deployments and Services	High-throughput endpoints needing predictable latency	No native Knative scale-to-zero

KServe’s deployment model works well when your organization already treats Kubernetes manifests, Helm charts, or GitOps workflows as standard production paths. Xebia’s research notes that KServe can integrate with existing DevOps pipelines because deployment requires a relatively simple resource definition.

BentoML deployment model

BentoML starts with Python. A basic BentoML service from the research looks like this:

import bentoml
from bentoml.io import JSON

model_runner = bentoml.sklearn.get("fraud_detection:latest").to_runner()

svc = bentoml.Service("fraud_detector", runners=[model_runner])

@svc.api(input=JSON(), output=JSON())
def predict(input_data):
    features = preprocess(input_data)
    result = model_runner.predict.run(features)
    return {"fraud_probability": float(result[0])}

You can run it locally with:

bentoml serve service:svc --reload

The research emphasizes that BentoML does not require Kubernetes knowledge at the beginning. Teams can iterate locally, then containerize or deploy later.

However, Xebia also notes a workflow trade-off: BentoML packages the service class, serialized model, code, dependencies, and Dockerfile into a separate archive or directory. That means Kubernetes delivery may require CI/CD pipeline changes compared with teams already shipping plain Docker images and Kubernetes manifests.

Seldon deployment model

Seldon Core is also Kubernetes-native. In the research, Seldon deployments are defined through Kubernetes CRDs such as SeldonDeployment.

apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: fraud-detector
spec:
  predictors:
  - name: default
    graph:
      name: classifier
      implementation: SKLEARN_SERVER
      modelUri: gs://my-bucket/fraud-model
    componentSpecs:
    - spec:
        containers:
        - name: classifier
          resources:
            requests:
              memory: 1Gi
              cpu: 1

The major differentiator is Seldon’s graph and pipeline architecture. You can define multi-step inference paths, including transformers, routers, and combiners.

graph:
  name: ab-test
  implementation: RANDOM_ABTEST
  parameters:
  - name: ratioA
    value: "0.5"
  children:
  - name: model-a
    implementation: SKLEARN_SERVER
    modelUri: gs://bucket/model-v1
  - name: model-b
    implementation: SKLEARN_SERVER
    modelUri: gs://bucket/model-v2

Practical takeaway: If your team wants to write Python and ship quickly, BentoML has the shortest local feedback loop. If your team already operates Kubernetes ML infrastructure, KServe and Seldon align more directly with platform engineering workflows.

4. Model Framework Support and Runtime Flexibility

Framework support matters because production teams often serve models from multiple libraries: Scikit-Learn, PyTorch, TensorFlow, XGBoost, HuggingFace, or custom code.

Framework / Runtime Area	KServe	BentoML	Seldon Core
Scikit-Learn	Supported out of the box in research	Built-in support	Easy to serve
XGBoost	Supported out of the box in research	Built-in support	Easy to serve
TensorFlow	Supported out of the box in research	Built-in support	Easy to serve
PyTorch	Supported out of the box in Xebia comparison	Built-in support	No built-in support in Xebia comparison; possible via Triton with extra effort
Custom models	Any Docker image; Python SDK	Any Python code	Any Docker image; SDK or duck typing
HuggingFace	Mentioned in Seldon Core v2 MLServer runtimes and KServe runtime examples	BentoML examples include OpenLLM-style usage in source data	Supported by MLServer in Seldon Core v2 source data

KServe framework support

Xebia’s comparison found that KServe made all tested standard frameworks fairly easy to serve. Standard frameworks are “first class” in KServe because it provides pre-built Docker images and direct InferenceService definitions.

KServe also allows any Docker image as part of the deployment, so custom frameworks and languages can be used to an extent. For Python-based custom logic, KServe provides an SDK with an abstract class that can be inherited.

BentoML framework support

BentoML supports standard frameworks and handles model serialization, deserialization, dependencies, and input/output handling. Xebia describes the implementation of BentoML’s service interface as usually fitting within a few lines of code.

Because BentoML services are Python code, custom preprocessing, model invocation, and post-processing can be implemented directly in the service.

Seldon framework support

Xebia’s research found that Seldon Core can easily serve Scikit-Learn, XGBoost, and TensorFlow models. For PyTorch, the same source notes there is no built-in support in that comparison; PyTorch can be achieved via Triton Server but requires additional effort and use of Seldon’s v2 protocol.

Spheron’s research adds that Seldon Core v2 uses MLServer, which supports scikit-learn, XGBoost, LightGBM, PyTorch, TensorFlow, and HuggingFace runtimes.

The important caveat is version and runtime selection: Seldon’s capabilities can vary depending on whether you are using older Seldon Core patterns, Seldon Core v2, MLServer, or Triton.

5. Autoscaling, Traffic Splitting, and Canary Deployment Capabilities

Production model serving usually requires controlled rollouts, elastic scaling, and rollback strategies. The source data shows all three platforms can participate in scalable deployments, but the mechanisms differ.

Capability	KServe	BentoML	Seldon Core
Autoscaling	Supported; Knative-based in serverless mode	Through container orchestration; managed options add more	Kubernetes-native scaling patterns
Scale-to-zero	Native via Knative in KServe serverless mode	Not described as native in source data	Spheron says no native scale-to-zero in v2 without external config
Canary deployment	Supported	Deployment-dependent	Supported
A/B testing	Not emphasized as core differentiator in source data	Deployment-dependent	Supported
Multi-Armed-Bandit	Not emphasized as core differentiator in source data	Not identified in source data	Supported
Request batching	Automatic request batching mentioned for KServe	Adaptive batching mentioned for BentoML	Depends on runtime choices

KServe autoscaling and scale-to-zero

KServe supports autoscaling and scale-to-zero. In serverless mode, this is provided via Knative. Spheron’s research notes that the Knative Activator buffers requests during scale-to-zero and routes them to warm pods.

For large model workloads, cold starts can matter. Spheron reports 2–8 minutes of cold start time for a 70B model on H100 in the autoscaling table. It also describes KServe’s ModelCar pattern, where model weights are stored as an init container image rather than fetched from remote storage at every pod startup.

For a 140 GB Llama 3 70B model, Spheron reports the difference as 4–6 minutes for remote NFS fetch at 400–600 MB/s versus 40 seconds from local NVMe at 3–4 GB/s.

BentoML autoscaling and batching

BentoML handles scaling through standard container orchestration when deployed as containers. The Reintech source notes that deploying to Kubernetes with a Horizontal Pod Autoscaler gives automatic scaling based on CPU or memory.

BentoML also supports adaptive batching out of the box, which the source says can improve throughput for high-traffic scenarios. Reintech cites a benchmark where BentoML handles 1000+ requests per second with p95 latency under 50ms for ResNet50 on modest hardware. As with any benchmark, teams should validate performance against their own model, hardware, and traffic patterns.

Seldon traffic management

Seldon Core supports canary deployments, A/B testing, and Multi-Armed-Bandit deployments. Its graph abstraction allows routing logic to be built directly into the deployment.

Xebia specifically highlights Seldon’s ability to define custom ROUTER components and COMBINER components, enabling ensembles and Multi-Armed-Bandit-style deployments.

Critical warning: Seldon’s richer routing and graph capabilities can increase operational complexity. The Reintech source explicitly notes that this flexibility requires solid Kubernetes knowledge and operational maturity.

6. Observability, Monitoring, and Production Debugging

Observability is where model serving platforms move beyond “container running” into production-grade ML operations.

Observability Area	KServe	BentoML	Seldon Core
Prometheus metrics	Inherited through Knative stack in source data	Exports request/model/custom metrics	Metrics flow to Prometheus
Distributed tracing	Not detailed deeply in source data	Manual OpenTelemetry possible; managed offering adds tracing/log aggregation	Jaeger integration mentioned
Payload logging / drift	Not identified as core in source data	Not identified as core in source data	Payload logging and Alibi Detect integration mentioned
Unified API monitoring	V2 protocol helps standardize dashboards	Python-service oriented	Runtime/protocol dependent

KServe observability

KServe inherits the Knative observability stack in serverless mode, including request metrics, revision metrics, and autoscaling metrics. The research also notes that KServe supports the V2 inference protocol, which can help standardize how clients communicate with different model types.

That standardization is valuable when a platform team wants consistent dashboards across TensorFlow, PyTorch, XGBoost, and custom models.

BentoML observability

BentoML includes request metrics, model metrics, and custom metrics APIs, according to the Reintech source. The same source says BentoCloud adds distributed tracing and log aggregation, while OpenTelemetry can be integrated manually.

This fits BentoML’s overall pattern: strong developer ergonomics, with production observability depending on how the deployment target is configured.

Seldon observability

Seldon Core has the deepest observability story in the provided research. Reintech describes Prometheus metrics, Jaeger tracing, log aggregation integration, and an analytics component for payload logging and drift detection.

Spheron also highlights Alibi Detect integration in Seldon Core v2. Alibi Detect supports outlier detection, adversarial detection, and concept drift monitoring. In Seldon Core v2, a drift detector can be added as a node in the pipeline graph and run inline with inference requests.

7. Security, Governance, and Enterprise Readiness

The provided source data does not include a detailed security control comparison such as RBAC matrices, audit logs, secrets handling, vulnerability management, or compliance certifications. At the time of writing, teams should evaluate those requirements directly against their Kubernetes platform, managed services, and vendor documentation.

That said, the research does provide several enterprise-readiness signals.

Area	KServe	BentoML	Seldon Core
Open-source positioning	Open-source Kubernetes model serving	Open-source Python framework	Open-source core serving tool
Enterprise alignment signal	Used by Bloomberg, NVIDIA, Samsung SDS, Cisco in Xebia source	BentoCloud mentioned as maintained managed path	Seldon Deploy described as larger paid solution
Governance strengths from source data	Standardized serving APIs, CRDs, canary deployment, scale controls	Packaged reproducible Bentos containing code, model, dependencies, config	Pipeline graphs, payload logging, drift detection, advanced rollout patterns
Governance caveat	Requires Kubernetes/Knative maturity	Yatai self-hosting caveat in source data	Operational complexity and version/runtime caveats

KServe enterprise fit

KServe’s enterprise readiness comes from Kubernetes-native control, standardized CRDs, and CNCF alignment. Xebia also notes use by companies including Bloomberg, NVIDIA, Samsung SDS, and Cisco.

For organizations building an internal ML platform, KServe’s standard InferenceService abstraction can help enforce consistent deployment patterns across teams.

BentoML enterprise fit

BentoML’s governance advantage is packaging reproducibility. A Bento includes model weights, serving code, Python dependencies, and runtime configuration. That can make it easier to move the same artifact from local development to Kubernetes.

However, Spheron raises a practical caveat around Yatai, the Kubernetes operator for BentoML deployments. The source describes Yatai as stable but not actively evolving, and says teams self-hosting BentoML on Kubernetes should factor in potential maintenance gaps. It also identifies BentoCloud as the current first-party deployment path for teams wanting a maintained managed experience.

Seldon enterprise fit

Seldon’s enterprise-readiness signals include graph-based control, advanced rollout strategies, payload logging, and drift detection integrations. It is also connected to a broader paid Seldon Deploy solution, according to Xebia.

The trade-off is complexity. Seldon can be powerful for platform teams that need custom inference graphs, but those same capabilities require Kubernetes and operational expertise.

8. Best Use Cases: When to Choose KServe, BentoML, or Seldon

Here is the practical decision guide for KServe vs BentoML vs Seldon based on the researched capabilities.

1. Choose KServe when you need standardized Kubernetes ML serving

KServe is a strong fit when:

Kubernetes-first platform: Your organization already runs production workloads on Kubernetes.
Standardized APIs: You want a consistent serving interface across frameworks.
Kubeflow/CNCF alignment: You value cloud-native ecosystem integration.
Scale-to-zero: You need native Knative scale-to-zero for bursty workloads.
Model runtime flexibility: You want pluggable runtimes such as Triton, vLLM, or HuggingFace TGI as described in the Spheron source.

KServe is less ideal if your team does not yet have Kubernetes or Knative operational maturity.

2. Choose BentoML when developer velocity matters most

BentoML is a strong fit when:

Python-first team: Your ML engineers want to define services in Python.
Fast local iteration: You want to run and test locally with bentoml serve.
Custom preprocessing: You need arbitrary Python code around model inference.
Flexible deployment targets: You may deploy to Kubernetes, KServe, Seldon Core, Knative, AWS Lambda, Azure Functions, or Google Cloud Run.
Packaged reproducibility: You want model, code, dependencies, and runtime config bundled together.

BentoML is less ideal if you need a purely Kubernetes-native CRD workflow from day one, or if self-hosting via Yatai raises maintenance concerns for your team.

3. Choose Seldon when inference workflows are complex

Seldon Core is a strong fit when:

Inference graphs: You need preprocessors, routers, combiners, explainers, or ensembles.
Advanced rollout patterns: You need canary, A/B, or Multi-Armed-Bandit deployments.
Drift monitoring: You want Alibi Detect integration in Seldon Core v2 pipelines.
Async inference: You need Kafka-based event-driven inference, as described in the Spheron source.
Multi-model serving: You want MLServer to load multiple models in one process.

Seldon is less ideal for teams that want the lowest operational complexity or the simplest local development loop.

9. Common Migration Paths Between Model Serving Platforms

Model serving choices are rarely permanent. Teams often migrate as their ML platform matures.

Migration Path	Why Teams Move	Practical Consideration
BentoML → KServe	Team starts Python-first, then standardizes on Kubernetes ML serving	BentoML-packaged models can be deployed to KServe according to Xebia
BentoML → Seldon Core	Team needs richer routing, inference graphs, or A/B testing	BentoML can deploy to Seldon Core per Xebia, but graph behavior must be modeled in Seldon
KServe → Seldon Core	Team needs complex pipelines, routers, combiners, or inline drift nodes	Requires adopting Seldon CRDs and operational patterns
Seldon Core → KServe	Team wants standardized serving APIs and simpler model endpoint abstraction	Advanced graph features may need redesign
Plain Kubernetes → Any of the three	Plain Deployments lack ML-aware readiness, rollout, observability, and scaling semantics	Platform choice depends on whether the team prioritizes Python workflow, standardization, or graph flexibility

From BentoML to Kubernetes-native serving

A common path is starting with BentoML because it lets ML engineers ship quickly without learning Kubernetes deeply. As the platform grows, teams may move production serving into KServe or Seldon for stronger Kubernetes-native control.

The Xebia source explicitly states BentoML-packaged models can be deployed in KServe and Seldon Core, making it a possible stepping stone rather than a dead end.

From KServe to Seldon

Teams may move from KServe to Seldon when single-model endpoint standardization is no longer enough. For example, if you need a preprocessor, multiple candidate models, a custom router, and a combiner in one deployment, Seldon’s inference graph model is a better conceptual fit.

From Seldon to KServe

The reverse migration can also happen. If a team no longer needs complex graphs and wants a simpler standardized InferenceService model, KServe may reduce conceptual overhead.

Migration advice: Do not migrate only for feature parity. Migrate when your dominant operational problem changes: developer speed, Kubernetes standardization, or inference workflow complexity.

10. Final Recommendation Based on Team Size and Infrastructure Maturity

The best choice in KServe vs BentoML vs Seldon depends less on which platform has the longest feature list and more on what your team can operate reliably.

Team / Infrastructure Profile	Recommended Starting Point	Why
Small ML team, limited Kubernetes experience	BentoML	Strong local development, Python-first APIs, minimal Kubernetes knowledge required initially
Growing team deploying multiple production models	KServe or BentoML	KServe if platform team owns Kubernetes; BentoML if ML engineers own service packaging
Mature Kubernetes platform team	KServe	Standardized CRDs, Knative option, consistent inference APIs
Team with complex multi-step inference	Seldon Core	Graphs, routers, combiners, pipelines, A/B testing, Multi-Armed-Bandit patterns
Team serving many smaller models on shared GPUs	Seldon Core v2 with MLServer	MLServer can load multiple models in one process, according to Spheron
Team serving isolated high-value model endpoints	KServe or BentoML	Per-pod/per-Bento isolation can reduce blast radius compared with shared MLServer processes
Enterprise platform requiring standardized model serving	KServe	CNCF alignment and `InferenceService` standardization
Team prioritizing managed BentoML experience	BentoCloud	Source data identifies it as BentoML’s maintained first-party managed path

Decision shortcut

Use this simple rule:

Choose BentoML if your biggest problem is packaging models into APIs quickly.
Choose KServe if your biggest problem is standardizing model serving on Kubernetes.
Choose Seldon Core if your biggest problem is orchestrating complex inference workflows.

Bottom Line

For most teams, the right model serving platform is determined by operating model.

KServe is the strongest fit for Kubernetes-native organizations that want standardized model serving, Knative-based scale-to-zero, canary deployments, and framework-agnostic APIs. It is especially compelling when an internal platform team already manages Kubernetes and wants a consistent InferenceService abstraction.

BentoML is the best fit for Python-first teams that value fast local iteration, simple service definitions, and reproducible packaging. Its main trade-off is that production Kubernetes delivery can require CI/CD changes, and self-hosted Yatai should be evaluated carefully at the time of writing.

Seldon Core is the best fit for advanced inference workflows: graphs, pipelines, routers, combiners, A/B tests, Multi-Armed-Bandit deployments, Kafka-based async inference, and drift monitoring. The trade-off is operational complexity and the need to understand Seldon version/runtime differences.

FAQ: KServe vs BentoML vs Seldon

Which is easiest to start with: KServe, BentoML, or Seldon?

BentoML is the easiest to start with based on the source data. It lets developers define a service in Python, run it locally with bentoml serve, and iterate without needing Kubernetes upfront.

Which platform is most Kubernetes-native?

KServe and Seldon Core are both Kubernetes-native. KServe centers on the InferenceService CRD, while Seldon uses CRDs such as SeldonDeployment and, in v2, Model and Pipeline.

Which is best for canary deployments and A/B testing?

Seldon Core has the richest rollout patterns in the research, including canary deployments, A/B testing, and Multi-Armed-Bandit deployments. KServe also supports canary deployments.

Which platform supports scale-to-zero?

KServe supports native scale-to-zero in serverless mode through Knative. The Spheron source states that Seldon Core v2 does not provide native scale-to-zero without external configuration, and BentoML scale-to-zero is not described as a native feature in the provided sources.

Which platform is best for custom preprocessing and post-processing?

All three can handle custom preprocessing, but in different ways. BentoML allows arbitrary Python code inside the service. KServe uses transformers in the InferenceService. Seldon Core supports transformers and richer inference graph components such as routers and combiners.

Can BentoML work with KServe or Seldon?

Yes. The Xebia research states that BentoML-packaged models can be deployed to multiple runtimes, including plain Kubernetes clusters, Seldon Core, KServe, Knative, AWS Lambda, Azure Functions, and Google Cloud Run.