
Serverless AI
Run containerized AI workloads on GPUs without managing infrastructure.
Serverless AI provides two execution models:
- Jobs — run workloads to completion
- Endpoints — serve real-time requests
Provision compute on demand, execute workloads, and release resources automatically.
Get to know Serverless
Get to know Serverless
Understand how Jobs and Endpoints work, and how to move from running workloads to serving APIs.

Start building
The fastest way to get started is to run a Job or deploy an Endpoint.
Core patterns
LLM inference
Serve large language models as real-time APIs using containerized endpoints with GPU acceleration.
Training and fine-tuning
Run GPU workloads that produce model artifacts.
Advanced workflows
RAG pipelines
Build retrieval and generation workflows that combine indexing, embeddings, and inference.
Agentic workflows — OpenClaw
Run agent-style pipelines with tool use, retrieval, and multi-step reasoning using OpenClaw.
Batch and data processing
Run pipelines for embeddings, ETL, and dataset preparation.
Guides coming soon
Verticals and domain workflows
Explore real workloads built on Jobs and Endpoints.

Life sciences (OpenMM)
Run molecular simulations and generate datasets on GPU.

Voice and Media
Build pipelines that combine batch processing and inference endpoints.

Robotics and Physical AI
Run simulation, control, and perception workloads for robotics systems, from dataset generation to policy evaluation.
Guides coming soon
Get started and explore more
Try Serverless in console
Join the community
Serverless cookbook
Jobs and Endpoints docs
Video walkthroughs
Subscribe to our newsletter
Get builder updates: new releases, cookbooks, events, and more