
Token Factory: inference API for open models
OpenAI-compatible API to start fast.
Dedicated GPUs, post-training, and workload optimization when you scale.
Start in minutes. Engineer it for production when it matters.
Build. Customize. Optimize. Deploy
Start building
Optimize for real workloads
Customize the model
Architecture and build walkthroughs

Official Token Factory playlist on YouTube
Learn about Nebius Token Factory from the makers

Build with Token Factory playlist on YouTube
Discover AI projects, tools, and success stories created with Token Factory.
In-depth Technical Resources
Production inference is not just serving a model. These guides break down the architecture behind real-world workloads.
Why large MoE models break latency budgets and what speculative decoding changes in production systems
The invisible architecture behind great chat apps
Routing in LLM inference is the difference between scaling and stalling
Ship faster with the community
Get help and connect with other builders.

Subscribe to our newsletter
Get builder updates: new releases, cookbooks, events, and more