Serverless Inference API
Last updated
Last updated
Instant Access to thousands of ML Models for Fast Prototyping
Explore the most popular models for text, image, speech, and more — all with a simple API request. Build, test, and experiment without worrying about infrastructure or setup.
The Serverless Inference API offers a fast and free way to explore thousands of models for a variety of tasks. Whether you’re prototyping a new application or experimenting with ML capabilities, this API gives you instant access to high-performing models across multiple domains:
Text Generation: Including large language models and tool-calling prompts, generate and experiment with high-quality responses.
Image Generation: Easily create customized images, including LoRAs for your own styles.
Document Embeddings: Build search and retrieval systems with SOTA embeddings.
Classical AI Tasks: Ready-to-use models for text classification, image classification, speech recognition, and more.
⠀ ⚡ Fast and Free to Get Started: The Inference API is free with higher rate limits for PRO users. For production needs, explore Inference Endpoints for dedicated resources, autoscaling, advanced security features, and more.
🚀 Instant Prototyping: Access powerful models without setup.
🎯 Diverse Use Cases: One API for text, image, and beyond.
🔧 Developer-Friendly: Simple requests, fast responses.
⠀
Leverage over 800,000+ models from different open-source libraries (transformers, sentence transformers, adapter transformers, diffusers, timm, etc.).
Use models for a variety of tasks, including text generation, image generation, document embeddings, NER, summarization, image classification, and more.
Accelerate your prototyping by using GPU-powered models.
Run very large models that are challenging to deploy in production.
Production-grade platform without the hassle: built-in automatic scaling, load balancing and caching.
⠀
The documentation is organized into two sections:
Getting Started Learn the basics of how to use the Inference API.
API Reference Dive into task-specific settings and parameters.
⠀
If you want to get started quickly with Chat Completion models use the Inference Playground to quickly text and compare models against your prompts.