AI-Inferoxy Documentation
Welcome to the AI-Inferoxy documentation. This comprehensive guide covers everything you need to know about using and configuring the multi-provider AI inference proxy server supporting Hugging Face and Vercel AI Gateway.
π Quick Start
Quick Start Guide
Get up and running in under 5 minutes with basic token management and HuggingFace Hub integration.
Perfect for: First-time users, simple use cases, testing the integration
β οΈ Authentication Required
All client operations now require authentication with the AI-Inferoxy server. See the RBAC documentation for setup details.
Provider Overview
Comprehensive guide to all supported providers, their capabilities, and selection criteria.
Perfect for: Choosing the right provider, understanding provider differences, planning multi-provider applications
π Core Documentation
Comprehensive HuggingFace Hub Integration
Complete guide with advanced patterns, error handling, and production-ready examples.
Perfect for: Production deployments, complex applications, advanced usage patterns
Main Project README
Server setup, configuration, and core proxy functionality documentation.
Perfect for: Setting up the HF-Inferoxy server, understanding core features
π Supported Providers
AI-Inferoxy supports multiple AI inference providers through a unified interface:
Primary Providers
- Hugging Face Inference API - Comprehensive AI model hosting with extensive model support
- Vercel AI Gateway - Modern AI gateway with OpenAI-compatible API and advanced features
Hugging Face Provider Ecosystem
AI-Inferoxy supports a wide range of HuggingFace providers through a unified interface:
AI Model Providers
- Cerebras - High-performance AI models and vision-language capabilities
- Cohere - Advanced language models and vision AI
- Featherless AI - Efficient AI inference and text generation
- Fireworks AI - Fast AI model deployment and inference
- Groq - Ultra-fast AI inference and chat completion
- Hyperbolic - Vision-language models and AI capabilities
- Nebius - Cloud AI services and model hosting
- Novita - AI model inference and video generation
- NScale - Scalable AI inference and image generation
- Replicate - AI model deployment and inference
- SambaNova - Enterprise AI models and embeddings
- Together - Collaborative AI model hosting
Core Services
- HF-Inference - Hugging Face core inference API with comprehensive model support
- Vercel AI Gateway - OpenAI-compatible API gateway with advanced routing and caching
π§ Key Features
- π RBAC Authentication: Secure role-based access control for all operations
- π Automatic Token Rotation: Seamlessly switch between API keys when errors occur
- π‘οΈ Smart Error Handling: Intelligent detection of auth/credit errors with appropriate responses
- π Usage Reporting: Track token usage and health for better key management
- π Zero Configuration: Works out of the box with sensible defaults
- π§ Flexible Integration: Simple functions that work with any HuggingFace workflow
- π Multi-Provider Support: Unified interface across Hugging Face and Vercel AI Gateway providers
- π§ Provider-Specific Features: Specialized support for each providerβs unique capabilities
π» Quick Reference
Essential Functions
# Get a managed token (requires authentication)
proxy_api_key = "your_proxy_api_key_here" # Get this from admin
# For Hugging Face providers
token, token_id = get_proxy_token(provider="hf", api_key=proxy_api_key)
# For Vercel AI Gateway
token, token_id = get_proxy_token(provider="vc", api_key=proxy_api_key)
# Report token usage
report_token_status(token_id, "success", provider="hf", api_key=proxy_api_key) # Success
report_token_status(token_id, "error", error_msg, provider="vc", api_key=proxy_api_key) # Error
Basic Usage Pattern
from huggingface_hub import InferenceClient
from hf_token_utils import get_proxy_token, report_token_status
from vc_token_utils import get_proxy_token as get_vc_token, report_token_status as report_vc_status
# Get managed token (requires authentication)
proxy_api_key = "your_proxy_api_key_here" # Get this from admin
# Option 1: Use with Hugging Face providers
token, token_id = get_proxy_token(provider="hf", api_key=proxy_api_key)
client = InferenceClient(provider="cerebras", api_key=token) # or "cohere", "groq", etc.
# Option 2: Use with Vercel AI Gateway
vc_token, vc_token_id = get_vc_token(provider="vc", api_key=proxy_api_key)
vc_client = InferenceClient(provider="openai", api_key=vc_token, base_url="https://ai-gateway.vercel.sh")
try:
# Hugging Face example
result = client.chat.completions.create(
model="openai/gpt-oss-120b",
messages=[{"role": "user", "content": "Hello!"}]
)
report_token_status(token_id, "success", provider="hf", api_key=proxy_api_key)
# Vercel AI Gateway example
vc_result = vc_client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}]
)
report_vc_status(vc_token_id, "success", provider="vc", api_key=proxy_api_key)
except Exception as e:
report_token_status(token_id, "error", str(e), provider="hf", api_key=proxy_api_key)
report_vc_status(vc_token_id, "error", str(e), provider="vc", api_key=proxy_api_key)
raise
π Provider Capabilities by Task(HF)
Task | Providers | Examples |
---|---|---|
Chat Completion (LLM) | All providers | Basic Example, Streaming |
Vision-Language (VLM) | Cerebras, Cohere, Featherless, Fireworks, Groq, Hyperbolic, Nebius, Novita, NScale, SambaNova, Together | VLM Examples |
Feature Extraction | HF-Inference, Nebius, SambaNova | Embedding Examples |
Text Generation | All providers | Generation Examples |
Image Generation | Fal-AI, HF-Inference, Nebius, NScale, Replicate, Together | Image Examples |
Video Generation | Fal-AI, Novita, Replicate | Video Examples |
Speech Recognition | Fal-AI, HF-Inference | Audio Examples |
Image Processing | Fal-AI, HF-Inference, Replicate | Processing Examples |
π Examples & Code
Getting Started Examples
- Simple Chat Completion - Basic HF chat completion example
- Streaming Chat Completion - Streaming HF chat completion example
- HF Token Utilities - Helper functions for HF token management
- VC Token Utilities - Helper functions for Vercel AI Gateway token management
π‘ Pro Tip: Use the download script to extract executable Python files from the documentation!
Provider-Specific Examples
- Hugging Face Providers: All HF provider examples are available in the hf-examples/provider/ directory with detailed configuration and usage patterns.
- Vercel AI Gateway: VC-specific examples and utilities are available in the vc-examples/ directory.
π― Navigation Guide
For New Users
- Start with Quick Start Guide
- Review Provider Overview
- Try the Simple Examples
For Developers
- Explore Provider Guides for your specific provider
- Check Token Utilities for helper functions
- Review Hugging Face Hub Integration
For Production
- Read the comprehensive guide
- Review Provider Configuration guides
- Check Examples for implementation patterns
ποΈ Architecture Overview
Your App β Token Utils β AI-Inferoxy Server β Multiple Providers
β β β β
βββ Token Management ββββ HF Providers
VC Gateway
Multi-Provider Flow:
- Your app authenticates with AI-Inferoxy using your API key
- Your app requests a token from AI-Inferoxy for specific provider (HF or VC)
- AI-Inferoxy provides a valid token from the appropriate provider pool
- Your app uses the token with the chosen provider (Hugging Face or Vercel AI Gateway)
- Your app reports success/error back to AI-Inferoxy (with authentication)
- AI-Inferoxy manages token health and rotation automatically per provider
Provider-Specific Features:
- Hugging Face: Extensive model ecosystem, specialized providers (Cerebras, Groq, etc.)
- Vercel AI Gateway: OpenAI-compatible API, advanced routing, caching, and analytics
Note: All client-server communication requires authentication via API keys as part of the RBAC system.
π Related Resources
- Main README - Project overview and quick start
- RBAC Documentation - User management and authentication setup
- Server Documentation - Server configuration and API endpoints
- GitHub Repository - Source code and issues
- Hugging Face Hub - Model repository and inference endpoints
Getting Help
- Start with: Quick Start Guide for immediate setup
- Authentication setup: See RBAC documentation for user management
- Choose your provider: Browse provider examples for specific use cases
- Provider selection: See Provider Overview for capabilities and selection guide
- For production: Read the comprehensive guide
- Issues: Check troubleshooting sections in the guides above
- Server setup: See the main README