AI-Inferoxy Documentation

Welcome to the AI-Inferoxy documentation. This comprehensive guide covers everything you need to know about using and configuring the multi-provider AI inference proxy server supporting Hugging Face and Vercel AI Gateway.

🚀 Quick Start

Quick Start Guide

Get up and running in under 5 minutes with basic token management and HuggingFace Hub integration.

Perfect for: First-time users, simple use cases, testing the integration

⚠️ Authentication Required

All client operations now require authentication with the AI-Inferoxy server. See the RBAC documentation for setup details.

Provider Overview

Comprehensive guide to all supported providers, their capabilities, and selection criteria.

Perfect for: Choosing the right provider, understanding provider differences, planning multi-provider applications

📚 Core Documentation

Comprehensive HuggingFace Hub Integration

Complete guide with advanced patterns, error handling, and production-ready examples.

Perfect for: Production deployments, complex applications, advanced usage patterns

Main Project README

Server setup, configuration, and core proxy functionality documentation.

Perfect for: Setting up the HF-Inferoxy server, understanding core features

🌐 Supported Providers

AI-Inferoxy supports multiple AI inference providers through a unified interface:

Primary Providers

Hugging Face Inference API - Comprehensive AI model hosting with extensive model support
Vercel AI Gateway - Modern AI gateway with OpenAI-compatible API and advanced features

Hugging Face Provider Ecosystem

AI-Inferoxy supports a wide range of HuggingFace providers through a unified interface:

AI Model Providers

Cerebras - High-performance AI models and vision-language capabilities
Cohere - Advanced language models and vision AI
Featherless AI - Efficient AI inference and text generation
Fireworks AI - Fast AI model deployment and inference
Groq - Ultra-fast AI inference and chat completion
Hyperbolic - Vision-language models and AI capabilities
Nebius - Cloud AI services and model hosting
Novita - AI model inference and video generation
NScale - Scalable AI inference and image generation
Replicate - AI model deployment and inference
SambaNova - Enterprise AI models and embeddings
Together - Collaborative AI model hosting

Core Services

HF-Inference - Hugging Face core inference API with comprehensive model support
Vercel AI Gateway - OpenAI-compatible API gateway with advanced routing and caching

🔧 Key Features

🔐 RBAC Authentication: Secure role-based access control for all operations
🔄 Automatic Token Rotation: Seamlessly switch between API keys when errors occur
🛡️ Smart Error Handling: Intelligent detection of auth/credit errors with appropriate responses
📊 Usage Reporting: Track token usage and health for better key management
🚀 Zero Configuration: Works out of the box with sensible defaults
🔧 Flexible Integration: Simple functions that work with any HuggingFace workflow
🌐 Multi-Provider Support: Unified interface across Hugging Face and Vercel AI Gateway providers
🔧 Provider-Specific Features: Specialized support for each provider’s unique capabilities

💻 Quick Reference

Essential Functions

# Get a managed token (requires authentication)
proxy_api_key = "your_proxy_api_key_here"  # Get this from admin

# For Hugging Face providers
token, token_id = get_proxy_token(provider="hf", api_key=proxy_api_key)

# For Vercel AI Gateway
token, token_id = get_proxy_token(provider="vc", api_key=proxy_api_key)

# Report token usage
report_token_status(token_id, "success", provider="hf", api_key=proxy_api_key)          # Success
report_token_status(token_id, "error", error_msg, provider="vc", api_key=proxy_api_key) # Error

Basic Usage Pattern

from huggingface_hub import InferenceClient
from hf_token_utils import get_proxy_token, report_token_status
from vc_token_utils import get_proxy_token as get_vc_token, report_token_status as report_vc_status

# Get managed token (requires authentication)
proxy_api_key = "your_proxy_api_key_here"  # Get this from admin

# Option 1: Use with Hugging Face providers
token, token_id = get_proxy_token(provider="hf", api_key=proxy_api_key)
client = InferenceClient(provider="cerebras", api_key=token)  # or "cohere", "groq", etc.

# Option 2: Use with Vercel AI Gateway
vc_token, vc_token_id = get_vc_token(provider="vc", api_key=proxy_api_key)
vc_client = InferenceClient(provider="openai", api_key=vc_token, base_url="https://ai-gateway.vercel.sh")

try:
    # Hugging Face example
    result = client.chat.completions.create(
        model="openai/gpt-oss-120b",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    report_token_status(token_id, "success", provider="hf", api_key=proxy_api_key)
    
    # Vercel AI Gateway example
    vc_result = vc_client.chat.completions.create(
        model="openai/gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    report_vc_status(vc_token_id, "success", provider="vc", api_key=proxy_api_key)
    
except Exception as e:
    report_token_status(token_id, "error", str(e), provider="hf", api_key=proxy_api_key)
    report_vc_status(vc_token_id, "error", str(e), provider="vc", api_key=proxy_api_key)
    raise

📋 Provider Capabilities by Task(HF)

Task	Providers	Examples
Chat Completion (LLM)	All providers	Basic Example, Streaming
Vision-Language (VLM)	Cerebras, Cohere, Featherless, Fireworks, Groq, Hyperbolic, Nebius, Novita, NScale, SambaNova, Together	VLM Examples
Feature Extraction	HF-Inference, Nebius, SambaNova	Embedding Examples
Text Generation	All providers	Generation Examples
Image Generation	Fal-AI, HF-Inference, Nebius, NScale, Replicate, Together	Image Examples
Video Generation	Fal-AI, Novita, Replicate	Video Examples
Speech Recognition	Fal-AI, HF-Inference	Audio Examples
Image Processing	Fal-AI, HF-Inference, Replicate	Processing Examples

📁 Examples & Code

Getting Started Examples

Simple Chat Completion - Basic HF chat completion example
Streaming Chat Completion - Streaming HF chat completion example
HF Token Utilities - Helper functions for HF token management
VC Token Utilities - Helper functions for Vercel AI Gateway token management

💡 Pro Tip: Use the download script to extract executable Python files from the documentation!

Provider-Specific Examples

Hugging Face Providers: All HF provider examples are available in the hf-examples/provider/ directory with detailed configuration and usage patterns.
Vercel AI Gateway: VC-specific examples and utilities are available in the vc-examples/ directory.

For New Users

For Developers

Explore Provider Guides for your specific provider
Check Token Utilities for helper functions
Review Hugging Face Hub Integration

For Production

Read the comprehensive guide
Review Provider Configuration guides
Check Examples for implementation patterns

🏗️ Architecture Overview

Your App → Token Utils → AI-Inferoxy Server → Multiple Providers
    ↑         ↓              ↓                    ↓
    └── Token Management ←──┘              HF Providers
                                           VC Gateway

Multi-Provider Flow:

Your app authenticates with AI-Inferoxy using your API key
Your app requests a token from AI-Inferoxy for specific provider (HF or VC)
AI-Inferoxy provides a valid token from the appropriate provider pool
Your app uses the token with the chosen provider (Hugging Face or Vercel AI Gateway)
Your app reports success/error back to AI-Inferoxy (with authentication)
AI-Inferoxy manages token health and rotation automatically per provider

Provider-Specific Features:

Hugging Face: Extensive model ecosystem, specialized providers (Cerebras, Groq, etc.)
Vercel AI Gateway: OpenAI-compatible API, advanced routing, caching, and analytics

Note: All client-server communication requires authentication via API keys as part of the RBAC system.

Main README - Project overview and quick start
RBAC Documentation - User management and authentication setup
Server Documentation - Server configuration and API endpoints
GitHub Repository - Source code and issues
Hugging Face Hub - Model repository and inference endpoints

Getting Help

Start with: Quick Start Guide for immediate setup
Authentication setup: See RBAC documentation for user management
Choose your provider: Browse provider examples for specific use cases
Provider selection: See Provider Overview for capabilities and selection guide
For production: Read the comprehensive guide
Issues: Check troubleshooting sections in the guides above
Server setup: See the main README