AI-Inferoxy Documentation

Welcome to the AI-Inferoxy documentation. This comprehensive guide covers everything you need to know about using and configuring the multi-provider AI inference proxy server supporting Hugging Face and Vercel AI Gateway.

πŸš€ Quick Start

Quick Start Guide

Get up and running in under 5 minutes with basic token management and HuggingFace Hub integration.

Perfect for: First-time users, simple use cases, testing the integration

⚠️ Authentication Required

All client operations now require authentication with the AI-Inferoxy server. See the RBAC documentation for setup details.

Provider Overview

Comprehensive guide to all supported providers, their capabilities, and selection criteria.

Perfect for: Choosing the right provider, understanding provider differences, planning multi-provider applications

πŸ“š Core Documentation

Comprehensive HuggingFace Hub Integration

Complete guide with advanced patterns, error handling, and production-ready examples.

Perfect for: Production deployments, complex applications, advanced usage patterns

Main Project README

Server setup, configuration, and core proxy functionality documentation.

Perfect for: Setting up the HF-Inferoxy server, understanding core features

🌐 Supported Providers

AI-Inferoxy supports multiple AI inference providers through a unified interface:

Primary Providers

Hugging Face Provider Ecosystem

AI-Inferoxy supports a wide range of HuggingFace providers through a unified interface:

AI Model Providers

  • Cerebras - High-performance AI models and vision-language capabilities
  • Cohere - Advanced language models and vision AI
  • Featherless AI - Efficient AI inference and text generation
  • Fireworks AI - Fast AI model deployment and inference
  • Groq - Ultra-fast AI inference and chat completion
  • Hyperbolic - Vision-language models and AI capabilities
  • Nebius - Cloud AI services and model hosting
  • Novita - AI model inference and video generation
  • NScale - Scalable AI inference and image generation
  • Replicate - AI model deployment and inference
  • SambaNova - Enterprise AI models and embeddings
  • Together - Collaborative AI model hosting

Core Services

  • HF-Inference - Hugging Face core inference API with comprehensive model support
  • Vercel AI Gateway - OpenAI-compatible API gateway with advanced routing and caching

πŸ”§ Key Features

  • πŸ” RBAC Authentication: Secure role-based access control for all operations
  • πŸ”„ Automatic Token Rotation: Seamlessly switch between API keys when errors occur
  • πŸ›‘οΈ Smart Error Handling: Intelligent detection of auth/credit errors with appropriate responses
  • πŸ“Š Usage Reporting: Track token usage and health for better key management
  • πŸš€ Zero Configuration: Works out of the box with sensible defaults
  • πŸ”§ Flexible Integration: Simple functions that work with any HuggingFace workflow
  • 🌐 Multi-Provider Support: Unified interface across Hugging Face and Vercel AI Gateway providers
  • πŸ”§ Provider-Specific Features: Specialized support for each provider’s unique capabilities

πŸ’» Quick Reference

Essential Functions

# Get a managed token (requires authentication)
proxy_api_key = "your_proxy_api_key_here"  # Get this from admin

# For Hugging Face providers
token, token_id = get_proxy_token(provider="hf", api_key=proxy_api_key)

# For Vercel AI Gateway
token, token_id = get_proxy_token(provider="vc", api_key=proxy_api_key)

# Report token usage
report_token_status(token_id, "success", provider="hf", api_key=proxy_api_key)          # Success
report_token_status(token_id, "error", error_msg, provider="vc", api_key=proxy_api_key) # Error

Basic Usage Pattern

from huggingface_hub import InferenceClient
from hf_token_utils import get_proxy_token, report_token_status
from vc_token_utils import get_proxy_token as get_vc_token, report_token_status as report_vc_status

# Get managed token (requires authentication)
proxy_api_key = "your_proxy_api_key_here"  # Get this from admin

# Option 1: Use with Hugging Face providers
token, token_id = get_proxy_token(provider="hf", api_key=proxy_api_key)
client = InferenceClient(provider="cerebras", api_key=token)  # or "cohere", "groq", etc.

# Option 2: Use with Vercel AI Gateway
vc_token, vc_token_id = get_vc_token(provider="vc", api_key=proxy_api_key)
vc_client = InferenceClient(provider="openai", api_key=vc_token, base_url="https://ai-gateway.vercel.sh")

try:
    # Hugging Face example
    result = client.chat.completions.create(
        model="openai/gpt-oss-120b",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    report_token_status(token_id, "success", provider="hf", api_key=proxy_api_key)
    
    # Vercel AI Gateway example
    vc_result = vc_client.chat.completions.create(
        model="openai/gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    report_vc_status(vc_token_id, "success", provider="vc", api_key=proxy_api_key)
    
except Exception as e:
    report_token_status(token_id, "error", str(e), provider="hf", api_key=proxy_api_key)
    report_vc_status(vc_token_id, "error", str(e), provider="vc", api_key=proxy_api_key)
    raise

πŸ“‹ Provider Capabilities by Task(HF)

Task Providers Examples
Chat Completion (LLM) All providers Basic Example, Streaming
Vision-Language (VLM) Cerebras, Cohere, Featherless, Fireworks, Groq, Hyperbolic, Nebius, Novita, NScale, SambaNova, Together VLM Examples
Feature Extraction HF-Inference, Nebius, SambaNova Embedding Examples
Text Generation All providers Generation Examples
Image Generation Fal-AI, HF-Inference, Nebius, NScale, Replicate, Together Image Examples
Video Generation Fal-AI, Novita, Replicate Video Examples
Speech Recognition Fal-AI, HF-Inference Audio Examples
Image Processing Fal-AI, HF-Inference, Replicate Processing Examples

πŸ“ Examples & Code

Getting Started Examples

πŸ’‘ Pro Tip: Use the download script to extract executable Python files from the documentation!

Provider-Specific Examples

  • Hugging Face Providers: All HF provider examples are available in the hf-examples/provider/ directory with detailed configuration and usage patterns.
  • Vercel AI Gateway: VC-specific examples and utilities are available in the vc-examples/ directory.

🎯 Navigation Guide

For New Users

  1. Start with Quick Start Guide
  2. Review Provider Overview
  3. Try the Simple Examples

For Developers

  1. Explore Provider Guides for your specific provider
  2. Check Token Utilities for helper functions
  3. Review Hugging Face Hub Integration

For Production

  1. Read the comprehensive guide
  2. Review Provider Configuration guides
  3. Check Examples for implementation patterns

πŸ—οΈ Architecture Overview

Your App β†’ Token Utils β†’ AI-Inferoxy Server β†’ Multiple Providers
    ↑         ↓              ↓                    ↓
    └── Token Management β†β”€β”€β”˜              HF Providers
                                           VC Gateway

Multi-Provider Flow:

  1. Your app authenticates with AI-Inferoxy using your API key
  2. Your app requests a token from AI-Inferoxy for specific provider (HF or VC)
  3. AI-Inferoxy provides a valid token from the appropriate provider pool
  4. Your app uses the token with the chosen provider (Hugging Face or Vercel AI Gateway)
  5. Your app reports success/error back to AI-Inferoxy (with authentication)
  6. AI-Inferoxy manages token health and rotation automatically per provider

Provider-Specific Features:

  • Hugging Face: Extensive model ecosystem, specialized providers (Cerebras, Groq, etc.)
  • Vercel AI Gateway: OpenAI-compatible API, advanced routing, caching, and analytics

Note: All client-server communication requires authentication via API keys as part of the RBAC system.

Getting Help

  1. Start with: Quick Start Guide for immediate setup
  2. Authentication setup: See RBAC documentation for user management
  3. Choose your provider: Browse provider examples for specific use cases
  4. Provider selection: See Provider Overview for capabilities and selection guide
  5. For production: Read the comprehensive guide
  6. Issues: Check troubleshooting sections in the guides above
  7. Server setup: See the main README