HF-Inferoxy Documentation

Welcome to the HF-Inferoxy documentation. This comprehensive guide covers everything you need to know about using and configuring the Hugging Face inference proxy server.

🚀 Quick Start

Quick Start Guide

Get up and running in under 5 minutes with basic token management and HuggingFace Hub integration.

Perfect for: First-time users, simple use cases, testing the integration

⚠️ Authentication Required

All client operations now require authentication with the HF-Inferoxy server. See the RBAC documentation for setup details.

Provider Overview

Comprehensive guide to all supported providers, their capabilities, and selection criteria.

Perfect for: Choosing the right provider, understanding provider differences, planning multi-provider applications

📚 Core Documentation

Comprehensive HuggingFace Hub Integration

Complete guide with advanced patterns, error handling, and production-ready examples.

Perfect for: Production deployments, complex applications, advanced usage patterns

Main Project README

Server setup, configuration, and core proxy functionality documentation.

Perfect for: Setting up the HF-Inferoxy server, understanding core features

🌐 Supported Providers

HF-Inferoxy supports a wide range of HuggingFace providers through a unified interface:

AI Model Providers

Cerebras - High-performance AI models and vision-language capabilities
Cohere - Advanced language models and vision AI
Featherless AI - Efficient AI inference and text generation
Fireworks AI - Fast AI model deployment and inference
Groq - Ultra-fast AI inference and chat completion
Hyperbolic - Vision-language models and AI capabilities
Nebius - Cloud AI services and model hosting
Novita - AI model inference and video generation
NScale - Scalable AI inference and image generation
Replicate - AI model deployment and inference
SambaNova - Enterprise AI models and embeddings
Together - Collaborative AI model hosting

Core Services

HF-Inference - HuggingFace’s core inference API with comprehensive model support

🔧 Key Features

🔐 RBAC Authentication: Secure role-based access control for all operations
🔄 Automatic Token Rotation: Seamlessly switch between API keys when errors occur
🛡️ Smart Error Handling: Intelligent detection of auth/credit errors with appropriate responses
📊 Usage Reporting: Track token usage and health for better key management
🚀 Zero Configuration: Works out of the box with sensible defaults
🔧 Flexible Integration: Simple functions that work with any HuggingFace workflow
🌐 Multi-Provider Support: Unified interface across all supported providers

💻 Quick Reference

Essential Functions

# Get a managed token (requires authentication)
proxy_api_key = "your_proxy_api_key_here"  # Get this from admin
token, token_id = get_proxy_token(api_key=proxy_api_key)

# Report token usage
report_token_status(token_id, "success", api_key=proxy_api_key)          # Success
report_token_status(token_id, "error", error_msg, api_key=proxy_api_key) # Error

Basic Usage Pattern

from huggingface_hub import InferenceClient
from hf_token_utils import get_proxy_token, report_token_status

# Get managed token (requires authentication)
proxy_api_key = "your_proxy_api_key_here"  # Get this from admin
token, token_id = get_proxy_token(api_key=proxy_api_key)

# Use with any HuggingFace provider
client = InferenceClient(provider="cerebras", api_key=token)  # or "cohere", "groq", etc.

try:
    result = client.chat.completions.create(
        model="openai/gpt-oss-120b",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    report_token_status(token_id, "success", api_key=proxy_api_key)  # Report success
except Exception as e:
    report_token_status(token_id, "error", str(e), api_key=proxy_api_key)  # Report error
    raise

📋 Provider Capabilities by Task

Task	Providers	Examples
Chat Completion (LLM)	All providers	Basic Example, Streaming
Vision-Language (VLM)	Cerebras, Cohere, Featherless, Fireworks, Groq, Hyperbolic, Nebius, Novita, NScale, SambaNova, Together	VLM Examples
Feature Extraction	HF-Inference, Nebius, SambaNova	Embedding Examples
Text Generation	All providers	Generation Examples
Image Generation	Fal-AI, HF-Inference, Nebius, NScale, Replicate, Together	Image Examples
Video Generation	Fal-AI, Novita, Replicate	Video Examples
Speech Recognition	Fal-AI, HF-Inference	Audio Examples
Image Processing	Fal-AI, HF-Inference, Replicate	Processing Examples

📁 Examples & Code

Getting Started Examples

Simple Chat Completion - Basic chat completion example
Streaming Chat Completion - Streaming chat completion example
Token Utilities - Helper functions for token management

💡 Pro Tip: Use the download script to extract executable Python files from the documentation!

Provider-Specific Examples

All provider examples are available in the examples/provider/ directory with detailed configuration and usage patterns.

For New Users

For Developers

Explore Provider Guides for your specific provider
Check Token Utilities for helper functions
Review Hugging Face Hub Integration

For Production

Read the comprehensive guide
Review Provider Configuration guides
Check Examples for implementation patterns

🏗️ Architecture Overview

Your App → hf_token_utils → HF-Inferoxy Server → HuggingFace Providers
    ↑                             ↓
    └── Token Management ←--------┘

Your app authenticates with HF-Inferoxy using your API key
Your app requests a token from HF-Inferoxy (with authentication)
HF-Inferoxy provides a valid token from its pool
Your app uses the token with any HuggingFace provider
Your app reports success/error back to HF-Inferoxy (with authentication)
HF-Inferoxy manages token health and rotation automatically

Note: All client-server communication requires authentication via API keys as part of the RBAC system.

Main README - Project overview and quick start
RBAC Documentation - User management and authentication setup
Server Documentation - Server configuration and API endpoints
GitHub Repository - Source code and issues
Hugging Face Hub - Model repository and inference endpoints

Getting Help

Start with: Quick Start Guide for immediate setup
Authentication setup: See RBAC documentation for user management
Choose your provider: Browse provider examples for specific use cases
Provider selection: See Provider Overview for capabilities and selection guide
For production: Read the comprehensive guide
Issues: Check troubleshooting sections in the guides above
Server setup: See the main README