HF-Inferoxy Documentation

Welcome to the HF-Inferoxy documentation. This comprehensive guide covers everything you need to know about using and configuring the Hugging Face inference proxy server.

πŸš€ Quick Start

Quick Start Guide

Get up and running in under 5 minutes with basic token management and HuggingFace Hub integration.

Perfect for: First-time users, simple use cases, testing the integration

⚠️ Authentication Required

All client operations now require authentication with the HF-Inferoxy server. See the RBAC documentation for setup details.

Provider Overview

Comprehensive guide to all supported providers, their capabilities, and selection criteria.

Perfect for: Choosing the right provider, understanding provider differences, planning multi-provider applications

πŸ“š Core Documentation

Comprehensive HuggingFace Hub Integration

Complete guide with advanced patterns, error handling, and production-ready examples.

Perfect for: Production deployments, complex applications, advanced usage patterns

Main Project README

Server setup, configuration, and core proxy functionality documentation.

Perfect for: Setting up the HF-Inferoxy server, understanding core features

🌐 Supported Providers

HF-Inferoxy supports a wide range of HuggingFace providers through a unified interface:

AI Model Providers

  • Cerebras - High-performance AI models and vision-language capabilities
  • Cohere - Advanced language models and vision AI
  • Featherless AI - Efficient AI inference and text generation
  • Fireworks AI - Fast AI model deployment and inference
  • Groq - Ultra-fast AI inference and chat completion
  • Hyperbolic - Vision-language models and AI capabilities
  • Nebius - Cloud AI services and model hosting
  • Novita - AI model inference and video generation
  • NScale - Scalable AI inference and image generation
  • Replicate - AI model deployment and inference
  • SambaNova - Enterprise AI models and embeddings
  • Together - Collaborative AI model hosting

Core Services

  • HF-Inference - HuggingFace’s core inference API with comprehensive model support

πŸ”§ Key Features

  • πŸ” RBAC Authentication: Secure role-based access control for all operations
  • πŸ”„ Automatic Token Rotation: Seamlessly switch between API keys when errors occur
  • πŸ›‘οΈ Smart Error Handling: Intelligent detection of auth/credit errors with appropriate responses
  • πŸ“Š Usage Reporting: Track token usage and health for better key management
  • πŸš€ Zero Configuration: Works out of the box with sensible defaults
  • πŸ”§ Flexible Integration: Simple functions that work with any HuggingFace workflow
  • 🌐 Multi-Provider Support: Unified interface across all supported providers

πŸ’» Quick Reference

Essential Functions

# Get a managed token (requires authentication)
proxy_api_key = "your_proxy_api_key_here"  # Get this from admin
token, token_id = get_proxy_token(api_key=proxy_api_key)

# Report token usage
report_token_status(token_id, "success", api_key=proxy_api_key)          # Success
report_token_status(token_id, "error", error_msg, api_key=proxy_api_key) # Error

Basic Usage Pattern

from huggingface_hub import InferenceClient
from hf_token_utils import get_proxy_token, report_token_status

# Get managed token (requires authentication)
proxy_api_key = "your_proxy_api_key_here"  # Get this from admin
token, token_id = get_proxy_token(api_key=proxy_api_key)

# Use with any HuggingFace provider
client = InferenceClient(provider="cerebras", api_key=token)  # or "cohere", "groq", etc.

try:
    result = client.chat.completions.create(
        model="openai/gpt-oss-120b",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    report_token_status(token_id, "success", api_key=proxy_api_key)  # Report success
except Exception as e:
    report_token_status(token_id, "error", str(e), api_key=proxy_api_key)  # Report error
    raise

πŸ“‹ Provider Capabilities by Task

Task Providers Examples
Chat Completion (LLM) All providers Basic Example, Streaming
Vision-Language (VLM) Cerebras, Cohere, Featherless, Fireworks, Groq, Hyperbolic, Nebius, Novita, NScale, SambaNova, Together VLM Examples
Feature Extraction HF-Inference, Nebius, SambaNova Embedding Examples
Text Generation All providers Generation Examples
Image Generation Fal-AI, HF-Inference, Nebius, NScale, Replicate, Together Image Examples
Video Generation Fal-AI, Novita, Replicate Video Examples
Speech Recognition Fal-AI, HF-Inference Audio Examples
Image Processing Fal-AI, HF-Inference, Replicate Processing Examples

πŸ“ Examples & Code

Getting Started Examples

πŸ’‘ Pro Tip: Use the download script to extract executable Python files from the documentation!

Provider-Specific Examples

All provider examples are available in the examples/provider/ directory with detailed configuration and usage patterns.

🎯 Navigation Guide

For New Users

  1. Start with Quick Start Guide
  2. Review Provider Overview
  3. Try the Simple Examples

For Developers

  1. Explore Provider Guides for your specific provider
  2. Check Token Utilities for helper functions
  3. Review Hugging Face Hub Integration

For Production

  1. Read the comprehensive guide
  2. Review Provider Configuration guides
  3. Check Examples for implementation patterns

πŸ—οΈ Architecture Overview

Your App β†’ hf_token_utils β†’ HF-Inferoxy Server β†’ HuggingFace Providers
    ↑                             ↓
    └── Token Management ←--------β”˜
  1. Your app authenticates with HF-Inferoxy using your API key
  2. Your app requests a token from HF-Inferoxy (with authentication)
  3. HF-Inferoxy provides a valid token from its pool
  4. Your app uses the token with any HuggingFace provider
  5. Your app reports success/error back to HF-Inferoxy (with authentication)
  6. HF-Inferoxy manages token health and rotation automatically

Note: All client-server communication requires authentication via API keys as part of the RBAC system.

Getting Help

  1. Start with: Quick Start Guide for immediate setup
  2. Authentication setup: See RBAC documentation for user management
  3. Choose your provider: Browse provider examples for specific use cases
  4. Provider selection: See Provider Overview for capabilities and selection guide
  5. For production: Read the comprehensive guide
  6. Issues: Check troubleshooting sections in the guides above
  7. Server setup: See the main README