HF-Inferoxy Documentation
Welcome to the HF-Inferoxy documentation. This comprehensive guide covers everything you need to know about using and configuring the Hugging Face inference proxy server.
π Quick Start
Quick Start Guide
Get up and running in under 5 minutes with basic token management and HuggingFace Hub integration.
Perfect for: First-time users, simple use cases, testing the integration
β οΈ Authentication Required
All client operations now require authentication with the HF-Inferoxy server. See the RBAC documentation for setup details.
Provider Overview
Comprehensive guide to all supported providers, their capabilities, and selection criteria.
Perfect for: Choosing the right provider, understanding provider differences, planning multi-provider applications
π Core Documentation
Comprehensive HuggingFace Hub Integration
Complete guide with advanced patterns, error handling, and production-ready examples.
Perfect for: Production deployments, complex applications, advanced usage patterns
Main Project README
Server setup, configuration, and core proxy functionality documentation.
Perfect for: Setting up the HF-Inferoxy server, understanding core features
π Supported Providers
HF-Inferoxy supports a wide range of HuggingFace providers through a unified interface:
AI Model Providers
- Cerebras - High-performance AI models and vision-language capabilities
- Cohere - Advanced language models and vision AI
- Featherless AI - Efficient AI inference and text generation
- Fireworks AI - Fast AI model deployment and inference
- Groq - Ultra-fast AI inference and chat completion
- Hyperbolic - Vision-language models and AI capabilities
- Nebius - Cloud AI services and model hosting
- Novita - AI model inference and video generation
- NScale - Scalable AI inference and image generation
- Replicate - AI model deployment and inference
- SambaNova - Enterprise AI models and embeddings
- Together - Collaborative AI model hosting
Core Services
- HF-Inference - HuggingFaceβs core inference API with comprehensive model support
π§ Key Features
- π RBAC Authentication: Secure role-based access control for all operations
- π Automatic Token Rotation: Seamlessly switch between API keys when errors occur
- π‘οΈ Smart Error Handling: Intelligent detection of auth/credit errors with appropriate responses
- π Usage Reporting: Track token usage and health for better key management
- π Zero Configuration: Works out of the box with sensible defaults
- π§ Flexible Integration: Simple functions that work with any HuggingFace workflow
- π Multi-Provider Support: Unified interface across all supported providers
π» Quick Reference
Essential Functions
# Get a managed token (requires authentication)
proxy_api_key = "your_proxy_api_key_here" # Get this from admin
token, token_id = get_proxy_token(api_key=proxy_api_key)
# Report token usage
report_token_status(token_id, "success", api_key=proxy_api_key) # Success
report_token_status(token_id, "error", error_msg, api_key=proxy_api_key) # Error
Basic Usage Pattern
from huggingface_hub import InferenceClient
from hf_token_utils import get_proxy_token, report_token_status
# Get managed token (requires authentication)
proxy_api_key = "your_proxy_api_key_here" # Get this from admin
token, token_id = get_proxy_token(api_key=proxy_api_key)
# Use with any HuggingFace provider
client = InferenceClient(provider="cerebras", api_key=token) # or "cohere", "groq", etc.
try:
result = client.chat.completions.create(
model="openai/gpt-oss-120b",
messages=[{"role": "user", "content": "Hello!"}]
)
report_token_status(token_id, "success", api_key=proxy_api_key) # Report success
except Exception as e:
report_token_status(token_id, "error", str(e), api_key=proxy_api_key) # Report error
raise
π Provider Capabilities by Task
Task | Providers | Examples |
---|---|---|
Chat Completion (LLM) | All providers | Basic Example, Streaming |
Vision-Language (VLM) | Cerebras, Cohere, Featherless, Fireworks, Groq, Hyperbolic, Nebius, Novita, NScale, SambaNova, Together | VLM Examples |
Feature Extraction | HF-Inference, Nebius, SambaNova | Embedding Examples |
Text Generation | All providers | Generation Examples |
Image Generation | Fal-AI, HF-Inference, Nebius, NScale, Replicate, Together | Image Examples |
Video Generation | Fal-AI, Novita, Replicate | Video Examples |
Speech Recognition | Fal-AI, HF-Inference | Audio Examples |
Image Processing | Fal-AI, HF-Inference, Replicate | Processing Examples |
π Examples & Code
Getting Started Examples
- Simple Chat Completion - Basic chat completion example
- Streaming Chat Completion - Streaming chat completion example
- Token Utilities - Helper functions for token management
π‘ Pro Tip: Use the download script to extract executable Python files from the documentation!
Provider-Specific Examples
All provider examples are available in the examples/provider/ directory with detailed configuration and usage patterns.
π― Navigation Guide
For New Users
- Start with Quick Start Guide
- Review Provider Overview
- Try the Simple Examples
For Developers
- Explore Provider Guides for your specific provider
- Check Token Utilities for helper functions
- Review Hugging Face Hub Integration
For Production
- Read the comprehensive guide
- Review Provider Configuration guides
- Check Examples for implementation patterns
ποΈ Architecture Overview
Your App β hf_token_utils β HF-Inferoxy Server β HuggingFace Providers
β β
βββ Token Management β--------β
- Your app authenticates with HF-Inferoxy using your API key
- Your app requests a token from HF-Inferoxy (with authentication)
- HF-Inferoxy provides a valid token from its pool
- Your app uses the token with any HuggingFace provider
- Your app reports success/error back to HF-Inferoxy (with authentication)
- HF-Inferoxy manages token health and rotation automatically
Note: All client-server communication requires authentication via API keys as part of the RBAC system.
π Related Resources
- Main README - Project overview and quick start
- RBAC Documentation - User management and authentication setup
- Server Documentation - Server configuration and API endpoints
- GitHub Repository - Source code and issues
- Hugging Face Hub - Model repository and inference endpoints
Getting Help
- Start with: Quick Start Guide for immediate setup
- Authentication setup: See RBAC documentation for user management
- Choose your provider: Browse provider examples for specific use cases
- Provider selection: See Provider Overview for capabilities and selection guide
- For production: Read the comprehensive guide
- Issues: Check troubleshooting sections in the guides above
- Server setup: See the main README