HF-Inferoxy Provider Overview
This document provides a comprehensive overview of all providers supported by HF-Inferoxy, their capabilities, and use cases.
Available Providers
HF-Inferoxy supports a wide range of HuggingFace providers, each offering unique capabilities and specializations.
🤖 AI Model Providers
Cerebras
- Capabilities: LLM, VLM
- Strengths: High-performance inference, vision-language support
- Best For: Production AI applications, vision-language tasks
- Example File: cerebras.md
- Key Models:
openai/gpt-oss-120b
,meta-llama/Llama-4-Scout-17B-16E-Instruct
Cohere
- Capabilities: LLM, VLM
- Strengths: Multilingual support, advanced language understanding
- Best For: International applications, complex language tasks
- Example File: cohere.md
- Key Models:
CohereLabs/c4ai-command-r-plus
,CohereLabs/command-a-vision-07-2025
Featherless AI
- Capabilities: LLM, VLM, Text Generation
- Strengths: Efficient inference, cost-effective
- Best For: Budget-conscious applications, high-volume processing
- Example File: featherless-ai.md
- Key Models:
moonshotai/Kimi-K2-Instruct
,google/gemma-3-27b-it
Fireworks AI
- Capabilities: LLM, VLM
- Strengths: High-speed inference, optimized performance
- Best For: Real-time applications, low-latency requirements
- Example File: fireworks-ai.md
- Key Models:
openai/gpt-oss-120b
,meta-llama/Llama-4-Scout-17B-16E-Instruct
Groq
- Capabilities: LLM, VLM
- Strengths: Extremely fast inference, low latency
- Best For: Real-time applications, interactive AI
- Example File: groq.md
- Key Models:
openai/gpt-oss-120b
,meta-llama/Llama-4-Scout-17B-16E-Instruct
Hyperbolic
- Capabilities: LLM, VLM
- Strengths: Vision-language models, image understanding
- Best For: Computer vision applications, image analysis
- Example File: hyperbolic.md
- Key Models:
openai/gpt-oss-120b
,Qwen/Qwen2.5-VL-7B-Instruct
Nebius
- Capabilities: LLM, VLM, Feature Extraction, Text Generation, Text-to-Image
- Strengths: Cloud-native services, enterprise features
- Best For: Enterprise applications, cloud deployments
- Example File: nebius.md
- Key Models:
Qwen/Qwen3-235B-A22B-Instruct-2507
,google/gemma-3-27b-it
,Qwen/Qwen3-Embedding-8B
Novita
- Capabilities: LLM, VLM, Text-to-Video
- Strengths: Video generation, multimedia AI
- Best For: Video creation, multimedia applications
- Example File: novita.md
- Key Models:
openai/gpt-oss-120b
,zai-org/GLM-4.5V
,Wan-AI/Wan2.1-T2V-14B
NScale
- Capabilities: LLM, VLM, Text-to-Image
- Strengths: Scalable inference, image generation
- Best For: High-scale applications, image creation
- Example File: nscale.md
- Key Models:
openai/gpt-oss-120b
,meta-llama/Llama-4-Scout-17B-16E-Instruct
,stabilityai/stable-diffusion-xl-base-1.0
Replicate
- Capabilities: Text-to-Image, Image-to-Image, Text-to-Video
- Strengths: Easy model deployment, specialized AI tasks
- Best For: AI model deployment, specialized applications
- Example File: replicate.md
- Key Models:
black-forest-labs/FLUX.1-Kontext-dev
,Qwen/Qwen-Image
,Wan-AI/Wan2.2-TI2V-5B
SambaNova
- Capabilities: LLM, VLM, Feature Extraction
- Strengths: Enterprise-grade AI, custom model support
- Best For: Enterprise applications, custom AI solutions
- Example File: sambanova.md
- Key Models:
meta-llama/Llama-3.1-8B-Instruct
,meta-llama/Llama-4-Maverick-17B-128E-Instruct
,intfloat/e5-mistral-7b-instruct
Together
- Capabilities: LLM, VLM, Text Generation, Text-to-Image
- Strengths: Collaborative hosting, wide model support
- Best For: Research applications, collaborative projects
- Example File: together.md
- Key Models:
openai/gpt-oss-120b
,meta-llama/Llama-4-Scout-17B-16E-Instruct
,black-forest-labs/FLUX.1-dev
🔧 Core Services
HF-Inference
- Capabilities: All Tasks, Core API, Chat Completion, Text Generation, Text-to-Image, Feature Extraction, Fill Mask, Image Classification, Image Segmentation, Object Detection, Question Answering, Summarization, Table Question Answering, Text Classification, Token Classification, Translation, Zero-shot Classification, Automatic Speech Recognition
- Strengths: Comprehensive model support, core HuggingFace functionality
- Best For: General AI applications, model exploration
- Example File: hf-inference.md
- Key Models: All HuggingFace models, comprehensive task support
🎯 Specialized AI Providers
Fal-AI
- Capabilities: Automatic Speech Recognition, Text-to-Image, Image-to-Image, Text-to-Video
- Strengths: Multimedia AI, audio processing
- Best For: Audio applications, multimedia content creation
- Example File: fal-ai.md
- Key Models:
openai/whisper-large-v3
,Qwen/Qwen-Image
,Wan-AI/Wan2.2-T2V-A14B
Provider Capabilities by Task
Text Generation & Chat Completion
Task | Providers | Best Options |
---|---|---|
Chat Completion (LLM) | Cerebras, Cohere, Featherless AI, Fireworks AI, Groq, Hyperbolic, Nebius, Novita, NScale, SambaNova, Together | Cerebras, Cohere, Groq, Together |
Text Generation | Featherless AI, HF-Inference, Nebius, Together | HF-Inference, Featherless AI, Nebius |
Advanced Language Models | Cohere, Together | Cohere for multilingual, Together for variety |
Vision & Multimodal AI
Task | Providers | Best Options |
---|---|---|
Vision-Language (VLM) | Cerebras, Cohere, Featherless AI, Fireworks AI, Groq, Hyperbolic, Nebius, Novita, NScale, SambaNova, Together | Cerebras, Cohere, Groq, Hyperbolic |
Text-to-Image | Fal-AI, HF-Inference, Nebius, NScale, Replicate, Together | Fal-AI, Replicate, Together |
Image-to-Image | Fal-AI, Replicate | Replicate for advanced processing |
Text-to-Video | Fal-AI, Novita, Replicate | Novita for high-quality, Replicate for variety |
AI Infrastructure & Processing
Task | Providers | Best Options |
---|---|---|
Feature Extraction | HF-Inference, Nebius, SambaNova | HF-Inference for variety, SambaNova for enterprise |
Automatic Speech Recognition | Fal-AI, HF-Inference | Fal-AI for specialized, HF-Inference for general |
Fill Mask | HF-Inference | Comprehensive model support |
Image Classification | HF-Inference | Wide range of models |
Image Segmentation | HF-Inference | Comprehensive model support |
Object Detection | HF-Inference | Comprehensive model support |
Question Answering | HF-Inference | Wide range of models |
Summarization | HF-Inference | Comprehensive model support |
Table Question Answering | HF-Inference | Specialized models |
Text Classification | HF-Inference | Wide range of models |
Token Classification | HF-Inference | Named entity recognition |
Translation | HF-Inference | Multilingual support |
Zero-shot Classification | HF-Inference | Flexible classification |
Provider Selection Guide
For Production Applications
- High Performance: Cerebras, Groq
- Enterprise Features: Nebius, SambaNova
- Cost Optimization: Featherless AI, Together
For Research & Development
- Model Variety: HF-Inference, Together
- Specialized Tasks: Fal-AI, Replicate
- Cutting-edge Models: Cohere, Hyperbolic
For Multimedia Applications
- Text-to-Image: Fal-AI, Replicate, Together, HF-Inference, Nebius, NScale
- Image-to-Image: Fal-AI, Replicate
- Text-to-Video: Novita, Replicate, Fal-AI
- Audio Processing: Fal-AI, HF-Inference
For International Applications
- Multilingual Support: Cohere, HF-Inference
- Global Infrastructure: Together, Nebius
Performance Characteristics
Speed & Latency
- Ultra-fast: Groq (sub-second responses)
- Fast: Fireworks AI, Cerebras
- Standard: HF-Inference, Together
- Optimized: Featherless AI, NScale
Scalability
- High Scale: NScale, SambaNova
- Cloud-native: Nebius, Together
- Flexible: HF-Inference, Replicate
Cost Efficiency
- Budget-friendly: Featherless AI, Together
- Performance-focused: Cerebras, Groq
- Enterprise: Nebius, SambaNova
Getting Started with Providers
1. Choose Your Provider
Based on your use case and requirements from the capabilities above.
2. Check Provider Examples
Each provider has detailed examples in the examples/provider/ directory.
3. Test with Simple Examples
Use the quickstart guide to test basic functionality.
4. Implement Advanced Patterns
See the comprehensive integration guide for production patterns.
Provider-Specific Configuration
Environment Variables
# Set proxy URL (optional, defaults to localhost:8000)
export HF_PROXY_URL="http://your-proxy-server:8000"
# Provider-specific settings can be configured in your application
Error Handling
Each provider may have specific error patterns. HF-Inferoxy automatically handles:
- Authentication errors (401)
- Credit exhaustion (402)
- Rate limiting (429)
- Server errors (5xx)
Model Compatibility
- Check provider documentation for supported models
- Use provider-specific model names when required
- HF-Inferoxy handles token management regardless of provider
Support and Resources
Provider Documentation
- Each provider has comprehensive examples in examples/provider/
- Check provider-specific documentation for advanced features
- HF-Inferoxy provides unified token management across all providers
Getting Help
- Start with: Quick Start Guide
- Provider examples: Provider Examples
- Advanced usage: Comprehensive Guide
- Server setup: Main README
Community Resources
- HuggingFace provider documentation
- Provider-specific GitHub repositories
- HF-Inferoxy issue tracker for token management issues
Conclusion
HF-Inferoxy provides unified token management across all supported HuggingFace providers, allowing you to:
- Choose the best provider for your specific use case
- Switch providers seamlessly without changing token management code
- Leverage provider strengths while maintaining consistent error handling
- Scale across providers with unified monitoring and reporting
This flexibility enables you to build robust AI applications that can adapt to changing requirements and leverage the best capabilities of each provider.