HF-Inferoxy Provider Overview

This document provides a comprehensive overview of all providers supported by HF-Inferoxy, their capabilities, and use cases.

Available Providers

HF-Inferoxy supports a wide range of HuggingFace providers, each offering unique capabilities and specializations.

🤖 AI Model Providers

Cerebras

Capabilities: LLM, VLM
Strengths: High-performance inference, vision-language support
Best For: Production AI applications, vision-language tasks
Example File: cerebras.md
Key Models: openai/gpt-oss-120b, meta-llama/Llama-4-Scout-17B-16E-Instruct

Cohere

Capabilities: LLM, VLM
Strengths: Multilingual support, advanced language understanding
Best For: International applications, complex language tasks
Example File: cohere.md
Key Models: CohereLabs/c4ai-command-r-plus, CohereLabs/command-a-vision-07-2025

Featherless AI

Capabilities: LLM, VLM, Text Generation
Strengths: Efficient inference, cost-effective
Best For: Budget-conscious applications, high-volume processing
Example File: featherless-ai.md
Key Models: moonshotai/Kimi-K2-Instruct, google/gemma-3-27b-it

Fireworks AI

Capabilities: LLM, VLM
Strengths: High-speed inference, optimized performance
Best For: Real-time applications, low-latency requirements
Example File: fireworks-ai.md
Key Models: openai/gpt-oss-120b, meta-llama/Llama-4-Scout-17B-16E-Instruct

Groq

Capabilities: LLM, VLM
Strengths: Extremely fast inference, low latency
Best For: Real-time applications, interactive AI
Example File: groq.md
Key Models: openai/gpt-oss-120b, meta-llama/Llama-4-Scout-17B-16E-Instruct

Hyperbolic

Capabilities: LLM, VLM
Strengths: Vision-language models, image understanding
Best For: Computer vision applications, image analysis
Example File: hyperbolic.md
Key Models: openai/gpt-oss-120b, Qwen/Qwen2.5-VL-7B-Instruct

Nebius

Capabilities: LLM, VLM, Feature Extraction, Text Generation, Text-to-Image
Strengths: Cloud-native services, enterprise features
Best For: Enterprise applications, cloud deployments
Example File: nebius.md
Key Models: Qwen/Qwen3-235B-A22B-Instruct-2507, google/gemma-3-27b-it, Qwen/Qwen3-Embedding-8B

Novita

Capabilities: LLM, VLM, Text-to-Video
Strengths: Video generation, multimedia AI
Best For: Video creation, multimedia applications
Example File: novita.md
Key Models: openai/gpt-oss-120b, zai-org/GLM-4.5V, Wan-AI/Wan2.1-T2V-14B

NScale

Capabilities: LLM, VLM, Text-to-Image
Strengths: Scalable inference, image generation
Best For: High-scale applications, image creation
Example File: nscale.md
Key Models: openai/gpt-oss-120b, meta-llama/Llama-4-Scout-17B-16E-Instruct, stabilityai/stable-diffusion-xl-base-1.0

Replicate

Capabilities: Text-to-Image, Image-to-Image, Text-to-Video
Strengths: Easy model deployment, specialized AI tasks
Best For: AI model deployment, specialized applications
Example File: replicate.md
Key Models: black-forest-labs/FLUX.1-Kontext-dev, Qwen/Qwen-Image, Wan-AI/Wan2.2-TI2V-5B

SambaNova

Capabilities: LLM, VLM, Feature Extraction
Strengths: Enterprise-grade AI, custom model support
Best For: Enterprise applications, custom AI solutions
Example File: sambanova.md
Key Models: meta-llama/Llama-3.1-8B-Instruct, meta-llama/Llama-4-Maverick-17B-128E-Instruct, intfloat/e5-mistral-7b-instruct

Together

Capabilities: LLM, VLM, Text Generation, Text-to-Image
Strengths: Collaborative hosting, wide model support
Best For: Research applications, collaborative projects
Example File: together.md
Key Models: openai/gpt-oss-120b, meta-llama/Llama-4-Scout-17B-16E-Instruct, black-forest-labs/FLUX.1-dev

🔧 Core Services

HF-Inference

Capabilities: All Tasks, Core API, Chat Completion, Text Generation, Text-to-Image, Feature Extraction, Fill Mask, Image Classification, Image Segmentation, Object Detection, Question Answering, Summarization, Table Question Answering, Text Classification, Token Classification, Translation, Zero-shot Classification, Automatic Speech Recognition
Strengths: Comprehensive model support, core HuggingFace functionality
Best For: General AI applications, model exploration
Example File: hf-inference.md
Key Models: All HuggingFace models, comprehensive task support

🎯 Specialized AI Providers

Fal-AI

Capabilities: Automatic Speech Recognition, Text-to-Image, Image-to-Image, Text-to-Video
Strengths: Multimedia AI, audio processing
Best For: Audio applications, multimedia content creation
Example File: fal-ai.md
Key Models: openai/whisper-large-v3, Qwen/Qwen-Image, Wan-AI/Wan2.2-T2V-A14B

Provider Capabilities by Task

Text Generation & Chat Completion

Task	Providers	Best Options
Chat Completion (LLM)	Cerebras, Cohere, Featherless AI, Fireworks AI, Groq, Hyperbolic, Nebius, Novita, NScale, SambaNova, Together	Cerebras, Cohere, Groq, Together
Text Generation	Featherless AI, HF-Inference, Nebius, Together	HF-Inference, Featherless AI, Nebius
Advanced Language Models	Cohere, Together	Cohere for multilingual, Together for variety

Vision & Multimodal AI

Task	Providers	Best Options
Vision-Language (VLM)	Cerebras, Cohere, Featherless AI, Fireworks AI, Groq, Hyperbolic, Nebius, Novita, NScale, SambaNova, Together	Cerebras, Cohere, Groq, Hyperbolic
Text-to-Image	Fal-AI, HF-Inference, Nebius, NScale, Replicate, Together	Fal-AI, Replicate, Together
Image-to-Image	Fal-AI, Replicate	Replicate for advanced processing
Text-to-Video	Fal-AI, Novita, Replicate	Novita for high-quality, Replicate for variety

AI Infrastructure & Processing

Task	Providers	Best Options
Feature Extraction	HF-Inference, Nebius, SambaNova	HF-Inference for variety, SambaNova for enterprise
Automatic Speech Recognition	Fal-AI, HF-Inference	Fal-AI for specialized, HF-Inference for general
Fill Mask	HF-Inference	Comprehensive model support
Image Classification	HF-Inference	Wide range of models
Image Segmentation	HF-Inference	Comprehensive model support
Object Detection	HF-Inference	Comprehensive model support
Question Answering	HF-Inference	Wide range of models
Summarization	HF-Inference	Comprehensive model support
Table Question Answering	HF-Inference	Specialized models
Text Classification	HF-Inference	Wide range of models
Token Classification	HF-Inference	Named entity recognition
Translation	HF-Inference	Multilingual support
Zero-shot Classification	HF-Inference	Flexible classification

Provider Selection Guide

For Production Applications

High Performance: Cerebras, Groq
Enterprise Features: Nebius, SambaNova
Cost Optimization: Featherless AI, Together

For Research & Development

Model Variety: HF-Inference, Together
Specialized Tasks: Fal-AI, Replicate
Cutting-edge Models: Cohere, Hyperbolic

For Multimedia Applications

Text-to-Image: Fal-AI, Replicate, Together, HF-Inference, Nebius, NScale
Image-to-Image: Fal-AI, Replicate
Text-to-Video: Novita, Replicate, Fal-AI
Audio Processing: Fal-AI, HF-Inference

For International Applications

Multilingual Support: Cohere, HF-Inference
Global Infrastructure: Together, Nebius

Performance Characteristics

Speed & Latency

Ultra-fast: Groq (sub-second responses)
Fast: Fireworks AI, Cerebras
Standard: HF-Inference, Together
Optimized: Featherless AI, NScale

Scalability

High Scale: NScale, SambaNova
Cloud-native: Nebius, Together
Flexible: HF-Inference, Replicate

Cost Efficiency

Budget-friendly: Featherless AI, Together
Performance-focused: Cerebras, Groq
Enterprise: Nebius, SambaNova

Getting Started with Providers

1. Choose Your Provider

Based on your use case and requirements from the capabilities above.

2. Check Provider Examples

Each provider has detailed examples in the examples/provider/ directory.

3. Test with Simple Examples

Use the quickstart guide to test basic functionality.

4. Implement Advanced Patterns

See the comprehensive integration guide for production patterns.

Provider-Specific Configuration

Environment Variables

# Set proxy URL (optional, defaults to localhost:8000)
export HF_PROXY_URL="http://your-proxy-server:8000"

# Provider-specific settings can be configured in your application

Error Handling

Each provider may have specific error patterns. HF-Inferoxy automatically handles:

Authentication errors (401)
Credit exhaustion (402)
Rate limiting (429)
Server errors (5xx)

Model Compatibility

Check provider documentation for supported models
Use provider-specific model names when required
HF-Inferoxy handles token management regardless of provider

Support and Resources

Provider Documentation

Each provider has comprehensive examples in examples/provider/
Check provider-specific documentation for advanced features
HF-Inferoxy provides unified token management across all providers

Getting Help

Start with: Quick Start Guide
Provider examples: Provider Examples
Advanced usage: Comprehensive Guide
Server setup: Main README

Community Resources

HuggingFace provider documentation
Provider-specific GitHub repositories
HF-Inferoxy issue tracker for token management issues

Conclusion

HF-Inferoxy provides unified token management across all supported HuggingFace providers, allowing you to:

Choose the best provider for your specific use case
Switch providers seamlessly without changing token management code
Leverage provider strengths while maintaining consistent error handling
Scale across providers with unified monitoring and reporting

This flexibility enables you to build robust AI applications that can adapt to changing requirements and leverage the best capabilities of each provider.