HF-Inferoxy Provider Overview

This document provides a comprehensive overview of all providers supported by HF-Inferoxy, their capabilities, and use cases.

Available Providers

HF-Inferoxy supports a wide range of HuggingFace providers, each offering unique capabilities and specializations.

🤖 AI Model Providers

Cerebras

  • Capabilities: LLM, VLM
  • Strengths: High-performance inference, vision-language support
  • Best For: Production AI applications, vision-language tasks
  • Example File: cerebras.md
  • Key Models: openai/gpt-oss-120b, meta-llama/Llama-4-Scout-17B-16E-Instruct

Cohere

  • Capabilities: LLM, VLM
  • Strengths: Multilingual support, advanced language understanding
  • Best For: International applications, complex language tasks
  • Example File: cohere.md
  • Key Models: CohereLabs/c4ai-command-r-plus, CohereLabs/command-a-vision-07-2025

Featherless AI

  • Capabilities: LLM, VLM, Text Generation
  • Strengths: Efficient inference, cost-effective
  • Best For: Budget-conscious applications, high-volume processing
  • Example File: featherless-ai.md
  • Key Models: moonshotai/Kimi-K2-Instruct, google/gemma-3-27b-it

Fireworks AI

  • Capabilities: LLM, VLM
  • Strengths: High-speed inference, optimized performance
  • Best For: Real-time applications, low-latency requirements
  • Example File: fireworks-ai.md
  • Key Models: openai/gpt-oss-120b, meta-llama/Llama-4-Scout-17B-16E-Instruct

Groq

  • Capabilities: LLM, VLM
  • Strengths: Extremely fast inference, low latency
  • Best For: Real-time applications, interactive AI
  • Example File: groq.md
  • Key Models: openai/gpt-oss-120b, meta-llama/Llama-4-Scout-17B-16E-Instruct

Hyperbolic

  • Capabilities: LLM, VLM
  • Strengths: Vision-language models, image understanding
  • Best For: Computer vision applications, image analysis
  • Example File: hyperbolic.md
  • Key Models: openai/gpt-oss-120b, Qwen/Qwen2.5-VL-7B-Instruct

Nebius

  • Capabilities: LLM, VLM, Feature Extraction, Text Generation, Text-to-Image
  • Strengths: Cloud-native services, enterprise features
  • Best For: Enterprise applications, cloud deployments
  • Example File: nebius.md
  • Key Models: Qwen/Qwen3-235B-A22B-Instruct-2507, google/gemma-3-27b-it, Qwen/Qwen3-Embedding-8B

Novita

  • Capabilities: LLM, VLM, Text-to-Video
  • Strengths: Video generation, multimedia AI
  • Best For: Video creation, multimedia applications
  • Example File: novita.md
  • Key Models: openai/gpt-oss-120b, zai-org/GLM-4.5V, Wan-AI/Wan2.1-T2V-14B

NScale

  • Capabilities: LLM, VLM, Text-to-Image
  • Strengths: Scalable inference, image generation
  • Best For: High-scale applications, image creation
  • Example File: nscale.md
  • Key Models: openai/gpt-oss-120b, meta-llama/Llama-4-Scout-17B-16E-Instruct, stabilityai/stable-diffusion-xl-base-1.0

Replicate

  • Capabilities: Text-to-Image, Image-to-Image, Text-to-Video
  • Strengths: Easy model deployment, specialized AI tasks
  • Best For: AI model deployment, specialized applications
  • Example File: replicate.md
  • Key Models: black-forest-labs/FLUX.1-Kontext-dev, Qwen/Qwen-Image, Wan-AI/Wan2.2-TI2V-5B

SambaNova

  • Capabilities: LLM, VLM, Feature Extraction
  • Strengths: Enterprise-grade AI, custom model support
  • Best For: Enterprise applications, custom AI solutions
  • Example File: sambanova.md
  • Key Models: meta-llama/Llama-3.1-8B-Instruct, meta-llama/Llama-4-Maverick-17B-128E-Instruct, intfloat/e5-mistral-7b-instruct

Together

  • Capabilities: LLM, VLM, Text Generation, Text-to-Image
  • Strengths: Collaborative hosting, wide model support
  • Best For: Research applications, collaborative projects
  • Example File: together.md
  • Key Models: openai/gpt-oss-120b, meta-llama/Llama-4-Scout-17B-16E-Instruct, black-forest-labs/FLUX.1-dev

🔧 Core Services

HF-Inference

  • Capabilities: All Tasks, Core API, Chat Completion, Text Generation, Text-to-Image, Feature Extraction, Fill Mask, Image Classification, Image Segmentation, Object Detection, Question Answering, Summarization, Table Question Answering, Text Classification, Token Classification, Translation, Zero-shot Classification, Automatic Speech Recognition
  • Strengths: Comprehensive model support, core HuggingFace functionality
  • Best For: General AI applications, model exploration
  • Example File: hf-inference.md
  • Key Models: All HuggingFace models, comprehensive task support

🎯 Specialized AI Providers

Fal-AI

  • Capabilities: Automatic Speech Recognition, Text-to-Image, Image-to-Image, Text-to-Video
  • Strengths: Multimedia AI, audio processing
  • Best For: Audio applications, multimedia content creation
  • Example File: fal-ai.md
  • Key Models: openai/whisper-large-v3, Qwen/Qwen-Image, Wan-AI/Wan2.2-T2V-A14B

Provider Capabilities by Task

Text Generation & Chat Completion

Task Providers Best Options
Chat Completion (LLM) Cerebras, Cohere, Featherless AI, Fireworks AI, Groq, Hyperbolic, Nebius, Novita, NScale, SambaNova, Together Cerebras, Cohere, Groq, Together
Text Generation Featherless AI, HF-Inference, Nebius, Together HF-Inference, Featherless AI, Nebius
Advanced Language Models Cohere, Together Cohere for multilingual, Together for variety

Vision & Multimodal AI

Task Providers Best Options
Vision-Language (VLM) Cerebras, Cohere, Featherless AI, Fireworks AI, Groq, Hyperbolic, Nebius, Novita, NScale, SambaNova, Together Cerebras, Cohere, Groq, Hyperbolic
Text-to-Image Fal-AI, HF-Inference, Nebius, NScale, Replicate, Together Fal-AI, Replicate, Together
Image-to-Image Fal-AI, Replicate Replicate for advanced processing
Text-to-Video Fal-AI, Novita, Replicate Novita for high-quality, Replicate for variety

AI Infrastructure & Processing

Task Providers Best Options
Feature Extraction HF-Inference, Nebius, SambaNova HF-Inference for variety, SambaNova for enterprise
Automatic Speech Recognition Fal-AI, HF-Inference Fal-AI for specialized, HF-Inference for general
Fill Mask HF-Inference Comprehensive model support
Image Classification HF-Inference Wide range of models
Image Segmentation HF-Inference Comprehensive model support
Object Detection HF-Inference Comprehensive model support
Question Answering HF-Inference Wide range of models
Summarization HF-Inference Comprehensive model support
Table Question Answering HF-Inference Specialized models
Text Classification HF-Inference Wide range of models
Token Classification HF-Inference Named entity recognition
Translation HF-Inference Multilingual support
Zero-shot Classification HF-Inference Flexible classification

Provider Selection Guide

For Production Applications

  • High Performance: Cerebras, Groq
  • Enterprise Features: Nebius, SambaNova
  • Cost Optimization: Featherless AI, Together

For Research & Development

  • Model Variety: HF-Inference, Together
  • Specialized Tasks: Fal-AI, Replicate
  • Cutting-edge Models: Cohere, Hyperbolic

For Multimedia Applications

  • Text-to-Image: Fal-AI, Replicate, Together, HF-Inference, Nebius, NScale
  • Image-to-Image: Fal-AI, Replicate
  • Text-to-Video: Novita, Replicate, Fal-AI
  • Audio Processing: Fal-AI, HF-Inference

For International Applications

  • Multilingual Support: Cohere, HF-Inference
  • Global Infrastructure: Together, Nebius

Performance Characteristics

Speed & Latency

  • Ultra-fast: Groq (sub-second responses)
  • Fast: Fireworks AI, Cerebras
  • Standard: HF-Inference, Together
  • Optimized: Featherless AI, NScale

Scalability

  • High Scale: NScale, SambaNova
  • Cloud-native: Nebius, Together
  • Flexible: HF-Inference, Replicate

Cost Efficiency

  • Budget-friendly: Featherless AI, Together
  • Performance-focused: Cerebras, Groq
  • Enterprise: Nebius, SambaNova

Getting Started with Providers

1. Choose Your Provider

Based on your use case and requirements from the capabilities above.

2. Check Provider Examples

Each provider has detailed examples in the examples/provider/ directory.

3. Test with Simple Examples

Use the quickstart guide to test basic functionality.

4. Implement Advanced Patterns

See the comprehensive integration guide for production patterns.

Provider-Specific Configuration

Environment Variables

# Set proxy URL (optional, defaults to localhost:8000)
export HF_PROXY_URL="http://your-proxy-server:8000"

# Provider-specific settings can be configured in your application

Error Handling

Each provider may have specific error patterns. HF-Inferoxy automatically handles:

  • Authentication errors (401)
  • Credit exhaustion (402)
  • Rate limiting (429)
  • Server errors (5xx)

Model Compatibility

  • Check provider documentation for supported models
  • Use provider-specific model names when required
  • HF-Inferoxy handles token management regardless of provider

Support and Resources

Provider Documentation

  • Each provider has comprehensive examples in examples/provider/
  • Check provider-specific documentation for advanced features
  • HF-Inferoxy provides unified token management across all providers

Getting Help

  1. Start with: Quick Start Guide
  2. Provider examples: Provider Examples
  3. Advanced usage: Comprehensive Guide
  4. Server setup: Main README

Community Resources

  • HuggingFace provider documentation
  • Provider-specific GitHub repositories
  • HF-Inferoxy issue tracker for token management issues

Conclusion

HF-Inferoxy provides unified token management across all supported HuggingFace providers, allowing you to:

  • Choose the best provider for your specific use case
  • Switch providers seamlessly without changing token management code
  • Leverage provider strengths while maintaining consistent error handling
  • Scale across providers with unified monitoring and reporting

This flexibility enables you to build robust AI applications that can adapt to changing requirements and leverage the best capabilities of each provider.