AI Inference Proxy

A lightweight FastAPI server that acts as a universal proxy to multiple AI Inference APIs (Hugging Face, Vercel AI Gateway) with smart API key rotation and provider-specific management.

License: AGPL v3 GitHub

Features

  • Multi-Provider Support: Universal proxy for Hugging Face Inference API and Vercel AI Gateway
  • Smart Key Rotation: Automatically rotates API keys to avoid rate limits
  • Silent Key Switching: Switches keys silently on invalid/monthly-credits-exceeded errors
  • Multi-Stage Quarantine System: Handles temporary 402 errors with progressive quarantine stages
  • Provider-Specific Key Management: Add, validate, and monitor API keys for each provider
  • Transparent Relay: Preserves all request/response data except auth headers
  • Role-Based Access Control (RBAC): Comprehensive user management and endpoint access control
  • API Key Authentication: Secure authentication using Bearer tokens
  • User Management: Create, update, delete, and manage users and roles
  • Audit Logging: All requests logged with user context when authenticated
  • Modular Architecture: Clean separation of provider-specific logic for easy extensibility

Quick Start

Prerequisites

  • Python 3.13+
  • uv package manager

Installation

  1. Clone the repository:
    git clone https://github.com/nazdridoy/ai-inferoxy.git
    cd ai-inferoxy
    
  2. Install dependencies:
    uv sync
    
  3. Configure (Optional):
    cp env.example .env
    # Edit .env with your preferred settings
    
  4. Run the server:
    uv run ai-inferoxy
    

The server will start on http://localhost:8000 and automatically create hf-keys.json and vc-keys.json if they don’t exist.

Running as a Systemd Service (Optional)

To run ai-inferoxy as a background service that starts automatically:

Option 1: User Service (for local development)

  1. Copy the service file to your user systemd directory:
    cp ai-inferoxy.service ~/.config/systemd/user/
    

Note: The service assumes your ai-inferoxy project is located at ~/ai-inferoxy. If your project is in a different location, edit the service file and change the WorkingDirectory path accordingly.

  1. Reload systemd and enable the service:
    systemctl --user daemon-reload
    systemctl --user enable ai-inferoxy
    
  2. Start the service:
    systemctl --user start ai-inferoxy
    

Useful commands:

  • Check status: systemctl --user status ai-inferoxy
  • View logs: journalctl --user -u ai-inferoxy -f
  • Stop service: systemctl --user stop ai-inferoxy
  • Disable service: systemctl --user disable ai-inferoxy

Option 2: System Service (for remote servers)

  1. Copy the system service file to the system directory:
    sudo cp ai-inferoxy-system.service /etc/systemd/system/ai-inferoxy.service
    
  2. Edit the service file to replace REPLACE_WITH_YOUR_USERNAME with your actual username:
    sudo nano /etc/systemd/system/ai-inferoxy.service
    
  3. Reload systemd and enable the service:
    sudo systemctl daemon-reload
    sudo systemctl enable ai-inferoxy
    
  4. Start the service:
    sudo systemctl start ai-inferoxy
    

Useful commands:

  • Check status: sudo systemctl status ai-inferoxy
  • View logs: sudo journalctl -u ai-inferoxy -f
  • Stop service: sudo systemctl stop ai-inferoxy
  • Disable service: sudo systemctl disable ai-inferoxy

Troubleshooting: If the service fails to start, check:

  1. Path issues: Ensure uv is in your PATH or edit the service file to use /usr/bin/python3 -m hf_inferoxy instead
  2. Working directory: Verify the WorkingDirectory path in the service file matches your actual project location
  3. Dependencies: Make sure you’ve run uv sync to install dependencies
  4. Logs: Check detailed logs with journalctl -u ai-inferoxy -xeu (system service) or journalctl --user -xeu ai-inferoxy.service (user service)

Configuration

Environment Variables

You can configure the server using environment variables or a .env file.

Option 1: Environment Variables

export PROXY_HOST=0.0.0.0
export PROXY_PORT=8000
export HF_KEYS_FILE=./hf-keys.json
export HF_BASE_URL=https://router.huggingface.co
export HF_WHOAMI_WAIT=2
export HF_WHOAMI_RETRY=3
export VC_KEYS_FILE=./vc-keys.json
export VC_BASE_URL=https://ai-gateway.vercel.sh
export VC_VALIDATE_WAIT=2
export VC_VALIDATE_RETRY=3
export MIN_CHECK_FREQUENCY=3600
export LOG_LEVEL=INFO

Option 2: .env File (Recommended)

  1. Copy the example environment file:
    cp env.example .env
    
  2. Edit .env with your preferred settings:
    # Server Configuration
    PROXY_HOST=0.0.0.0
    PROXY_PORT=8000
       
    # Hugging Face Configuration
    HF_KEYS_FILE=./hf-keys.json
    HF_BASE_URL=https://router.huggingface.co
    HF_WHOAMI_WAIT=2
    HF_WHOAMI_RETRY=3
       
    # Vercel AI Gateway Configuration
    VC_KEYS_FILE=./vc-keys.json
    VC_BASE_URL=https://ai-gateway.vercel.sh
    VC_VALIDATE_WAIT=2
    VC_VALIDATE_RETRY=3
       
    # Global Configuration
    MIN_CHECK_FREQUENCY=3600
    LOG_LEVEL=INFO
    

Available Environment Variables

Server Configuration:

  • PROXY_HOST - Host to bind to (default: 0.0.0.0)
  • PROXY_PORT - Port to bind to (default: 8000)

Hugging Face Configuration:

  • HF_KEYS_FILE - Path to HF API keys JSON file (default: ./hf-keys.json)
  • HF_BASE_URL - HF Inference base URL (default: https://router.huggingface.co)
  • HF_WHOAMI_WAIT - Wait time in seconds after rate limit (default: 2)
  • HF_WHOAMI_RETRY - Number of retries for whoami endpoint (default: 3)

Vercel AI Gateway Configuration:

  • VC_KEYS_FILE - Path to VC API keys JSON file (default: ./vc-keys.json)
  • VC_BASE_URL - Vercel AI Gateway base URL (default: https://ai-gateway.vercel.sh)
  • VC_VALIDATE_WAIT - Wait time in seconds for validation requests (default: 2)
  • VC_VALIDATE_RETRY - Number of retries for validation requests (default: 3)

Global Configuration:

  • MIN_CHECK_FREQUENCY - Minimum time between validity checks in seconds (default: 3600)
  • LOG_LEVEL - Logging level (default: INFO)
  • HF_LOGS_DIR - Directory for log files (default: ./logs)
  • LOG_ROTATION_WHEN - Log rotation frequency: M=minute, H=hourly, D=daily, M=monthly (default: D)
  • LOG_ROTATION_INTERVAL - Interval for rotation (default: 1)
  • LOG_BACKUP_COUNT - Number of backup files to keep (default: 7)
  • ENABLE_RELOAD - Enable file watching and auto-reload for development (default: false)

Logging Configuration

The server provides comprehensive logging with configurable rotation:

Log Rotation Options:

  • Daily Rotation (default): LOG_ROTATION_WHEN=D - Creates one log file per day
  • Hourly Rotation: LOG_ROTATION_WHEN=H - Creates one log file per hour
  • Minute Rotation: LOG_ROTATION_WHEN=M - Creates one log file per minute (for testing)
  • Monthly Rotation: LOG_ROTATION_WHEN=M with LOG_ROTATION_INTERVAL>1 - Creates one log file per month

Log Retention:

  • Daily: Keep 7 days of logs (default)
  • Hourly: Keep 168 hours (1 week) of logs
  • Minute: Keep 10 minutes of logs (for testing)

Log File Naming:

  • Daily: ai-inferoxy.log.2025-08-19, ai-inferoxy.log.2025-08-20
  • Hourly: ai-inferoxy.log.2025-08-19_14, ai-inferoxy.log.2025-08-19_15
  • Minute: ai-inferoxy.log.2025-08-19_14-30, ai-inferoxy.log.2025-08-19_14-31

Development Mode:

  • Set ENABLE_RELOAD=true for file watching and auto-reload
  • Note: When ENABLE_RELOAD=true, consider setting HF_LOGS_DIR to a directory outside the project to avoid file watching spam

API Keys File Structure

Hugging Face Keys (hf-keys.json):

{
  "keys": [
    {
      "key": "hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
      "valid": true,
      "last_validity_check": "2025-01-15T10:30:00+00:00",
      "user_info": {
        "name": "username",
        "email": "user@example.com",
        "isPro": false,
        "canPay": false
      },
      "quarantine_stage": "none",
      "quarantine_start_date": null
    }
  ],
  "rotation_strategy": "round_robin",
  "check_interval_days": 30
}

Vercel AI Gateway Keys (vc-keys.json):

{
  "keys": [
    {
      "key": "vck_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
      "valid": true,
      "last_validity_check": "2025-01-15T10:30:00+00:00",
      "user_info": {
        "name": "user@example.com",
        "email": "user@example.com",
        "isPro": false,
        "canPay": false
      },
      "quarantine_stage": "none",
      "quarantine_start_date": null
    }
  ],
  "rotation_strategy": "round_robin",
  "check_interval_days": 7
}

API Endpoints

Health & Info

  • GET / - Server information
  • GET /health - Health check
  • GET /docs - Interactive API documentation

Key Management (Hugging Face)

  • GET /check-validity/hf/{key} - Check if a specific API key is valid and update its status in configuration if found
  • POST /add-key/hf/{key} - Add new API key with automatic validation (path parameter)
  • POST /add-key/hf - Add multiple API keys with automatic validation (JSON body)
  • GET /keys/status/hf - Get status of all API keys
  • POST /keys/validate/hf - Validate keys that need validation (respects minimum check frequency)
  • POST /keys/refresh/hf - Force refresh all keys validity and user info (ignores minimum check frequency, removes duplicates)
  • POST /keys/cleanup/hf - Remove duplicate keys and invalid keys from configuration
  • POST /keys/reload/hf - Force reload keys from file

Key Management (Vercel AI Gateway)

  • GET /check-validity/vc/{key} - Check if a specific API key is valid and update its status in configuration if found
  • POST /add-key/vc/{key} - Add new API key with automatic validation (path parameter, supports key:email format)
  • POST /add-key/vc - Add multiple API keys with automatic validation (JSON body, supports key:email format)
  • GET /keys/status/vc - Get status of all API keys
  • POST /keys/validate/vc - Validate keys that need validation (respects minimum check frequency)
  • POST /keys/refresh/vc - Force refresh all keys validity and user info (ignores minimum check frequency, removes duplicates)
  • POST /keys/cleanup/vc - Remove duplicate keys and invalid keys from configuration
  • POST /keys/reload/vc - Force reload keys from file

Client API (Hugging Face)

  • GET /keys/provision/hf - Provision a valid API key for client usage
  • POST /keys/report/hf - Report token usage status (success/error) from clients (optional client_name to track end user; defaults to username)

Client API (Vercel AI Gateway)

  • GET /keys/provision/vc - Provision a valid API key for client usage
  • POST /keys/report/vc - Report token usage status (success/error) from clients (optional client_name to track end user; defaults to username)

Quarantine Management (Hugging Face)

  • GET /keys/quarantine/hf - Get quarantine status for all keys
  • POST /keys/quarantine/clear/hf/{key} - Manually clear quarantine for a specific key

Quarantine Management (Vercel AI Gateway)

  • GET /keys/quarantine/vc - Get quarantine status for all keys
  • POST /keys/quarantine/clear/vc/{key} - Manually clear quarantine for a specific key

Universal Proxy (Hugging Face)

  • ANY /hf/{path:path} - Proxy to HF Inference API (role-based access)

Universal Proxy (Vercel AI Gateway)

  • ANY /vc/{path:path} - Proxy to Vercel AI Gateway (role-based access)

Admin Endpoints (Admin Role Required)

User Management

  • GET /admin/users - List all users
  • POST /admin/users - Create new user
  • PUT /admin/users/{username} - Update user
  • DELETE /admin/users/{username} - Delete user
  • POST /admin/users/{username}/generate-key - Regenerate API key

Role Management

  • GET /admin/roles - List all roles
  • POST /admin/roles - Create new role
  • PUT /admin/roles/{role_name} - Update role
  • DELETE /admin/roles/{role_name} - Delete role

Authentication & RBAC

The AI Inference Proxy now includes a comprehensive Role-Based Access Control (RBAC) system. Most endpoints require authentication via API keys in the Authorization header.

πŸ“š For detailed RBAC documentation, Pydantic models, testing examples, and troubleshooting, see RBAC_README.md.

Default Admin User

The system creates a default admin user on first run. You’ll need this to access admin endpoints:

{
  "username": "admin",
  "api_key": "ndr_YWRtaW4_abc123def456ghi789jkl012mno345pqr678stu901vwx234yz5678901234567890"
}

⚠️ IMPORTANT: Use this API key to access admin endpoints and create additional users. The actual API key will be different in your installation.

Authentication Header

All protected endpoints require the Authorization header:

curl -H "Authorization: Bearer YOUR_API_KEY" http://localhost:8000/admin/users

Usage Examples

Quick Reference

Category Examples Auth Required
Public Health, Docs, Server Info ❌ No
Key Management Add, Validate, Refresh, Cleanup βœ… Manager/Admin
Client API Provision, Report βœ… User Role
Universal Proxy HF Inference API βœ… User Role+
Admin Users, Roles Management βœ… Admin Only

Public Endpoints (No Authentication Required)

1. Check Server Health

curl http://localhost:8000/health

2. View API Documentation

curl http://localhost:8000/docs

3. Get Server Information

curl http://localhost:8000/

Key Management (Manager/Admin Role Required)

4. Add Single API Key (HF)

curl -X POST "http://localhost:8000/add-key/hf/hf_your_api_key_here" \
  -H "Authorization: Bearer $ADMIN_API_KEY"

5. Add Multiple API Keys (HF)

curl -X POST "http://localhost:8000/add-key/hf" \
  -H "Authorization: Bearer $ADMIN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"keys": ["hf_key_1", "hf_key_2", "hf_key_3"]}'

Bulk Response Format:

{
  "success": true,
  "message": "Processed 3 keys: 2 successful, 1 failed",
  "results": [
    {
      "key": "hf_key_1...",
      "success": true,
      "message": "Key added successfully"
    },
    {
      "key": "hf_key_2...",
      "success": false,
      "message": "Key already exists"
    }
  ],
  "summary": {
    "total": 3,
    "successful": 2,
    "failed": 1
  }
}

6. Check Key Validity (HF)

curl -H "Authorization: Bearer $ADMIN_API_KEY" \
  "http://localhost:8000/check-validity/hf/hf_your_api_key_here"

7. Get All Keys Status (HF)

curl -H "Authorization: Bearer $ADMIN_API_KEY" \
  "http://localhost:8000/keys/status/hf"

8. Validate Keys (HF) (Respects Rate Limits)

curl -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
  "http://localhost:8000/keys/validate/hf"

9. Force Refresh All Keys (HF)

curl -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
  "http://localhost:8000/keys/refresh/hf"

10. Clean Up Keys (HF) (Remove Duplicates and Invalid Keys)

curl -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
  "http://localhost:8000/keys/cleanup/hf"

11. Reload Keys from File (HF)

curl -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
  "http://localhost:8000/keys/reload/hf"

12. Check Quarantine Status (HF)

curl -H "Authorization: Bearer $ADMIN_API_KEY" \
  "http://localhost:8000/keys/quarantine/hf"

13. Clear Quarantine for a Key (HF)

curl -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
  "http://localhost:8000/keys/quarantine/clear/hf/hf_your_api_key_here"

Client API (User Role Required)

14. Get API Key for Client Usage (HF)

curl -H "Authorization: Bearer $USER_API_KEY" \
  "http://localhost:8000/keys/provision/hf"

Response Example:

{
  "token": "hf_your_api_key_here",
  "token_id": "hf_your_ap",
  "expires_in": 3600
}

15. Report Token Usage Status (HF)

# Report successful usage (optionally include client_name/end user identifier):
curl -X POST -H "Authorization: Bearer $USER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"token_id": "hf_your_ap", "status": "success", "client_name": "acme_customer_42"}' \
  "http://localhost:8000/keys/report/hf"

# Report error:
curl -X POST -H "Authorization: Bearer $USER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"token_id": "hf_your_ap", "status": "error", "error": "402 Client Error: Payment Required", "client_name": "acme_customer_42"}' \
  "http://localhost:8000/keys/report/hf"

Universal Proxy (Role-Based Access)

16. Use as Proxy for HF Inference API

# Instead of calling HF directly:
curl -X POST "https://router.huggingface.co/v1/chat/completions" \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "Hello"}], "model": "zai-org/GLM-4.5:novita"}'

# Call through the proxy (requires user role or higher):
curl -X POST "http://localhost:8000/hf/v1/chat/completions" \
  -H "Authorization: Bearer $USER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "Hello"}], "model": "zai-org/GLM-4.5:novita"}'

17. Use as Proxy for Vercel AI Gateway

# Instead of calling Vercel AI Gateway directly:
curl -X POST "https://ai-gateway.vercel.sh/v1/chat/completions" \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "Hello"}], "model": "openai/gpt-4o-mini"}'

# Call through the proxy (requires user role or higher):
curl -X POST "http://localhost:8000/vc/v1/chat/completions" \
  -H "Authorization: Bearer $USER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "Hello"}], "model": "openai/gpt-4o-mini"}'

18. Add Vercel AI Gateway Keys (with email support)

# Add single key with email:
curl -X POST "http://localhost:8000/add-key/vc/vck_ABC123:user@example.com" \
  -H "Authorization: Bearer $ADMIN_API_KEY"

# Add multiple keys with emails:
curl -X POST "http://localhost:8000/add-key/vc" \
  -H "Authorization: Bearer $ADMIN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "keys": [
      "vck_ABC123:alice@example.com",
      "vck_DEF456:bob@example.com"
    ]
  }'

Bulk Key Operations

19. Add Multiple Keys from File

If you have your keys in a text file (one key per line), you can easily add them all:

File format (keys.txt):

hf_key_1
hf_key_2
hf_key_3

Using jq (recommended):

curl -X POST "http://localhost:8000/add-key/hf" \
  -H "Authorization: Bearer $ADMIN_API_KEY" \
  -H "Content-Type: application/json" \
  -d "{\"keys\": $(cat keys.txt | jq -R -s -c 'split("\n") | map(select(length > 0))')}"

Alternative method (without jq):

keys=$(cat keys.txt | tr '\n' ',' | sed 's/,$//' | sed 's/^/"/;s/,/","/g;s/$/"/')
curl -X POST "http://localhost:8000/add-key/hf" \
  -H "Authorization: Bearer $ADMIN_API_KEY" \
  -H "Content-Type: application/json" \
  -d "{\"keys\": [$keys]}"

Security Note: Remember to add keys.txt to your .gitignore to prevent accidentally committing your API keys to version control.

Admin Endpoints Examples

20. User Management (Admin Role Required)

List All Users

curl -H "Authorization: Bearer $ADMIN_API_KEY" \
  "http://localhost:8000/admin/users"

Create New User

curl -X POST "http://localhost:8000/admin/users" \
  -H "Authorization: Bearer $ADMIN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "username": "newuser",
    "email": "user@example.com",
    "full_name": "New User",
    "role": "user"
  }'

Response Example:

{
  "success": true,
  "message": "User newuser created successfully",
  "username": "newuser",
  "api_key": "generated_api_key_here..."
}

Update User

curl -X PUT "http://localhost:8000/admin/users/newuser" \
  -H "Authorization: Bearer $ADMIN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "email": "updated@example.com",
    "full_name": "Updated User Name"
  }'

Delete User

curl -X DELETE "http://localhost:8000/admin/users/newuser" \
  -H "Authorization: Bearer $ADMIN_API_KEY"

Regenerate API Key

curl -X POST "http://localhost:8000/admin/users/newuser/generate-key" \
  -H "Authorization: Bearer $ADMIN_API_KEY"

21. Role Management (Admin Role Required)

List All Roles

curl -H "Authorization: Bearer $ADMIN_API_KEY" \
  "http://localhost:8000/admin/roles"

Create New Role

curl -X POST "http://localhost:8000/admin/roles" \
  -H "Authorization: Bearer $ADMIN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "customrole",
    "description": "Custom role for specific access",
    "endpoints": ["/health", "/docs", "/custom/*"],
    "is_default": false
  }'

Update Role

curl -X PUT "http://localhost:8000/admin/roles/customrole" \
  -H "Authorization: Bearer $ADMIN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "description": "Updated role description",
    "endpoints": ["/health", "/docs", "/custom/*", "/new/*"]
  }'

Delete Role

curl -X DELETE "http://localhost:8000/admin/roles/customrole" \
  -H "Authorization: Bearer $ADMIN_API_KEY"

Roles and Permissions

The RBAC system defines four user roles with different access levels:

Admin Role

  • Access: All endpoints (*)
  • Description: Full system access
  • Use Case: System administrators

Manager Role

  • Access: Key management + Universal proxy
  • Endpoints:
    • /check-validity/*
    • /add-key/*
    • /keys/*
    • /hf/{path:path}, /vc/{path:path} (Universal proxy)
  • Use Case: Key managers and system operators

User Role

  • Access: Standard user access
  • Endpoints:
    • /health, /docs, /
    • /hf/{path:path}, /vc/{path:path} (Universal proxy)
    • /keys/provision/hf, /keys/report/hf
    • /keys/provision/vc, /keys/report/vc
  • Use Case: Regular users accessing AI Inference APIs

Guest Role (Default)

  • Access: Public endpoints only
  • Endpoints: /health, /docs, /
  • Use Case: Basic access with authentication tracking

Security Features

  • API Key Generation: 64+ character cryptographically secure random strings
  • Role-Based Access: Endpoint access controlled by user roles
  • Universal Proxy Protection: Universal proxy endpoint respects role permissions
  • Audit Logging: All requests logged with user context
  • No Password Storage: Only API keys, no password management
  • Input Validation: Comprehensive validation of all request data using Pydantic models
  • JSON-Based API: Modern REST API design with proper content-type validation

How It Works

  1. Server Startup: Loads keys from provider-specific files without validation (to avoid rate limits)
  2. Request Reception: Server receives requests to any AI Inference endpoint (HF or VC)
  3. Provider Detection: Routes requests to appropriate provider based on URL prefix (/hf/ or /vc/)
  4. Key Selection: Smart algorithm selects a valid, non-quarantined API key for the provider
  5. Request Forwarding: Forwards request to the appropriate upstream API with selected key
  6. Response Handling:
    • If successful: Returns response as-is
    • If key error: Silently switches to next valid key and retries
    • If 402 error: Handles with quarantine system and switches to next key
    • If no valid keys: Passes error transparently to user
  7. Transparent Relay: All responses (success, error, streaming) are relayed unchanged
  8. Rate Limiting: Respects provider rate limits with configurable wait times and retries
  9. Quarantine System: Handles temporary 402 errors with progressive quarantine stages
  10. Provider-Specific Logic: Each provider has its own validation, error handling, and key management

Error Handling

The proxy detects specific error responses and handles them automatically for each provider:

Hugging Face Error Handling

  • Invalid Key (401 Unauthorized):
    • Detects both HTTP 401 status codes and JSON error responses
    • Automatically marks the key as invalid in the configuration
    • Switches to the next valid key and retries
  • 402 Payment Required: {"error":"You have exceeded your monthly included credits for Inference Providers. Subscribe to PRO to get 20x more monthly included credits."}
    • Handles with quarantine system and switches to the next key

Vercel AI Gateway Error Handling

  • Invalid Key (401 Authentication Error):
    • Detects authentication errors from VC Gateway
    • Automatically marks the key as invalid in the configuration
    • Switches to the next valid key and retries
  • 402 Insufficient Funds: {"error":{"message":"Insufficient funds. Please add credits to your account to continue using AI services.","type":"insufficient_funds"}}
    • Handles with quarantine system and switches to the next key
  • 404 Model Not Found: Treated as valid key (expected for validation requests)

When these errors occur, the proxy handles them with the quarantine system and switches to the next valid key.

Rate Limiting

The server handles rate limiting gracefully for each provider:

Hugging Face Rate Limiting

  • IP Rate Limiting: When your IP is rate limited (429), the system stops processing all HF keys
  • Retry Logic: Configurable retry attempts (HF_WHOAMI_RETRY, default: 3)
  • Wait Time: Configurable wait time between retries (HF_WHOAMI_WAIT, default: 2s)
  • Graceful Shutdown: When rate limited after max retries, stops processing all HF keys
  • Resume Capability: Uses last_validity_check timestamps to resume validation later

Vercel AI Gateway Rate Limiting

  • IP Rate Limiting: When your IP is rate limited (429), the system stops processing all VC keys
  • Retry Logic: Configurable retry attempts (VC_VALIDATE_RETRY, default: 3)
  • Wait Time: Configurable wait time between retries (VC_VALIDATE_WAIT, default: 2s)
  • Graceful Shutdown: When rate limited after max retries, stops processing all VC keys
  • Resume Capability: Uses last_validity_check timestamps to resume validation later

Note: Keys themselves don’t have rate limits - only your IP does. When rate limited, all keys for that provider are affected equally.

Quarantine System

The proxy implements a multi-stage quarantine system to handle temporary 402 errors gracefully for both providers. This addresses the issue where providers sometimes return false 402 errors that resolve themselves after a few minutes to hours.

Quarantine Stages

  1. Stage 1 (Soft Disable): 30 minutes
    • Triggered on first 402 error
    • Key is temporarily unavailable for 30 minutes
  2. Stage 2 (Hour Block): 1 hour
    • Triggered if Stage 1 expires and key still returns 402
    • Key is unavailable for 1 hour
  3. Stage 3 (Day Block): 1 day
    • Triggered if Stage 2 expires and key still returns 402
    • Key is unavailable for 1 day
  4. Stage 4 (Week Block): 1 week
    • Triggered if Stage 3 expires and key still returns 402
    • Key is unavailable for 1 week
  5. Stage 5 (Extended): 30 days
    • Triggered if Stage 4 expires and key still returns 402
    • Key is quarantined for 30 days, then becomes available again

Quarantine Logic

  • Progressive Advancement: Keys advance through stages when quarantine expires and a new 402 error occurs
  • Time-Based Stages: Each stage has a fixed duration, and advancement happens when that duration expires
  • Stage 5 Reset: After Stage 5 expires, if a 402 error occurs, the quarantine restarts from Stage 1 (prevents long waiting periods)
  • Automatic Clearing: When a key successfully processes a request, its quarantine is automatically cleared
  • Manual Clearing: Admins can manually clear quarantine using the API endpoint

Quarantine Status Response

{
  "quarantine": [
    {
      "key": "hf_key_123...",
      "stage": "stage_1",
      "start_date": "2025-01-15T10:30:00+00:00",
      "is_active": true,
      "remaining_seconds": 1200
    }
  ]
}

Benefits

  • Prevents False Positives: Keys aren’t permanently disabled on temporary 402 errors
  • Progressive Response: Longer quarantine periods for persistently problematic keys, with automatic recovery
  • Automatic Recovery: Keys automatically return to service when they work again
  • Transparent Operation: No impact on normal request processing
  • Provider Agnostic: Works consistently across HF and VC providers

Development

Running in Development Mode

uv run ai-inferoxy

Viewing Logs

The server provides comprehensive logging with both console and file output:

Console Logging:

  • Real-time logs displayed in the terminal
  • Perfect for development and debugging

File Logging:

  • Automatic log rotation based on configuration
  • Logs stored in ./logs/ directory (configurable via HF_LOGS_DIR)
  • Daily rotation by default (configurable to hourly, minute, or monthly)

Log Content Includes:

  • Incoming requests
  • Key selection and rotation
  • Proxy forwarding details
  • Response status codes
  • Error responses and retry attempts
  • Rate limiting events
  • Monthly credits exceeded events
  • Log rotation events

Project Structure

ai-inferoxy/
β”œβ”€β”€ ai_inferoxy/           # Main package directory
β”‚   β”œβ”€β”€ __init__.py        # Package initialization
β”‚   β”œβ”€β”€ server.py          # FastAPI application entry point
β”‚   β”œβ”€β”€ schemas.py         # Pydantic models and request/response schemas
β”‚   β”œβ”€β”€ app_state.py       # Application state management
β”‚   β”œβ”€β”€ user_manager.py    # User and role management (RBAC)
β”‚   β”œβ”€β”€ rbac_manager.py    # Role-based access control logic
β”‚   β”œβ”€β”€ auth_middleware.py # Authentication and authorization middleware
β”‚   β”œβ”€β”€ key_manager/       # Modular key management
β”‚   β”‚   β”œβ”€β”€ __init__.py    # Package initialization
β”‚   β”‚   β”œβ”€β”€ common.py      # Common key manager base class
β”‚   β”‚   β”œβ”€β”€ hf.py          # Hugging Face key manager
β”‚   β”‚   β”œβ”€β”€ vc.py          # Vercel AI Gateway key manager
β”‚   β”‚   └── cf.py          # Cloudflare key manager (stub)
β”‚   β”œβ”€β”€ proxy/             # Modular proxy implementation
β”‚   β”‚   β”œβ”€β”€ __init__.py    # Package initialization
β”‚   β”‚   β”œβ”€β”€ common.py      # Common proxy base class
β”‚   β”‚   β”œβ”€β”€ hf.py          # Hugging Face proxy
β”‚   β”‚   β”œβ”€β”€ vc.py          # Vercel AI Gateway proxy
β”‚   β”‚   └── cf.py          # Cloudflare proxy (stub)
β”‚   β”œβ”€β”€ routers/           # FastAPI route modules
β”‚   β”‚   β”œβ”€β”€ health.py      # Health and info endpoints
β”‚   β”‚   β”œβ”€β”€ keys_hf.py     # HF key management endpoints
β”‚   β”‚   β”œβ”€β”€ keys_vc.py     # VC key management endpoints
β”‚   β”‚   β”œβ”€β”€ proxy_hf.py    # HF proxy endpoints
β”‚   β”‚   β”œβ”€β”€ proxy_vc.py    # VC proxy endpoints
β”‚   β”‚   └── admin.py       # Admin endpoints
β”‚   └── __main__.py        # CLI entry point
β”œβ”€β”€ users.json             # User and role definitions (auto-created)
β”œβ”€β”€ hf-keys.json          # HF API keys (auto-created if missing)
β”œβ”€β”€ vc-keys.json          # VC API keys (auto-created if missing)
β”œβ”€β”€ env.example            # Example environment configuration
β”œβ”€β”€ .env                   # Environment configuration (create from env.example)
β”œβ”€β”€ pyproject.toml         # Python project configuration
β”œβ”€β”€ README.md             # Documentation
β”œβ”€β”€ LICENSE               # AGPL v3.0 license
β”œβ”€β”€ .gitignore            # Git ignore file
└── logs/                 # Log files directory (auto-created, gitignored)

Configuration

The server automatically creates key files with default structures if they don’t exist:

HF Keys (hf-keys.json):

{
  "keys": [],
  "rotation_strategy": "round_robin",
  "check_interval_days": 30
}

VC Keys (vc-keys.json):

{
  "keys": [],
  "rotation_strategy": "round_robin",
  "check_interval_days": 7
}

Note: The hf-keys.json, vc-keys.json, and .env files are excluded from version control for security reasons.

License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0) - see the LICENSE file for details.

The AGPL-3.0 license requires that if you run a modified version of this software on a server and provide services to users over a network, you must make the source code available to those users. This ensures that improvements to the software remain open source.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Development Setup

  1. Fork the repository
  2. Clone your fork: git clone https://github.com/yourusername/ai-inferoxy.git
  3. Create a feature branch: git checkout -b feature/amazing-feature
  4. Install dependencies: uv sync
  5. Make your changes and test them
  6. Commit your changes: git commit -m 'Add some amazing feature'
  7. Push to the branch: git push origin feature/amazing-feature
  8. Open a Pull Request

Code Style

  • Follow PEP 8 for Python code style
  • Add type hints where appropriate
  • Include docstrings for functions and classes
  • Write tests for new features

Support

If you encounter any issues or have questions, please:

  1. Check the Issues page for existing solutions
  2. Create a new issue with detailed information about your problem
  3. Include your environment details and any error messages

Acknowledgments

  • Built with FastAPI for high-performance API development
  • Uses uv for fast Python package management