AI Inference Proxy

A lightweight FastAPI server that acts as a universal proxy to multiple AI Inference APIs (Hugging Face, Vercel AI Gateway) with smart API key rotation and provider-specific management.

Features

Multi-Provider Support: Universal proxy for Hugging Face Inference API and Vercel AI Gateway
Smart Key Rotation: Automatically rotates API keys to avoid rate limits
Silent Key Switching: Switches keys silently on invalid/monthly-credits-exceeded errors
Multi-Stage Quarantine System: Handles temporary 402 errors with progressive quarantine stages
Provider-Specific Key Management: Add, validate, and monitor API keys for each provider
Transparent Relay: Preserves all request/response data except auth headers
Role-Based Access Control (RBAC): Comprehensive user management and endpoint access control
API Key Authentication: Secure authentication using Bearer tokens
User Management: Create, update, delete, and manage users and roles
Audit Logging: All requests logged with user context when authenticated
Modular Architecture: Clean separation of provider-specific logic for easy extensibility

Quick Start

Prerequisites

Python 3.13+
uv package manager

Installation

Clone the repository:

git clone https://github.com/nazdridoy/ai-inferoxy.git
cd ai-inferoxy

Install dependencies:
```
uv sync
```

Configure (Optional):

cp env.example .env
# Edit .env with your preferred settings

Run the server:
```
uv run ai-inferoxy
```

The server will start on http://localhost:8000 and automatically create hf-keys.json and vc-keys.json if they don’t exist.

Running as a Systemd Service (Optional)

To run ai-inferoxy as a background service that starts automatically:

Option 1: User Service (for local development)

Copy the service file to your user systemd directory:
```
cp ai-inferoxy.service ~/.config/systemd/user/
```

Note: The service assumes your ai-inferoxy project is located at ~/ai-inferoxy. If your project is in a different location, edit the service file and change the WorkingDirectory path accordingly.

Reload systemd and enable the service:

systemctl --user daemon-reload
systemctl --user enable ai-inferoxy

Start the service:
```
systemctl --user start ai-inferoxy
```

Useful commands:

Check status: systemctl --user status ai-inferoxy
View logs: journalctl --user -u ai-inferoxy -f
Stop service: systemctl --user stop ai-inferoxy
Disable service: systemctl --user disable ai-inferoxy

Option 2: System Service (for remote servers)

Copy the system service file to the system directory:

sudo cp ai-inferoxy-system.service /etc/systemd/system/ai-inferoxy.service

Edit the service file to replace REPLACE_WITH_YOUR_USERNAME with your actual username:
```
sudo nano /etc/systemd/system/ai-inferoxy.service
```

Reload systemd and enable the service:

sudo systemctl daemon-reload
sudo systemctl enable ai-inferoxy

Start the service:
```
sudo systemctl start ai-inferoxy
```

Useful commands:

Check status: sudo systemctl status ai-inferoxy
View logs: sudo journalctl -u ai-inferoxy -f
Stop service: sudo systemctl stop ai-inferoxy
Disable service: sudo systemctl disable ai-inferoxy

Troubleshooting: If the service fails to start, check:

Path issues: Ensure uv is in your PATH or edit the service file to use /usr/bin/python3 -m hf_inferoxy instead
Working directory: Verify the WorkingDirectory path in the service file matches your actual project location
Dependencies: Make sure you’ve run uv sync to install dependencies
Logs: Check detailed logs with journalctl -u ai-inferoxy -xeu (system service) or journalctl --user -xeu ai-inferoxy.service (user service)

Configuration

Environment Variables

You can configure the server using environment variables or a .env file.

Option 1: Environment Variables

export PROXY_HOST=0.0.0.0
export PROXY_PORT=8000
export HF_KEYS_FILE=./hf-keys.json
export HF_BASE_URL=https://router.huggingface.co
export HF_WHOAMI_WAIT=2
export HF_WHOAMI_RETRY=3
export VC_KEYS_FILE=./vc-keys.json
export VC_BASE_URL=https://ai-gateway.vercel.sh
export VC_VALIDATE_WAIT=2
export VC_VALIDATE_RETRY=3
export MIN_CHECK_FREQUENCY=3600
export LOG_LEVEL=INFO

Option 2: .env File (Recommended)

Copy the example environment file:
```
cp env.example .env
```

Edit .env with your preferred settings:

# Server Configuration
PROXY_HOST=0.0.0.0
PROXY_PORT=8000
   
# Hugging Face Configuration
HF_KEYS_FILE=./hf-keys.json
HF_BASE_URL=https://router.huggingface.co
HF_WHOAMI_WAIT=2
HF_WHOAMI_RETRY=3
   
# Vercel AI Gateway Configuration
VC_KEYS_FILE=./vc-keys.json
VC_BASE_URL=https://ai-gateway.vercel.sh
VC_VALIDATE_WAIT=2
VC_VALIDATE_RETRY=3
   
# Global Configuration
MIN_CHECK_FREQUENCY=3600
LOG_LEVEL=INFO

Available Environment Variables

Server Configuration:

PROXY_HOST - Host to bind to (default: 0.0.0.0)
PROXY_PORT - Port to bind to (default: 8000)

Hugging Face Configuration:

HF_KEYS_FILE - Path to HF API keys JSON file (default: ./hf-keys.json)
HF_BASE_URL - HF Inference base URL (default: https://router.huggingface.co)
HF_WHOAMI_WAIT - Wait time in seconds after rate limit (default: 2)
HF_WHOAMI_RETRY - Number of retries for whoami endpoint (default: 3)

Vercel AI Gateway Configuration:

VC_KEYS_FILE - Path to VC API keys JSON file (default: ./vc-keys.json)
VC_BASE_URL - Vercel AI Gateway base URL (default: https://ai-gateway.vercel.sh)
VC_VALIDATE_WAIT - Wait time in seconds for validation requests (default: 2)
VC_VALIDATE_RETRY - Number of retries for validation requests (default: 3)

Global Configuration:

MIN_CHECK_FREQUENCY - Minimum time between validity checks in seconds (default: 3600)
LOG_LEVEL - Logging level (default: INFO)
HF_LOGS_DIR - Directory for log files (default: ./logs)
LOG_ROTATION_WHEN - Log rotation frequency: M=minute, H=hourly, D=daily, M=monthly (default: D)
LOG_ROTATION_INTERVAL - Interval for rotation (default: 1)
LOG_BACKUP_COUNT - Number of backup files to keep (default: 7)
ENABLE_RELOAD - Enable file watching and auto-reload for development (default: false)

Logging Configuration

The server provides comprehensive logging with configurable rotation:

Log Rotation Options:

Daily Rotation (default): LOG_ROTATION_WHEN=D - Creates one log file per day
Hourly Rotation: LOG_ROTATION_WHEN=H - Creates one log file per hour
Minute Rotation: LOG_ROTATION_WHEN=M - Creates one log file per minute (for testing)
Monthly Rotation: LOG_ROTATION_WHEN=M with LOG_ROTATION_INTERVAL>1 - Creates one log file per month

Log Retention:

Daily: Keep 7 days of logs (default)
Hourly: Keep 168 hours (1 week) of logs
Minute: Keep 10 minutes of logs (for testing)

Log File Naming:

Daily: ai-inferoxy.log.2025-08-19, ai-inferoxy.log.2025-08-20
Hourly: ai-inferoxy.log.2025-08-19_14, ai-inferoxy.log.2025-08-19_15
Minute: ai-inferoxy.log.2025-08-19_14-30, ai-inferoxy.log.2025-08-19_14-31

Development Mode:

Set ENABLE_RELOAD=true for file watching and auto-reload
Note: When ENABLE_RELOAD=true, consider setting HF_LOGS_DIR to a directory outside the project to avoid file watching spam

API Keys File Structure

Hugging Face Keys (hf-keys.json):

{
  "keys": [
    {
      "key": "hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
      "valid": true,
      "last_validity_check": "2025-01-15T10:30:00+00:00",
      "user_info": {
        "name": "username",
        "email": "user@example.com",
        "isPro": false,
        "canPay": false
      },
      "quarantine_stage": "none",
      "quarantine_start_date": null
    }
  ],
  "rotation_strategy": "round_robin",
  "check_interval_days": 30
}

Vercel AI Gateway Keys (vc-keys.json):

{
  "keys": [
    {
      "key": "vck_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
      "valid": true,
      "last_validity_check": "2025-01-15T10:30:00+00:00",
      "user_info": {
        "name": "user@example.com",
        "email": "user@example.com",
        "isPro": false,
        "canPay": false
      },
      "quarantine_stage": "none",
      "quarantine_start_date": null
    }
  ],
  "rotation_strategy": "round_robin",
  "check_interval_days": 7
}

API Endpoints

Health & Info

GET / - Server information
GET /health - Health check
GET /docs - Interactive API documentation

Key Management (Hugging Face)

GET /check-validity/hf/{key} - Check if a specific API key is valid and update its status in configuration if found
POST /add-key/hf/{key} - Add new API key with automatic validation (path parameter)
POST /add-key/hf - Add multiple API keys with automatic validation (JSON body)
GET /keys/status/hf - Get status of all API keys
POST /keys/validate/hf - Validate keys that need validation (respects minimum check frequency)
POST /keys/refresh/hf - Force refresh all keys validity and user info (ignores minimum check frequency, removes duplicates)
POST /keys/cleanup/hf - Remove duplicate keys and invalid keys from configuration
POST /keys/reload/hf - Force reload keys from file

Key Management (Vercel AI Gateway)

GET /check-validity/vc/{key} - Check if a specific API key is valid and update its status in configuration if found
POST /add-key/vc/{key} - Add new API key with automatic validation (path parameter, supports key:email format)
POST /add-key/vc - Add multiple API keys with automatic validation (JSON body, supports key:email format)
GET /keys/status/vc - Get status of all API keys
POST /keys/validate/vc - Validate keys that need validation (respects minimum check frequency)
POST /keys/refresh/vc - Force refresh all keys validity and user info (ignores minimum check frequency, removes duplicates)
POST /keys/cleanup/vc - Remove duplicate keys and invalid keys from configuration
POST /keys/reload/vc - Force reload keys from file

Client API (Hugging Face)

GET /keys/provision/hf - Provision a valid API key for client usage
POST /keys/report/hf - Report token usage status (success/error) from clients (optional client_name to track end user; defaults to username)

Client API (Vercel AI Gateway)

GET /keys/provision/vc - Provision a valid API key for client usage
POST /keys/report/vc - Report token usage status (success/error) from clients (optional client_name to track end user; defaults to username)

Quarantine Management (Hugging Face)

GET /keys/quarantine/hf - Get quarantine status for all keys
POST /keys/quarantine/clear/hf/{key} - Manually clear quarantine for a specific key

Quarantine Management (Vercel AI Gateway)

GET /keys/quarantine/vc - Get quarantine status for all keys
POST /keys/quarantine/clear/vc/{key} - Manually clear quarantine for a specific key

Universal Proxy (Hugging Face)

ANY /hf/{path:path} - Proxy to HF Inference API (role-based access)

Universal Proxy (Vercel AI Gateway)

ANY /vc/{path:path} - Proxy to Vercel AI Gateway (role-based access)

Admin Endpoints (Admin Role Required)

User Management

GET /admin/users - List all users
POST /admin/users - Create new user
PUT /admin/users/{username} - Update user
DELETE /admin/users/{username} - Delete user
POST /admin/users/{username}/generate-key - Regenerate API key

Role Management

GET /admin/roles - List all roles
POST /admin/roles - Create new role
PUT /admin/roles/{role_name} - Update role
DELETE /admin/roles/{role_name} - Delete role

Authentication & RBAC

The AI Inference Proxy now includes a comprehensive Role-Based Access Control (RBAC) system. Most endpoints require authentication via API keys in the Authorization header.

📚 For detailed RBAC documentation, Pydantic models, testing examples, and troubleshooting, see RBAC_README.md.

Default Admin User

The system creates a default admin user on first run. You’ll need this to access admin endpoints:

{
  "username": "admin",
  "api_key": "ndr_YWRtaW4_abc123def456ghi789jkl012mno345pqr678stu901vwx234yz5678901234567890"
}

⚠️ IMPORTANT: Use this API key to access admin endpoints and create additional users. The actual API key will be different in your installation.

Authentication Header

All protected endpoints require the Authorization header:

curl -H "Authorization: Bearer YOUR_API_KEY" http://localhost:8000/admin/users

Usage Examples

Quick Reference

Category	Examples	Auth Required
Public	Health, Docs, Server Info	❌ No
Key Management	Add, Validate, Refresh, Cleanup	✅ Manager/Admin
Client API	Provision, Report	✅ User Role
Universal Proxy	HF Inference API	✅ User Role+
Admin	Users, Roles Management	✅ Admin Only

Public Endpoints (No Authentication Required)

1. Check Server Health

curl http://localhost:8000/health

2. View API Documentation

curl http://localhost:8000/docs

3. Get Server Information

curl http://localhost:8000/

Key Management (Manager/Admin Role Required)

4. Add Single API Key (HF)

curl -X POST "http://localhost:8000/add-key/hf/hf_your_api_key_here" \
  -H "Authorization: Bearer $ADMIN_API_KEY"

5. Add Multiple API Keys (HF)

curl -X POST "http://localhost:8000/add-key/hf" \
  -H "Authorization: Bearer $ADMIN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"keys": ["hf_key_1", "hf_key_2", "hf_key_3"]}'

Bulk Response Format:

{
  "success": true,
  "message": "Processed 3 keys: 2 successful, 1 failed",
  "results": [
    {
      "key": "hf_key_1...",
      "success": true,
      "message": "Key added successfully"
    },
    {
      "key": "hf_key_2...",
      "success": false,
      "message": "Key already exists"
    }
  ],
  "summary": {
    "total": 3,
    "successful": 2,
    "failed": 1
  }
}

6. Check Key Validity (HF)

curl -H "Authorization: Bearer $ADMIN_API_KEY" \
  "http://localhost:8000/check-validity/hf/hf_your_api_key_here"

7. Get All Keys Status (HF)

curl -H "Authorization: Bearer $ADMIN_API_KEY" \
  "http://localhost:8000/keys/status/hf"

8. Validate Keys (HF) (Respects Rate Limits)

curl -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
  "http://localhost:8000/keys/validate/hf"

9. Force Refresh All Keys (HF)

curl -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
  "http://localhost:8000/keys/refresh/hf"

10. Clean Up Keys (HF) (Remove Duplicates and Invalid Keys)

curl -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
  "http://localhost:8000/keys/cleanup/hf"

11. Reload Keys from File (HF)

curl -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
  "http://localhost:8000/keys/reload/hf"

12. Check Quarantine Status (HF)

curl -H "Authorization: Bearer $ADMIN_API_KEY" \
  "http://localhost:8000/keys/quarantine/hf"

13. Clear Quarantine for a Key (HF)

curl -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
  "http://localhost:8000/keys/quarantine/clear/hf/hf_your_api_key_here"

Client API (User Role Required)

14. Get API Key for Client Usage (HF)

curl -H "Authorization: Bearer $USER_API_KEY" \
  "http://localhost:8000/keys/provision/hf"

Response Example:

{
  "token": "hf_your_api_key_here",
  "token_id": "hf_your_ap",
  "expires_in": 3600
}

15. Report Token Usage Status (HF)

# Report successful usage (optionally include client_name/end user identifier):
curl -X POST -H "Authorization: Bearer $USER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"token_id": "hf_your_ap", "status": "success", "client_name": "acme_customer_42"}' \
  "http://localhost:8000/keys/report/hf"

# Report error:
curl -X POST -H "Authorization: Bearer $USER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"token_id": "hf_your_ap", "status": "error", "error": "402 Client Error: Payment Required", "client_name": "acme_customer_42"}' \
  "http://localhost:8000/keys/report/hf"

Universal Proxy (Role-Based Access)

16. Use as Proxy for HF Inference API

# Instead of calling HF directly:
curl -X POST "https://router.huggingface.co/v1/chat/completions" \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "Hello"}], "model": "zai-org/GLM-4.5:novita"}'

# Call through the proxy (requires user role or higher):
curl -X POST "http://localhost:8000/hf/v1/chat/completions" \
  -H "Authorization: Bearer $USER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "Hello"}], "model": "zai-org/GLM-4.5:novita"}'

17. Use as Proxy for Vercel AI Gateway

# Instead of calling Vercel AI Gateway directly:
curl -X POST "https://ai-gateway.vercel.sh/v1/chat/completions" \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "Hello"}], "model": "openai/gpt-4o-mini"}'

# Call through the proxy (requires user role or higher):
curl -X POST "http://localhost:8000/vc/v1/chat/completions" \
  -H "Authorization: Bearer $USER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "Hello"}], "model": "openai/gpt-4o-mini"}'

18. Add Vercel AI Gateway Keys (with email support)

# Add single key with email:
curl -X POST "http://localhost:8000/add-key/vc/vck_ABC123:user@example.com" \
  -H "Authorization: Bearer $ADMIN_API_KEY"

# Add multiple keys with emails:
curl -X POST "http://localhost:8000/add-key/vc" \
  -H "Authorization: Bearer $ADMIN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "keys": [
      "vck_ABC123:alice@example.com",
      "vck_DEF456:bob@example.com"
    ]
  }'

Bulk Key Operations

19. Add Multiple Keys from File

If you have your keys in a text file (one key per line), you can easily add them all:

File format (keys.txt):

hf_key_1
hf_key_2
hf_key_3

Using jq (recommended):

curl -X POST "http://localhost:8000/add-key/hf" \
  -H "Authorization: Bearer $ADMIN_API_KEY" \
  -H "Content-Type: application/json" \
  -d "{\"keys\": $(cat keys.txt | jq -R -s -c 'split("\n") | map(select(length > 0))')}"

Alternative method (without jq):

keys=$(cat keys.txt | tr '\n' ',' | sed 's/,$//' | sed 's/^/"/;s/,/","/g;s/$/"/')
curl -X POST "http://localhost:8000/add-key/hf" \
  -H "Authorization: Bearer $ADMIN_API_KEY" \
  -H "Content-Type: application/json" \
  -d "{\"keys\": [$keys]}"

Security Note: Remember to add keys.txt to your .gitignore to prevent accidentally committing your API keys to version control.

Admin Endpoints Examples

20. User Management (Admin Role Required)

List All Users

curl -H "Authorization: Bearer $ADMIN_API_KEY" \
  "http://localhost:8000/admin/users"

Create New User

curl -X POST "http://localhost:8000/admin/users" \
  -H "Authorization: Bearer $ADMIN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "username": "newuser",
    "email": "user@example.com",
    "full_name": "New User",
    "role": "user"
  }'

Response Example:

{
  "success": true,
  "message": "User newuser created successfully",
  "username": "newuser",
  "api_key": "generated_api_key_here..."
}

Update User

curl -X PUT "http://localhost:8000/admin/users/newuser" \
  -H "Authorization: Bearer $ADMIN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "email": "updated@example.com",
    "full_name": "Updated User Name"
  }'

Delete User

curl -X DELETE "http://localhost:8000/admin/users/newuser" \
  -H "Authorization: Bearer $ADMIN_API_KEY"

Regenerate API Key

curl -X POST "http://localhost:8000/admin/users/newuser/generate-key" \
  -H "Authorization: Bearer $ADMIN_API_KEY"

21. Role Management (Admin Role Required)

List All Roles

curl -H "Authorization: Bearer $ADMIN_API_KEY" \
  "http://localhost:8000/admin/roles"

Create New Role

curl -X POST "http://localhost:8000/admin/roles" \
  -H "Authorization: Bearer $ADMIN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "customrole",
    "description": "Custom role for specific access",
    "endpoints": ["/health", "/docs", "/custom/*"],
    "is_default": false
  }'

Update Role

curl -X PUT "http://localhost:8000/admin/roles/customrole" \
  -H "Authorization: Bearer $ADMIN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "description": "Updated role description",
    "endpoints": ["/health", "/docs", "/custom/*", "/new/*"]
  }'

Delete Role

curl -X DELETE "http://localhost:8000/admin/roles/customrole" \
  -H "Authorization: Bearer $ADMIN_API_KEY"

Roles and Permissions

The RBAC system defines four user roles with different access levels:

Admin Role

Access: All endpoints (*)
Description: Full system access
Use Case: System administrators

Manager Role

Access: Key management + Universal proxy
Endpoints:
- /check-validity/*
- /add-key/*
- /keys/*
- /hf/{path:path}, /vc/{path:path} (Universal proxy)
Use Case: Key managers and system operators

User Role

Access: Standard user access
Endpoints:
- /health, /docs, /
- /hf/{path:path}, /vc/{path:path} (Universal proxy)
- /keys/provision/hf, /keys/report/hf
- /keys/provision/vc, /keys/report/vc
Use Case: Regular users accessing AI Inference APIs

Guest Role (Default)

Access: Public endpoints only
Endpoints: /health, /docs, /
Use Case: Basic access with authentication tracking

Security Features

API Key Generation: 64+ character cryptographically secure random strings
Role-Based Access: Endpoint access controlled by user roles
Universal Proxy Protection: Universal proxy endpoint respects role permissions
Audit Logging: All requests logged with user context
No Password Storage: Only API keys, no password management
Input Validation: Comprehensive validation of all request data using Pydantic models
JSON-Based API: Modern REST API design with proper content-type validation

How It Works

Server Startup: Loads keys from provider-specific files without validation (to avoid rate limits)
Request Reception: Server receives requests to any AI Inference endpoint (HF or VC)
Provider Detection: Routes requests to appropriate provider based on URL prefix (/hf/ or /vc/)
Key Selection: Smart algorithm selects a valid, non-quarantined API key for the provider
Request Forwarding: Forwards request to the appropriate upstream API with selected key
Response Handling:
- If successful: Returns response as-is
- If key error: Silently switches to next valid key and retries
- If 402 error: Handles with quarantine system and switches to next key
- If no valid keys: Passes error transparently to user
Transparent Relay: All responses (success, error, streaming) are relayed unchanged
Rate Limiting: Respects provider rate limits with configurable wait times and retries
Quarantine System: Handles temporary 402 errors with progressive quarantine stages
Provider-Specific Logic: Each provider has its own validation, error handling, and key management

Error Handling

The proxy detects specific error responses and handles them automatically for each provider:

Hugging Face Error Handling

Invalid Key (401 Unauthorized):
- Detects both HTTP 401 status codes and JSON error responses
- Automatically marks the key as invalid in the configuration
- Switches to the next valid key and retries
402 Payment Required: {"error":"You have exceeded your monthly included credits for Inference Providers. Subscribe to PRO to get 20x more monthly included credits."}
- Handles with quarantine system and switches to the next key

Vercel AI Gateway Error Handling

Invalid Key (401 Authentication Error):
- Detects authentication errors from VC Gateway
- Automatically marks the key as invalid in the configuration
- Switches to the next valid key and retries
402 Insufficient Funds: {"error":{"message":"Insufficient funds. Please add credits to your account to continue using AI services.","type":"insufficient_funds"}}
- Handles with quarantine system and switches to the next key
404 Model Not Found: Treated as valid key (expected for validation requests)

When these errors occur, the proxy handles them with the quarantine system and switches to the next valid key.

Rate Limiting

The server handles rate limiting gracefully for each provider:

Hugging Face Rate Limiting

IP Rate Limiting: When your IP is rate limited (429), the system stops processing all HF keys
Retry Logic: Configurable retry attempts (HF_WHOAMI_RETRY, default: 3)
Wait Time: Configurable wait time between retries (HF_WHOAMI_WAIT, default: 2s)
Graceful Shutdown: When rate limited after max retries, stops processing all HF keys
Resume Capability: Uses last_validity_check timestamps to resume validation later

Vercel AI Gateway Rate Limiting

IP Rate Limiting: When your IP is rate limited (429), the system stops processing all VC keys
Retry Logic: Configurable retry attempts (VC_VALIDATE_RETRY, default: 3)
Wait Time: Configurable wait time between retries (VC_VALIDATE_WAIT, default: 2s)
Graceful Shutdown: When rate limited after max retries, stops processing all VC keys
Resume Capability: Uses last_validity_check timestamps to resume validation later

Note: Keys themselves don’t have rate limits - only your IP does. When rate limited, all keys for that provider are affected equally.

Quarantine System

The proxy implements a multi-stage quarantine system to handle temporary 402 errors gracefully for both providers. This addresses the issue where providers sometimes return false 402 errors that resolve themselves after a few minutes to hours.

Quarantine Stages

Stage 1 (Soft Disable): 30 minutes
- Triggered on first 402 error
- Key is temporarily unavailable for 30 minutes
Stage 2 (Hour Block): 1 hour
- Triggered if Stage 1 expires and key still returns 402
- Key is unavailable for 1 hour
Stage 3 (Day Block): 1 day
- Triggered if Stage 2 expires and key still returns 402
- Key is unavailable for 1 day
Stage 4 (Week Block): 1 week
- Triggered if Stage 3 expires and key still returns 402
- Key is unavailable for 1 week
Stage 5 (Extended): 30 days
- Triggered if Stage 4 expires and key still returns 402
- Key is quarantined for 30 days, then becomes available again

Quarantine Logic

Progressive Advancement: Keys advance through stages when quarantine expires and a new 402 error occurs
Time-Based Stages: Each stage has a fixed duration, and advancement happens when that duration expires
Stage 5 Reset: After Stage 5 expires, if a 402 error occurs, the quarantine restarts from Stage 1 (prevents long waiting periods)
Automatic Clearing: When a key successfully processes a request, its quarantine is automatically cleared
Manual Clearing: Admins can manually clear quarantine using the API endpoint

Quarantine Status Response

{
  "quarantine": [
    {
      "key": "hf_key_123...",
      "stage": "stage_1",
      "start_date": "2025-01-15T10:30:00+00:00",
      "is_active": true,
      "remaining_seconds": 1200
    }
  ]
}

Benefits

Prevents False Positives: Keys aren’t permanently disabled on temporary 402 errors
Progressive Response: Longer quarantine periods for persistently problematic keys, with automatic recovery
Automatic Recovery: Keys automatically return to service when they work again
Transparent Operation: No impact on normal request processing
Provider Agnostic: Works consistently across HF and VC providers

Development

Running in Development Mode

uv run ai-inferoxy

Viewing Logs

The server provides comprehensive logging with both console and file output:

Console Logging:

Real-time logs displayed in the terminal
Perfect for development and debugging

File Logging:

Automatic log rotation based on configuration
Logs stored in ./logs/ directory (configurable via HF_LOGS_DIR)
Daily rotation by default (configurable to hourly, minute, or monthly)

Log Content Includes:

Incoming requests
Key selection and rotation
Proxy forwarding details
Response status codes
Error responses and retry attempts
Rate limiting events
Monthly credits exceeded events
Log rotation events

Project Structure

ai-inferoxy/
├── ai_inferoxy/           # Main package directory
│   ├── __init__.py        # Package initialization
│   ├── server.py          # FastAPI application entry point
│   ├── schemas.py         # Pydantic models and request/response schemas
│   ├── app_state.py       # Application state management
│   ├── user_manager.py    # User and role management (RBAC)
│   ├── rbac_manager.py    # Role-based access control logic
│   ├── auth_middleware.py # Authentication and authorization middleware
│   ├── key_manager/       # Modular key management
│   │   ├── __init__.py    # Package initialization
│   │   ├── common.py      # Common key manager base class
│   │   ├── hf.py          # Hugging Face key manager
│   │   ├── vc.py          # Vercel AI Gateway key manager
│   │   └── cf.py          # Cloudflare key manager (stub)
│   ├── proxy/             # Modular proxy implementation
│   │   ├── __init__.py    # Package initialization
│   │   ├── common.py      # Common proxy base class
│   │   ├── hf.py          # Hugging Face proxy
│   │   ├── vc.py          # Vercel AI Gateway proxy
│   │   └── cf.py          # Cloudflare proxy (stub)
│   ├── routers/           # FastAPI route modules
│   │   ├── health.py      # Health and info endpoints
│   │   ├── keys_hf.py     # HF key management endpoints
│   │   ├── keys_vc.py     # VC key management endpoints
│   │   ├── proxy_hf.py    # HF proxy endpoints
│   │   ├── proxy_vc.py    # VC proxy endpoints
│   │   └── admin.py       # Admin endpoints
│   └── __main__.py        # CLI entry point
├── users.json             # User and role definitions (auto-created)
├── hf-keys.json          # HF API keys (auto-created if missing)
├── vc-keys.json          # VC API keys (auto-created if missing)
├── env.example            # Example environment configuration
├── .env                   # Environment configuration (create from env.example)
├── pyproject.toml         # Python project configuration
├── README.md             # Documentation
├── LICENSE               # AGPL v3.0 license
├── .gitignore            # Git ignore file
└── logs/                 # Log files directory (auto-created, gitignored)

Configuration

The server automatically creates key files with default structures if they don’t exist:

HF Keys (hf-keys.json):

{
  "keys": [],
  "rotation_strategy": "round_robin",
  "check_interval_days": 30
}

VC Keys (vc-keys.json):

{
  "keys": [],
  "rotation_strategy": "round_robin",
  "check_interval_days": 7
}

Note: The hf-keys.json, vc-keys.json, and .env files are excluded from version control for security reasons.

License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0) - see the LICENSE file for details.

The AGPL-3.0 license requires that if you run a modified version of this software on a server and provide services to users over a network, you must make the source code available to those users. This ensures that improvements to the software remain open source.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Development Setup

Fork the repository
Clone your fork: git clone https://github.com/yourusername/ai-inferoxy.git
Create a feature branch: git checkout -b feature/amazing-feature
Install dependencies: uv sync
Make your changes and test them
Commit your changes: git commit -m 'Add some amazing feature'
Push to the branch: git push origin feature/amazing-feature
Open a Pull Request

Code Style

Follow PEP 8 for Python code style
Add type hints where appropriate
Include docstrings for functions and classes
Write tests for new features

Support

If you encounter any issues or have questions, please:

Check the Issues page for existing solutions
Create a new issue with detailed information about your problem
Include your environment details and any error messages

Acknowledgments

Built with FastAPI for high-performance API development
Uses uv for fast Python package management