AI Inference Proxy
A lightweight FastAPI server that acts as a universal proxy to multiple AI Inference APIs (Hugging Face, Vercel AI Gateway) with smart API key rotation and provider-specific management.
Features
- Multi-Provider Support: Universal proxy for Hugging Face Inference API and Vercel AI Gateway
- Smart Key Rotation: Automatically rotates API keys to avoid rate limits
- Silent Key Switching: Switches keys silently on invalid/monthly-credits-exceeded errors
- Multi-Stage Quarantine System: Handles temporary 402 errors with progressive quarantine stages
- Provider-Specific Key Management: Add, validate, and monitor API keys for each provider
- Transparent Relay: Preserves all request/response data except auth headers
- Role-Based Access Control (RBAC): Comprehensive user management and endpoint access control
- API Key Authentication: Secure authentication using Bearer tokens
- User Management: Create, update, delete, and manage users and roles
- Audit Logging: All requests logged with user context when authenticated
- Modular Architecture: Clean separation of provider-specific logic for easy extensibility
Quick Start
Prerequisites
- Python 3.13+
uv
package manager
Installation
- Clone the repository:
git clone https://github.com/nazdridoy/ai-inferoxy.git cd ai-inferoxy
- Install dependencies:
uv sync
- Configure (Optional):
cp env.example .env # Edit .env with your preferred settings
- Run the server:
uv run ai-inferoxy
The server will start on http://localhost:8000
and automatically create hf-keys.json
and vc-keys.json
if they donβt exist.
Running as a Systemd Service (Optional)
To run ai-inferoxy as a background service that starts automatically:
Option 1: User Service (for local development)
- Copy the service file to your user systemd directory:
cp ai-inferoxy.service ~/.config/systemd/user/
Note: The service assumes your ai-inferoxy project is located at ~/ai-inferoxy
. If your project is in a different location, edit the service file and change the WorkingDirectory
path accordingly.
- Reload systemd and enable the service:
systemctl --user daemon-reload systemctl --user enable ai-inferoxy
- Start the service:
systemctl --user start ai-inferoxy
Useful commands:
- Check status:
systemctl --user status ai-inferoxy
- View logs:
journalctl --user -u ai-inferoxy -f
- Stop service:
systemctl --user stop ai-inferoxy
- Disable service:
systemctl --user disable ai-inferoxy
Option 2: System Service (for remote servers)
- Copy the system service file to the system directory:
sudo cp ai-inferoxy-system.service /etc/systemd/system/ai-inferoxy.service
- Edit the service file to replace
REPLACE_WITH_YOUR_USERNAME
with your actual username:sudo nano /etc/systemd/system/ai-inferoxy.service
- Reload systemd and enable the service:
sudo systemctl daemon-reload sudo systemctl enable ai-inferoxy
- Start the service:
sudo systemctl start ai-inferoxy
Useful commands:
- Check status:
sudo systemctl status ai-inferoxy
- View logs:
sudo journalctl -u ai-inferoxy -f
- Stop service:
sudo systemctl stop ai-inferoxy
- Disable service:
sudo systemctl disable ai-inferoxy
Troubleshooting: If the service fails to start, check:
- Path issues: Ensure
uv
is in your PATH or edit the service file to use/usr/bin/python3 -m hf_inferoxy
instead - Working directory: Verify the
WorkingDirectory
path in the service file matches your actual project location - Dependencies: Make sure youβve run
uv sync
to install dependencies - Logs: Check detailed logs with
journalctl -u ai-inferoxy -xeu
(system service) orjournalctl --user -xeu ai-inferoxy.service
(user service)
Configuration
Environment Variables
You can configure the server using environment variables or a .env
file.
Option 1: Environment Variables
export PROXY_HOST=0.0.0.0
export PROXY_PORT=8000
export HF_KEYS_FILE=./hf-keys.json
export HF_BASE_URL=https://router.huggingface.co
export HF_WHOAMI_WAIT=2
export HF_WHOAMI_RETRY=3
export VC_KEYS_FILE=./vc-keys.json
export VC_BASE_URL=https://ai-gateway.vercel.sh
export VC_VALIDATE_WAIT=2
export VC_VALIDATE_RETRY=3
export MIN_CHECK_FREQUENCY=3600
export LOG_LEVEL=INFO
Option 2: .env File (Recommended)
- Copy the example environment file:
cp env.example .env
- Edit
.env
with your preferred settings:# Server Configuration PROXY_HOST=0.0.0.0 PROXY_PORT=8000 # Hugging Face Configuration HF_KEYS_FILE=./hf-keys.json HF_BASE_URL=https://router.huggingface.co HF_WHOAMI_WAIT=2 HF_WHOAMI_RETRY=3 # Vercel AI Gateway Configuration VC_KEYS_FILE=./vc-keys.json VC_BASE_URL=https://ai-gateway.vercel.sh VC_VALIDATE_WAIT=2 VC_VALIDATE_RETRY=3 # Global Configuration MIN_CHECK_FREQUENCY=3600 LOG_LEVEL=INFO
Available Environment Variables
Server Configuration:
PROXY_HOST
- Host to bind to (default: 0.0.0.0)PROXY_PORT
- Port to bind to (default: 8000)
Hugging Face Configuration:
HF_KEYS_FILE
- Path to HF API keys JSON file (default: ./hf-keys.json)HF_BASE_URL
- HF Inference base URL (default: https://router.huggingface.co)HF_WHOAMI_WAIT
- Wait time in seconds after rate limit (default: 2)HF_WHOAMI_RETRY
- Number of retries for whoami endpoint (default: 3)
Vercel AI Gateway Configuration:
VC_KEYS_FILE
- Path to VC API keys JSON file (default: ./vc-keys.json)VC_BASE_URL
- Vercel AI Gateway base URL (default: https://ai-gateway.vercel.sh)VC_VALIDATE_WAIT
- Wait time in seconds for validation requests (default: 2)VC_VALIDATE_RETRY
- Number of retries for validation requests (default: 3)
Global Configuration:
MIN_CHECK_FREQUENCY
- Minimum time between validity checks in seconds (default: 3600)LOG_LEVEL
- Logging level (default: INFO)HF_LOGS_DIR
- Directory for log files (default: ./logs)LOG_ROTATION_WHEN
- Log rotation frequency: M=minute, H=hourly, D=daily, M=monthly (default: D)LOG_ROTATION_INTERVAL
- Interval for rotation (default: 1)LOG_BACKUP_COUNT
- Number of backup files to keep (default: 7)ENABLE_RELOAD
- Enable file watching and auto-reload for development (default: false)
Logging Configuration
The server provides comprehensive logging with configurable rotation:
Log Rotation Options:
- Daily Rotation (default):
LOG_ROTATION_WHEN=D
- Creates one log file per day - Hourly Rotation:
LOG_ROTATION_WHEN=H
- Creates one log file per hour - Minute Rotation:
LOG_ROTATION_WHEN=M
- Creates one log file per minute (for testing) - Monthly Rotation:
LOG_ROTATION_WHEN=M
withLOG_ROTATION_INTERVAL>1
- Creates one log file per month
Log Retention:
- Daily: Keep 7 days of logs (default)
- Hourly: Keep 168 hours (1 week) of logs
- Minute: Keep 10 minutes of logs (for testing)
Log File Naming:
- Daily:
ai-inferoxy.log.2025-08-19
,ai-inferoxy.log.2025-08-20
- Hourly:
ai-inferoxy.log.2025-08-19_14
,ai-inferoxy.log.2025-08-19_15
- Minute:
ai-inferoxy.log.2025-08-19_14-30
,ai-inferoxy.log.2025-08-19_14-31
Development Mode:
- Set
ENABLE_RELOAD=true
for file watching and auto-reload - Note: When
ENABLE_RELOAD=true
, consider settingHF_LOGS_DIR
to a directory outside the project to avoid file watching spam
API Keys File Structure
Hugging Face Keys (hf-keys.json
):
{
"keys": [
{
"key": "hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"valid": true,
"last_validity_check": "2025-01-15T10:30:00+00:00",
"user_info": {
"name": "username",
"email": "user@example.com",
"isPro": false,
"canPay": false
},
"quarantine_stage": "none",
"quarantine_start_date": null
}
],
"rotation_strategy": "round_robin",
"check_interval_days": 30
}
Vercel AI Gateway Keys (vc-keys.json
):
{
"keys": [
{
"key": "vck_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"valid": true,
"last_validity_check": "2025-01-15T10:30:00+00:00",
"user_info": {
"name": "user@example.com",
"email": "user@example.com",
"isPro": false,
"canPay": false
},
"quarantine_stage": "none",
"quarantine_start_date": null
}
],
"rotation_strategy": "round_robin",
"check_interval_days": 7
}
API Endpoints
Health & Info
GET /
- Server informationGET /health
- Health checkGET /docs
- Interactive API documentation
Key Management (Hugging Face)
GET /check-validity/hf/{key}
- Check if a specific API key is valid and update its status in configuration if foundPOST /add-key/hf/{key}
- Add new API key with automatic validation (path parameter)POST /add-key/hf
- Add multiple API keys with automatic validation (JSON body)GET /keys/status/hf
- Get status of all API keysPOST /keys/validate/hf
- Validate keys that need validation (respects minimum check frequency)POST /keys/refresh/hf
- Force refresh all keys validity and user info (ignores minimum check frequency, removes duplicates)POST /keys/cleanup/hf
- Remove duplicate keys and invalid keys from configurationPOST /keys/reload/hf
- Force reload keys from file
Key Management (Vercel AI Gateway)
GET /check-validity/vc/{key}
- Check if a specific API key is valid and update its status in configuration if foundPOST /add-key/vc/{key}
- Add new API key with automatic validation (path parameter, supportskey:email
format)POST /add-key/vc
- Add multiple API keys with automatic validation (JSON body, supportskey:email
format)GET /keys/status/vc
- Get status of all API keysPOST /keys/validate/vc
- Validate keys that need validation (respects minimum check frequency)POST /keys/refresh/vc
- Force refresh all keys validity and user info (ignores minimum check frequency, removes duplicates)POST /keys/cleanup/vc
- Remove duplicate keys and invalid keys from configurationPOST /keys/reload/vc
- Force reload keys from file
Client API (Hugging Face)
GET /keys/provision/hf
- Provision a valid API key for client usagePOST /keys/report/hf
- Report token usage status (success/error) from clients (optionalclient_name
to track end user; defaults to username)
Client API (Vercel AI Gateway)
GET /keys/provision/vc
- Provision a valid API key for client usagePOST /keys/report/vc
- Report token usage status (success/error) from clients (optionalclient_name
to track end user; defaults to username)
Quarantine Management (Hugging Face)
GET /keys/quarantine/hf
- Get quarantine status for all keysPOST /keys/quarantine/clear/hf/{key}
- Manually clear quarantine for a specific key
Quarantine Management (Vercel AI Gateway)
GET /keys/quarantine/vc
- Get quarantine status for all keysPOST /keys/quarantine/clear/vc/{key}
- Manually clear quarantine for a specific key
Universal Proxy (Hugging Face)
ANY /hf/{path:path}
- Proxy to HF Inference API (role-based access)
Universal Proxy (Vercel AI Gateway)
ANY /vc/{path:path}
- Proxy to Vercel AI Gateway (role-based access)
Admin Endpoints (Admin Role Required)
User Management
GET /admin/users
- List all usersPOST /admin/users
- Create new userPUT /admin/users/{username}
- Update userDELETE /admin/users/{username}
- Delete userPOST /admin/users/{username}/generate-key
- Regenerate API key
Role Management
GET /admin/roles
- List all rolesPOST /admin/roles
- Create new rolePUT /admin/roles/{role_name}
- Update roleDELETE /admin/roles/{role_name}
- Delete role
Authentication & RBAC
The AI Inference Proxy now includes a comprehensive Role-Based Access Control (RBAC) system. Most endpoints require authentication via API keys in the Authorization
header.
π For detailed RBAC documentation, Pydantic models, testing examples, and troubleshooting, see RBAC_README.md.
Default Admin User
The system creates a default admin user on first run. Youβll need this to access admin endpoints:
{
"username": "admin",
"api_key": "ndr_YWRtaW4_abc123def456ghi789jkl012mno345pqr678stu901vwx234yz5678901234567890"
}
β οΈ IMPORTANT: Use this API key to access admin endpoints and create additional users. The actual API key will be different in your installation.
Authentication Header
All protected endpoints require the Authorization
header:
curl -H "Authorization: Bearer YOUR_API_KEY" http://localhost:8000/admin/users
Usage Examples
Quick Reference
Category | Examples | Auth Required |
---|---|---|
Public | Health, Docs, Server Info | β No |
Key Management | Add, Validate, Refresh, Cleanup | β Manager/Admin |
Client API | Provision, Report | β User Role |
Universal Proxy | HF Inference API | β User Role+ |
Admin | Users, Roles Management | β Admin Only |
Public Endpoints (No Authentication Required)
1. Check Server Health
curl http://localhost:8000/health
2. View API Documentation
curl http://localhost:8000/docs
3. Get Server Information
curl http://localhost:8000/
Key Management (Manager/Admin Role Required)
4. Add Single API Key (HF)
curl -X POST "http://localhost:8000/add-key/hf/hf_your_api_key_here" \
-H "Authorization: Bearer $ADMIN_API_KEY"
5. Add Multiple API Keys (HF)
curl -X POST "http://localhost:8000/add-key/hf" \
-H "Authorization: Bearer $ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d '{"keys": ["hf_key_1", "hf_key_2", "hf_key_3"]}'
Bulk Response Format:
{
"success": true,
"message": "Processed 3 keys: 2 successful, 1 failed",
"results": [
{
"key": "hf_key_1...",
"success": true,
"message": "Key added successfully"
},
{
"key": "hf_key_2...",
"success": false,
"message": "Key already exists"
}
],
"summary": {
"total": 3,
"successful": 2,
"failed": 1
}
}
6. Check Key Validity (HF)
curl -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/check-validity/hf/hf_your_api_key_here"
7. Get All Keys Status (HF)
curl -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/keys/status/hf"
8. Validate Keys (HF) (Respects Rate Limits)
curl -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/keys/validate/hf"
9. Force Refresh All Keys (HF)
curl -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/keys/refresh/hf"
10. Clean Up Keys (HF) (Remove Duplicates and Invalid Keys)
curl -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/keys/cleanup/hf"
11. Reload Keys from File (HF)
curl -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/keys/reload/hf"
12. Check Quarantine Status (HF)
curl -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/keys/quarantine/hf"
13. Clear Quarantine for a Key (HF)
curl -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/keys/quarantine/clear/hf/hf_your_api_key_here"
Client API (User Role Required)
14. Get API Key for Client Usage (HF)
curl -H "Authorization: Bearer $USER_API_KEY" \
"http://localhost:8000/keys/provision/hf"
Response Example:
{
"token": "hf_your_api_key_here",
"token_id": "hf_your_ap",
"expires_in": 3600
}
15. Report Token Usage Status (HF)
# Report successful usage (optionally include client_name/end user identifier):
curl -X POST -H "Authorization: Bearer $USER_API_KEY" \
-H "Content-Type: application/json" \
-d '{"token_id": "hf_your_ap", "status": "success", "client_name": "acme_customer_42"}' \
"http://localhost:8000/keys/report/hf"
# Report error:
curl -X POST -H "Authorization: Bearer $USER_API_KEY" \
-H "Content-Type: application/json" \
-d '{"token_id": "hf_your_ap", "status": "error", "error": "402 Client Error: Payment Required", "client_name": "acme_customer_42"}' \
"http://localhost:8000/keys/report/hf"
Universal Proxy (Role-Based Access)
16. Use as Proxy for HF Inference API
# Instead of calling HF directly:
curl -X POST "https://router.huggingface.co/v1/chat/completions" \
-H "Authorization: Bearer YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Hello"}], "model": "zai-org/GLM-4.5:novita"}'
# Call through the proxy (requires user role or higher):
curl -X POST "http://localhost:8000/hf/v1/chat/completions" \
-H "Authorization: Bearer $USER_API_KEY" \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Hello"}], "model": "zai-org/GLM-4.5:novita"}'
17. Use as Proxy for Vercel AI Gateway
# Instead of calling Vercel AI Gateway directly:
curl -X POST "https://ai-gateway.vercel.sh/v1/chat/completions" \
-H "Authorization: Bearer YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Hello"}], "model": "openai/gpt-4o-mini"}'
# Call through the proxy (requires user role or higher):
curl -X POST "http://localhost:8000/vc/v1/chat/completions" \
-H "Authorization: Bearer $USER_API_KEY" \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Hello"}], "model": "openai/gpt-4o-mini"}'
18. Add Vercel AI Gateway Keys (with email support)
# Add single key with email:
curl -X POST "http://localhost:8000/add-key/vc/vck_ABC123:user@example.com" \
-H "Authorization: Bearer $ADMIN_API_KEY"
# Add multiple keys with emails:
curl -X POST "http://localhost:8000/add-key/vc" \
-H "Authorization: Bearer $ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"keys": [
"vck_ABC123:alice@example.com",
"vck_DEF456:bob@example.com"
]
}'
Bulk Key Operations
19. Add Multiple Keys from File
If you have your keys in a text file (one key per line), you can easily add them all:
File format (keys.txt):
hf_key_1
hf_key_2
hf_key_3
Using jq (recommended):
curl -X POST "http://localhost:8000/add-key/hf" \
-H "Authorization: Bearer $ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d "{\"keys\": $(cat keys.txt | jq -R -s -c 'split("\n") | map(select(length > 0))')}"
Alternative method (without jq):
keys=$(cat keys.txt | tr '\n' ',' | sed 's/,$//' | sed 's/^/"/;s/,/","/g;s/$/"/')
curl -X POST "http://localhost:8000/add-key/hf" \
-H "Authorization: Bearer $ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d "{\"keys\": [$keys]}"
Security Note: Remember to add keys.txt
to your .gitignore
to prevent accidentally committing your API keys to version control.
Admin Endpoints Examples
20. User Management (Admin Role Required)
List All Users
curl -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/admin/users"
Create New User
curl -X POST "http://localhost:8000/admin/users" \
-H "Authorization: Bearer $ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"username": "newuser",
"email": "user@example.com",
"full_name": "New User",
"role": "user"
}'
Response Example:
{
"success": true,
"message": "User newuser created successfully",
"username": "newuser",
"api_key": "generated_api_key_here..."
}
Update User
curl -X PUT "http://localhost:8000/admin/users/newuser" \
-H "Authorization: Bearer $ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"email": "updated@example.com",
"full_name": "Updated User Name"
}'
Delete User
curl -X DELETE "http://localhost:8000/admin/users/newuser" \
-H "Authorization: Bearer $ADMIN_API_KEY"
Regenerate API Key
curl -X POST "http://localhost:8000/admin/users/newuser/generate-key" \
-H "Authorization: Bearer $ADMIN_API_KEY"
21. Role Management (Admin Role Required)
List All Roles
curl -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/admin/roles"
Create New Role
curl -X POST "http://localhost:8000/admin/roles" \
-H "Authorization: Bearer $ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "customrole",
"description": "Custom role for specific access",
"endpoints": ["/health", "/docs", "/custom/*"],
"is_default": false
}'
Update Role
curl -X PUT "http://localhost:8000/admin/roles/customrole" \
-H "Authorization: Bearer $ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"description": "Updated role description",
"endpoints": ["/health", "/docs", "/custom/*", "/new/*"]
}'
Delete Role
curl -X DELETE "http://localhost:8000/admin/roles/customrole" \
-H "Authorization: Bearer $ADMIN_API_KEY"
Roles and Permissions
The RBAC system defines four user roles with different access levels:
Admin Role
- Access: All endpoints (
*
) - Description: Full system access
- Use Case: System administrators
Manager Role
- Access: Key management + Universal proxy
- Endpoints:
/check-validity/*
/add-key/*
/keys/*
/hf/{path:path}
,/vc/{path:path}
(Universal proxy)
- Use Case: Key managers and system operators
User Role
- Access: Standard user access
- Endpoints:
/health
,/docs
,/
/hf/{path:path}
,/vc/{path:path}
(Universal proxy)/keys/provision/hf
,/keys/report/hf
/keys/provision/vc
,/keys/report/vc
- Use Case: Regular users accessing AI Inference APIs
Guest Role (Default)
- Access: Public endpoints only
- Endpoints:
/health
,/docs
,/
- Use Case: Basic access with authentication tracking
Security Features
- API Key Generation: 64+ character cryptographically secure random strings
- Role-Based Access: Endpoint access controlled by user roles
- Universal Proxy Protection: Universal proxy endpoint respects role permissions
- Audit Logging: All requests logged with user context
- No Password Storage: Only API keys, no password management
- Input Validation: Comprehensive validation of all request data using Pydantic models
- JSON-Based API: Modern REST API design with proper content-type validation
How It Works
- Server Startup: Loads keys from provider-specific files without validation (to avoid rate limits)
- Request Reception: Server receives requests to any AI Inference endpoint (HF or VC)
- Provider Detection: Routes requests to appropriate provider based on URL prefix (
/hf/
or/vc/
) - Key Selection: Smart algorithm selects a valid, non-quarantined API key for the provider
- Request Forwarding: Forwards request to the appropriate upstream API with selected key
- Response Handling:
- If successful: Returns response as-is
- If key error: Silently switches to next valid key and retries
- If 402 error: Handles with quarantine system and switches to next key
- If no valid keys: Passes error transparently to user
- Transparent Relay: All responses (success, error, streaming) are relayed unchanged
- Rate Limiting: Respects provider rate limits with configurable wait times and retries
- Quarantine System: Handles temporary 402 errors with progressive quarantine stages
- Provider-Specific Logic: Each provider has its own validation, error handling, and key management
Error Handling
The proxy detects specific error responses and handles them automatically for each provider:
Hugging Face Error Handling
- Invalid Key (401 Unauthorized):
- Detects both HTTP 401 status codes and JSON error responses
- Automatically marks the key as invalid in the configuration
- Switches to the next valid key and retries
- 402 Payment Required:
{"error":"You have exceeded your monthly included credits for Inference Providers. Subscribe to PRO to get 20x more monthly included credits."}
- Handles with quarantine system and switches to the next key
Vercel AI Gateway Error Handling
- Invalid Key (401 Authentication Error):
- Detects authentication errors from VC Gateway
- Automatically marks the key as invalid in the configuration
- Switches to the next valid key and retries
- 402 Insufficient Funds:
{"error":{"message":"Insufficient funds. Please add credits to your account to continue using AI services.","type":"insufficient_funds"}}
- Handles with quarantine system and switches to the next key
- 404 Model Not Found: Treated as valid key (expected for validation requests)
When these errors occur, the proxy handles them with the quarantine system and switches to the next valid key.
Rate Limiting
The server handles rate limiting gracefully for each provider:
Hugging Face Rate Limiting
- IP Rate Limiting: When your IP is rate limited (429), the system stops processing all HF keys
- Retry Logic: Configurable retry attempts (
HF_WHOAMI_RETRY
, default: 3) - Wait Time: Configurable wait time between retries (
HF_WHOAMI_WAIT
, default: 2s) - Graceful Shutdown: When rate limited after max retries, stops processing all HF keys
- Resume Capability: Uses
last_validity_check
timestamps to resume validation later
Vercel AI Gateway Rate Limiting
- IP Rate Limiting: When your IP is rate limited (429), the system stops processing all VC keys
- Retry Logic: Configurable retry attempts (
VC_VALIDATE_RETRY
, default: 3) - Wait Time: Configurable wait time between retries (
VC_VALIDATE_WAIT
, default: 2s) - Graceful Shutdown: When rate limited after max retries, stops processing all VC keys
- Resume Capability: Uses
last_validity_check
timestamps to resume validation later
Note: Keys themselves donβt have rate limits - only your IP does. When rate limited, all keys for that provider are affected equally.
Quarantine System
The proxy implements a multi-stage quarantine system to handle temporary 402 errors gracefully for both providers. This addresses the issue where providers sometimes return false 402 errors that resolve themselves after a few minutes to hours.
Quarantine Stages
- Stage 1 (Soft Disable): 30 minutes
- Triggered on first 402 error
- Key is temporarily unavailable for 30 minutes
- Stage 2 (Hour Block): 1 hour
- Triggered if Stage 1 expires and key still returns 402
- Key is unavailable for 1 hour
- Stage 3 (Day Block): 1 day
- Triggered if Stage 2 expires and key still returns 402
- Key is unavailable for 1 day
- Stage 4 (Week Block): 1 week
- Triggered if Stage 3 expires and key still returns 402
- Key is unavailable for 1 week
- Stage 5 (Extended): 30 days
- Triggered if Stage 4 expires and key still returns 402
- Key is quarantined for 30 days, then becomes available again
Quarantine Logic
- Progressive Advancement: Keys advance through stages when quarantine expires and a new 402 error occurs
- Time-Based Stages: Each stage has a fixed duration, and advancement happens when that duration expires
- Stage 5 Reset: After Stage 5 expires, if a 402 error occurs, the quarantine restarts from Stage 1 (prevents long waiting periods)
- Automatic Clearing: When a key successfully processes a request, its quarantine is automatically cleared
- Manual Clearing: Admins can manually clear quarantine using the API endpoint
Quarantine Status Response
{
"quarantine": [
{
"key": "hf_key_123...",
"stage": "stage_1",
"start_date": "2025-01-15T10:30:00+00:00",
"is_active": true,
"remaining_seconds": 1200
}
]
}
Benefits
- Prevents False Positives: Keys arenβt permanently disabled on temporary 402 errors
- Progressive Response: Longer quarantine periods for persistently problematic keys, with automatic recovery
- Automatic Recovery: Keys automatically return to service when they work again
- Transparent Operation: No impact on normal request processing
- Provider Agnostic: Works consistently across HF and VC providers
Development
Running in Development Mode
uv run ai-inferoxy
Viewing Logs
The server provides comprehensive logging with both console and file output:
Console Logging:
- Real-time logs displayed in the terminal
- Perfect for development and debugging
File Logging:
- Automatic log rotation based on configuration
- Logs stored in
./logs/
directory (configurable viaHF_LOGS_DIR
) - Daily rotation by default (configurable to hourly, minute, or monthly)
Log Content Includes:
- Incoming requests
- Key selection and rotation
- Proxy forwarding details
- Response status codes
- Error responses and retry attempts
- Rate limiting events
- Monthly credits exceeded events
- Log rotation events
Project Structure
ai-inferoxy/
βββ ai_inferoxy/ # Main package directory
β βββ __init__.py # Package initialization
β βββ server.py # FastAPI application entry point
β βββ schemas.py # Pydantic models and request/response schemas
β βββ app_state.py # Application state management
β βββ user_manager.py # User and role management (RBAC)
β βββ rbac_manager.py # Role-based access control logic
β βββ auth_middleware.py # Authentication and authorization middleware
β βββ key_manager/ # Modular key management
β β βββ __init__.py # Package initialization
β β βββ common.py # Common key manager base class
β β βββ hf.py # Hugging Face key manager
β β βββ vc.py # Vercel AI Gateway key manager
β β βββ cf.py # Cloudflare key manager (stub)
β βββ proxy/ # Modular proxy implementation
β β βββ __init__.py # Package initialization
β β βββ common.py # Common proxy base class
β β βββ hf.py # Hugging Face proxy
β β βββ vc.py # Vercel AI Gateway proxy
β β βββ cf.py # Cloudflare proxy (stub)
β βββ routers/ # FastAPI route modules
β β βββ health.py # Health and info endpoints
β β βββ keys_hf.py # HF key management endpoints
β β βββ keys_vc.py # VC key management endpoints
β β βββ proxy_hf.py # HF proxy endpoints
β β βββ proxy_vc.py # VC proxy endpoints
β β βββ admin.py # Admin endpoints
β βββ __main__.py # CLI entry point
βββ users.json # User and role definitions (auto-created)
βββ hf-keys.json # HF API keys (auto-created if missing)
βββ vc-keys.json # VC API keys (auto-created if missing)
βββ env.example # Example environment configuration
βββ .env # Environment configuration (create from env.example)
βββ pyproject.toml # Python project configuration
βββ README.md # Documentation
βββ LICENSE # AGPL v3.0 license
βββ .gitignore # Git ignore file
βββ logs/ # Log files directory (auto-created, gitignored)
Configuration
The server automatically creates key files with default structures if they donβt exist:
HF Keys (hf-keys.json
):
{
"keys": [],
"rotation_strategy": "round_robin",
"check_interval_days": 30
}
VC Keys (vc-keys.json
):
{
"keys": [],
"rotation_strategy": "round_robin",
"check_interval_days": 7
}
Note: The hf-keys.json
, vc-keys.json
, and .env
files are excluded from version control for security reasons.
License
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0) - see the LICENSE file for details.
The AGPL-3.0 license requires that if you run a modified version of this software on a server and provide services to users over a network, you must make the source code available to those users. This ensures that improvements to the software remain open source.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Development Setup
- Fork the repository
- Clone your fork:
git clone https://github.com/yourusername/ai-inferoxy.git
- Create a feature branch:
git checkout -b feature/amazing-feature
- Install dependencies:
uv sync
- Make your changes and test them
- Commit your changes:
git commit -m 'Add some amazing feature'
- Push to the branch:
git push origin feature/amazing-feature
- Open a Pull Request
Code Style
- Follow PEP 8 for Python code style
- Add type hints where appropriate
- Include docstrings for functions and classes
- Write tests for new features
Support
If you encounter any issues or have questions, please:
- Check the Issues page for existing solutions
- Create a new issue with detailed information about your problem
- Include your environment details and any error messages