HF Inference Proxy
A lightweight FastAPI server that acts as a universal proxy to Hugging Face Inference API with smart API key rotation.
Features
- Universal Proxy: Transparently forwards all requests to HF Inference API endpoints
- Smart Key Rotation: Automatically rotates API keys to avoid rate limits
- Silent Key Switching: Switches keys silently on invalid/monthly-credits-exceeded errors
- Multi-Stage Quarantine System: Handles temporary 402 errors with progressive quarantine stages
- Key Management: Add, validate, and monitor API keys dynamically
- Transparent Relay: Preserves all request/response data except auth headers
- Role-Based Access Control (RBAC): Comprehensive user management and endpoint access control
- API Key Authentication: Secure authentication using Bearer tokens
- User Management: Create, update, delete, and manage users and roles
- Audit Logging: All requests logged with user context when authenticated
Quick Start
Prerequisites
- Python 3.13+
uv
package manager
Installation
- Clone the repository:
git clone https://github.com/nazdridoy/hf-inferoxy.git cd hf-inferoxy
- Install dependencies:
uv sync
- Configure (Optional):
cp env.example .env # Edit .env with your preferred settings
- Run the server:
uv run hf-inferoxy
The server will start on http://localhost:8000
and automatically create hf-keys.json
if it doesnβt exist.
Running as a Systemd Service (Optional)
To run hf-inferoxy as a background service that starts automatically:
Option 1: User Service (for local development)
- Copy the service file to your user systemd directory:
cp hf-inferoxy.service ~/.config/systemd/user/
Note: The service assumes your hf-inferoxy project is located at ~/hf-inferoxy
. If your project is in a different location, edit the service file and change the WorkingDirectory
path accordingly.
- Reload systemd and enable the service:
systemctl --user daemon-reload systemctl --user enable hf-inferoxy
- Start the service:
systemctl --user start hf-inferoxy
Useful commands:
- Check status:
systemctl --user status hf-inferoxy
- View logs:
journalctl --user -u hf-inferoxy -f
- Stop service:
systemctl --user stop hf-inferoxy
- Disable service:
systemctl --user disable hf-inferoxy
Option 2: System Service (for remote servers)
- Copy the system service file to the system directory:
sudo cp hf-inferoxy-system.service /etc/systemd/system/hf-inferoxy.service
- Edit the service file to replace
REPLACE_WITH_YOUR_USERNAME
with your actual username:sudo nano /etc/systemd/system/hf-inferoxy.service
- Reload systemd and enable the service:
sudo systemctl daemon-reload sudo systemctl enable hf-inferoxy
- Start the service:
sudo systemctl start hf-inferoxy
Useful commands:
- Check status:
sudo systemctl status hf-inferoxy
- View logs:
sudo journalctl -u hf-inferoxy -f
- Stop service:
sudo systemctl stop hf-inferoxy
- Disable service:
sudo systemctl disable hf-inferoxy
Troubleshooting: If the service fails to start, check:
- Path issues: Ensure
uv
is in your PATH or edit the service file to use/usr/bin/python3 -m hf_inferoxy
instead - Working directory: Verify the
WorkingDirectory
path in the service file matches your actual project location - Dependencies: Make sure youβve run
uv sync
to install dependencies - Logs: Check detailed logs with
journalctl -u hf-inferoxy -xeu
(system service) orjournalctl --user -xeu hf-inferoxy.service
(user service)
Configuration
Environment Variables
You can configure the server using environment variables or a .env
file.
Option 1: Environment Variables
export HF_PROXY_HOST=0.0.0.0
export HF_PROXY_PORT=8000
export HF_KEYS_FILE=./hf-keys.json
export HF_BASE_URL=https://router.huggingface.co
export HF_WHOAMI_WAIT=2
export HF_WHOAMI_RETRY=3
export HF_MIN_CHECK_FREQUENCY=3600
export LOG_LEVEL=INFO
Option 2: .env File (Recommended)
- Copy the example environment file:
cp env.example .env
- Edit
.env
with your preferred settings:# Server Configuration HF_PROXY_HOST=0.0.0.0 HF_PROXY_PORT=8000 # Keys Configuration HF_KEYS_FILE=./hf-keys.json HF_BASE_URL=https://router.huggingface.co # Rate Limiting Configuration HF_WHOAMI_WAIT=2 HF_WHOAMI_RETRY=3 HF_MIN_CHECK_FREQUENCY=3600 # Logging LOG_LEVEL=INFO
Available Environment Variables
HF_PROXY_HOST
- Host to bind to (default: 0.0.0.0)HF_PROXY_PORT
- Port to bind to (default: 8000)HF_KEYS_FILE
- Path to API keys JSON file (default: ./hf-keys.json)HF_BASE_URL
- HF Inference base URL (default: https://router.huggingface.co)HF_WHOAMI_WAIT
- Wait time in seconds after rate limit (default: 2)HF_WHOAMI_RETRY
- Number of retries for whoami endpoint (default: 3)HF_MIN_CHECK_FREQUENCY
- Minimum time between validity checks in seconds (default: 3600)LOG_LEVEL
- Logging level (default: INFO)HF_LOGS_DIR
- Directory for log files (default: ./logs)LOG_ROTATION_WHEN
- Log rotation frequency: M=minute, H=hourly, D=daily, M=monthly (default: D)LOG_ROTATION_INTERVAL
- Interval for rotation (default: 1)LOG_BACKUP_COUNT
- Number of backup files to keep (default: 7)ENABLE_RELOAD
- Enable file watching and auto-reload for development (default: false)
Logging Configuration
The server provides comprehensive logging with configurable rotation:
Log Rotation Options:
- Daily Rotation (default):
LOG_ROTATION_WHEN=D
- Creates one log file per day - Hourly Rotation:
LOG_ROTATION_WHEN=H
- Creates one log file per hour - Minute Rotation:
LOG_ROTATION_WHEN=M
- Creates one log file per minute (for testing) - Monthly Rotation:
LOG_ROTATION_WHEN=M
withLOG_ROTATION_INTERVAL>1
- Creates one log file per month
Log Retention:
- Daily: Keep 7 days of logs (default)
- Hourly: Keep 168 hours (1 week) of logs
- Minute: Keep 10 minutes of logs (for testing)
Log File Naming:
- Daily:
hf-inferoxy.log.2025-08-19
,hf-inferoxy.log.2025-08-20
- Hourly:
hf-inferoxy.log.2025-08-19_14
,hf-inferoxy.log.2025-08-19_15
- Minute:
hf-inferoxy.log.2025-08-19_14-30
,hf-inferoxy.log.2025-08-19_14-31
Development Mode:
- Set
ENABLE_RELOAD=true
for file watching and auto-reload - Note: When
ENABLE_RELOAD=true
, consider settingHF_LOGS_DIR
to a directory outside the project to avoid file watching spam
API Keys File Structure
{
"keys": [
{
"key": "hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"valid": true,
"last_validity_check": "2025-01-15T10:30:00+00:00",
"user_info": {
"name": "username",
"email": "user@example.com",
"isPro": false,
"canPay": false
},
"quarantine_stage": "none",
"quarantine_start_date": null
}
],
"rotation_strategy": "round_robin",
"check_interval_days": 30
}
API Endpoints
Health & Info
GET /
- Server informationGET /health
- Health checkGET /docs
- Interactive API documentation
Key Management
GET /check-validity/{key}
- Check if a specific API key is valid and update its status in configuration if foundPOST /add-key/{key}
- Add new API key with automatic validation (path parameter)POST /add-key
- Add multiple API keys with automatic validation (JSON body)GET /keys/status
- Get status of all API keysPOST /keys/validate
- Validate keys that need validation (respects minimum check frequency)POST /keys/refresh
- Force refresh all keys validity and user info (ignores minimum check frequency, removes duplicates)POST /keys/cleanup
- Remove duplicate keys and invalid keys from configurationPOST /keys/reload
- Force reload keys from file
Client API
GET /keys/provision
- Provision a valid API key for client usagePOST /keys/report
- Report token usage status (success/error) from clients (optionalclient_name
to track end user; defaults to username)
Quarantine Management
GET /keys/quarantine
- Get quarantine status for all keysPOST /keys/quarantine/clear/{key}
- Manually clear quarantine for a specific key
Universal Proxy
ANY /{path:path}
- Proxy to HF Inference API (role-based access)
Admin Endpoints (Admin Role Required)
User Management
GET /admin/users
- List all usersPOST /admin/users
- Create new userPUT /admin/users/{username}
- Update userDELETE /admin/users/{username}
- Delete userPOST /admin/users/{username}/generate-key
- Regenerate API key
Role Management
GET /admin/roles
- List all rolesPOST /admin/roles
- Create new rolePUT /admin/roles/{role_name}
- Update roleDELETE /admin/roles/{role_name}
- Delete role
Authentication & RBAC
The HF Inference Proxy now includes a comprehensive Role-Based Access Control (RBAC) system. Most endpoints require authentication via API keys in the Authorization
header.
π For detailed RBAC documentation, Pydantic models, testing examples, and troubleshooting, see RBAC_README.md.
Default Admin User
The system creates a default admin user on first run. Youβll need this to access admin endpoints:
{
"username": "admin",
"api_key": "ndr_YWRtaW4_abc123def456ghi789jkl012mno345pqr678stu901vwx234yz5678901234567890"
}
β οΈ IMPORTANT: Use this API key to access admin endpoints and create additional users. The actual API key will be different in your installation.
Authentication Header
All protected endpoints require the Authorization
header:
curl -H "Authorization: Bearer YOUR_API_KEY" http://localhost:8000/admin/users
Usage Examples
Quick Reference
Category | Examples | Auth Required |
---|---|---|
Public | Health, Docs, Server Info | β No |
Key Management | Add, Validate, Refresh, Cleanup | β Manager/Admin |
Client API | Provision, Report | β User Role |
Universal Proxy | HF Inference API | β User Role+ |
Admin | Users, Roles Management | β Admin Only |
Public Endpoints (No Authentication Required)
1. Check Server Health
curl http://localhost:8000/health
2. View API Documentation
curl http://localhost:8000/docs
3. Get Server Information
curl http://localhost:8000/
Key Management (Manager/Admin Role Required)
4. Add Single API Key
curl -X POST "http://localhost:8000/add-key/hf_your_api_key_here" \
-H "Authorization: Bearer $ADMIN_API_KEY"
5. Add Multiple API Keys
curl -X POST "http://localhost:8000/add-key" \
-H "Authorization: Bearer $ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d '{"keys": ["hf_key_1", "hf_key_2", "hf_key_3"]}'
Bulk Response Format:
{
"success": true,
"message": "Processed 3 keys: 2 successful, 1 failed",
"results": [
{
"key": "hf_key_1...",
"success": true,
"message": "Key added successfully"
},
{
"key": "hf_key_2...",
"success": false,
"message": "Key already exists"
}
],
"summary": {
"total": 3,
"successful": 2,
"failed": 1
}
}
6. Check Key Validity
curl -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/check-validity/hf_your_api_key_here"
7. Get All Keys Status
curl -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/keys/status"
8. Validate Keys (Respects Rate Limits)
curl -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/keys/validate"
9. Force Refresh All Keys
curl -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/keys/refresh"
10. Clean Up Keys (Remove Duplicates and Invalid Keys)
curl -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/keys/cleanup"
11. Reload Keys from File
curl -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/keys/reload"
12. Check Quarantine Status
curl -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/keys/quarantine"
13. Clear Quarantine for a Key
curl -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/keys/quarantine/clear/hf_your_api_key_here"
Client API (User Role Required)
14. Get API Key for Client Usage
curl -H "Authorization: Bearer $USER_API_KEY" \
"http://localhost:8000/keys/provision"
Response Example:
{
"token": "hf_your_api_key_here",
"token_id": "hf_your_ap",
"expires_in": 3600
}
15. Report Token Usage Status
# Report successful usage (optionally include client_name/end user identifier):
curl -X POST -H "Authorization: Bearer $USER_API_KEY" \
-H "Content-Type: application/json" \
-d '{"token_id": "hf_your_ap", "status": "success", "client_name": "acme_customer_42"}' \
"http://localhost:8000/keys/report"
# Report error:
curl -X POST -H "Authorization: Bearer $USER_API_KEY" \
-H "Content-Type: application/json" \
-d '{"token_id": "hf_your_ap", "status": "error", "error": "402 Client Error: Payment Required", "client_name": "acme_customer_42"}' \
"http://localhost:8000/keys/report"
Universal Proxy (Role-Based Access)
16. Use as Proxy for HF Inference API
# Instead of calling HF directly:
curl -X POST "https://router.huggingface.co/v1/chat/completions" \
-H "Authorization: Bearer YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Hello"}], "model": "zai-org/GLM-4.5:novita"}'
# Call through the proxy (requires user role or higher):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Authorization: Bearer $USER_API_KEY" \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Hello"}], "model": "zai-org/GLM-4.5:novita"}'
Bulk Key Operations
17. Add Multiple Keys from File
If you have your keys in a text file (one key per line), you can easily add them all:
File format (keys.txt):
hf_key_1
hf_key_2
hf_key_3
Using jq (recommended):
curl -X POST "http://localhost:8000/add-key" \
-H "Authorization: Bearer $ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d "{\"keys\": $(cat keys.txt | jq -R -s -c 'split("\n") | map(select(length > 0))')}"
Alternative method (without jq):
keys=$(cat keys.txt | tr '\n' ',' | sed 's/,$//' | sed 's/^/"/;s/,/","/g;s/$/"/')
curl -X POST "http://localhost:8000/add-key" \
-H "Authorization: Bearer $ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d "{\"keys\": [$keys]}"
Security Note: Remember to add keys.txt
to your .gitignore
to prevent accidentally committing your API keys to version control.
Admin Endpoints Examples
18. User Management (Admin Role Required)
List All Users
curl -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/admin/users"
Create New User
curl -X POST "http://localhost:8000/admin/users" \
-H "Authorization: Bearer $ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"username": "newuser",
"email": "user@example.com",
"full_name": "New User",
"role": "user"
}'
Response Example:
{
"success": true,
"message": "User newuser created successfully",
"username": "newuser",
"api_key": "generated_api_key_here..."
}
Update User
curl -X PUT "http://localhost:8000/admin/users/newuser" \
-H "Authorization: Bearer $ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"email": "updated@example.com",
"full_name": "Updated User Name"
}'
Delete User
curl -X DELETE "http://localhost:8000/admin/users/newuser" \
-H "Authorization: Bearer $ADMIN_API_KEY"
Regenerate API Key
curl -X POST "http://localhost:8000/admin/users/newuser/generate-key" \
-H "Authorization: Bearer $ADMIN_API_KEY"
19. Role Management (Admin Role Required)
List All Roles
curl -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/admin/roles"
Create New Role
curl -X POST "http://localhost:8000/admin/roles" \
-H "Authorization: Bearer $ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "customrole",
"description": "Custom role for specific access",
"endpoints": ["/health", "/docs", "/custom/*"],
"is_default": false
}'
Update Role
curl -X PUT "http://localhost:8000/admin/roles/customrole" \
-H "Authorization: Bearer $ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"description": "Updated role description",
"endpoints": ["/health", "/docs", "/custom/*", "/new/*"]
}'
Delete Role
curl -X DELETE "http://localhost:8000/admin/roles/customrole" \
-H "Authorization: Bearer $ADMIN_API_KEY"
Roles and Permissions
The RBAC system defines four user roles with different access levels:
Admin Role
- Access: All endpoints (
*
) - Description: Full system access
- Use Case: System administrators
Manager Role
- Access: Key management + Universal proxy
- Endpoints:
/check-validity/*
/add-key/*
/keys/*
/{path:path}
(Universal proxy)
- Use Case: Key managers and system operators
User Role
- Access: Standard user access
- Endpoints:
/health
,/docs
,/
/{path:path}
(Universal proxy)/keys/provision
,/keys/report
- Use Case: Regular users accessing HF Inference API
Guest Role (Default)
- Access: Public endpoints only
- Endpoints:
/health
,/docs
,/
- Use Case: Basic access with authentication tracking
Security Features
- API Key Generation: 64+ character cryptographically secure random strings
- Role-Based Access: Endpoint access controlled by user roles
- Universal Proxy Protection: Universal proxy endpoint respects role permissions
- Audit Logging: All requests logged with user context
- No Password Storage: Only API keys, no password management
- Input Validation: Comprehensive validation of all request data using Pydantic models
- JSON-Based API: Modern REST API design with proper content-type validation
How It Works
- Server Startup: Loads keys from file without validation (to avoid rate limits)
- Request Reception: Server receives requests to any HF Inference endpoint
- Key Selection: Smart algorithm selects a valid, non-exceeded API key
- Request Forwarding: Forwards request to HF with selected key
- Response Handling:
- If successful: Returns response as-is
- If key error: Silently switches to next valid key and retries
- If 402 error: Handles with quarantine system and switches to next key
- If no valid keys: Passes error transparently to user
- Transparent Relay: All responses (success, error, streaming) are relayed unchanged
- Rate Limiting: Respects HF rate limits with configurable wait times and retries
- Quarantine System: Handles temporary 402 errors with progressive quarantine stages
Error Handling
The proxy detects specific error responses and handles them automatically:
- Invalid Key (401 Unauthorized):
- Detects both HTTP 401 status codes and JSON error responses
- Automatically marks the key as invalid in the configuration
- Switches to the next valid key and retries
- This handles cases where keys have been revoked or become invalid
- 402 Payment Required:
{"error":"You have exceeded your monthly included credits for Inference Providers. Subscribe to PRO to get 20x more monthly included credits."}
- Handles with quarantine system and switches to the next key
When these errors occur, the proxy handles them with the quarantine system and switches to the next valid key.
Rate Limiting
The server handles rate limiting gracefully:
- IP Rate Limiting: When your IP is rate limited (429), the system stops processing all keys
- Retry Logic: Configurable retry attempts (
HF_WHOAMI_RETRY
, default: 3) - Wait Time: Configurable wait time between retries (
HF_WHOAMI_WAIT
, default: 2s) - Graceful Shutdown: When rate limited after max retries, stops processing all keys
- Resume Capability: Uses
last_validity_check
timestamps to resume validation later
Note: Keys themselves donβt have rate limits - only your IP does. When rate limited, all keys are affected equally.
Quarantine System
The proxy implements a multi-stage quarantine system to handle HFβs temporary 402 errors gracefully. This addresses the issue where HF sometimes returns false 402 errors that resolve themselves after a few minutes to hours.
Quarantine Stages
- Stage 1 (Soft Disable): 30 minutes
- Triggered on first 402 error
- Key is temporarily unavailable for 30 minutes
- Stage 2 (Hour Block): 1 hour
- Triggered if Stage 1 expires and key still returns 402
- Key is unavailable for 1 hour
- Stage 3 (Day Block): 1 day
- Triggered if Stage 2 expires and key still returns 402
- Key is unavailable for 1 day
- Stage 4 (Week Block): 1 week
- Triggered if Stage 3 expires and key still returns 402
- Key is unavailable for 1 week
- Stage 5 (Extended): 30 days
- Triggered if Stage 4 expires and key still returns 402
- Key is quarantined for 30 days, then becomes available again
Quarantine Logic
- Progressive Advancement: Keys advance through stages when quarantine expires and a new 402 error occurs
- Time-Based Stages: Each stage has a fixed duration, and advancement happens when that duration expires
- Stage 5 Reset: After Stage 5 expires, if a 402 error occurs, the quarantine restarts from Stage 1 (prevents long waiting periods)
- Automatic Clearing: When a key successfully processes a request, its quarantine is automatically cleared
- Manual Clearing: Admins can manually clear quarantine using the API endpoint
Quarantine Status Response
{
"quarantine": [
{
"key": "hf_key_123...",
"stage": "stage_1",
"start_date": "2025-01-15T10:30:00+00:00",
"is_active": true,
"remaining_seconds": 1200
}
]
}
Benefits
- Prevents False Positives: Keys arenβt permanently disabled on temporary 402 errors
- Progressive Response: Longer quarantine periods for persistently problematic keys, with automatic recovery
- Automatic Recovery: Keys automatically return to service when they work again
- Transparent Operation: No impact on normal request processing
Development
Running in Development Mode
uv run hf-inferoxy
Viewing Logs
The server provides comprehensive logging with both console and file output:
Console Logging:
- Real-time logs displayed in the terminal
- Perfect for development and debugging
File Logging:
- Automatic log rotation based on configuration
- Logs stored in
./logs/
directory (configurable viaHF_LOGS_DIR
) - Daily rotation by default (configurable to hourly, minute, or monthly)
Log Content Includes:
- Incoming requests
- Key selection and rotation
- Proxy forwarding details
- Response status codes
- Error responses and retry attempts
- Rate limiting events
- Monthly credits exceeded events
- Log rotation events
Project Structure
hf-inferoxy/
βββ hf_inferoxy/ # Main package directory
β βββ __init__.py # Package initialization
β βββ server.py # FastAPI application entry point
β βββ proxy.py # Core proxy logic
β βββ key_manager.py # API key rotation and management
β βββ user_manager.py # User and role management (RBAC)
β βββ rbac_manager.py # Role-based access control logic
β βββ auth_middleware.py # Authentication and authorization middleware
β βββ __main__.py # CLI entry point
βββ users.json # User and role definitions (auto-created)
βββ hf-keys.json # JSON file with API keys (auto-created if missing)
βββ env.example # Example environment configuration
βββ .env # Environment configuration (create from env.example)
βββ pyproject.toml # Python project configuration
βββ README.md # Documentation
βββ RBAC_README.md # Detailed RBAC documentation
βββ LICENSE # AGPL v3.0 license
βββ .gitignore # Git ignore file
βββ logs/ # Log files directory (auto-created, gitignored)
Configuration
The server automatically creates hf-keys.json
with a default structure if it doesnβt exist:
{
"keys": [],
"rotation_strategy": "round_robin",
"check_interval_days": 30
}
Note: The hf-keys.json
and .env
files are excluded from version control for security reasons.
License
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0) - see the LICENSE file for details.
The AGPL-3.0 license requires that if you run a modified version of this software on a server and provide services to users over a network, you must make the source code available to those users. This ensures that improvements to the software remain open source.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Development Setup
- Fork the repository
- Clone your fork:
git clone https://github.com/yourusername/hf-inferoxy.git
- Create a feature branch:
git checkout -b feature/amazing-feature
- Install dependencies:
uv sync
- Make your changes and test them
- Commit your changes:
git commit -m 'Add some amazing feature'
- Push to the branch:
git push origin feature/amazing-feature
- Open a Pull Request
Code Style
- Follow PEP 8 for Python code style
- Add type hints where appropriate
- Include docstrings for functions and classes
- Write tests for new features
Support
If you encounter any issues or have questions, please:
- Check the Issues page for existing solutions
- Create a new issue with detailed information about your problem
- Include your environment details and any error messages