One command to go from bare hardware to a fully working local AI API and management dashboard. No cloud required. No API keys. No data leaving your network.
Three steps from bare hardware to a working AI API.
Run the one-line installer. WarpHost detects your hardware — NVIDIA GPUs, Apple Silicon, CPU — and sets up everything automatically.
WarpHost scans your hardware — GPU model, VRAM, CPU, RAM — and recommends the best models for your setup.
Pull a model and start serving. You get an OpenAI-compatible API and a management dashboard instantly.
Drop-in replacement for OpenAI's API. Point any client at localhost:8811/v1 and it just works.
Automatically detects NVIDIA GPUs, Apple Silicon, and system specs. Recommends the best models for your hardware.
Clean web UI to monitor your system, manage models, and test with a built-in chat playground.
Browse a curated catalog, pull models with one click, switch between them instantly.
Runs in Docker with NVIDIA GPU passthrough, or natively on macOS. Clean, isolated, easy to update.
No data leaves your network. No API keys. No cloud dependency. Your hardware, your models, your data.
19 curated models from 3B to 70B. From laptop-friendly to datacenter-grade.
Alibaba's latest small model. Thinking/non-thinking modes. Strong coding and multilingual.
Google's Gemma 3 with 128K context. Multilingual (140+ languages).
Best all-rounder at 8B. Thinking/non-thinking modes. Coding, math, multilingual.
Meta's proven workhorse. Excellent tool use and 128K context.
Excellent step up from 8B. Strong coding, reasoning, multilingual. Apache 2.0.
DeepSeek R1 reasoning in 14B. Exceptional for math, science, and logic.
Meta's best open model. Performance rivaling Llama 3.1 405B. Top-tier quality.
Best reasoning model available locally. DeepSeek R1 distilled into Llama 70B.
WarpHost is free, open source, and ready to run on your hardware today.