Open Source · AGPLv3

Run open-source AI on
your own hardware.

One command to go from bare hardware to a fully working local AI API and management dashboard. No cloud required. No API keys. No data leaving your network.

curl -fsSL https://warphost.io/install | bash

How it works

Three steps from bare hardware to a working AI API.

1

Install

Run the one-line installer. WarpHost detects your hardware — NVIDIA GPUs, Apple Silicon, CPU — and sets up everything automatically.

2

Detect & Recommend

WarpHost scans your hardware — GPU model, VRAM, CPU, RAM — and recommends the best models for your setup.

3

Run

Pull a model and start serving. You get an OpenAI-compatible API and a management dashboard instantly.

Everything you need to run AI locally

OpenAI-Compatible API

Drop-in replacement for OpenAI's API. Point any client at localhost:8811/v1 and it just works.

Hardware Auto-Detection

Automatically detects NVIDIA GPUs, Apple Silicon, and system specs. Recommends the best models for your hardware.

Management Dashboard

Clean web UI to monitor your system, manage models, and test with a built-in chat playground.

One-Click Model Management

Browse a curated catalog, pull models with one click, switch between them instantly.

Docker Native

Runs in Docker with NVIDIA GPU passthrough, or natively on macOS. Clean, isolated, easy to update.

100% Local & Private

No data leaves your network. No API keys. No cloud dependency. Your hardware, your models, your data.

Supported Models

19 curated models from 3B to 70B. From laptop-friendly to datacenter-grade.

Qwen3-4B

4 GB

Alibaba's latest small model. Thinking/non-thinking modes. Strong coding and multilingual.

Gemma-3-4B-IT

4 GB

Google's Gemma 3 with 128K context. Multilingual (140+ languages).

Qwen3-8B

8 GB

Best all-rounder at 8B. Thinking/non-thinking modes. Coding, math, multilingual.

Llama-3.1-8B-Instruct

8 GB

Meta's proven workhorse. Excellent tool use and 128K context.

Qwen3-14B

12 GB

Excellent step up from 8B. Strong coding, reasoning, multilingual. Apache 2.0.

DeepSeek-R1-Distill-Qwen-14B

12 GB

DeepSeek R1 reasoning in 14B. Exceptional for math, science, and logic.

Llama-3.3-70B-Instruct

48 GB

Meta's best open model. Performance rivaling Llama 3.1 405B. Top-tier quality.

DeepSeek-R1-Distill-Llama-70B

48 GB

Best reasoning model available locally. DeepSeek R1 distilled into Llama 70B.

View all 19 models →

Ready to get started?

WarpHost is free, open source, and ready to run on your hardware today.