open source · multi-provider · token-efficient

A token-efficient coding agent for your terminal, on any model.

ClawCodex is a token-efficient Python rebuild of Claude Code — 230K lines of pure Python. A genuine tool-calling loop, a streaming REPL, skills, and one runtime in front of Anthropic, OpenAI, Z.ai GLM, MiniMax, OpenRouter, and DeepSeek, where prefix-cache reuse makes long DeepSeek sessions over 200× cheaper to run.

Mission for this build Keep the real Claude Code architecture, but make it Python-native and open to every provider. Stream replies, call tools, persist sessions, and let you choose the most flexible, cost-effective model stack for agentic coding. Still alpha; shipping weekly.

Install GitHub Docs

Maintained by ClawCodex Team · v0.5.0 · 6 providers

one-line install

bash

# clone + create a venv (Python 3.10+)
$ git clone https://github.com/agentforce314/clawcodex.git
$ cd clawcodex && python3 -m venv .venv
$ source .venv/bin/activate
$ pip install -r requirements.txt
# configure a provider, then launch the REPL
$ python -m src.cli login
$ clawcodex

needs Python 3.10+ other ways →

30+ tools files, bash, web, agents, MCP

6 providers one runtime, swap any time

58.2% SWE-bench Verified on Gemini 2.5 Pro

200× cost saving token-efficient on DeepSeek

ClawCodex running in a terminal: a streaming REPL with visible tool activity and a status row. — The actual inline REPL: streaming replies, visible tool calls, and a tool-aware status row.

Section 01 · why

A model writes text. An agent leaves consequences.

ClawCodex puts a real runtime around the model: a working loop, every provider, and code you can read and extend.

real runtime

A working agent, not a source dump

A genuine tool-calling loop, a streaming REPL, session history, and multi-turn execution. Ported from the real Claude Code TypeScript architecture and shipped as a CLI you actually run.

any provider

One runtime in front of every model

Claude Code targets Claude models only. ClawCodex puts Anthropic, OpenAI, Z.ai GLM, MiniMax, OpenRouter, and DeepSeek behind the same loop — so you swap vendor, region, and price tier without giving up tools or skills.

built to hack on

Readable Python you can extend

Idiomatic Python with full type hints, real test suites, and markdown-driven SKILL.md extensibility. Fork it, add a tool or a skill, and make it yours. MIT licensed.

Section 02 · what's inside

The full surface, in your terminal.

Streaming, skills, permissions, sessions, MCP, and a scriptable headless mode — the same loop whichever model you point it at.

Token-efficient on DeepSeek

ClawCodex keeps the request prefix byte-stable so DeepSeek's prompt cache covers your whole system + tools + history span. Cache-hit input bills at about $0.0435 per 1M tokens, over 200× cheaper than Claude Fable 5, so long agentic sessions cost pennies. The longer you code, the more you save.

Streaming agent experience

True API streaming for direct replies plus richer streaming during tool-driven loops. Toggle live output with /stream and re-render clean Markdown with /render-last.

Programmable skill runtime

Markdown-based SKILL.md slash commands with named arguments and per-skill tool limits. Project skills and user skills, loaded from .clawcodex/skills.

REPL by default, TUI on demand

An inline prompt_toolkit + Rich REPL with history, tab completion, and multiline input. Launch the Textual TUI any time with clawcodex --tui or /tui.

Permission modes

Plan (read-only), acceptEdits, dontAsk, and an explicit bypass for sandboxes. The REPL, TUI, and headless -p mode all honor the same gates.

Sessions you can resume

Save and reload conversations locally. auto_save writes each session; max_history caps retained turns. Everything stays on your machine.

MCP + hooks

Model Context Protocol tools and resources, plus pre/post tool-use lifecycle hooks for shell, prompt, agent, and HTTP automation.

Scriptable headless mode

Run -p for one-shot prompts, with --output-format json or stream-json for pipes, CI, and agent-to-agent workflows.

Image handling

A TS-parity Read pipeline sniffs magic bytes and resizes to API limits. @-mention an image to inline it, with Anthropic-to-OpenAI block translation for vision-capable backends.

Section 03 · any provider

One runtime in front of every model.

Swap vendor, region, and price tier without giving up tools or skills. Six providers ship today.

DeepSeek default · prefix cache

Anthropic Claude models

OpenAI GPT + compatible

Z.ai GLM GLM-5.1 / 5.2

MiniMax MiniMax-M2.7

OpenRouter unified API

Plus any OpenAI-compatible endpoint via a custom base_url — self-hosted vLLM, SGLang, or Ollama included.

swap models

# point a single run at a different backend
$ clawcodex --provider deepseek --model deepseek-v4-pro
$ clawcodex --provider zai --model glm-5.2 -p "refactor utils.py"

Section 04 · how it works

A reviewable line from prompt to patch.

Read code, call tools behind permission gates, leave real evidence, and resume across sessions.

Prompt

You type in the REPL or pipe a prompt with -p. Plan mode is read-only by default.

Stream

The selected provider streams the reply; tool calls overlap with the stream.

Tools

Reads, edits, bash, and web run behind permission gates, leaving real evidence.

Resume

Results feed the next turn. Sessions save locally so long work survives.

under the hood · six abstractions

Query loop

The heartbeat that orchestrates model calls and tool execution.

Tool system

30+ tools: read files, run bash, search the web, drive sub-agents and MCP.

Tasks

Background workflows and agent orchestration, journaled and resumable.

Two-tier state

A bootstrap layer and an app layer, kept independent by design.

Memory

Relevance scanning over project, user, and team CLAUDE.md files.

Hooks

Lifecycle events with shell, prompt, agent, and HTTP executors.

Read ARCHITECTURE.md →

Section 05 · swe-bench verified

clawcodex beats openclaude on the same model.

Full SWE-bench Verified split (499 instances), both agents driven by Gemini 2.5 Pro under one standardized harness.

clawcodex

58.2% 291/499

openclaude

53.0% 265/499

241 Both solved 50 Only clawcodex 24 Only openclaude 184 Neither

Reproduce locally — see eval/README.md for the full workflow.

Section 06 · proof

Every demo was built by ClawCodex itself.

Same CLI you just installed, same agent loop, same tools. No hand-edits.

React 18 + Vite + Vitest

CRM app

A mini CRM with contacts, deals, a dashboard, and a full test suite.

demos/crm-app

React 18 + Vite + Router

LinkedIn-style feed

Profile, network, jobs, and messaging in a familiar feed layout.

demos/linkedin-app

React + three.js

Minecraft sandbox

A browser voxel sandbox with terrain, mining, a HUD, and player controls.

demos/minecraft-app

Static HTML/CSS/JS

World Cup 2026 intro

Animated hero, live countdown, host nations, and 16 stadiums. Built with GLM-5.2.

demos/wc26-intro

Section 07 · new in v0.5.0

Shipping weekly.

DeepSeek is now the default provider, with a prefix-cache optimization that makes long agentic coding sessions dramatically token-cheaper. Recent highlights from the repository.

cost#363

DeepSeek prefix-cache exploitation

ClawCodex keeps its request prefix byte-stable across turns, so DeepSeek's prompt cache covers the whole system + tools + history span. Cache-hit input bills at about $0.0435 per 1M tokens, over 200× cheaper than Claude Fable 5. Gated to DeepSeek; every other provider is byte-for-byte unchanged.

runtime#262–#271

Dynamic workflow engine + /deep-research

A Python workflow engine — agent(), parallel(), pipeline(), phase() — with journaling and resume, wired end-to-end with per-agent retry and worktree isolation, plus a bundled /deep-research harness.

providers#364–#367

DeepSeek is the default provider

DeepSeek-V4-Pro is wired in as the default, with its 1M-token context window registered and a per-model prompt-cache hit-rate plus cost surfaced in /cost. Interrupted streams now recover truncated tool-call argument JSON instead of dropping it.

cost#181–#193

/advisor token-efficient mode

Pair a cheap worker model with an expensive reviewer consulted only at decision points — roughly 6x cheaper than running the expensive model alone, with live token and USD cost in the status bar.

Full changelog → All activity →

join in

A small project. Your patch matters.

No CLA. No sponsor lockouts. Issues triaged in the open, releases cut from main, and the maintainer reads everything. Bring a real test and prose that tells the reviewer what you were thinking.

Star on GitHub Contribute