# hyperkb

> Persistent memory for AI agents. Flat markdown files, SQLite indexing, ripgrep speed. MCP-native.

## What It Is

hyperkb is a knowledge base where markdown files are the source of truth and SQLite is a rebuildable index. Entries are append-only with epoch timestamps. Files use dotted namespaces (`domain.topic.subtopic.md`) instead of folders. All knowledge operations run through 10 MCP tools; the CLI handles only admin (`hkb init`, `hkb config`, `hkb sync`).

Storage lives at `~/.hkb/` by default. Delete the SQLite index and rebuild it in seconds with `hkb_health(action="reindex")`.

## Why hyperkb

- Human-readable storage: files named `infra.postgres.md`, not hashed blobs. Read without the tool, in any editor, forever.
- Search that ranks: 9-stage scoring with recency decay, staleness penalties, type/status boosts, session anchoring.
- No database lock-in: SQLite is a rebuildable index. Delete and rebuild in seconds.
- Context-aware retrieval: token-budgeted packing, session briefings, topic anchoring.
- Real multi-machine sync: git-backed three-way merge over S3 with encrypted credentials and conflict resolution.

## Key Files

- [README.md](https://github.com/calvincs/hyperkb/blob/main/README.md) — Full documentation
- [SKILL.md](https://github.com/calvincs/hyperkb/blob/main/SKILL.md) — MCP tool decision guide (when/how to use each tool)
- [LICENSE](https://github.com/calvincs/hyperkb/blob/main/LICENSE) — Custom license (MIT-based with restrictions)
- [CONTRIBUTING.md](https://github.com/calvincs/hyperkb/blob/main/CONTRIBUTING.md) — CLA and contribution process
- [CLAUDE.md](https://github.com/calvincs/hyperkb/blob/main/CLAUDE.md) — Development instructions
- [pyproject.toml](https://github.com/calvincs/hyperkb/blob/main/pyproject.toml) — Package config and dependencies
- [.claude/skills/rem/SKILL.md](https://github.com/calvincs/hyperkb/blob/main/.claude/skills/rem/SKILL.md) — `/rem` natural language save/recall skill
- [integrations/opencode/](https://github.com/calvincs/hyperkb/tree/main/integrations/opencode) — OpenCode integration files

## Requirements

- Python 3.10+
- [ripgrep](https://github.com/BurntSushi/ripgrep#installation) (`rg`) on PATH

## Install

```bash
# macOS
brew install ripgrep
# Linux
sudo apt install ripgrep

git clone https://github.com/calvincs/hyperkb ~/src/hyperkb
python3 -m venv ~/.hkb/venv
~/.hkb/venv/bin/pip install -e "~/src/hyperkb[all]"
export PATH="$HOME/.hkb/venv/bin:$PATH"  # add to shell rc
hkb init
```

### Optional extras

```bash
pip install -e ".[all]"      # crypto + mcp + sync
pip install -e ".[crypto]"   # Fernet key encryption
pip install -e ".[mcp]"      # MCP server
pip install -e ".[sync]"     # S3 sync + watchdog
pip install -e ".[dev]"      # pytest + moto
```

## Configure MCP Server

Register in `.mcp.json` (project or `~/.claude/.mcp.json`):

```json
{
  "mcpServers": {
    "hyperkb": {
      "command": "~/.hkb/venv/bin/hkb-mcp"
    }
  }
}
```

Auto-approve tools in `.claude/settings.local.json`:

```json
{
  "permissions": {
    "allow": [
      "mcp__hyperkb__hkb_search", "mcp__hyperkb__hkb_show",
      "mcp__hyperkb__hkb_add", "mcp__hyperkb__hkb_update",
      "mcp__hyperkb__hkb_task", "mcp__hyperkb__hkb_sync",
      "mcp__hyperkb__hkb_session", "mcp__hyperkb__hkb_context",
      "mcp__hyperkb__hkb_view", "mcp__hyperkb__hkb_health"
    ]
  }
}
```

Restart the MCP client and approve the server when prompted.

## Architecture

```
.md files (source of truth)
    |
    +-- ripgrep ----------> exact/regex matches (fastest)
    |
    +-- SQLite index
         +-- FTS5 --------> BM25 keyword ranking
```

Data flow: `mcp_server.py` -> `store.py` -> `db.py` + `search.py` + `format.py`.

## File Format

Files use dotted namespace names: `domain.topic[.subtopic[.focus]].md` (2-4 segments, lowercase, hyphens within segments). YAML frontmatter describes the file; timestamped entries are delimited by `>>> epoch` / `<<<` markers.

```markdown
---
name: infra.postgres
description: PostgreSQL configuration, performance findings, connection pooling.
keywords: [postgres, connection-pool, database]
links: [infra.redis]
created: 2026-02-21T10:00:00Z
compacted: ""
---

>>> 1740130800
@type: finding
@weight: high
Connection pool exhaustion under 50+ concurrent requests.
Root cause: default pool size of 10 with no timeout on checkout.
<<<
```

## Entry Metadata

`@key: value` lines at the start of entry content are parsed into DB columns.

- `@type`: note (default), finding, decision, task, milestone, skill
- `@status`: active (default), pending, in_progress, blocked, completed, superseded, resolved, cancelled, archived
- `@weight`: high (stays prominent), normal (default), low (temporary)
- `@tags`: comma-separated labels
- `@author` / `@hostname`: auto-populated, never set manually

## 10 MCP Tools

| Tool | Purpose |
|------|---------|
| `hkb_search` | Find entries. Modes: hybrid, rg, bm25, recent, check |
| `hkb_show` | Read files, list files, link graph |
| `hkb_add` | Add entry or create file |
| `hkb_update` | Amend, archive, batch update entries |
| `hkb_task` | Task lifecycle: create, show, update, list |
| `hkb_sync` | S3 sync: push, pull, both, status, config, conflicts |
| `hkb_session` | Briefing, review, anchor (session management) |
| `hkb_context` | Token-budgeted retrieval: packed, suggest, narrative |
| `hkb_view` | Named file groupings: set, list |
| `hkb_health` | Maintenance: check, reindex, compact |

### Tool Selection Guide

**Find something:**
- Know the file -> `hkb_show(name="file.name")`
- Don't know the file -> `hkb_search(query="keywords")`
- Exact string/regex -> `hkb_search(mode="rg", query="...")`
- List all files -> `hkb_show()`
- Recent timeline -> `hkb_search(mode="recent")`
- Token-budgeted context -> `hkb_context(topic="...")`
- File suggestions -> `hkb_context(mode="suggest", topic="...")`

**Save something:**
- Know the file -> `hkb_add(content="...", to="file.name")`
- Unsure where -> `hkb_search(mode="check", query="...")` then add with `to=`
- New file needed -> `hkb_add(create_file=True, to="name", description="...", keywords=[...])`

**Update something:**
- Amend -> `hkb_update(file="f", epoch=E, new_content="...")`
- Archive -> `hkb_update(action="archive", file="f", epoch=E)`
- Supersede -> add new entry, then `hkb_update(file="f", epoch=OLD, set_status="superseded")`

**Orient yourself:**
- Session start -> `hkb_session(action="briefing")`
- Focused briefing -> `hkb_session(action="briefing", focus="topic")`
- Set search bias -> `hkb_session(action="anchor", topics="auth, security")` (1.5x boost)

### hkb_add Response Handling

| Response | Meaning | Action |
|----------|---------|--------|
| `"ok"` | Saved | Done |
| `"no_match"` | No file matched | Create file first, then retry with `to=` |
| `"low_confidence"` | Weak matches listed | Pick one and retry with `to=`, or create new file |

Never re-call `hkb_add` without `to=` — auto-routing returns the same result.

## Search Scoring Pipeline

1. Source weighting: rg (0.5) + BM25 (0.5)
2. Deduplication across sources
3. Recency boost: 80% relevance + 20% recency (180-day half-life)
4. Staleness penalty: active entries older than 360 days dampened (floor 0.7). Decisions and high-weight exempt.
5. Type boost: decision 1.1x, skill 1.08x, finding 1.05x, milestone 1.05x
6. Status boost: pending 1.08x, in_progress 1.05x, completed 0.88x, cancelled 0.65x
7. Weight boost: high 1.15x, normal 1.0x, low 0.8x
8. Anchor boost: 1.5x for files matching session anchors

## Multi-Machine Sync

S3-compatible sync with git-backed three-way merge. Works with AWS S3, MinIO, Backblaze B2.

```bash
hkb sync setup              # interactive wizard
hkb_sync(action="both")     # push + pull via MCP
hkb_sync(action="status")   # check sync state
```

Credentials encrypted at rest with machine-tied Fernet. Background sync polls every 60 seconds (configurable). Watchdog triggers sync on local file changes.

## Config Keys

| Key | Default | Purpose |
|-----|---------|---------|
| `rg_weight` | 0.5 | Ripgrep weight in hybrid search |
| `bm25_weight` | 0.5 | BM25 weight in hybrid search |
| `recency_half_life_days` | 180 | Recency decay half-life |
| `route_confidence_threshold` | 0.6 | Auto-routing minimum score |
| `max_entry_size` | 1048576 | Max entry size (bytes) |
| `rg_timeout` | 10.0 | Ripgrep timeout (seconds) |
| `sync_enabled` | false | Enable S3 sync |
| `sync_interval` | 60 | Background sync interval (seconds) |

## Admin CLI

```bash
hkb init [--path DIR]          # initialize KB
hkb config KEY [VALUE] [--set] # view/set config
hkb sync setup                 # interactive S3 setup
hkb sync status                # quick sync check
```

## License

Custom license (MIT-based with anti-exploitation protections). Free for individuals, students, hobbyists, small teams, nonprofits, universities. Restricted entities (AI companies, California-based companies, competitors, companies with recent mass layoffs, large commercial entities) must obtain written permission. AI training on this code requires permission.

Contact: Open an issue on GitHub, or email contact -at- hyperkb -dot- com.