# The Safety Architecture

> What the safety hook does, how it works, what it catches, and what it cannot catch.
> Read the code: `src/hooks/safety-check.py` (~80 lines). Run the tests: `python3 tests/test-safety-check.py`.

---

## What This Is (and What It Is Not)

The safety hook is a regex-based command filter. It intercepts Bash commands before execution and blocks commands that match known destructive patterns (like `rm -rf`, `git push --force`, `DROP TABLE`).

**It is a seatbelt, not a security boundary.** It catches the most common accidental destructive commands that LLM agents generate. It does not provide sandbox isolation, privilege separation, or protection against a determined adversary. Those are handled by other layers (OS permissions, Claude Code's sandbox, CI/CD gates).

**Why regex works for this specific problem:** LLM agents generate commands from natural language. They produce predictable, well-formed shell commands — not obfuscated or adversarial inputs. A regex filter is effective against `rm -rf /` generated by a coding assistant. It is not effective against a human attacker who encodes commands in base64.

**Why not something more sophisticated?** Parsing shell commands into an AST would be more robust, but also heavier and harder to configure. The regex approach is transparent (you can read every pattern), fast (microseconds per check), and editable by anyone who knows basic regex. The tradeoff is accepted: simpler mechanism, documented bypasses, other layers compensate.

---

## How It Works

The `safety-check.py` hook runs as a Claude Code/OpenCode `PreToolUse` hook. When the AI agent requests a shell command, the host fires a `PreToolUse` event before execution.

```
Agent requests a Bash command
  -> Host fires PreToolUse event
    -> safety-check.py receives JSON on stdin (contains the command string)
      -> Evaluates the command against patterns from safety-patterns.yaml
        -> MATCH: return {"decision": "deny"} — command never reaches the shell
        -> NO MATCH: exit 0 — command executes normally
```

Properties:

- **Synchronous.** The hook blocks until the check finishes. The command cannot race past it.
- **Stateless.** Each command is evaluated independently. No session state, no history.
- **Fail-open.** If the config file is missing or unreadable, all commands pass. This prioritizes availability — a broken config does not lock you out of your terminal.
- **Zero dependencies.** The hook includes a minimal YAML parser (~20 lines). No PyYAML, no pip install, no virtual environment needed.

---

## How Detection Works

### Regex-Based Matching

Most patterns use Python regex with word boundaries (`\b`). Word boundaries prevent false matches on substrings. Each pattern can set case sensitivity with a `flags: i` field.

Notable techniques:

- **Negative lookahead** for `git push --force`. This allows `--force-with-lease` (the safe version) while blocking `--force`.
- **Two-condition patterns** for Terraform. Both `terraform apply` AND a production keyword must appear to trigger a block.
- **Case-insensitive SQL matching.** This catches `DROP TABLE`, `drop table`, and `Drop Table`.

### Compound Flag Detection

The `rm -rf` pattern uses a dedicated function instead of a single regex. The function:

1. Splits the command at shell operators (`;`, `&&`, `|`).
2. For each `rm` call, checks separately for recursive and force flags.
3. Catches all orderings: `-rf`, `-fr`, `-r -f`, `--recursive --force`, `-Rf`, `-rfv`.

A single regex cannot handle the combinatorial explosion of flag orderings. The function can.

---

## What Cannot Be Caught

These bypasses are built into the nature of pattern-based detection. The test suite documents them. They are accepted as out of scope for this layer.

| Bypass | Why It Cannot Be Caught |
|---|---|
| Alternative tools (`find -delete`, `dd`) | There are infinite destructive commands |
| Scripting (`python3 -c`, `perl -e`) | Any language can encode any operation |
| Redirection (`cat /dev/null > file`) | Looks the same as safe redirection |
| Encoding (`base64 -d`, variable expansion) | Commands hidden inside encoded text |
| Git colon syntax (`git push origin :branch`) | Legitimate syntax for branch deletion |

Other layers handle these threats: OS permissions (non-root user), sandbox (filesystem restrictions), branch protection rules, and CI/CD gates.

---

## Scope Boundary

The safety hook intercepts **Bash tool calls only**. It fires as a `PreToolUse` hook for the Bash tool. Other tools available to the AI agent are not intercepted.

| Tool | Intercepted by safety hook? | What governs it instead |
|---|---|---|
| **Bash** | Yes | `safety-check.py` evaluates the command string |
| **Write** (create/overwrite files) | No | Host sandbox, OS file permissions |
| **Edit** (modify files in place) | No | Host sandbox, OS file permissions |
| **Read**, **Glob**, **Grep** | No | Read-only — no destructive potential |
| **WebFetch**, **WebSearch** | No | Network-only — no local side effects |
| **Agent** (subagent dispatch) | No | Subagent inherits the same hook for its own Bash calls |

This means the safety hook cannot prevent an agent from overwriting a file via the Write tool or inserting harmful content via the Edit tool. Those operations never pass through a shell — they are direct file-system writes handled by the host application.

This is by design. The hook is Layer 2 in the defense stack, not a comprehensive security boundary. File-write safety is handled by other layers:

- **Layer 3 (host sandbox):** Claude Code's bubblewrap/seatbelt sandbox restricts which paths the Write and Edit tools can reach. OpenCode's deny-glob rules serve the same function.
- **Layer 4 (OS permissions):** The process runs as a non-root user. Files outside the working tree are protected by standard Unix permissions.
- **Layer 5 (Git hooks):** A pre-commit or pre-push hook can reject changes to protected files before they leave the local machine.
- **Layer 6 (CI/CD gates):** Even if a file is modified locally, CI pipelines validate content before it reaches production.

The safety hook focuses on the highest-risk surface — shell commands — where a single `rm -rf` or `DROP TABLE` causes immediate, irreversible damage. File writes are lower-velocity and recoverable (git revert, editor undo, backup restore), making them appropriate for softer controls.

---

## Defense-in-Depth Position

Defense-in-depth means multiple layers of protection. Each layer catches what the others miss. The safety hook is one layer in a larger stack:

```
Layer 1: LLM built-in safety (the model refuses harmful requests)
Layer 2: rune safety-check hook  <-- THIS LAYER
Layer 3: Claude Code sandbox (filesystem + network isolation)
Layer 4: OS permissions (non-root user, file permissions)
Layer 5: Git hooks (GPG signature enforcement)
Layer 6: CI/CD gates (tests, linting, security scanning)
Layer 7: IAM / RBAC (cloud service accounts, least privilege)
Layer 8: Branch protection (required reviews, status checks)
```

The safety hook is Layer 2. It catches common accidental destructive commands from LLM agents — commands like `rm -rf` or `git push --force` that are generated from natural language, not adversarially crafted. A determined human can bypass regex. An AI coding assistant that accidentally generates a destructive command typically cannot, because it produces well-formed, predictable shell syntax.

---

## Configuration

All patterns live in `src/hooks/safety-patterns.yaml`. To add a pattern:

```yaml
- name: kubectl-delete
  match: '\bkubectl\s+delete\b'
  message: "Kubernetes resource deletion blocked"
  severity: block
```

To remove a pattern, delete or comment out its entry. No code changes needed.

Pattern fields:

| Field | Required | Description |
|---|---|---|
| `name` | Yes | An identifier for logs and messages |
| `match` | Yes (or `builtin`) | A Python regex matched against the command |
| `flags` | No | `i` for case-insensitive matching |
| `requires` | No | A second regex —both must match to trigger a block |
| `message` | Yes | The message shown when the command is blocked |
| `severity` | Yes | `block` (prevent execution) |

---

## Test Suite

Run `python3 tests/test-safety-check.py` to validate all patterns:

- **Positive tests.** Every pattern is exercised with representative destructive commands.
- **Negative tests.** Safe variants are confirmed unblocked (zero false positives).
- **Evasion tests.** Whitespace injection, flag splitting, and command chaining are tested.
- **Known bypasses.** Documented and not counted as failures.

The test suite is self-contained. It runs without dependencies. It exits with code 1 on any failure.

---

## Industry Context

The [OWASP Top 10 for Agentic Applications (2026)](https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/) identifies **Tool Misuse & Exploitation (ASI02)** and **Cascading Failures (ASI08)** as top risks for AI coding agents. rune's safety hooks address ASI02 directly: they block destructive tool calls before execution and require human approval for irreversible actions.

Several AI coding tools offer safety mechanisms at different layers:

| Tool | Approach | Config Format |
|---|---|---|
| Claude Code | OS-level sandbox (bubblewrap/seatbelt) + programmable PreToolUse hooks | JSON + shell scripts |
| Codex CLI | Kernel-level sandbox (Landlock/Seatbelt) + exec policy rules | TOML + Starlark |
| Cursor | Enterprise policy hooks with allow/deny/step-up verdicts | JSON |
| OpenCode | Glob-based deny rules + per-agent permission overrides | JSON + YAML frontmatter |
| Goose | Per-tool permission levels + YAML allowlists | YAML |

rune's contribution is a single declarative YAML file (`safety-patterns.yaml`) with regex-based command blocking, no scripting required. It is not the only tool with configurable safety — but it aims to make the configuration simple enough that teams adopt it without friction.

---

## Sources

- OWASP Top 10 for Agentic Applications 2026
- OWASP AI Agent Security Cheat Sheet
- OpenSSF Security-Focused Guide for AI Code Assistant Instructions
- Anthropic: Claude Code Sandboxing
- Test suite: `tests/test-safety-check.py`

---

*Block the accident. Document the bypass. Trust the layers above and below.*

See also: [CONTRIBUTING.md](../CONTRIBUTING.md) for how to add custom safety patterns. [The Knowledge Toolkit](the-knowledge-toolkit.md) covers how profile selection affects what agents can access.
