Changelog¶
[v1.38.0] - 2026-07-04¶
Security & correctness audit pass, plus an explicit audit-log recovery command.
Added¶
broker-ctl audit repair— recover a signer whose audit log had its final record torn by a crash. A truncated, unparseable trailing line makes the signer refuse to boot (by design: on a tamper-evident, hash-chained log a truncated tail is indistinguishable from a truncation attack). The command is the explicit operator recovery path: dry-run by default;--applyquarantines the corrupt tail to<log>.corrupt-<timestamp>and truncates the log to the last well-formed record so the signer can boot, preserving the hash chain;--keyverifies the kept prefix's signatures first. It refuses mid-file corruption (not the startup-brick case). The signer stays fail-closed — recovery is never automatic. Runbook indocs/OPERATIONS.md.
Changed¶
release.ymlnow publishesserver.jsonto the MCP Registry automatically after every tagged release: amcp-registryjob (authenticated with GitHub Actions OIDC, no PAT) that runs once thereleasejob has pushed the ghcr image, so the registry can validate OCI ownership. Manualmcp-publisher publishis no longer part of the release runbook.
Fixed¶
- k8s MCP input validation:
k8s_listlabel/field selectors andk8s_logscontainerare now length- and null-byte-validated like every other tool field (previously they reached the API-server query string without the input gate). No injection was possible — query values are percent-encoded — but the stdio frontend had no length bound; this closes the unbounded / null-byte gap. - Policy recommender double-count: a single human approval, written to the
audit log twice (the approval-decision and the consumption), was counted as
support 2 in
broker-ctl policy recommend, halving the effective--min-countfor approved commands. Occurrences are now deduplicated by approval id. - Global
max_ttl_secondsload check: a global cap above the 900s certificate limit is now rejected at load (mirroring the per-host cap), instead of failing every issuance at request time for a host with no per-host cap.
Internal¶
- Removed dead test-only session accessors and an orphaned elevation-label helper (coverage ported to the production ownership-gated paths).
.gitignorenow excludescontrol-plane.json, so a real, secret-bearing control-plane config cannot be accidentally committed.
[v1.37.0] - 2026-07-04¶
Distribution release: prebuilt binaries, an official OCI image and a containerized demo. Installing no longer requires a Go toolchain.
Added¶
- Prebuilt release binaries (goreleaser): one archive per platform
(
linux/darwin×amd64/arm64) with all six binaries and the example configs, pluschecksums.txt. The installer tarball for the systemd path (infrabroker-v<ver>.tar.gz, consumed bydeploy/install.sh) is unchanged and still attached to every release. - Official OCI image
ghcr.io/luisgf/infrabroker(multi-arch linux/amd64+arm64, distroless/static, nonroot): the six binaries withmcp-brokeras entrypoint. Carries theio.modelcontextprotocol.server.namelabel the MCP Registry uses to validate package ownership (server.jsonbumped to referenceghcr.io/luisgf/infrabroker:1.37.0). - Compose demo (
examples/compose/,make demo): signer + toy sshd target + broker with auto-provisioned PKI — run a policied command through the full remote-signing topology in 5 minutes, thendown -vand it is gone. Works with docker compose and podman compose (rootless). - docs/CONTAINERS.md: image usage (stdio MCP in a container, other
binaries via
--entrypoint), the demo walkthrough, podman notes, and the explicit demo≠production and k8s-target≠k8s-runtime boundaries. - README Install section: release binaries,
go install, container and one-clickclaude mcp addlines.
Changed¶
release.yml: goreleaser now owns the GitHub release (archives, checksums, image push to ghcr and multi-arch manifests);make diststill builds the installer tarball, attached viarelease.extra_files. Workflow gainspackages: write.
[v1.36.0] - 2026-07-04¶
The project is renamed ssh-broker → infrabroker: the broker outgrew SSH (Kubernetes tools shipped in v1.2x, databases are under study) and the old name hid half the surface. No functional changes.
Changed¶
- Rename to
infrabrokereverywhere it is not history or an on-disk format: Go module path (github.com/luisgf/infrabroker), Makefile/dist artifact (infrabroker-<version>.tar.gz), CI workflows, docs site (luisgf.github.io/infrabroker), systemd units (infrabroker-*.service), system users/group (infrabroker-<svc>,infrabroker), and install paths (/etc/infrabroker,/var/lib/infrabroker). The GitHub repository is renamed; old URLs andgo getpaths redirect. - Recording header extension renamed
ssh_broker→infrabroker. Recordings written by older versions keep the old key — ASCIIcast players ignore unknown header fields either way, butjqreview of old.castfiles must still queryssh_broker. - Suggested host-side CA filename in OPERATIONS.md is now
/etc/ssh/infrabroker_ca.pub; existing hosts keep whateverTrustedUserCAKeyspath they already use. - Not renamed: binary names (
mcp-broker,broker,broker-ctl,signer,control-plane) — they never carried the project name — and historical changelog entries, which describe what those releases actually shipped. - Existing deployments must migrate manually — the installer does not
rename users/paths/units on top of a pre-rename install. See "Upgrading from
ssh-broker (pre-rename, ≤ v1.35)" in
deploy/README.md.
Added¶
- MCP registry manifest (
server.json). Repo repositioning shipped alongside: GitHub topics, README badges and the new tagline ("Infrastructure access broker for AI agents — SSH & Kubernetes").
[v1.35.0] - 2026-07-04¶
Deployment privilege separation, a Kubernetes authorization fix, and a hardened
docs anti-drift gate. Each service now runs as its own system user, a
Kubernetes deny rule now actually blocks (it was silently ignored), and the
reference-doc drift gate is enforced on a protected main.
Security¶
- Kubernetes
denyrules are enforced again.deny-effect ActionPolicy rules were compiled into an allowlist policy's deny slice, which the evaluator only consults for denylist-mode members — so adenyoverlapping a broadallow(e.g.allow get *+deny get secrets) was silently ignored and the allow won. Deny rules are now compiled into a dedicated denylist member so "deny wins" holds, andCommandPolicy.Validaterejects a deny pattern on an allowlist policy (or an allow on a denylist) at config load so the class cannot recur. Only affects clusters whose rules carve adenyout of a broaderallow; default-deny clusters were never exposed. - One system user per service (
ssh-broker-signer,ssh-broker-control-plane,ssh-broker-mcp-http) in the systemd units and the installer, so a compromised broker frontend can no longer read the signer's CA key, policy, grant state, audit seed, or mTLS key — nor impersonate another service or the admin CLI. The sharedssh-brokergroup remains only for traversing/etc/ssh-brokerand reading the shared mTLS CA certificate. The legacy singlessh-brokeruser is no longer created or used. - Per-service PKI subdirectories: each private key lives in
/etc/ssh-broker/pki/<svc>/(0750 root:ssh-broker-<svc>), readable by that service alone; only the shared CA cert stays at thepki/root. Admin CLI material moves topki/admin/(root-only) so no service can impersonate the admin (broker-ctl.example.jsonupdated accordingly). - Per-service config groups:
/etc/ssh-broker/{control-plane,config}.jsonare readable only by their own service (they can carry secrets — OIDC client, webhook tokens).
Fixed¶
- Secret redaction no longer masks the bare shell
PWD=working-directory variable (MYSQL_PWD=/DB_PWD=still masked); it was degrading the forensic value of every recording andenvdump. deploy/install.sh's stray-key migration warning is NUL-safe (key filenames with spaces stay intact) and no longer swallows afind/grepfailure as a silently-absent security warning.tools/docgenprunes orphaned reference pages, so a removed or renamed generator surfaces as a drift-gate failure instead of shipping a stale page.
Changed¶
deploy/install.shcreates the per-service users/groups, converges ownership of state directories and configs on upgrade (idempotent), and warns about private keys still flat underpki/that must be moved into the per-service subdirectories. Migration steps from the ≤ v1.34 single-user layout are indeploy/README.md§Upgrades.- Docs anti-drift gate hardened and enforced. The gate now uses
git status --porcelain(catching untracked and deleted generated pages, not just tracked edits);mainis a protected branch withbuild,govulncheckandcheckas required status checks; and a newmake verifytarget runs the full local pre-push gate. THREAT_MODEL.mddocuments the colocated-host process-isolation posture,OPERATIONS.mddocuments the per-service deployment layout and the least-privilege k8s minter-token permissions, and the deploy skill preflight checks the per-service key placement.
[v1.34.0] - 2026-07-03¶
Kubernetes target (credential-broker): the signer can now broker access to Kubernetes clusters, reusing the whole control plane (identity, RBAC, approval, grants, signed audit) with a structured action grammar instead of the shell one. The agent never holds a cluster credential — the signer mints a short-lived bound ServiceAccount token per authorised action and the cluster's native RBAC enforces it.
Added¶
- Six curated MCP tools, registered only when the broker sees a cluster
(an SSH-only deployment does not offer them):
k8s_list_clusters,k8s_get,k8s_list(label/field selectors, limit),k8s_logs(container, tail, since) — read-only — plusk8s_apply(server-side apply) andk8s_delete, both policy- and approval-gated. No pod-exec, port-forward, watch, or sessions in this phase. - Per-cluster ActionPolicy, default-deny. Structured rules
(
{verbs, resources, namespaces, names, effect}; effectallow|deny|require_approval) compile at load into the samePolicySetmachinery ascommand_policy, over a canonical action string<verb> <resource[.group]> <namespace>/<name>built from charset-validated fields (never parsed → injection-free). So deny-wins composition, runtime grants, approve-and-learn waivers, andpolicy recommendall apply to k8s actions unchanged. The broker sends the structured fields and the canonical string; the signer recomputes it and rejects a mismatch, so the approver and the audit log see exactly what runs. - Bound ServiceAccount tokens. The signer holds one minimal-privilege
minter credential per cluster (
token_file; RBAC =createonserviceaccounts/tokenfor the bound SAs) and calls the TokenRequest API to mint a bound token (TTL 600–900s) for the SA selected by the end user's groups (sa_bindings). The token travels back over the existing mTLS channel like an SSH certificate. - Dependency-free k8s client (
internal/k8s): the five verbs plus TokenRequest as plain REST overnet/http(no client-go), pinned to the cluster CA, with a curated core resource table plus per-clusterextra_resources(no API discovery). - New config
kubernetes.clusters.<name>insigner.json(parallel tohosts; cluster names must be disjoint from host names — grants and audit are indexed by that shared name). New signer endpointGET /v1/clusters(caller-scoped connectivity, forwarded by the control plane) andbroker-ctl cluster list --remote. audit.Entrygainstarget_typeandbody_sha256: ak8s_applymanifest is never logged verbatim (it can carry a Secret) — only its sha256, mirroring file transfers.- New e2e lab
lab/run_k8s_lab.sh(mock API server, no cluster needed): mints a bound token for an allowed action, gates a delete behind approval and issues the token after approval, and enforces default-deny and deny-wins.
Security¶
- The Kubernetes credential-broker's structural trade is documented as
threat-model gap #10: a bound token grants the ServiceAccount's whole RBAC
for its TTL, not a single call (no
force-commandequivalent). Scope agent ServiceAccounts to least privilege; the minter credential is deliberately minimal (only token-minting for the bound SAs).
[v1.33.0] - 2026-07-02¶
Dynamic state persists across restarts: an opt-in SQLite state_db (pure-Go
driver, no CGO, no system dependency) backs the signer's runtime grants and
the control plane's approval registry. Closes the deploy caveat "restarting
the control plane clears pending approvals".
Added¶
- New
internal/statedbpackage: opener +user_versionmigration runner (WAL,busy_timeout, single connection). A database written by a newer binary is refused; ifstate_dbis set and cannot be opened or migrated, the service refuses to start (fail-closed).statedb_errors_totalcounts best-effort write failures (in-memory state diverged from disk until the next restart) — alert on any increase. - Signer grants/waivers persist (
state_dbinsigner.json). Write-through with the in-memory map still the only state consulted on the decision path (zero I/O on/v1/sign):Addis insert-first (a grant that cannot be persisted fails the API call), expiry/supersede sweeps are best-effort (an expired row is filtered out on load), and revocation is deliberately hard — the row is deleted before the in-memory grant, so a revoked grant can never resurrect on restart after the operator saw a success. Live rows are reloaded at startup with their waiver patterns recompiled; approve-and-learn waivers keep their caller/end-user/elevation binding across restarts. - Approval registry persists (
state_dbincontrol-plane.json), including the original wire request (public material only — the broker's ephemeral public key), so a pending or approved-but-uncollected request survives a restart and the polling broker still collects its certificate.Createis insert-first;Decideand consume transitions are written through; terminal entries inside the purge window are restored too, so a poller seesdenied/approvedinstead of a 404. Theissuingflag is an intra-process concurrency gate and is intentionally not persisted: after a restart an approved-but-unconsumed request is consumable again — exactly once. Behaviour baselines stay in-memory by design (they re-learn). - New e2e lab
lab/run_state_lab.sh(no sshd needed): grant survives a signer restart and its revocation is durable; a pending approval survives a control-plane restart, is approved afterwards, the poller gets the certificate, and the consumed approval stays consumed across another restart.
Documentation¶
- THREAT_MODEL gap #5 updated: restart-survival vs multi-instance, and the
consume crash window (a crash between certificate issuance and the
consumedwrite re-exposes the approval once, bounded by the approval and certificate TTLs). OPERATIONS "what survives a restart"; deploy checklist and example configs gainstate_db(with the WAL-wal/-shmbackup note).
Internal¶
GrantStore.Revokenow returns(bool, error); the grant-revoke API answers 500 and keeps the grant when the durable delete fails.- New dependency
modernc.org/sqlite, confined tointernal/statedb— the driver links only intosignerandcontrol-plane, not the broker frontends.
[v1.32.0] - 2026-07-02¶
Secret redaction (threat-model gap #8): an opt-in redact config block on the
three services masks secrets embedded in commands at every persistent or
outbound sink, replacing them with [REDACTED:<rule>].
Added¶
- New
internal/redactpackage: named RE2 rules, built-in defaults (password/token flags, the attachedmysql -p<pass>form,VAR=secretassignments with_-delimited keyword matching, URIuser:pass@,Authorizationheaders, JWTs, AWS/GitHub/GitLab/Slack tokens, private-key blocks) plus operator-definedpatterns(a(?P<secret>...)group masks only the secret and keeps the rest of the match as forensic context;disable_defaultskeeps only the operator rules). An invalid pattern is a startup error (fail-closed), and overlapping rules never re-mask another rule's marker. - Redaction choke-points, so no call site can be missed:
audit.Log: the free-text fields (command,err,warning,anomaly) are masked before the entry is signed — the Ed25519 signature and hash chain cover the redacted content,broker-ctl audit verifyis unaffected, and the original text is never persisted (irrecoverable by design).recording.Recorder: every ASCIIcast event is masked. Input events carry one full command line per event (reliable); output arrives in arbitrary chunks, so a split secret can escape a pattern (documented best-effort).- Control-plane notifier: the approval notification payload
(log/webhook/Teams) is masked. The approval registry keeps the original
command — the mTLS approval UI (
/ui/approvals) andGET /v1/approvalsshow the approver exactly what will run, and the approved request forwarded to the signer is untouched. redactconfig key in the broker (config.json), signer (signer.json) and control plane (control-plane.json). Present — even empty{}— enables the built-in defaults; absent = disabled (backward compatible). Redaction never touches the decision path: the signer and the certificate force-command always see the original command.
Documentation¶
- New "Redaction is best-effort" section in SECURITY.md (limits: regex ≠ DLP,
chunked output, decision path untouched by design, false-positive escape
hatch); THREAT_MODEL gap #8 updated from "no redaction" to "opt-in,
best-effort";
redactblocks in the three example configs; config reference regenerated.
[v1.31.1] - 2026-07-02¶
Fixed¶
- Deployment: the signer config moves to the service-owned state directory
/var/lib/ssh-broker/signer/signer.json. v1.31.0'sReadWritePathsfix (#38) removed the systemd barrier to the durable policy-mutation API (broker-ctl policy add/remove), but the POSIX permission barrier remained —/etc/ssh-brokerisroot:ssh-broker 0750, so thessh-brokerservice user could not createsigner.json.tmpfor the temp-file+rename and every durable mutation still failedEACCES. Placing the signer config where the service owns it (and reverting the now-unnecessaryReadWritePaths, keeping/etcread-only for the service) lets durable mutations persist while the PKI and the other services' configs stay root-owned in/etc. Binaries are unchanged from v1.31.0; this is a deploy-artifact + docs fix. The installer seeds the signer config to the new location; the systemd unit points-configthere.
[v1.31.0] - 2026-07-02¶
Security & correctness audit pass. Fourteen findings across the signer, control plane, broker-ctl, deployment artifacts and docs.
Security¶
- Signer rejects an empty one-shot command. An empty command baked no force-command into the certificate (an unrestricted host credential) and, on a denylist or approval-only host, slipped past the command firewall and the human-approval gate. Rejected at the authoritative layer (#37).
- The control plane no longer trusts an unauthenticated, broker-supplied
end_userfor the approver's display, the notifier, or the forward to the signer unless the broker CN is a trusted forwarder — a malicious broker could otherwise label a command as coming from a trusted admin to bias the human decision (#40). broker-ctlno longer searches the current working directory forbroker-ctl.json, and a relative default cert/key/ca resolves against the loaded config file's directory rather than the CWD, so a planted file cannot redirect the CLI's mTLS endpoint or CA trust anchor (#39, #42).- The approval webhook/Teams notifier requires an
httpsURL (httponly for a loopback relay), preventing cleartext leakage of approval details; the legacy Teams MessageCard no longer enables markdown, which could inject links into the approver's notification (#44, #43).
Fixed¶
- The signer systemd unit adds
ReadWritePaths=/etc/ssh-brokerso the durable policy-mutation API can persist tosigner.json; underProtectSystem=strictit was failingEROFSwhile in-memory grants and SIGHUP reload masked it (#38). - The release workflow builds via
make dist, so the published artifact ships thecontrol-planebinary (the shipped unit had nothing to exec),deploy/and the example configs, with the version injected (#41). - A per-host
max_ttl_secondsabove the 900s certificate cap is rejected at config load instead of failing every issuance at request time (#45). - The signer rate-limiter bucket map stays strictly bounded — least-recently-used eviction when pruning frees nothing (#46).
Documentation¶
ssh_list_serversreturn table documentsallow_file_transfer; the README stdio bullet lists the file-transfer tools; the OPERATIONS reference-config table includesbroker-ctl.example.json;config.example.jsongains afile_transfer_max_bytesexample (#47–#50).
[v1.30.0] - 2026-07-02¶
Added¶
GET /v1/policy/hostson the signer: full host-policy read (the current in-memory table, same schema as the signer.jsonhostsobject, including the fieldsGET /v1/hostswithholds from brokers — principal, TTLs,allowed_callers,command_policy). Auth is thereload_callerstier, like the policy mutation APIs; every read attempt is audited (policy-read/policy-read-denied).broker-ctl host list --remote: renders the live policy from a running signer over mTLS with the same columns as the local view — the recommended post-deploy end-to-end check. A non-200 is a hard failure (no silent fallback to the reduced/v1/hostsview).- broker-ctl client parameters file: the remote commands (
reload,policy add/remove/grant/grants/revoke,approval list/allow/deny,host list --remote) resolve--url/--cert/--key/--cawith per-parameter precedence flag > env > file > default. File search order:--client-config,$BROKER_CTL_CONFIG,./broker-ctl.json,~/.config/broker-ctl/config.json,/etc/ssh-broker/broker-ctl.json(seeded bydeploy/install.sh). Env vars:BROKER_CTL_SIGNER_{URL,CERT,KEY,CA},BROKER_CTL_CP_{URL,CERT,KEY,CA}. Newbroker-ctl.example.json, validated against the struct in CI. - The signer-facing remote commands accept
--url, so none of them need a localsigner.jsonanymore (itslistenfield remains the last-resort URL fallback).
[v1.29.0] - 2026-07-02¶
Added¶
- Production deployment artifacts under
deploy/: hardened systemd units for the three daemons (ssh-broker-signer,ssh-broker-control-plane,ssh-broker-mcp-http) with full sandboxing (ProtectSystem=strict, empty capability bounding set, syscall filtering),StateDirectory-managed audit log directories under/var/lib/ssh-broker/<svc>/,systemctl reload(SIGHUP) wired to the signer's hot-reload, and an optionalEnvironmentFile=/etc/ssh-broker/<svc>.envforAZURE_*credentials when CA custody is Azure Key Vault. deploy/install.sh: idempotent root installer — creates thessh-brokersystem user, the/etc/ssh-broker+/etc/ssh-broker/pkilayout, installs binaries and units, and seeds configs from the examples without ever overwriting an existing real config (safe for upgrades).make dist: release tarball (dist/ssh-broker-<version>.tar.gz) bundling the binaries,deploy/, and the example configs the installer seeds from.deploy/README.md: production checklist presenting CA custody as an explicit operator choice —akv(Azure Key Vault; the private key never leaves the vault; RSA/EC only) vspem(local file; lab/dev) — plus default-denycallers, rate limiting, monitor binding, and upgrade caveats (in-memory approvals/sessions).- Vendor-agnostic agent skill
.agents/skills/deploy/SKILL.md(symlinked from.claude/skills): the judgment layer over the deterministic tooling — preflight policy checks, the custody question, reload-vs-restart decision, and post-deploy health verification. - New § 8 "Production deployment" in OPERATIONS.md.
[v1.28.0] - 2026-07-02¶
Added¶
- Built-in approval UI on the control plane's mTLS listener:
GET /ui/approvals(pending-first list, auto-refresh) andGET /ui/approvals/{id}(request context with Approve / Deny and an optional approve-and-learn TTL). Server-renderedhtml/template, no new dependency, no external assets. Decisions are same-origin JavaScript POSTs to the existing/v1/approvals/{id}API, so the audit trail, broker/approver role separation, and the four-eyes self-approval guard apply unchanged. Auth is the browser's mTLS client certificate (CN inapproval.callers).approval_url_templatecan now point notification links athttps://<control-plane>/ui/approvals/{id}.
Security¶
POST /v1/approvals/{id}requiresContent-Type: application/json(415 otherwise): CSRF hardening for the browser UI — mTLS client certificates are ambient credentials and an HTML form withenctype=text/plaincan smuggle a JSON-shaped body cross-site; the media-type requirement stops forms, and a cross-originfetchcarrying it is stopped by the CORS preflight (the server sends no CORS headers).broker-ctlalready sent the header.
[v1.27.0] - 2026-07-02¶
Added¶
- Two new MCP tools,
ssh_put_fileandssh_get_file, built on the one-shot certificate machinery (no SFTP subsystem, no new dependency): the transfer is a force-command one-shot (cat > path/ boundedhead -cread) with content streamed over stdin/stdout; binary data via base64. A file larger than the cap is an error, not a truncation. The content's sha256, size, and path are recorded in dedicatedfile_put/file_getaudit entries correlated with theexecutedentry by serial. - New per-host gate
allow_file_transfer(default false, secure by default) in the signer HostPolicy and broker local-mode HostConfig, enforced at signing time via the newfile_transferintent/wire flag, exposed inGET /v1/hostsandssh_list_servers, and manageable withbroker-ctl host add --file-transfer. The generated transfer command remains subject to the host'scommand_policy. - Broker config
file_transfer_max_bytescaps transfer size (default 512 KiB; the HTTP MCP frontend's 1 MiB body bound must fit base64-encoded content).
[v1.26.0] - 2026-07-02¶
Added¶
- Every service accepts an optional
monitor_listenconfig key that starts a separate plain-HTTP listener with/healthz(liveness) and/metrics(Prometheus text exposition format, no new dependencies). The broker key covers all three broker frontends. - Initial metric inventory, fed from the existing audit funnels:
signer_sign_requests_total{outcome}(including the un-auditedrate-limitedoutcome),controlplane_events_total{outcome},controlplane_approvals_pending,broker_events_total{outcome},broker_sessions_active, andaudit_append_failures_total— the machine-readable signal for threat-model gap #9 (audit is fail-open); alert on any increase.
[v1.25.0] - 2026-07-02¶
Security¶
- The signer enforces an optional per-CN rate limit on
POST /v1/sign(sign_rate_limit_per_min, hot-reloadable), closing threat-model gap #4 on opt-in: a token bucket keyed on the authenticated mTLS peer CN — noton_behalf_of— checked before body parsing. Excess requests get429with aRetry-Afterhint; rejections are deliberately not audited so the tamper-evident log cannot become the flooding amplifier. 0/absent = disabled (backward compatible).
[v1.24.0] - 2026-07-02¶
Security¶
- The
callersRBAC table supports a reserved"_default"entry that unlisted broker CNs inherit, closing threat-model gap #6 on opt-in:"_default": {"allowed_groups": []}makes the table default-deny, so forgetting to list a new CN fails closed instead of open. Explicit entries always win over_default.
Added¶
broker-ctl callers addaccepts an explicitly-empty--groups ""to write a deny-allallowed_groups: []entry (required to create the_defaultdefault-deny entry from the CLI; an omitted--groupsis still a usage error).
Documentation¶
- Audited the whole doc set against the code and fixed the drift (#18): the generated config reference now covers the broker/MCP config (docgen recurses into nested structs and resolves const-named routes), API status codes match the handlers, and stale binary/package inventories were completed.
- Refreshed handoff, architecture, and changelog notes for post-v1.23.5 audit hardening and the scoped approve-and-learn waiver behavior.
- Corrected runtime-grant list examples so allow-grants and approval-waivers show
their distinct fields and match the
broker-ctl policy grantsoutput.
Fixed¶
- Approve-and-learn approval waivers are now scoped to the effective broker caller and OIDC end user that were approved, instead of clearing re-approval for every subject that can reach the same host/command.
- Persistent shell/PTY session readers now cap a single unterminated stdout line
before buffering it, so a remote command cannot bypass
maxOutputBytesby emitting a huge line without a newline. broker-ctl reloadnow matches the local process basename exactly before sending SIGHUP, avoiding accidental signals to unrelated commands whose name merely containssigner.
[v1.23.5] - 2026-06-30¶
Documentation¶
- Corrected post-release documentation drift for v1.23.4 and clarified that session preflight revalidates authorization, elevation, PTY, and command policy.
- Corrected the context-propagation notes:
SessionExecnow uses caller context, while AKV signing is bounded by the signer's own timeout becausecrypto.Signerhas no context parameter. - Removed fixed test-count numbers from the handoff document and kept only the stable coverage areas.
Fixed¶
- Behavior guardrails in
enforcemode no longer learn a novel host/command before approval is granted. Repeating the same unapproved anomaly keeps returning202instead of silently entering the subject baseline. ssh_session_execnow revalidates every bastion hop asrole=bastionbefore the target command preflight, so signer reloads that revoke jump-host access also stop already-open sessions on their next command.- New
ssh_executeandssh_session_opencalls refresh/v1/hostsimmediately before building SSH hops and fail closed on refresh errors, avoiding staleaddr/host_key/jumpdata for new connections. - The control-plane config loader now rejects unknown
behavior.mode,approval.notifier, andapproval.teams_formatvalues at startup instead of silently disabling guardrails or falling back to the log notifier. - Made broker shutdown idempotent, including repeated
Engine.Close()calls. - Canonicalized approve-and-learn waiver elevation so
sudo_user=""andsudo_user="root"match the same effective sudo target. - Propagated
preflightfrom the signer HTTP request into the internal signing intent. - Hardened persistent shell session markers against
printf()function redefinition, soshell/ptysessions cannot spoof the reported exit code by shadowing the marker emitter. - Rejected
ssh_session_execon an already-open session when the current signer host route (addr/user/host_key/jump) no longer matches the route used to open that session.
[v1.23.4] - 2026-06-30¶
Fixed¶
- Session preflight now carries PTY state.
ssh_session_execpreflight sends the live session's PTY bit to the signer, so a policy reload that disablesallow_ptyalso stops already-openmode=ptysessions on their next command.
Documentation¶
control-plane.example.json, API, operations, architecture, and handoff docs now describe the approved-but-uncollected approval TTL and the current v1.23.x session-preflight behavior.
[v1.23.3] - 2026-06-30¶
Fixed¶
- Approved requests now expire if they are not collected. Once a human approves an operation, the broker must redeem it within the approval TTL; stale approved-but-unconsumed requests can no longer issue a certificate later.
- Session command preflight now follows current signer policy. Every
ssh_session_execis rechecked withdry_run=true+preflight=true, so signer reloads affect already-open sessions.mode=execcommands enforce the new policy on the next call, and existingshell/ptysessions are blocked once a command policy becomes active.
Documentation¶
- Clarified that session command filtering is broker-preflighted but not host-enforced, fixed the persistent-session serial examples, and updated the session/preflight API wording.
[v1.23.2] - 2026-06-30¶
Fixed¶
- Approved requests survive transient signer failures. The control plane now burns an approval only after the signer returns a certificate or preflight decision, while still preventing concurrent double issuance.
- Broker HTTP responses preserve audit-mode warnings.
/v1/ssh_runnow includes optionalwarningsso clients can see command-policy audit findings.
Documentation¶
- API/MCP return-field documentation now lists
warnings, and the security scope no longer describes session command firewalling as entirely absent.
[v1.23.1] - 2026-06-30¶
Fixed¶
- Session exec preflight is now scoped to command-policy hosts.
mode=execsessions on unrestricted hosts no longer call the signer before everyssh_session_exec; hosts withcommand_policystill preflight each command. - Executable preflights now pass through control-plane behavior guardrails.
Pure dry-runs still bypass guardrails, but
dry_run=true+preflight=trueis treated as an imminent execution and can be rate-limited or escalated.
Documentation¶
- Main example configs now use
enforcement: "enforce"by default and documentauditas a baseline-collection mode. - API and architecture documentation updated for executable preflight.
[v1.23.0] - 2026-06-30¶
Added¶
- Command-policy audit mode.
command_policy.enforcementnow accepts"audit"(default remains"enforce"). Audit mode lets commands run while returning and auditing warnings such aswould_denyandwould_require_approval, so operators can collect a baseline before enforcing allow/deny/approval rules. In composed policies, any enforcing policy wins; a host is audit-only only when every restricting policy is audit. - Command firewall for
ssh_session_execinmode=exec. Hosts withcommand_policynow allowssh_session_open mode=exec; the broker preflights eachssh_session_execwith the signer before opening the SSH exec channel. Denied or approval-gated commands are blocked in enforce mode and returned as warnings in audit mode.shellandptysessions remain rejected on command-policy hosts.
Changed¶
broker-ctl host addsupports--policy-enforcement enforce|auditand preserves that field during partial--forcecommand-policy updates.- MCP execution outputs now include optional
warnings, and audit entries can carry awarningfield for audit-mode policy observations.
[v1.22.1] - 2026-06-30¶
Patch release correcting the incomplete client-cancellation fix from v1.22.0 (it
only covered PTY executions) and a strict-config-validation blind spot for
_-prefixed map entries.
Fixed¶
- Client cancellation now aborts non-PTY SSH executions too. The v1.22.0
cancellation fix only reached the PTY branch of
ExecOnce; the common non-PTY path (ssh_executeand exec-mode sessions) still ran to the 10-minute timeout after a client disconnect. Both branches now share a singlewaitResultcore (unit-tested), so the cancellation and timeout handling cannot diverge again. - Cancelling a shell/PTY session command now tears down the SSH channel, so the remote command actually stops instead of lingering until the session is closed or reaped.
- Strict config validation no longer has a blind spot for
_-prefixed map entries. The strict pass stripped every_-prefixed key, so a typo nested inside an entry whose identifier starts with_(e.g. a host"_x"with a misspelled field) went undetected — and on default-open fields could widen access. Stripping now distinguishes comments (the_*_comment/_*_exampleconvention, or a_-prefixed key with a scalar value such as an inline_note) from real data (a_-prefixed object/array entry, or_default), so data entries reach validation while inline comments are still ignored.
Documentation¶
- THREAT_MODEL.md no longer lists the certificate TTL as a session mitigation: for an
established session the bound is
session_idle_seconds/session_max_seconds, not the cert TTL.
[v1.22.0] - 2026-06-30¶
Config and session hardening: fixes a v1.21.0 regression that could drop real
_-prefixed config keys, two session-management defects (unauthorized close
refreshing the idle timer; client cancellation not aborting commands), and corrects
the session-lifetime documentation.
Fixed¶
- Strict config no longer drops real
_-prefixed map keys.confcheck.Strict(the runtime loader path added in v1.21.0) stripped every_-prefixed key before decoding, which would silently delete a legitimate map entry whose key begins with_— e.g. a broker CN_ciincallers, whose removal makes that CN fall back to default-open. The loader now loads the real value with a lenient pass and uses the strip+strict pass only to detect unknown struct fields (typos), so_-prefixed map data is preserved while a misspelled control is still rejected. - Unauthorized
CloseSessionno longer refreshes a session's idle timer. It went throughget(), which updatedlastUsedbefore the ownership check, so a caller holding a leakedsession_idcould keep another caller's session alive against the idle reaper. Ownership is now checked and the session removed atomically, without touchinglastUsed(C1). - Client cancellation now aborts in-flight SSH commands.
SessionExecignored its context andExecOncehad none, so a disconnected MCP/HTTP client left the remote command running until the 10-minute execution timeout. The request context is now threaded throughExecOnceand the shell/PTYExec; on cancellation the command is signalled and the channel closed.
Documentation¶
- Corrected the session-lifetime docs (USAGE.md and the
ssh_session_open/ssh_session_closeMCP descriptions): an established session is closed bysession_idle_seconds/session_max_seconds, not by the certificate TTL (OpenSSH validates the certificate only at authentication). Setsession_max_secondsto the maximum exposure window you accept.
[v1.21.0] - 2026-06-30¶
Config-safety hardening: the runtime loaders now reject unknown/misspelled keys so a typo cannot silently leave a security control open. No change to the wire protocol or the broker's runtime behaviour.
Security¶
- Config is strictly decoded at load (fail-closed on unknown keys). The runtime
loaders (signer, control plane, broker — startup, reload, and the policy-mutation
path) now reject a config with an unrecognised or misspelled key instead of silently
ignoring it, so a typo in a security control (
sign_callers,allowed_callers,callers, …) can no longer quietly leave a default-open setting. Comment keys (_*) and the reserved_defaultgroup are still accepted (internal/confcheck.Strict).
Documentation¶
- OPERATIONS.md documents
sign_callers/ the broker–approver role separation and the strict config validation.
Internal¶
- Documented why
crypto/rand.Readerrors are intentionally discarded in the id/marker helpers (on Go 1.24+ it never returns an error — it crashes on RNG failure — so the discarded return is not a fail-open path).
[v1.20.0] - 2026-06-30¶
Security hardening of the control plane and the mTLS caller identity — no change to the broker's runtime behaviour or the wire protocol.
Security¶
- Control-plane broker/approver role separation. The signing path (
/v1/sign,/v1/hosts,/v1/sign/result) is now restricted to brokers: a newsign_callersallowlist pins which CNs may sign, and with no list a CN inapproval.callersis denied the sign path (an approver is not a broker — secure by default). This closes a role-confusion gap where an approver certificate, signed by the sameclient_ca, could originate signing requests. - mTLS rejects an empty or malformed CN.
auth.CallerCNnow fails closed on an empty common name or one containing control characters, instead of treating it as an unlisted (default-open) identity.
Fixed¶
- Control plane forwards host
groupsonGET /v1/hosts. The group labels were dropped when re-serialising the host list, so an OIDC user with groups saw zero hosts inssh_list_serversbehind the control plane. Restores the documented/v1/hostscontract.
[v1.19.0] - 2026-06-30¶
Relicensing and documentation infrastructure: the project is now GPL-3.0, and the docs are published to GitHub Pages with a CI pipeline that keeps them from drifting from the code. No change to the broker's runtime behaviour, API, config, or tools.
Changed¶
- Relicensed from proprietary to GPL-3.0.
LICENSEis now the GNU General Public License v3.0; README updated accordingly. - Wiki mirror enabled. The one-way docs→Wiki CI job is on (
ENABLE_WIKI_MIRROR); it pushes withGITHUB_TOKEN(falls back to aWIKI_TOKENPAT if one is set). - Documentation moved to
docs/and published to GitHub Pages, built from the repo's Markdown bymkdocs-material(single source of truth, reviewed in the same PR as the code). A one-way CI job optionally mirrors the docs to the read-only GitHub Wiki.
Added¶
- Anti-drift documentation pipeline.
tools/docgenregeneratesdocs/reference/{endpoints,mcp-tools,config,cli}.mdfrom the actual HTTP routes, MCP tool schemas (enumerated from the live server), config structs, and thebroker-ctlCLI; CI fails if the committed reference differs. The example configs are validated against their Go structs (internal/confcheck), andmkdocs build --strictfails on a broken link or anchor. Newmake docs-gen|docs-check|docs-serve.
[v1.18.0] - 2026-06-19¶
Dynamic command policy: a runtime overlay composed on top of the file baseline so the
firewall can be loosened temporarily without editing signer.json — widen an allowlist
for a TTL (grants), or skip re-approval for a vouched-for command (approve-and-learn).
Both are widen-only and self-expiring; the file stays the source of truth.
Added¶
- Approve-and-learn — TTL'd approval waivers. When a reviewer approves a
require_approvalcommand with--learn, the same command runs without re-approval for a TTL. Becauserequire_approvalis orthogonal to allow/deny, this is a newwaive_approvalgrant dimension (suppress the approval gate for an already-allowed command), applied inresolveCommandPolicyafter the allow check — so it only un-gates an allowed command, never widens allow/deny (no inversion risk; works on any host, incl. default-allow ones carrying arequire_approvalrule). The waiver is minted signer-internally: the control plane carries the learn intent on the approved sign and the signer mints a waiver scoped to the approved broker caller and OIDC end user, honoured only from atrusted_forwarder(likeapproved) — no new auth tier, a broker can neither self-approve nor self-learn. A waiver is bound to the exact command, elevation (sudo/sudo_user), caller, and end user that were approved — approving a non-sudo command never waives its root variant, and another subject still needs its own approval. Waivers appear inpolicy grantsand are revoked like any grant; the TTL is clamped tomax_grant_ttl_seconds; re-learning refreshes the single waiver (no duplicate accumulation) and expired ones are purged periodically; every mint is audited (approval-waiver-created) and linked to its approval id.
broker-ctl approval allow <id> --learn --ttl 2h # approve once, skip re-approval for 2h
broker-ctl policy grants # shows waive-approval[^cmd$]
broker-ctl policy revoke <grant-id> # end it early
- Runtime command-policy grants (dynamic widening overlay). A grant temporarily
widens an allowlist host without editing
signer.json— a set ofallowpatterns that expire on their own after a TTL. Grants are the in-memory dynamic overlay on top of the durable file baseline, composed at decision time (internal/signer/grants.goGrantStore/GrantProvider, injected inresolveCommandPolicy). They survive config reloads and are dropped on a signer restart (TTL'd; fail-safe). New signer API (authreload_callers, audited): POST /v1/policy/hosts/{host}/grants— create{ "allow":[...], "ttl_seconds":N, "caller":"", "end_user":"" }→201 { id, host, expires_at }.GET /v1/policy/grants— list active grants.DELETE /v1/policy/grants/{id}— revoke.broker-ctl policy grant|grants|revokeclients; optionalmax_grant_ttl_secondsconfig cap.
broker-ctl policy grant --host web01 --allow '^systemctl restart nginx$' --ttl 2h
broker-ctl policy grants
broker-ctl policy revoke <id>
Widen-only, enforced. A grant carries only allow (never deny /
require_approval) and is applied only on a host that is already allowlist-active
— on a default-allow/denylist host it is refused (409), since injecting an allowlist
there would invert the host to default-deny. deny still wins; creation is
operator-only; the broker/agent can never create one.
[v1.17.0] - 2026-06-19¶
Dynamic command-policy operations (Phase 0): manage the firewall without abandoning the file as the source of truth — recommend changes from the audit, apply them with a validated mutation API, and pick up edits automatically.
Added¶
- Validated policy mutation API on the signer.
POST/DELETE /v1/policy/hosts/{host}/allowadd/remove a single command-policy allow regex for a host over mTLS, authorised by the existingreload_callersallowlist. Unlike a hand edit, the change is validated by building the new state (CompileHostPolicies+ CA load) before it is persisted or applied: a bad regex, an unknown host, or a config that would not compile is rejected and nothing changes. On success the file is written atomically (temp+rename, preserving permissions, top-level keys and other hosts verbatim) and the in-memory policy is swapped, so disk and the running policy stay consistent; every attempt (changed / denied / failed) is recorded in the signed audit log. Newbroker-ctl policy add|remove --host <h> --allow <regex>client. This is the apply-side ofpolicy recommendand the foundation for runtime grants. - Signer auto-reload (opt-in). New
auto_reload_secondsinsigner.json: when0, the signer polls the config file's mtime and hot-reloads on change via the same validated, atomic, previous-state-preserving path as SIGHUP /
POST /v1/reload(a half-written file mid-save is rejected and re-applied on the next tick). Dependency-free (mtime poll, no fsnotify). 0/absent = disabled. Removes the manualbroker-ctl reloadafter a hand edit or a GitOps write. broker-ctl policy recommend— mines an audit log and prints advisory command-policy suggestions: promote (commands run or human-approved despite the current policy denying them — candidates for the allowlist), dead-rule (allow/deny patterns that never matched in the window — least-privilege cleanup), and friction (commands repeatedly denied). Read-only and advisory: it never changes policy. Attribution is by re-evaluation against the current compiled policy (signer.PolicySet.Decide), so it does not depend on the audit recording which rule matched. Newinternal/policyrec;--audit <log>,--host,--since,--min-count,--json.
[v1.16.0] - 2026-06-19¶
Performance and maintainability pass (read-only audit of the hot-path packages, then targeted fixes). No behaviour change to issuance, policy decisions, or the wire protocol.
Security¶
- BehaviorTracker memory is now bounded (resource-exhaustion fix). The
control-plane anomaly/rate tracker kept per-subject state in maps that were
only ever added to. A trusted forwarder rotating
end_uservalues (subject =<brokerCN>:<endUser>), or any subject touching many distinct hosts/commands, could grow them without limit. The subject table is now capped (max_subjects, default 4096) with least-recently-seen + idle-TTL eviction (subject_ttl_minutes, default 1440), and each subject's host/command history is capped (max_distinct_per_subject, default 1024); once full, novelty detection for that dimension degrades to "seen" instead of growing or emitting unbounded approval escalations. New optionalbehaviorconfig fields, all with sane defaults (internal/control/behavior.go).
Changed¶
CommandPolicy.Decide/decideOneremoved (single evaluator). The request path has always evaluated throughPolicySet; the parallel single-policy evaluator was test-only and had drifted (Spanish error strings vs the EnglishPolicySetones). It is deleted and its tests now run againstPolicySet{cp}, leaving one source of truth for the AI-action firewall rule logic. Thecommand_policysource (internal/signer/cmdpolicy.go) is also fully normalised to English.
Performance¶
- Parsed host keys are cached (content-addressed by the authorized_keys
line) instead of re-parsed per hop per request (
internal/broker/engine.go). shellQuoteSessionrewritten from O(n²) string concatenation to a singlestrings.Builderpass (internal/broker/session.go).- POSIX-shell parser pooled (
sync.Pool) and the AST printer hoisted out of the per-CallExprloop inextractCommands;buildConstraintsbuilds the cert KeyID with onestrings.Builderinstead of a slice +Sprintf+Join(byte-identical output, guarded by a test) (internal/signer).
Fixed¶
- Host-refresh goroutine lifecycle. The remote-mode host-refresh goroutine
had no stop channel and was not terminated by
Engine.Close()(a leak in tests and repeated construction). It now exits onClose(internal/broker/engine.go).
[v1.15.0] - 2026-06-19¶
Added¶
--versionon every binary. All six commands (signer,broker,control-plane,mcp-broker,mcp-broker-http,broker-ctl) now print their build version and exit. Short, script-friendly form by default (broker-ctl --version→v1.15.0); detailed form with--version --verbose(Go toolchain, target os/arch, VCS revision and commit time).broker-ctlalso gains the twin subcommandbroker-ctl version [--verbose]. The infrastructure already existed (internal/versioninjected from the git tag by the Makefile); this wires it to the CLI. Newversion.Printandversion.Detailedhelpers.
Changed¶
- BREAKING —
broker-ctl --configis now a global flag and must precede the subcommand. Usebroker-ctl --config <f> host listinstead ofbroker-ctl host list --config <f>. The per-subcommand--configwas removed from all subcommands (host,ca-keys,callers,reload,policy explain), so--configafter the subcommand is now rejected. This alignsbroker-ctlwith the other five binaries, which already take--configat the top level. Scripts that passed--configafter the subcommand must move it before it.
[v1.14.0] - 2026-06-18¶
Added¶
- Composable command policies by group. The AI-action firewall is no longer
per-host only: a named policy library (
command_policies) attaches N policies to a group (group_command_policies: group → [names]), and a host (with N groups) gets the composition of all its groups' policies plus its own inlinecommand_policy. Composition is additive: deny wins (any denylist match blocks), allow is a union (if any contributing policy is an allowlist, the command must match the union of all of them),require_approvalis a union, andshell_parseis OR. The reserved group_defaultapplies to every host (global guardrail, mirroringca_keys_default). Newinternal/signer/policyset.go(PolicySet) andCompileHostPoliciesresolve + validate the composition at config load (a one-element set reproducesCommandPolicy.Decideexactly, so single-policy hosts are unchanged). Works in both the remote signer (signer.json) and the local single-binary broker (config.json). Newbroker-ctl policy explain --config <f> --host <h> [--command <c>]prints a host's composed policy and evaluates a command offline (no signing, no network). SeeARCHITECTURE.md§ AI-action firewall and the example configs.
[v1.13.0] - 2026-06-16¶
Security hardening from an adversarial (red-team) review of authentication,
RBAC, privilege escalation, the command firewall, and audit integrity. Two
high-severity bypasses (command firewall via role=bastion; deny-all RBAC
collapsing to unrestricted on the wire) plus several medium/low fixes.
Security¶
- Command firewall could be bypassed by requesting
role=bastion. The AI-action firewall (command_policy) and the certforce-commandwere applied only forrole=target, whilerolearrives unverified from the wire. A compromised broker could requestrole=bastionon a host that had both acommand_policyandallow_as_bastion, obtaining a certificate with the host's real principal, no force-command, andpermit-port-forwarding— i.e. an unrestricted credential that evades the allow/deny rules entirely, defeating the "one-shot policy survives a fully compromised broker" guarantee.PolicyTable.Resolvenow rejects any non-targetrole on a host whosecommand_policyrestricts, andPolicyTable.Validaterejects a host that sets bothallow_as_bastionand acommand_policy(the two are mutually exclusive — a bastion certificate carries no force-command). Defends both the remote signer and the broker's local mode. - Empty OIDC groups (deny-all) collapsed to unrestricted on the wire. The
OIDC verifier computes a non-nil empty
[]string{}for an authenticated user with zero groups, so the signer denies every host (deny-all). But the broker→signer wire field usedjson:",omitempty", which drops a length-0 slice entirely, so the request arrived withend_user_groups == nil— read by the signer as unrestricted (no per-user filter), the exact inverse of the intended decision.WireRequest.EndUserGroupsno longer usesomitempty:nil(no end-user identity) round-trips tonil;[](deny-all) round-trips to a non-nil empty slice. GET /v1/hostsignored per-hostallowed_callers. The host list applied only the group RBAC filter, so a broker CN excluded from a host viaallowed_callers(but not group-restricted — thecallerstable is default-open) still received that host'saddr/user/host_key/jump. The handler now also drops hosts whoseallowed_callersexcludes the caller, matching the/v1/signauthorization.- Approval requests hid sudo elevation from the human approver. The pending
request stores
sudo/sudo_userand the issued certificate bakes the sudo prefix into its force-command, butbroker-ctl approval listand the defaultlognotifier did not display the elevation, so an approver could authorize a benign-looking command unaware it would run as root. Both now showelevation=sudo:<user>(the webhook already serialized the full request and the Teams card already rendered it). - Certificate KeyID accepted control characters in broker-supplied identity.
end_userand the resolved caller (on_behalf_offrom a trusted forwarder) flowed verbatim into the cert KeyID, which sshd writes to its auth log; a newline let a compromised forwarder forge/splice lines in the host'sauth.log. The signer now rejects control characters incaller/end_user.
Fixed¶
- Audit rotation is now verifiable end-to-end.
broker-ctl audit verifygains an--allflag that discovers the rotated segments (<log>.<timestamp>) plus the active file, verifies each, and checks the cross-file linkage (segment N's firstprev_hash== SHA-256 of segment N-1's last line, earliest segment starts at genesis). Single-file verification accepted the firstprev_hashas an unchecked seed, so dropping a whole rotated segment — or truncating the active file and restarting (which re-anchors to genesis) — was undetectable, contradicting THREAT_MODEL's rotation guarantee.--alldetects both. broker-ctl host add --forceno longer wipes the wholecommand_policy. A partial update that passed any one policy sub-flag rebuilt the entire policy object from flag defaults, so omitting--policy-modesilently downgraded the host tomode:off(firewall disabled, sessions re-enabled) with no warning. The policy is now merged field-by-field like every other host field: only the sub-fields whose flags were explicitly passed are overridden.- Session
shell/ptyper-command audit now records the elevation. For an elevated shell/pty session the per-commandsession_execentries recorded a blank elevation (the prefix lives in the shell process), understating privilege. The session now retains anelevLabelfor all modes and emits it on every command. ssh_session_execchecks ownership before mutating session state. It marked a command in flight (busy/lastUsed) before the C1 ownership check, so a non-owner could refresh another caller'slastUsedand holdbusy>0to keep the session from being reaped. Ownership is now verified under the lock before any mutation (newsessionManager.checkoutOwned).
Changed¶
- Local single-binary mode no longer marks every host as a bastion.
policyFromHostshardcodedallow_as_bastion=truefor every host, grantingpermit-port-forwardingon every cert and contradicting the documented default-deny bastion gate. A newallow_as_bastionfield on the localHostConfig(default false) plus automatic enablement for hosts referenced as another host'sjumptarget preserves existing jump chains while honoring the gate for leaf hosts.
[v1.12.7] - 2026-06-13¶
Final batch from the logic-flaw review: the remaining low-severity findings, plus a build-time version that can no longer go stale.
Added¶
- Build version is derived from the git tag. New
internal/versionpackage whose value is injected via-ldflagsfromgit describe --tags, with a fallback to the Go build info (module version or VCS revision) so a plaingo buildnever reports an empty or hard-coded string. A newMakefile(make build/make install) wires the injection for every binary. The MCP server now announces this version to clients instead of the hard-coded1.4.1constant (removed). - OIDC clock-skew tolerance (
oauth.clock_skew_seconds, default 60s) for the HTTP frontend.
Fixed¶
- OIDC:
nbf(not-before) is now enforced. go-oidc validatesexpbut notnbf, so a token marked valid only from a future instant was accepted. The verifier now rejects not-yet-valid tokens and also rejects a token whoseiatis in the future (which would read as a negative age and slip under the max-age bound). Both apply the configurable clock skew, avoiding spurious 401s from minor IdP/host clock drift. - Shell sessions no longer drop a final unterminated output line. When a
command's last line lacked a trailing newline (e.g.
printf hello), the shell wrote the end-of-output marker on the same line andExecdiscarded the text before it, returning empty output. That text is now captured. A marker line with a non-numeric exit code now marks the session broken instead of silently reporting exit 0. - HTTP broker no longer maps every failure to 403.
cmd/brokernow returns 400 for a malformed request, 404 for an unknown host, 502 for an infrastructure failure (SSH dial/exec, or the signing service unreachable/5xx), and 403 only for an actual policy/authorization denial. Upstream (502) responses carry a generic message so internal addresses from dial errors are not leaked to the client (the full error is still audited). New error categoriesbroker.ErrBadRequest/ErrUnknownHost/ErrUpstreamandsigner.ErrSignerUnavailableback the classification. broker-ctl reloadverifies the PID is the signer before SIGHUP. A bare liveness check could SIGHUP a recycled PID belonging to an unrelated process; it now confirms the process command line looks like the signer and otherwise falls back to the authenticated HTTP reload (which targets the signer by URL).- Session recordings are size-capped (
recording.DefaultMaxBytes, 100 MiB, mirroring the audit-log rotation size). A long or abusive session can no longer fill the disk; the recording stops with a truncation note once the cap is reached.
[v1.12.6] - 2026-06-13¶
Second batch from the logic-flaw review (v1.12.5 shipped the two signer
firewall bypasses). Fixes across sessions, the SSH layer, the control plane,
the audit chain, and broker-ctl/CA.
Security¶
- Behaviour guardrails key on the authenticated broker CN, not the
client-supplied
end_user. The control plane keyed its rate limit and anomaly baselines onend_user, an unauthenticated JSON field, so a client could rotate it to get a fresh window / first-seen baseline on every request. Newtrusted_forwardersconfig (control-plane): only for CNs in that list doesend_userqualify the subject (<broker CN>:<end_user>); every other CN is keyed on the broker CN alone. See THREAT_MODEL.md non-goal #2. - No self-approval. The originator of an approval request can no longer
approve or deny it (four-eyes), even if its CN is in
approval.callers; the attempt is audited asself-approval-rejected. - Audit hash chain is continuous across rotation.
maybeRotatereset the chain to zero, so deleting or truncating a file at a rotation boundary was undetectable. The first entry of each rotated-to file now carriesprev_hash= hash of the previous file's last line.broker-ctl audit verifytreats a first-lineprev_hashas the chain seed.
Fixed¶
broker-ctl audit verify --keyno longer reports false signature failures. The CLI re-implemented the signed entry struct and was missing five signed fields (policy_rule,dry_run,approval_id,approved_by,anomaly), so any entry with one populated (denials, approvals, anomalies) verified as invalid. It now usesinternal/audit.Entrydirectly;show/tailalso render those fields.broker-ctl ca-keys add/removepreserves all fields. The command mirrored only 4 of the 7ca_keysfields and re-serialised the whole map, silently droppingkey_version,tenant_id,client_id, andclient_secret_envfrom every entry (breaking AKV service-principal auth). It now edits only the touched entry as raw JSON.- Session reaper no longer kills a session with a command in flight. The idle TTL (5 min) could fire under a longer exec (10 min cap), closing the connection mid-command. A busy counter protects in-flight sessions; the idle clock now counts from command completion.
- Shell sessions fail fast after a desync. After an exec timeout or output overflow, the per-session end-of-output marker was left in flight and the next exec returned the previous command's output and exit code (also corrupting the audit trail). Such a session is now marked broken and every later exec errors asking the caller to reopen it.
- Output over the 10 MiB cap is truncated, not failed.
limitedWriter/syncBufreturned a short write at the cap, which aborted the SSHio.CopywithErrShortWrite— erroring the command or stalling it until the 10-min timeout. They now consume all bytes, discard the overflow, and return the truncated output with a marker. - AKV signer pins the key version at startup. With
key_versionempty it resolved "latest" on everySigncall; after a Key Vault rotation, certs were signed by the new version while the cached public key (and the cert'sSignatureKey) was the old one, so sshd rejected them all. The version is now pinned from the KID returned at startup (rotation requires a signer reload/restart). - Intermediate ProxyJump hop dials are time-bounded. Only the first hop had a dial timeout; a dead bastion could hang a connect indefinitely. Every hop's dial is now bounded by the request context plus a per-hop timeout, and the context is cancellable.
broker-ctl host add --scanhonours the port in--addr(passesssh-keyscan -p) and handles IPv6 literals; it previously keyscanned port 22 regardless, risking a wrong/hostile host key at onboarding.broker-ctl host add --forcepreserves unspecified fields. A--forceupdate reset every field not given as a flag (sudo, groups, callers, TTL) to defaults; it now starts from the existing entry and overrides only the flags explicitly set.- Per-session goroutine leak removed.
shellReaderblocked forever on its channel after a session closed; it now exits on a done signal. - Reaper close/audit moved outside the session-manager lock, so a slow disk or a hung connection close no longer stalls all other session operations.
Added¶
- Control-plane config field
trusted_forwarders(list of broker CNs); documented incontrol-plane.example.json.
[v1.12.5] - 2026-06-13¶
Security¶
- Signer: two command-firewall bypasses closed (found in a logic-flaw
review). Both are in
PolicyTable.Resolve/CommandPolicy.Decide, the authoritative AI-action firewall: - Unknown
role/purposeno longer skip the firewall. Command-policy evaluation is gated onrole == targetand theforce-commandis baked only forpurpose == oneshot; both values arrive from the wire and were never validated. A caller authorised for a host with acommand_policycould sendrole: "x"(orpurpose: "") and receive a certificate for the target with no force-command and no policy check — a full interactive shell.Resolvenow rejects any role/purpose outside the known set (default-deny). require_approvalis no longer dropped on chained commands. Withshell_parse,DecideoverwroteneedsApprovalon each command of a chain instead of accumulating it, sosystemctl restart nginx && systemctl status nginxissued the cert without the human approval the first command required. It now OR-accumulates approval across the chain and keeps the matched rule for the audit trail.
Regression tests added for both.
[v1.12.4] - 2026-06-10¶
Changed¶
- README trimmed to a landing page (862 → 203 lines). After the v1.12.1
documentation split, the README duplicated ARCHITECTURE.md, OPERATIONS.md, and
THREAT_MODEL.md almost in full (sudo/sudoers,
broker-ctlflag table, hot reload, auth diagrams, the AI-action firewall, approval/behaviour, etc.), which was already drifting. The README is now an orientation page: pitch, frontends, documentation index, "why", a one-screen "how it works", a feature overview table linking to the canonical docs, the competitive comparison (kept in full), a quickstart, the API summary, and security/testing/license pointers. Removed the staleSecurity (v1.4.1)table and the duplicateProduction roadmap(single-sourced in THREAT_MODEL.md / HANDOFF.md). - Repointed the two inbound links that referenced now-removed README sections (USAGE.md → OPERATIONS.md §4 and ARCHITECTURE.md § AI-action firewall).
Documentation only; no code changes.
[v1.12.3] - 2026-06-10¶
Security¶
- Dependency & toolchain CVE fixes (found by the new govulncheck CI job).
Bumped
golang.org/x/netv0.54.0 → v0.55.0 (3 vulnerabilities, incl. an idna issue reached viasigner.Remote.FetchHosts) and the Go directive 1.26.3 → 1.26.4 (two standard-library vulnerabilities innet/textprotoandcrypto/x509).govulncheck ./...now reports no vulnerabilities. - Signer validates
signer.jsonon load and reload. NewPolicyTable.Validate()/CommandPolicy.Validate()compile every command-policy regex, reject unknown modes, and check that everyjumptarget is a defined host. An invalid config is now rejected up front (preserving the previous good state) instead of silently breaking a host on its next request.
Added¶
- CI quality gates (
.github/workflows/go.yml):gofmt -lcheck,go vet,go test -race, and agovulncheckjob — mirroring the CODING_STYLE / CONTRIBUTING pre-commit checklist that was previously manual-only. Pinned to Go 1.26.4. - Graceful shutdown (
internal/httpserve.RunTLS): the signer, control-plane, broker, and HTTP MCP frontend now drain in-flight requests on SIGINT/SIGTERM viahttp.Server.Shutdown, so the deferred audit-log close/flush actually runs (it did not when exiting throughlog.Fatalon a rawListenAndServeTLS). LICENSE— proprietary, all-rights-reserved notice.
Changed¶
- Docs: THREAT_MODEL.md gains two explicit non-goals — secrets logged
verbatim in audit logs/recordings (no redaction) and audit-write fail-open.
OPERATIONS.md gains a key/certificate rotation runbook (SSH CA via
TrustedUserCAKeystwo-CA transition; mTLS CA/leaf rotation).
[v1.12.2] - 2026-06-10¶
Changed¶
make_presentation.pybrought up to date (v1.12.1 content). Cover and roadmap version refreshed (v1.11.0→v1.12.1); portable output path (writes next to the script via__file__instead of a hard-coded/home/luislgf/...); slide-header comments renumbered sequentially (1–34, dropping stale(was N)/(NEW — X)annotations and duplicate numbers); dead numeric argument removed from everyslide_number()call.- Added two slides: Hardening — fail-closed by default (v1.11.2 / v1.12.0:
fail-closed OIDC groups/iat, signer-level newline rejection, host list scoped
to the user's OIDC groups, bounded approval state, uniform DoS limits) and
Security limits — what we don't claim (the threat model's explicit
non-goals: sessions without a command firewall, behaviour as detection not
containment, no KRL, default-open
callers).
[v1.12.1] - 2026-06-10¶
Changed¶
- Documentation split —
HANDOFF.mdbroken up by topic/reader. The 1,100-line HANDOFF (architecture + design decisions + runbook + PKI + pending + versioning - test plan, with the design decisions numbered out of order) is now:
ARCHITECTURE.md(new, EN) — diagram, request flow, design decisions renumbered and regrouped by theme, sudo elevation mechanism.OPERATIONS.md(new, EN) — runbook: startup, adding hosts, hot-reload,broker-ctl, PKI inventory, reference configs.CONTRIBUTING.md(new, EN) — branches,X.Y.Zversioning, the mandatory pre-commit living-docs checklist, language rule.HANDOFF.md(reduced to ~145 lines, ES) — current state, file tree, pending work, test-plan snapshot, resume notes, and a documentation index.CODING_STYLE.md— language table corrected (CHANGELOG.mdis English since v1.9.3; new*.mddocs are English;HANDOFF.mdstays Spanish); the checklist now points toCONTRIBUTING.mdfor the workflow.README.md— added a Documentation index linking the new files.
Added¶
THREAT_MODEL.md(new, EN) — assets, actors/trust levels, trust boundaries and guarantees, and an explicit non-goals/gaps section (sessions without a command firewall, broker-asserted behavior subject, no KRL, no signer rate limit, in-memory single-instance state, default-opencallers, CA custody).SECURITY.md(new, EN) — supported versions, private vulnerability reporting, scope (links to the threat model), secret-handling notes.
Documentation only; no code changes.
[v1.12.0] - 2026-06-09¶
Added¶
ssh_list_serversfiltered by the end user's OIDC groups. The host list was served from the broker's cache (fetched with its own CN), so a group-restricted user saw every host even though the signer would deny signing on most of them.GET /v1/hostsnow includes each host's RBACgroups(labels, not secrets), andEngine.ServerInfos(caller)filters by group intersection when the caller carries groups. Nil groups (stdio/mTLS) = full list (compatible); empty groups = no hosts.
Fixed¶
cmd/brokerhardening (A1/A2). The HTTP+mTLS one-shot frontend was missed by the v1.4.1 pass:http.Servernow setsReadTimeout/IdleTimeout(noWriteTimeout— the response waits for the remote command) and/v1/ssh_runlimits the request body to 64 KiB.- Approval registry memory growth.
control.Registrynever deleted entries; expired/denied/consumed approvals accumulated for the lifetime of the control plane. Entries are now purged 2×TTL after creation (opportunistically onCreate/List); a purged id answers 404 on later polls instead of 408/410. - gofmt drift in
cmd/broker-ctlandinternal/ssh/shell.go(no behavior change).
[v1.11.2] - 2026-06-09¶
Security¶
- OIDC per-user RBAC is now fail-closed. With
groups_claimconfigured, a token without the claim is rejected (401) instead of being accepted with no group restriction. Previously a claim-name typo, or an IdP that stopped emitting the claim, silently disabled per-user RBAC for every user (EndUserGroupsnil = unrestricted in the signer). An explicitly empty groups list is still propagated as-is (denies every host). (internal/oauth/verifier.go) iatclaim required whenmax_token_age_seconds > 0. A token without a numericiatwas previously exempted from the max-age check (fail-open); it is now rejected, since its age cannot be established. (internal/oauth/verifier.go)- Newlines rejected in one-shot commands at the signer. A command
containing
\n/\rcould smuggle extra command lines past regex command policies withoutshell_parse(an allowlist^psalso matches"ps\nrm -rf /", and the remote shell executes both lines of the force-command).PolicyTable.Resolvenow rejects such commands authoritatively on every host (local and remote mode); compose with;or&&instead. This also makes the long-documented API.md constraint real. (internal/signer/signer.go)
Fixed¶
- Documentation coherence pass (API.md, USAGE.md, HANDOFF.md).
ssh_list_serversno longer documentsaddr/userfields it never returned;ssh_session_openreturnsserial(notelevation_prefix);ssh_executedocumentsdry_run; the 403 cause "TTL cap exceeded" removed (TTL is clamped, not rejected); session newline restriction scoped toshell/pty; USAGE examples updated to the English tool output (v1.9.3) and a multi-line heredoc example that the broker itself would reject replaced; HANDOFF duplicated architecture diagram block and stale "signer requires restart to reload" note fixed.
[v1.11.1] - 2026-06-09¶
Fixed¶
cmd/broker-ctl: criticalcommand_policysilent erasure bug.host add --forceandhost removesilently deletedcommand_policyfrom existing hosts becausehostEntrylacked the field. Fixed by addingCommandPolicy json.RawMessage \json:"command_policy,omitempty"`tohostEntry, which preserves the raw JSON verbatim through any round-trip without broker-ctl needing to understand the internal policy structure. When--forceis used without any policy flag, the existingCommandPolicyis copied to the updated entry. When policy flags are explicitly set, a newCommandPolicy` is built from them (replacing the old one).
Added¶
-
broker-ctl host add: command policy flags. New flags:--policy-mode(allowlist|denylist|off),--allow,--deny,--require-approval,--shell-parse. Internally usesbuildCommandPolicyJSONandcommandPolicyLabelhelpers. -
broker-ctl host list: additional columns. The table now showsJUMP,SRC_ADDR,SUDO_USERS,CALLERS, andPOLICY(a short label such asallowlist(2)ordenylist(1)derived fromcommand_policy). The—placeholder is used for empty/absent fields. -
broker-ctl ca-keys add/list/remove: new subcommand group to manage theca_keysmap insigner.json. add --name <n> --type pem --path <f>— adds a PEM-backed entry.add --name <n> --type akv --vault-url <u> --key-name <k>— adds an AKV entry.list— tabular view (NAME / TYPE / DETAIL).-
remove <name>— removes an entry. All operations preserve all other fields insigner.json(atomic write via.tmprename). -
broker-ctl callers add/list/remove: new subcommand group to manage the top-levelcallersRBAC table insigner.json. add --name <cn> --groups <g1,g2>— adds or updates a caller entry.list— tabular view (NAME / ALLOWED_GROUPS).-
remove <cn>— removes a caller entry. -
writeRawinternal helper: shared atomic JSON write used bywriteHosts,writeCAKeys, andwriteCallers.
Changed¶
cmd/broker-ctl:actionvariable logic corrected. The "added" vs "updated" detection inhost addnow checks existence before the map assignment instead of after (the previous code always reported "updated" when--forcewas used).
Tests¶
cmd/broker-ctl: 29 cases (up from 13). New tests added witht.Parallel():TestCommandPolicyLabel,TestBuildCommandPolicyJSON*,TestExtractCAKeys*,TestCAKeysRoundTrip,TestCAKeysRemoveRoundTrip,TestExtractCallers*,TestCallersRoundTrip,TestCallersEmptyGroupsSerialisedAsArray,TestCommandPolicyPreservedOnForce,TestCommandPolicyErasedWhenPolicyFlagsSet,TestCommandPolicyNilWhenHostHasNone.- Total test count: 185 (up from 170).
[v1.11.0] - 2026-06-09¶
Added¶
- Multi-CA support + Azure Key Vault (AKV) backend for CA keys. The CA signing key is no longer limited to a local PEM file; any group of hosts can now use a dedicated CA key, and each key can be stored in Azure Key Vault (private key never leaves AKV).
#### New config field ca_keys (signer.json and config.json)
"ca_keys": {
"_default": { "type": "akv", "vault_url": "https://vault.azure.net", "key_name": "ssh-ca" },
"prod-web": { "type": "akv", "vault_url": "https://vault.azure.net", "key_name": "ssh-ca-web" },
"databases": { "type": "pem", "path": "pki/db_ca" }
}
"_default" overrides the legacy ca_key string when present.
- All other keys map group names to their CA. The first group in a host's
groups field that has an entry in ca_keys wins; other hosts fall back
to the default CA. Backward compatible: existing ca_key string configs
require no changes.
- Supported types: "pem" (local PEM file; emits a warning) and "akv"
(Azure Key Vault — RSA 2048/3072/4096 and EC P-256/P-384/P-521; Ed25519
is not supported by AKV).
- A 30-second startup timeout covers all AKV GetKey calls.
- ca_keys participates in hot-reload (SIGHUP / POST /v1/reload).
#### New packages / files
- internal/ca/loader.go — CAKeyConfig struct, LoadCA(ctx, cfg),
LoadGroupCAs(ctx, caKey, caKeys) (shared helper used by both
cmd/signer and internal/broker).
- internal/ca/akv.go — akvSigner (crypto.Signer backed by AKV);
akvKeyOps interface (enables mock-based unit tests without a real vault);
rawECSignatureToDER converter (AKV returns raw R‖S, SSH needs DER);
parseAKVPublicKey (JWK → Go crypto key).
- internal/ca/loader_test.go — unit tests for LoadCA / LoadGroupCAs.
- internal/ca/akv_test.go — full mock-based unit tests: EC P-256,
RSA-2048, DER conversion, algorithm selection, end-to-end BuildAndSign.
#### Modified files
- internal/ca/sign.go — BuildAndSign accepts ctx context.Context
as first parameter; context cancellation is checked before signing.
- internal/signer/signer.go — Local gains groupCAs map and
caKeyFor(hp) (first-match group selection). NewLocalWithGroupCAs
constructor added. SignIntent selects the correct CA per host.
- cmd/signer/main.go — Config.CAKeys; buildState uses
ca.LoadGroupCAs and signer.NewLocalWithGroupCAs.
- internal/broker/engine.go — Config.CAKeys; HostConfig.Groups
(propagated to signer.HostPolicy); buildSigner uses ca.LoadGroupCAs
and signer.NewLocalWithGroupCAs.
- signer.example.json / config.example.json — documented
ca_keys block with PEM and AKV examples; groups field added to
example hosts in config.example.json.
#### Azure Key Vault notes
- Authentication: DefaultAzureCredential by default (managed identity,
workload identity, AZURE_* env vars, Azure CLI). Override with
tenant_id + client_id + client_secret_env in CAKeyConfig.
- AKV EC signatures arrive as raw R‖S bytes; akvSigner converts to DER
before returning from crypto.Signer.Sign.
- Recommended key type: EC P-256 (P-256 curve, 256-bit security). RSA
3072 is also supported for compliance environments.
- Ed25519 is NOT supported by AKV; use "pem" for Ed25519 CA keys.
[v1.10.0] - 2026-06-09¶
Added¶
-
Session recording in ASCIIcast v2 format. When
session_recording_diris set inconfig.json,shellandptysessions are recorded to.castfiles in that directory. One file per session:<session_id>.cast. -
internal/recording/recorder.go— newRecordertype (thread-safe). Writes ASCIIcast v2 JSONL: a header with session metadata (session_id,caller,host,serial,started_at) plus event lines[delta, type, data]where type is"i"(stdin),"o"(stdout/PTY), or"e"(stderr). Deltas in seconds from session start. - Stdin captured (
"i"events): the command typed by the agent is recorded before being written to the shell's stdin channel. - Stdout/PTY captured (
"o"events): each output line is teed to the recorder insideShellSession.Exec(). - Stderr captured (
"e"events, non-PTY mode only): thesyncBufstderr drain tees bytes to the recorder as they arrive. - File naming correlates directly with the broker audit log: the
session_idfield insession_open/session_exec/session_closeaudit entries matches the.castfilename, making the audit log the search index. - Files are written with
0o600permissions (owner-read only). -
internal/recording/recorder_test.go— 8 test cases: header fields, event types, delta monotonicity, concurrent writes, empty-data skipping, idempotent close, write-after-close no-op, default dimensions. -
session_recording_dirconfig field inbroker.Config(JSON:session_recording_dir). Empty or absent = recording disabled.
Changed¶
internal/ssh/shell.go:ShellSessionandsyncBufgain an optionalrecorder *recording.Recorderfield; newSetRecorder()method propagates it to both stdout and stderr tee points.internal/broker/session.go:liveSessiongainsrecorderfield; recorder is opened inOpenSession(when configured), closed inCloseSessionand the session reaper.config.example.json: documentssession_recording_dir.USAGE.md: new §8 "Session recording" with setup, file format, playback, and storage management.API.md: session recording note added to the persistent sessions section.HANDOFF.md: recording marked as implemented; design decision #18 added.
[v1.9.3] - 2026-06-09¶
Changed¶
- All Go source comments, error messages, flag descriptions, and user-visible strings translated from Spanish to English across all packages and binaries (internal/, cmd/, lab/). No behaviour change.
signer.shecho strings and inline comments translated to English._commentfields in all example JSON config files translated to English.CODING_STYLE.mdsection 10 updated: English is now required for all Go comments (including legacy code); the previous "do not change on refactors" exception is removed.
[v1.9.2] - 2026-06-09¶
Added¶
shell_parsefield inCommandPolicy— whenshell_parse: true, the command is parsed as POSIX sh viamvdan.cc/sh/v3/syntaxbefore regex evaluation. Each simple command in a pipeline or sequence (&&,||,;,|) is evaluated independently against the policy, preventing bypasses such asps aux && kill -9 1000from passing an allowlist that only coversps.
Dangerous AST nodes are rejected unconditionally regardless of configured rules:
CmdSubst ($(...)) , ProcSubst (<(...)), ArithmCmd ($((...))) and file
redirects (>, >>, <). fd-to-fd redirections (2>&1) are allowed.
Backward compatible: shell_parse defaults to false, preserving existing behavior
for all operators that do not explicitly enable it.
mvdan.cc/sh/v3 v3.13.1added as a direct dependency.
Changed¶
API.md—command_policyfield description updated to documentshell_parse.signer.example.json—web02example updated withshell_parse: true.HANDOFF.md— design decision #17 updated with implementation details and reference configuration patterns.
[v1.8.0] - 2026-06-08¶
Added¶
-
Microsoft Teams notifier (
notifier: "teams"). The control plane can now send approval-required notifications to a Microsoft Teams channel via an Incoming Webhook (Power Automate Workflow) or a legacy M365 Connector, formatted as a rich card instead of raw JSON. -
internal/control/teams.go—TeamsNotifierimplementing the existingNotifierinterface. Two payload formats supported and configurable:"workflow"/"adaptivecard"(default, recommended): Adaptive Card v1.4 wrapped in the Power Automate Workflow message envelope ({"type":"message","attachments":[...]}). Compatible with the "When a Teams webhook request is received" trigger."messagecard"(legacy): MessageCard format for tenants still using the M365 Connectors classic mechanism (Microsoft is retiring this format).
- Both formats include a
FactSet/factssection with: approval ID, status, created timestamp, host, command, caller (broker CN), end user (if present), elevation target (if sudo), and policy rule (if matched). approval_url_template— new optional config field. When set (e.g."https://approvals.example.com/requests/{id}"), a "View request" button (Action.OpenUrl/OpenUri) is added to the card. Designed as a forward-compatible hook for the Phase 2 approval bridge (cmd/approval-bridge, not yet implemented). Leave empty until the bridge is deployed.- The card never contains the ephemeral public key or any
WireRequestinternal field (thereqfield ofApprovalis unexported and excluded from serialization by design). -
NewTeamsNotifier(url, format, approvalURLTemplate string)— constructor; emptyformatdefaults to"workflow". -
Config fields in
approvalblock (control-plane.json): "notifier": "teams"— selects the Teams notifier (reuseswebhook_urlas target)."teams_format": "workflow"— card format ("workflow"default,"messagecard"legacy).-
"approval_url_template": ""— optional URL with{id}placeholder. -
internal/control/teams_test.go— 18 test cases covering both card formats, fact presence (host/command/caller/end-user/elevation/rule), approval URL template (substitution, presence/absence of action buttons per format), security (no pubkey leak), HTTP error handling (4xx/5xx), and minimal approval (no optional fields). -
Design document — Phase 2 approval bridge (HANDOFF.md, design decision #15): records the architecture for future bidirectional Teams approval (bot + bridge pattern), multi-notifier config (
notifiers: [...]), multi-channel approval (approval_channels: [...]), and trade-off analysis (Options A/B/C).
Changed¶
cmd/control-plane/main.go:Config.Approvalstruct gainsTeamsFormatandApprovalURLTemplatefields; notifier selection is now aswitchstatement (extensible) instead of a singleif.control-plane.example.json:_approval_commentupdated to document"teams",teams_format, andapproval_url_template; new fields added with empty defaults.API.md: new section "Outbound Notifications — Notifier contracts" documenting the payload format for all notifiers, the Adaptive Card and MessageCard schemas, the fact table, theapproval_url_templatefield, and the security guarantee.
Added (test coverage)¶
- Test suite — high-priority coverage (3 new test files, 47 new test cases).
internal/audit/log_test.go — first direct tests for the cryptographic audit chain (previously untested despite being the most security-critical component):
- Append: sequential Seq increment, correct PrevHash chaining (SHA-256 of previous raw line), valid Ed25519 signature per entry, signature invalidation after field tampering.
- restoreChain: new path (seq=0, prevHash=""), empty file, chain continuity across process restart (3 entries → close → reopen → 2 more entries → intact 5-entry chain), error on malformed last line.
- maybeRotate: rotation fires when maxFileSize=1, rotated file exists, new log restarts at Seq=1/PrevHash=""; rotation disabled when maxFileSize=0.
- Close: no error on normal close.
internal/broker/session_test.go — first direct tests for session management and the two security fixes applied in v1.4.1:
- sessionManager: add/get/remove happy paths, get updates lastUsed, missing-ID behaviour, global limit (maxSessionsGlobal=200), per-caller limit (maxSessionsPerCaller=20), reaper evicts idle sessions and fires onReap.
- C1 (ownership enforcement): SessionExec and CloseSession reject callers that do not own the session; session is not deleted on unauthorized close.
- M5 (newline injection): SessionExec rejects commands containing \n/\r in shell and pty modes; exec mode is unaffected.
- Internal helpers: buildElevatedExecCommand, shellQuoteSession (including single-quote escaping), elevationLabelFromPrefix, newSessionID uniqueness.
cmd/broker-ctl/main_test.go — first tests for the CLI verification logic and utility helpers:
- verifyLog: intact chain passes without --key; intact chain + correct signatures pass with --key; wrong key detects invalid signatures; seq gap detected; wrong prev_hash detected; tampered field (Caller altered post-signing) detected; empty log passes cleanly.
- lastNLines: ring buffer returns last N lines; requests larger than total return all; missing file errors correctly.
- parseAuditTime: RFC3339 and YYYY-MM-DD accepted; invalid formats rejected.
- splitComma, boolStr, auditDetail: all branches covered.
[v1.7.0] - 2026-06-08¶
Added¶
- Behavioral guardrails + rate limiting (Phase C). The control plane now tracks each agent's behavior and flags deviations — statistical/rule-based, no ML:
- Anomalies: request-rate spike, a host the agent has never used before, and a command outside its history (fingerprint = first token). The first request for a subject establishes the baseline (not flagged).
- Subject: the end-user OIDC identity when present, otherwise the broker CN.
- Modes (
behavior.modeincontrol-plane.json):off(default) ·observe(audits anomalies, never blocks) ·enforce(anomalies escalate to human approval — reusing Phase B; rate-limit excess is denied with429). - Rate limiting per subject (
behavior.rate_limit_per_min) falls out of the same tracker. - Implemented in
internal/control/behavior.go(BehaviorTracker); wired into the control plane/v1/signhandler before forwarding. - Audit (control plane): new
anomalyfield; new outcomesanomaly(observe) andrate-limited(enforce). Behavior escalations are audited asapproval-requiredwithpolicy_rule="behavior"and the anomaly list. control-plane.example.jsondocuments thebehaviorblock.
Changed¶
- Control plane
/v1/sign: dry-run requests now bypass the behavior gate and rate limit (they execute nothing).requireApprovalis now a shared helper used by both command-policy and behavior escalations.
[v1.6.0] - 2026-06-06¶
Added¶
- Control plane (
cmd/control-plane) — human-in-the-loop approval (Phase B). A new service sits between the broker and the signer (broker → control-plane → signer), enforcing approval of commands the command policy marksrequire_approval, without holding the CA key (zero-trust PEP/PDP split). Flow is asynchronous (no held connections): - Broker
POST /v1/sign→ control plane forwards to the signer. - If the command needs approval, the signer returns no certificate; the control plane creates a request, notifies out-of-band, and responds
202 {approval_id}. - Broker polls
GET /v1/sign/result/{id}. - A human approves via
broker-ctl approval allow <id>→POST /v1/approvals/{id}. - The next poll re-signs with
approved=trueand returns the certificate. One approval mints exactly one certificate. - Approval is unavoidable. The signer enforces the gate: a
require_approvalcommand is not issued unlessapproved=true, andapproved(likeon_behalf_of) is honoured only fromtrusted_forwarders(the control plane's CN). A broker going direct to the signer cannot self-approve. - Identity propagation + CN pinning.
signer.jsongainstrusted_forwarders. The control plane forwards the broker's identity viaon_behalf_of(body,/v1/sign) andX-On-Behalf-Of(header,/v1/hosts); the signer honours it only from trusted forwarders, preserving per-broker RBAC through the proxy. - Notifiers:
log(default; pair withbroker-ctl approval list) andwebhook(POST JSON, Slack-compatible). broker-ctl approval list|allow|denysubcommands (mTLS to the control plane).- Broker config
signer.approval_wait_seconds: how long the broker waits on a202before giving up. - Audit (control plane, own chained log): outcomes
forwarded,approval-required,approval-granted,approval-denied,approval-timeout,approval-decision-allow; new entry fieldsapproval_id,approved_by. - New
control-plane.example.json;signer.example.jsondocumentstrusted_forwarders;config.example.jsonshows pointing the broker at the control plane withapproval_wait_seconds.
Changed¶
signer.Remotenow handles a202response by polling the approval result;Remote.FetchHoststakes anonBehalfOfargument (broker passes"").WireRequestgainsdry_run(Phase A),on_behalf_of, andapprovedfields;Issued/WireResponsemay carry no certificate when approval is pending.
[v1.5.0] - 2026-06-06¶
Added¶
- AI-action firewall — command-level policy (Phase A). Hosts may now declare a
command_policy(insigner.jsonfor external mode, or in the broker'sconfig.jsonfor local mode) that restricts which commands a one-shotssh_executemay run: mode: "allowlist"— the command must match at least oneallowregex.mode: "denylist"— the command must not match anydenyregex.require_approval: [...]— regexes marking commands that will require human approval (orchestrated by the control plane in Phase B; the signer surfaces the flag).- Enforcement is authoritative for one-shot (the signer bakes the command into the cert's
force-command; a compromised broker cannot evade it). Rules are RE2 regexes (linear time, no catastrophic backtracking). - Hosts with any
command_policyrule reject persistent sessions (the command is not verifiable at signing time). - Implemented in new
internal/signer/cmdpolicy.go(shared library) +HostPolicy.CommandPolicy;PolicyTable.Resolvenow returns a richerDecisionstruct. - Dry-run / simulation mode. New
dry_runparameter onssh_execute: resolves host policy (allow/deny + whether approval would be required) and returns the decision without connecting or executing. Lets the model preview an action before committing. Threaded throughIntent.DryRun→WireRequest.dry_run→WireResponse.decision; the broker short-circuits before dialing. - Audit: new
policy_ruleanddry_runfields on audit entries; new outcomesdry_run_allowed/dry_run_denied. signer.example.json:web02now demonstrates acommand_policy(allowlist +require_approval).
Changed¶
PolicyTable.Resolvesignature changed from(ca.Constraints, string, error)to(Decision, error)(internal API; all call sites and tests updated).
[v1.4.6] - 2026-06-05¶
Added¶
cmd/broker-ctl:auditsubcommand with three sub-subcommands:audit tail --log <path> [-n N]— streams new audit log entries in real time (polls every 500 ms, handles log rotation by size decrease); shows last N lines before following.audit show --log <path> [--host] [--caller] [--outcome] [--serial] [--since] [--limit] [--json]— searches and filters audit entries;--jsonemits raw JSON lines compatible withjq.audit verify --log <path> [--key seed]— verifies SHA-256 hash chain integrity; optionally verifies Ed25519 signatures when--keyis provided. Exits 1 and prints affected sequence numbers on failure.USAGE.md§7 "Reviewing audit logs": live tail usage, filter examples,jqpipelines for correlation byserial,verifyexamples with and without--key, and full audit entry field reference table.HANDOFF.md: broker-ctl section expanded with allauditsubcommand examples (tail, show, show --json, verify with/without key).
[v1.4.5] - 2026-06-05¶
Added¶
USAGE.md: practical usage guide for all five MCP tools (ssh_list_servers,ssh_execute,ssh_session_open,ssh_session_exec,ssh_session_close). Covers one-shot commands, persistent sessions (exec/shell/pty modes), sudo escalation, PTY usage, common operational patterns, error handling, and a quick-reference table.HANDOFF.md: added mandatoryUSAGE.mdupdate rule (step 4 in "Mandatory pre-commit checklist") — must be updated when a tool is added, removed, renamed, or its parameters/behaviour change.
[v1.4.4] - 2026-06-05¶
Added¶
API.md: new dedicated API reference document covering all HTTP endpoints across all three services — signer (POST /v1/sign,GET /v1/hosts,POST /v1/reload), broker HTTP (POST /v1/ssh_run), and MCP HTTP (GET /.well-known/oauth-protected-resource+ Streamable HTTP tools). Each endpoint documents auth requirements, request/response schemas, error codes, and audit outcomes. Includes audit log field reference, outcome value table,jqcorrelation examples, and Ed25519 chain integrity description.README.md:## API Referencesection replaced with a summary table + link toAPI.md.HANDOFF.md: added mandatoryAPI.mdupdate rule (step 3 in pre-commit checklist) and English language rule for all new commit messages, documentation files, and code comments.
Changed¶
README.md: full rewrite in English. All sections translated; reorganized to match current feature set (v1.4.3).
[v1.4.3] - 2026-06-05¶
Added¶
README.md: section## Client-to-broker authenticationwith a comparison table of the three frontends, step-by-step OAuth2/OIDC flow, and identity propagation diagram to the signer.README.md: section## Broker-to-SSH-server authenticationwith diagrams of ephemeral key-pair generation, certificate signing by the signer (cert fields: principal, TTL, source-address, force-command, permit-pty), SSH handshake with pinned host key verification, sshd checks, and ProxyJump flow with independent cert per hop.
Fixed¶
README.md: added missing## Testingheader above the lab bash block.
[v1.4.2] - 2026-06-05¶
Added¶
README.md: section "Registering the MCP in OpenCode" with the correct config for~/.config/opencode/opencode.json(type: "local",commandas array).
[v1.4.1] - 2026-06-05¶
Security¶
- C1 (critical)
internal/broker/session.go:SessionExecandCloseSessionverify that the caller owns the session before operating;CloseSessionperforms get-before-delete to avoid removing sessions owned by other callers. - A1 (high)
cmd/signer/main.go,cmd/mcp-broker-http/main.go:ReadTimeout,WriteTimeout(signer only),IdleTimeoutonhttp.Server. - A2 (high)
cmd/signer/main.go,internal/signer/remote.go:http.MaxBytesReader(64 KiB)on/v1/sign;io.LimitReader(1 MiB)on bothio.ReadAllcalls inremote.go. - A3 (high)
internal/ssh/run.go,internal/ssh/shell.go:defaultExecTimeout=10 min;maxOutputBytes=10 MiB;limitedWriter;session.Signal(SIGTERM)on timeout; shell/pty silently discards excess bytes. - A4 (high)
internal/audit/log.go:restoreChain()withbufio.Scanner(256 KiB buffer) restoresseq+prevHashfrom the last record on restart; without this fix the broker broke the audit chain on every restart. - M1 (medium)
internal/broker/engine.go,cmd/signer/main.go:auditLog.Appenderrors are no longer silenced with_ =; logged vialog.Printf. - M2 (medium)
internal/broker/session.go:maxSessionsGlobal=200,maxSessionsPerCaller=20;sessionManager.add()returnserror. - M3 (medium)
internal/oauth/verifier.go,internal/broker/engine.go,cmd/mcp-broker-http/main.go:MaxTokenAgefield inConfig/Verifier; validates theiatclaim whenmaxTokenAge > 0;OAuthConfig.MaxTokenAgeSeconds(recommended: 3600). - M5 (medium)
internal/broker/session.go:SessionExecrejects commands containing\nor\r. - L1 (low)
internal/ca/sign.go:LoadCAFromPEMemits a[WARN]at runtime indicating lab-only use. - L2 (low)
internal/audit/log.go:maybeRotate()rotates the audit file when it exceeds 100 MiB, renaming to<path>.20060102T150405Z. - L4 (low)
internal/mcpserver/tools.go:validateInput()limits all input fields to 64 KiB and rejects null bytes; called in all 4 tool handlers before reaching the engine.
[v1.4.0] - 2026-06-04¶
Added¶
- Remote MCP frontend
cmd/mcp-broker-http: Streamable HTTP + OAuth2/OIDC (RFC 9728 + Authorization Code + PKCE). - Local OIDC bearer token validation against the issuer's JWKS (
go-oidc): no round-trip per request, noclient_secret. - OIDC identity (
user_claim, e.g.preferred_username) asCaller.IDin the broker's audit log. - Per-end-user RBAC: when the token carries
groups_claim, the groups are propagated to the signer asEndUserGroups; the signer requireshp.Groups ∩ EndUserGroups ≠ ∅(in addition to mTLS CN RBAC). /.well-known/oauth-protected-resource(RFC 9728) for Authorization Server discovery by the MCP client.internal/mcpserver: tools extracted to a shared package; both frontends (stdio and HTTP) use the sameRegister(eng, callerFn).internal/oauth/verifier.go:NewVerifier+VerifywithUserID,Scopes, and groups extraction; tests with fake OIDC IdP (httptest+go-joseRSA).internal/auth/mtls.go:ServerTLSConfigNoClientAuthfor the HTTP+OAuth frontend (TLS without mTLS).OAuthConfigandResourceURLinbroker.Config; injectableCallerFuncinmcpserver.New.
Changed¶
- MCP tool descriptions improved to reduce model errors:
ssh_executeandssh_session_open: explicit guidance not to retry whenallow_sudo/allow_ptyis false.executeOutput: documentedexit_code(command failure ≠ tool error),stderr(empty with pty), andserial(audit only).ttl_seconds: clarified as optional; the host policy maximum is used when omitted.- Cross-reference
ssh_executevsssh_session_open: when to prefer each. ssh_session_open/ssh_session_close: warning to always close the session.ssh_session_exec: documents state persistence by mode.ssh_list_servers: explains whatallow_sudo/allow_ptyfalse implies.sessionOpenInput.mode: describes the three modes with concrete use cases.- MCP server
Implementationversion synchronised:0.2.0→1.2.0.
[v1.2.0] - 2026-06-04¶
Added¶
ssh_list_serversnow returns per-host capabilities:allow_sudo,allow_pty, andjump, so the model can choose the correct execution strategy without attempting and failing.GET /v1/hostsfrom the signer includesallow_sudoandallow_ptyin the response (WireHostInfo).HostInfoandServerInfo(broker internal) propagateAllowSudo/AllowPTYfrom both modes (local and remote).ssh_executeandssh_session_opendescriptions updated to instruct the model to check capabilities before usingsudo/pty.
[v1.1.1] - 2026-06-04¶
Fixed¶
- Signer audit: the
hostfield now records the real FQDN/addr (hp.Addr) instead of the short logical name. - Signer audit: the
userandprincipalfields are now correctly populated inissuedanddeniedevents.
[v1.1.0] - 2026-06-04¶
Added¶
- CLI
broker-ctl(cmd/broker-ctl) for managingsigner.jsonwithout editing JSON by hand: host add: adds or updates a host with all its parameters;--scanrunsssh-keyscanautomatically.host list: formatted table of hosts with addr, user, principal, TTL, sudo, PTY, groups.host remove: removes a host from the configuration.reload: SIGHUP when the signer runs locally (detectssigner.pid), POST/v1/reloadmTLS as fallback.- Preserves
_commentfields and JSON annotations when writing (atomic write via rename).
[v1.0.0] - 2026-06-03¶
Added¶
- SSH broker with in-memory Ed25519 ephemeral key generation (keys never touch disk).
- External signing service (
cmd/signer) with exclusive SSH CA key custody via HTTPS+mTLS. - MCP stdio interface (
cmd/mcp-broker): toolsssh_execute,ssh_session_open,ssh_session_exec,ssh_session_close,ssh_list_servers. - ProxyJump support (multi-hop chains through a bastion).
- Policy-gated
sudo NOPASSWDelevation in the signer:allow_sudo,allowed_sudo_users, anti-injection sanitisation. - Persistent sessions with three modes:
exec,shell(no PTY), andpty(with PTY). - PTY support in one-shot and sessions:
allow_ptyper host,permit-ptyin the certificate. - Hot-reload of
signer.jsonwithout restart:SIGHUPandPOST /v1/reload(mTLS, gated byreload_callers). - Triple signed and hash-chained audit by
serial(Ed25519 + SHA-256): signer, broker, and sshd correlated. - Group-based RBAC:
groupsfield per host andcallerssection insigner.json;GET /v1/hostsfilters by caller groups,POST /v1/signrejects out-of-group hosts beforeResolve(). - Alternative HTTP+mTLS frontend (
cmd/broker) for one-shot use without MCP. - Local PKI generated: Ed25519 SSH CA, mTLS CA, server/client certs, audit seeds.
- End-to-end lab scripts:
lab/run_mcp_lab.sh,lab/run_signer_lab.sh.