Skip to main content

routerctl doctor — runtime health diagnostics

Diagram showing routerctl doctor read-only diagnostics combining state, status socket, and optional host probes into area checks, stable JSON or YAML output, and fail-only nonzero exit behavior

routerctl doctor runs a battery of read-only checks and reports whether this router is currently functioning as a home gateway. It does not change host state. It is designed to be used by operators, CI, monitoring agents, and downstream tools (a Prometheus exporter, the Web Console, or an LLM-assisted diagnostic).

Usage

# Run every area (default)
routerctl doctor

# Run a single area
routerctl doctor dns

# Skip host commands (resource-status checks only)
routerctl doctor --no-host

# Machine-readable output
routerctl doctor -o json
routerctl doctor -o yaml

Per-call options reuse the diagnose flag set: --config, --state-file, --no-host / --host, -o / --output, --timeout.

Areas

AreaChecks
wanEgressRoutePolicy and HealthCheck resource status; IPv4 / IPv6 default route presence (ip -4/-6 route show default).
dnsDNSResolver resource status; an A-record probe via dig @127.0.0.1.
dsliteDSLiteTunnel resource status; AFTR FQDN AAAA probe; tunnel device existence (ip link show).
dhcpv6-pdDHCPv6PrefixDelegation status (Bound, delegated prefix). PD pending is WARN by design (do not advertise stale IPv6 on the LAN).
natNAT44Rule resource status; nft list table ip routerd_nat exists.
firewallFirewallZone / FirewallPolicy resource status; nft list table inet routerd_filter exists with policy drop on the input chain (otherwise the router is permissive); Linux host check for marked routerd-owned nft tables that are present but not expected by the current config-rendered ruleset.
rollbackAt least one stored generation exists, so routerctl rollback --to is usable.
disk/var/lib/routerd and /run/routerd capacity; WARN at 90% or <256 MiB, FAIL at 98% or <64 MiB. On Linux, also fails when temporary directory invariants are broken: /tmp and /var/tmp must be root:root sticky 1777 directories.
mgmtManagement interface presence (best-effort from ManagementAccess or FirewallZone role=mgmt); WebConsole binding (FAIL/WARN on 0.0.0.0 / ::).
reconcilePer-controller reconcile error history from the read-only status socket. --since <duration> bounds the window. WARN at ≥1 error in the window, FAIL at ≥10; up to 5 sample entries are shown in the detail.
runtimerouterd's own heap / goroutine / fd footprint from the read-only status socket: heapAlloc, heapObjects, numGoroutine, numGC, openFds/maxFds. WARN when numGoroutine exceeds 10000 or open fds reach ≥80% of RLIMIT_NOFILE. Observational — never FAILs.
dynamicDynamicConfigPart freshness, masks, and override policies, including stale or masking parts that can make the effective config diverge from the intended generated state.
routesInstalled IPv4Route status compared with the Linux host FIB (ip -4 route show <destination>), including destination, type, gateway, device, preferred source when applicable, and metric. Use this as drift evidence; it does not replace dataplane probes.
pluginTrusted local plugin executable presence, permissions, and recent run freshness from plugin status where available.
hybridHybridRoute / OverlayPeer references, Selective Address Mobility config references, default-route safety, MTU estimate, optional HealthCheck status, read-only route-table observation (ip -4 route show <prefix>), and Linux SAM checks for /32 delivery routes, provider local-address absence, proxy-neighbor capture, proxy_arp, ip_forward, route lookup, warning-only rp_filter, and default-drop FORWARD policy heuristics. When the FORWARD policy table cannot be inspected, the detail distinguishes nft unavailable, permission denied, routerd_filter table absent, and other nft list table failures.
samCloudEdge SAM ownership/capture diagnostics, including provider ownership freshness, OS capture state, delivery route lookup, forwarding prerequisites, blocking reasons, and owner-table/FIB drift. When host checks are enabled, doctor sam compares endpoint-owned local rows from ownershipResolverOwnerTable with the Linux main FIB: those /32 rows must resolve to a local/cloud route instead of a SAM overlay route. Provider-secondary capture-holder rows for BGP remote owners are not subject to that local-route requirement; they may legitimately stay no-local, with delivery/forwarding checks and dataplane probes proving the overlay path. Unexpected /32 route residue inside the MobilityPool prefix is reported unless it is known provider DHCP/link state or an observed BGP return route. This is a diagnostic view only; real dataplane checks still decide CloudEdge acceptance.

Each check returns one of pass, warn, fail, or skip (the resource or signal is not present on this router).

JSON output contract

routerctl doctor -o json is a stable machine-readable interface. The shape is:

{
"summary": {
"overall": "pass", // "pass" | "warn" | "fail" | "skip"
"pass": 7,
"warn": 1,
"fail": 0,
"skip": 2
},
"checks": [
{
"area": "dns", // see Areas table above
"name": "DNSResolver/lan-resolver", // human-readable subject
"status": "warn", // "pass" | "warn" | "fail" | "skip"
"detail": "phase=Degraded,waiting=...", // optional
"remedy": "wait for or repair dependency wan-pd" // optional
}
// ...
]
}

Field guarantees:

  • summary.overall is the worst-of checks[].status (fail > warn > unknown/skip > pass).
  • summary.pass/warn/fail/skip are integer counts and sum to len(checks).
  • checks[].status is one of pass, warn, fail, skip — no other values.
  • checks[].area is one of the identifiers in the Areas table above; the set is stable.
  • checks[].name is human-readable; do not pattern-match on its exact form.
  • detail and remedy are optional, free-form text intended for operators.

For example, routerctl doctor runtime -o json surfaces routerd's own process footprint from the read-only status socket:

{
"summary": { "overall": "pass", "pass": 1, "warn": 0, "fail": 0, "skip": 0 },
"checks": [
{
"area": "runtime",
"name": "process",
"status": "pass",
"detail": "heapAlloc=11.0MiB heapObjects=84213 numGoroutine=187 numGC=14 openFds=23/1024"
}
]
}

Exit code

  • 0 — no fail checks (pass, warn, and skip are all considered non-failure for exit purposes).
  • non-zero — at least one fail check. Scriptable as routerctl doctor || alert.

warn does not fail the exit code (e.g., DHCPv6-PD not yet Bound on a fresh boot is informational). Tighten this with an explicit area selection if you want stricter gates (routerctl doctor wan exits non-zero only if wan fails).

Stability

The JSON shape, area identifiers, and status enum are part of the v1alpha1 operator contract. Future versions may add new areas and optional fields; existing area names and status values will not be renamed or repurposed in v1alpha1 minor builds.

See also