(Setup, for anyone who hasn’t seen this stack before: Claude Code is Anthropic’s terminal CLI for Claude, and one of its hook events — PreToolUse — is a script Claude spawns and waits on before running a tool like Bash or Edit. The script’s stdout decides “allow” / “deny” / “ask”. Stream Deck is Elgato’s USB grid of programmable LCD keys. The plumbing I’m describing here lives in a daemon — a background process at 127.0.0.1:9127 — that the hook script POSTs to and that the Stream Deck plugin connects to over WebSocket. For the hooks docs themselves and the four other gotchas in that layer, see the hooks-reality post.)
daemon/src/server.ts no longer holds the response open. See the PTY-wrap follow-up post for why the migration happened.
This post is about one feature in ClaudeDeck: when Claude Code wants to run a tool, my Stream Deck lights up YES / NO / ALWAYS keys. I press one. Claude proceeds. It works, but the design took several iterations and there are still rough edges I’m chewing on.

The YES / NO / ALL keys in the middle-left of that grid are the ones this post is about — the hardware buttons whose physical press has to make it back through a held-open HTTP response in time for Claude Code to receive a decision.
The shape of the gate #
Claude Code’s PreToolUse hook is a synchronous gate. Claude pauses, runs your hook command, and reads its stdout (or exit code) to decide whether the tool runs. The relevant stdout shape:
{
"hookSpecificOutput": {
"hookEventName": "PreToolUse",
"permissionDecision": "allow"
}
}permissionDecision is one of allow, deny, or ask (the last falls through to Claude’s built-in terminal prompt). The hook’s per-call timeout is 600 seconds — Claude waits up to ten minutes for your hook to print JSON and exit. I’ll come back to that number; it’s the entire reason this design works.
Without that ceiling, there’d be no point pausing for human input. The hook would have to return immediately and the gate would degenerate into “auto-allow everything”.
The daemon: hold the HTTP response #
The hook command is a small shell script that POSTs the hook payload to my daemon and waits for the response body:
#!/bin/bash
curl -sf -X POST "http://127.0.0.1:9127/hooks/PreToolUse" \
-H "Content-Type: application/json" \
--max-time 590 \
-d @- <<< "$(cat)"590s, not 600s, so we have a 10s safety margin under Claude’s own ceiling. (More on why that margin matters when we get to the race conditions.)
On the daemon side, the POST /hooks/PreToolUse handler does this:
- Parse the hook payload. Extract the session ID.
- Check: is at least one Stream Deck plugin connected via WebSocket? (
sockets.size > 0) - If no plugin is connected, return
{"hookSpecificOutput":{"permissionDecision":"ask"}}immediately. Claude falls back to its terminal prompt. - If a plugin is connected, register a pending permission keyed by session ID, broadcast
permission:pendingover WebSocket to all subscribed plugins, then hold the HTTP response open waiting for the resolution. - When
permission:respondarrives from the plugin (the user pressed YES/NO/ALWAYS), look up the pending permission, write the decision into the still-open HTTP response, close. - If the 590s timer fires first, write an empty body. Claude sees no JSON, falls through to whatever its default behavior is for the active permission rules.
The hold-open is implemented as a Promise (JavaScript’s deferred-value primitive — a placeholder you can await on, that some other code path resolves with the answer when it’s ready) that resolves when the WebSocket handler delivers a verdict:
async function holdForPermission(
sessionId: string,
capMs: number,
): Promise<PermissionDecision | "timeout"> {
return new Promise((resolve) => {
const pending = { resolve, timer: setTimeout(() => {
pendingPermissions.delete(sessionId);
resolve("timeout");
}, capMs) };
pendingPermissions.set(sessionId, pending);
});
}
// Resolved by the WebSocket handler when the plugin sends permission:respond.
function resolvePending(sessionId: string, decision: PermissionDecision) {
const pending = pendingPermissions.get(sessionId);
if (!pending) return;
clearTimeout(pending.timer);
pendingPermissions.delete(sessionId);
pending.resolve(decision);
}In the HTTP handler:
const decision = await holdForPermission(sessionId, 590_000);
if (decision === "timeout") {
return new Response('{"hookSpecificOutput":{"permissionDecision":"ask"}}', { status: 200 });
}
return new Response(JSON.stringify({
hookSpecificOutput: { hookEventName: "PreToolUse", permissionDecision: decision }
}), { status: 200 });No queues, no message brokers, no Redis (the in-memory key-value store people typically reach for when they want fast cross-process messaging). The hold-open Promise is the queue. The HTTP response is the channel.
Picking the timeout #
I shipped 5 seconds first. Placeholder while I built the plumbing. Even before I tested with another human, the number broke — I’d be sitting in front of the Stream Deck, Claude would want to run Edit, I’d be reading the diff Claude printed beforehand, and 5 seconds in the hook had already timed out and Claude had fallen through to the terminal prompt.
Then 20s. Better. Still too short for any tool call where I wanted to read the input first (anything that touches a file, anything that runs a command I haven’t reviewed). Missed presses constantly.
Then 60s. Comfortable for routine tool calls. Still too short when Slack pulled me out of the loop mid-turn and I came back to find the gate gone.
Then 300s. Comfortable for everything except long context switches. But the long context switches — pulled into a meeting, walked to coffee, came back 8 minutes later — are the cases that hurt most, because the response is then “Claude waited 5 minutes for me, gave up, and I have no idea what state we’re in”.
Finally 590s. Claude’s own 600s ceiling minus a 10s safety margin. It’s the maximum hold I can offer without risking Claude timing out before the daemon does and ending in a weird “decision arrived but Claude already gave up” race. I haven’t found a case where 590s feels too short.
The lesson here ended up being smaller than I expected and also more general: size for the maximum reasonable human latency, not the median. The median is forgiving. Two seconds, five seconds, twenty seconds — they all “work” for the median. What breaks is the max. Optimizing for the max means the median works fine and you have headroom. Optimizing for the median means the max case is permanently broken.
The gating-everything problem #
A design tension I haven’t fully resolved.
Claude Code has multiple permission modes: default, accept-edits, bypassPermissions. In default, Claude prompts before running tools. In accept-edits, Claude auto-allows edit-related tools but still prompts for shells. In bypassPermissions, Claude auto-allows everything.
My PreToolUse hook fires regardless of permission mode. The hook payload doesn’t include the current mode. So the daemon can’t see whether Claude would have auto-allowed this tool call. The user is stuck pressing YES on every single tool, including ones Claude was going to allow anyway.
In bypassPermissions mode, pressing YES 200 times per session is friction the user shouldn’t have to put up with. But I can’t conditionalize the hook on the mode, because the mode isn’t in the payload.
I have four candidate fixes. They’re worth comparing side-by-side rather than wrapping in prose, because the right answer is “some combination”:
- Option A — Parse
permissions.allowrules from settings. Daemon reads~/.claude/settings.local.json(and the project-local copy), parses the same patterns Claude uses, matches the tool call against them. If the rule would allow, return immediately, no hold. Most accurate. Most code to write. Has to fsWatch the file so rule changes take effect without a daemon restart. - Option B — A Stream Deck “mode” key that mirrors Claude’s permission mode. User toggles it manually between GATE / AUTO / SMART. Daemon branches on the toggle. Cheap to build. Requires user discipline: if the Stream Deck mode and Claude’s actual mode drift, the user gets surprised.
- Option C — A hardcoded allow-list of “always safe” tools (
Read,Glob,LS,Task*). Pragmatic, ~30 minutes of work, ~80% of the noise gone. Misses edge cases — short read-onlyBashcalls, MCP tools the user trusts. - Option D — All of the above. Hardcoded list for the common case. Allowlist parsing for full coverage. Mode toggle for explicit override.
I’m leaning toward C as the starting point and graduating to D over time. The full design notes live in docs/2026-04-22-gating-behavior.md in the repo.
The “plugin connected but device unplugged” gotcha #
This one I caught the first time my Stream Deck cable got snagged on my chair.
If the physical Stream Deck device is unplugged but the Stream Deck app is still running, the daemon’s sockets.size > 0 check still returns true. The plugin’s WebSocket is still alive — it just can’t reach the device anymore. The daemon dutifully holds the hook for 590 seconds waiting for a key press that physically cannot arrive.
Two ways out:
- User workaround. Quit Stream Deck app or run
streamdeck stop com.nickboy.claudedeckto kill the WebSocket subscriber. Daemon goes to fire-and-forget; Claude proceeds via its own allow-list. - Plugin-side fix. Listen for
onDeviceDidDisconnectfrom the SDK and notify the daemon. Then the daemon’s “plugin connected” check becomes “plugin connected AND device present”. Not shipped yet.
This is the kind of edge case that’s invisible until it bites you. Worth saying once: if you’re holding open an HTTP request waiting on a hardware action, you need a story for “the hardware physically isn’t there anymore”. A connected WebSocket is not the same thing as a reachable hand.
Diagnosing a press that didn’t make it #
When a Stream Deck press doesn’t seem to reach Claude, the chain has five steps and three logs to cross-reference. The decision tree, which lives in the repo’s docs/2026-04-21-yes-button-bug.md:
1. claude --debug hooks --debug-file /tmp/claude-hooks.log
2. ~/.claudedeck/daemon.log
3. ~/.claudedeck/plugin.log
Plugin log on press | Daemon log on press | Suspect
---------------------------|---------------------------|---------------------------
target=none | (no new ws-cmd line) | Wrong session focused
target=abc... | (no new ws-cmd line) | Plugin WS disconnected
target=abc... | hasPending=false | Pending already cleared
target=abc... | hasPending=true | Chain is correct; check
| | settings.local.json's
| | permissions.ask ruleEach row corresponds to a different broken link (focus / WS / state / settings). The triangulation tells you where to look without making you guess. Single-log debugging on a four-process pipeline is mostly vibes; three-log debugging actually narrows it down.
The last row in particular caught me twice. Claude Code auto-writes ask rules into settings.local.json on first denial, and ask rules silently override hook "allow" decisions. The Stream Deck press lands, the daemon happily writes permissionDecision: "allow" into the held response, Claude reads it… and then prompts in the terminal anyway because the ask rule fires before the hook decision is consulted. From the user’s seat: “I pressed YES, the daemon log says it worked, why is Claude still asking?”. The answer is in the settings file.
Lessons #
- Holding HTTP requests is a viable IPC pattern when one side is single-purpose and the other has a generous timeout. No queues, no brokers — the Promise is the queue, the HTTP response is the channel. The pattern lives or dies on the timeout headroom.
- Size the hold for the maximum tolerable human latency, not the median. Median latency is forgiving; max latency is what breaks the experience. Anything else is optimizing for the case that already works.
- Always have a graceful fall-through when the gate can’t render. No plugin connected? Return
askimmediately so Claude can use its own prompt. Never silently block on a UI that isn’t there. - The hook payload doesn’t include enough state to be context-aware. Permission mode, allow-list rules, current focus — none of it’s in the payload. Anything mode-aware has to be reconstructed daemon-side, which is fragile and probably wrong half the time.
- Three logs beat one when the pipeline crosses process boundaries. Single-log debugging is guesswork; cross-referenced logs tell you which link in the chain broke.
The full design notes and the related yes-button-bug post-mortem are in docs/2026-04-21-yes-button-bug.md and docs/2026-04-22-gating-behavior.md in the ClaudeDeck repo if you want the raw debugging notes — including the architecture I considered (and rejected) of writing the keystroke into Claude’s PTY instead of returning JSON through the hook.
For the companion post on how the hook payload actually reaches the daemon in the first place (stdin, not env var; the docs are misleading), see How Claude Code hooks actually work.
References #
- Claude Code hooks reference (
PreToolUseevent,hookSpecificOutputschema,permissionDecisionvalues): https://docs.claude.com/en/docs/claude-code/hooks - The 600-second default per-hook timeout is referenced in the same docs page. ClaudeDeck’s 590s
--max-timeis a 10s safety margin under that ceiling, set after measuring that the daemon’s own timer needed to fire strictly before Claude’s to avoid a “decision arrived but Claude already gave up” race. - AgentDeck — the upstream agent-runtime project that pioneered the HTTP-hold-for-permission-input pattern that ClaudeDeck adopted: https://github.com/puritysb/AgentDeck/tree/master/bridge
- ClaudeDeck’s permission hold-open implementation:
daemon/src/server.ts(POST /hooks/PreToolUsehandler,holdForPermission, and theresponseToKeystrokemapping if you want the keystroke-injection variant). - Debugging decision tree for “press didn’t reach Claude”:
docs/2026-04-21-yes-button-bug.md - Four candidate fixes for the gate-everything problem:
docs/2026-04-22-gating-behavior.md - Companion post: How Claude Code hooks actually work