Holding HTTP open for 590 seconds so a Stream Deck key can approve a tool call

Table of Contents

Claude Code wants to run a shell command. I want to press a physical Stream Deck key — the YES key, two inches to the left of my keyboard — to approve it. The hook gets exactly one HTTP response to decide allow vs deny. The key press might land in 200 milliseconds; it might land seven minutes later, after I've been pulled into a meeting and come back. The trick is that Claude Code's hook timeout is 600 seconds, which turns out to be just enough headroom to hold the HTTP response open the whole time and let a hardware button write the answer.

(Setup, for anyone who hasn’t seen this stack before: Claude Code is Anthropic’s terminal CLI for Claude, and one of its hook events — PreToolUse — is a script Claude spawns and waits on before running a tool like Bash or Edit. The script’s stdout decides “allow” / “deny” / “ask”. Stream Deck is Elgato’s USB grid of programmable LCD keys. The plumbing I’m describing here lives in a daemon — a background process at 127.0.0.1:9127 — that the hook script POSTs to and that the Stream Deck plugin connects to over WebSocket. For the hooks docs themselves and the four other gotchas in that layer, see the hooks-reality post.)

Update (2026-05-10): The architecture in this post was the v1 design. The current ClaudeDeck daemon has shifted to a fire-and-forget hook + PTY-keystroke-injection model — PTY (pseudo-terminal — the kernel object behind every interactive shell, the pair of file descriptors that lets one process pretend to be a keyboard typing at another) ring-buffer interactions made hold-open unreliable in long sessions, and routing the approval through the same TTY (the controlling terminal Claude is reading stdin from) removed a class of bugs. The lessons below still apply if you’re designing a sync→async permission gate from scratch, but the live daemon/src/server.ts no longer holds the response open. See the PTY-wrap follow-up post for why the migration happened.

This post is about one feature in ClaudeDeck: when Claude Code wants to run a tool, my Stream Deck lights up YES / NO / ALWAYS keys. I press one. Claude proceeds. It works, but the design took several iterations and there are still rough edges I’m chewing on.

ClaudeDeck Stream Deck layout: middle row shows YES, NO, and ALL approval keys (left to right) — these are the hardware buttons that fulfil the permission-gate HTTP response

The YES / NO / ALL keys in the middle-left of that grid are the ones this post is about — the hardware buttons whose physical press has to make it back through a held-open HTTP response in time for Claude Code to receive a decision.

The shape of the gate
#

Claude Code’s PreToolUse hook is a synchronous gate. Claude pauses, runs your hook command, and reads its stdout (or exit code) to decide whether the tool runs. The relevant stdout shape:

{
  "hookSpecificOutput": {
    "hookEventName": "PreToolUse",
    "permissionDecision": "allow"
  }
}

permissionDecision is one of allow, deny, or ask (the last falls through to Claude’s built-in terminal prompt). The hook’s per-call timeout is 600 seconds — Claude waits up to ten minutes for your hook to print JSON and exit. I’ll come back to that number; it’s the entire reason this design works.

Without that ceiling, there’d be no point pausing for human input. The hook would have to return immediately and the gate would degenerate into “auto-allow everything”.

The daemon: hold the HTTP response
#

The hook command is a small shell script that POSTs the hook payload to my daemon and waits for the response body:

#!/bin/bash
curl -sf -X POST "http://127.0.0.1:9127/hooks/PreToolUse" \
  -H "Content-Type: application/json" \
  --max-time 590 \
  -d @- <<< "$(cat)"

590s, not 600s, so we have a 10s safety margin under Claude’s own ceiling. (More on why that margin matters when we get to the race conditions.)

On the daemon side, the POST /hooks/PreToolUse handler does this:

Parse the hook payload. Extract the session ID.
Check: is at least one Stream Deck plugin connected via WebSocket? (sockets.size > 0)
If no plugin is connected, return {"hookSpecificOutput":{"permissionDecision":"ask"}} immediately. Claude falls back to its terminal prompt.
If a plugin is connected, register a pending permission keyed by session ID, broadcast permission:pending over WebSocket to all subscribed plugins, then hold the HTTP response open waiting for the resolution.
When permission:respond arrives from the plugin (the user pressed YES/NO/ALWAYS), look up the pending permission, write the decision into the still-open HTTP response, close.
If the 590s timer fires first, write an empty body. Claude sees no JSON, falls through to whatever its default behavior is for the active permission rules.

The hold-open is implemented as a Promise (JavaScript’s deferred-value primitive — a placeholder you can await on, that some other code path resolves with the answer when it’s ready) that resolves when the WebSocket handler delivers a verdict:

async function holdForPermission(
  sessionId: string,
  capMs: number,
): Promise<PermissionDecision | "timeout"> {
  return new Promise((resolve) => {
    const pending = { resolve, timer: setTimeout(() => {
      pendingPermissions.delete(sessionId);
      resolve("timeout");
    }, capMs) };
    pendingPermissions.set(sessionId, pending);
  });
}

// Resolved by the WebSocket handler when the plugin sends permission:respond.
function resolvePending(sessionId: string, decision: PermissionDecision) {
  const pending = pendingPermissions.get(sessionId);
  if (!pending) return;
  clearTimeout(pending.timer);
  pendingPermissions.delete(sessionId);
  pending.resolve(decision);
}

In the HTTP handler:

const decision = await holdForPermission(sessionId, 590_000);
if (decision === "timeout") {
  return new Response('{"hookSpecificOutput":{"permissionDecision":"ask"}}', { status: 200 });
}
return new Response(JSON.stringify({
  hookSpecificOutput: { hookEventName: "PreToolUse", permissionDecision: decision }
}), { status: 200 });

No queues, no message brokers, no Redis (the in-memory key-value store people typically reach for when they want fast cross-process messaging). The hold-open Promise is the queue. The HTTP response is the channel.

Picking the timeout
#

I shipped 5 seconds first. Placeholder while I built the plumbing. Even before I tested with another human, the number broke — I’d be sitting in front of the Stream Deck, Claude would want to run Edit, I’d be reading the diff Claude printed beforehand, and 5 seconds in the hook had already timed out and Claude had fallen through to the terminal prompt.

Then 20s. Better. Still too short for any tool call where I wanted to read the input first (anything that touches a file, anything that runs a command I haven’t reviewed). Missed presses constantly.

Then 60s. Comfortable for routine tool calls. Still too short when Slack pulled me out of the loop mid-turn and I came back to find the gate gone.

Then 300s. Comfortable for everything except long context switches. But the long context switches — pulled into a meeting, walked to coffee, came back 8 minutes later — are the cases that hurt most, because the response is then “Claude waited 5 minutes for me, gave up, and I have no idea what state we’re in”.

Finally 590s. Claude’s own 600s ceiling minus a 10s safety margin. It’s the maximum hold I can offer without risking Claude timing out before the daemon does and ending in a weird “decision arrived but Claude already gave up” race. I haven’t found a case where 590s feels too short.

The lesson here ended up being smaller than I expected and also more general: size for the maximum reasonable human latency, not the median. The median is forgiving. Two seconds, five seconds, twenty seconds — they all “work” for the median. What breaks is the max. Optimizing for the max means the median works fine and you have headroom. Optimizing for the median means the max case is permanently broken.

The gating-everything problem
#

A design tension I haven’t fully resolved.

Claude Code has multiple permission modes: default, accept-edits, bypassPermissions. In default, Claude prompts before running tools. In accept-edits, Claude auto-allows edit-related tools but still prompts for shells. In bypassPermissions, Claude auto-allows everything.

My PreToolUse hook fires regardless of permission mode. The hook payload doesn’t include the current mode. So the daemon can’t see whether Claude would have auto-allowed this tool call. The user is stuck pressing YES on every single tool, including ones Claude was going to allow anyway.

In bypassPermissions mode, pressing YES 200 times per session is friction the user shouldn’t have to put up with. But I can’t conditionalize the hook on the mode, because the mode isn’t in the payload.

I have four candidate fixes. They’re worth comparing side-by-side rather than wrapping in prose, because the right answer is “some combination”:

Option A — Parse permissions.allow rules from settings. Daemon reads ~/.claude/settings.local.json (and the project-local copy), parses the same patterns Claude uses, matches the tool call against them. If the rule would allow, return immediately, no hold. Most accurate. Most code to write. Has to fsWatch the file so rule changes take effect without a daemon restart.
Option B — A Stream Deck “mode” key that mirrors Claude’s permission mode. User toggles it manually between GATE / AUTO / SMART. Daemon branches on the toggle. Cheap to build. Requires user discipline: if the Stream Deck mode and Claude’s actual mode drift, the user gets surprised.
Option C — A hardcoded allow-list of “always safe” tools (Read, Glob, LS, Task*). Pragmatic, ~30 minutes of work, ~80% of the noise gone. Misses edge cases — short read-only Bash calls, MCP tools the user trusts.
Option D — All of the above. Hardcoded list for the common case. Allowlist parsing for full coverage. Mode toggle for explicit override.

I’m leaning toward C as the starting point and graduating to D over time. The full design notes live in docs/2026-04-22-gating-behavior.md in the repo.

The “plugin connected but device unplugged” gotcha
#

This one I caught the first time my Stream Deck cable got snagged on my chair.

If the physical Stream Deck device is unplugged but the Stream Deck app is still running, the daemon’s sockets.size > 0 check still returns true. The plugin’s WebSocket is still alive — it just can’t reach the device anymore. The daemon dutifully holds the hook for 590 seconds waiting for a key press that physically cannot arrive.

Two ways out:

User workaround. Quit Stream Deck app or run streamdeck stop com.nickboy.claudedeck to kill the WebSocket subscriber. Daemon goes to fire-and-forget; Claude proceeds via its own allow-list.
Plugin-side fix. Listen for onDeviceDidDisconnect from the SDK and notify the daemon. Then the daemon’s “plugin connected” check becomes “plugin connected AND device present”. Not shipped yet.

This is the kind of edge case that’s invisible until it bites you. Worth saying once: if you’re holding open an HTTP request waiting on a hardware action, you need a story for “the hardware physically isn’t there anymore”. A connected WebSocket is not the same thing as a reachable hand.

Diagnosing a press that didn’t make it
#

When a Stream Deck press doesn’t seem to reach Claude, the chain has five steps and three logs to cross-reference. The decision tree, which lives in the repo’s docs/2026-04-21-yes-button-bug.md:

1. claude --debug hooks --debug-file /tmp/claude-hooks.log
2. ~/.claudedeck/daemon.log
3. ~/.claudedeck/plugin.log

Plugin log on press        | Daemon log on press       | Suspect
---------------------------|---------------------------|---------------------------
target=none                | (no new ws-cmd line)      | Wrong session focused
target=abc...              | (no new ws-cmd line)      | Plugin WS disconnected
target=abc...              | hasPending=false          | Pending already cleared
target=abc...              | hasPending=true           | Chain is correct; check
                           |                           | settings.local.json's
                           |                           | permissions.ask rule

Each row corresponds to a different broken link (focus / WS / state / settings). The triangulation tells you where to look without making you guess. Single-log debugging on a four-process pipeline is mostly vibes; three-log debugging actually narrows it down.

The last row in particular caught me twice. Claude Code auto-writes ask rules into settings.local.json on first denial, and ask rules silently override hook "allow" decisions. The Stream Deck press lands, the daemon happily writes permissionDecision: "allow" into the held response, Claude reads it… and then prompts in the terminal anyway because the ask rule fires before the hook decision is consulted. From the user’s seat: “I pressed YES, the daemon log says it worked, why is Claude still asking?”. The answer is in the settings file.

Lessons
#

Holding HTTP requests is a viable IPC pattern when one side is single-purpose and the other has a generous timeout. No queues, no brokers — the Promise is the queue, the HTTP response is the channel. The pattern lives or dies on the timeout headroom.
Size the hold for the maximum tolerable human latency, not the median. Median latency is forgiving; max latency is what breaks the experience. Anything else is optimizing for the case that already works.
Always have a graceful fall-through when the gate can’t render. No plugin connected? Return ask immediately so Claude can use its own prompt. Never silently block on a UI that isn’t there.
The hook payload doesn’t include enough state to be context-aware. Permission mode, allow-list rules, current focus — none of it’s in the payload. Anything mode-aware has to be reconstructed daemon-side, which is fragile and probably wrong half the time.
Three logs beat one when the pipeline crosses process boundaries. Single-log debugging is guesswork; cross-referenced logs tell you which link in the chain broke.

The full design notes and the related yes-button-bug post-mortem are in docs/2026-04-21-yes-button-bug.md and docs/2026-04-22-gating-behavior.md in the ClaudeDeck repo if you want the raw debugging notes — including the architecture I considered (and rejected) of writing the keystroke into Claude’s PTY instead of returning JSON through the hook.

For the companion post on how the hook payload actually reaches the daemon in the first place (stdin, not env var; the docs are misleading), see How Claude Code hooks actually work.

References
#

Claude Code hooks reference (PreToolUse event, hookSpecificOutput schema, permissionDecision values): https://docs.claude.com/en/docs/claude-code/hooks
The 600-second default per-hook timeout is referenced in the same docs page. ClaudeDeck’s 590s --max-time is a 10s safety margin under that ceiling, set after measuring that the daemon’s own timer needed to fire strictly before Claude’s to avoid a “decision arrived but Claude already gave up” race.
AgentDeck — the upstream agent-runtime project that pioneered the HTTP-hold-for-permission-input pattern that ClaudeDeck adopted: https://github.com/puritysb/AgentDeck/tree/master/bridge
ClaudeDeck’s permission hold-open implementation: daemon/src/server.ts (POST /hooks/PreToolUse handler, holdForPermission, and the responseToKeystroke mapping if you want the keystroke-injection variant).
Debugging decision tree for “press didn’t reach Claude”: docs/2026-04-21-yes-button-bug.md
Four candidate fixes for the gate-everything problem: docs/2026-04-22-gating-behavior.md
Companion post: How Claude Code hooks actually work

The Claude Code hooks docs are wrong. Here's what's actually on the wire.

10 May 2026·1719 words·9 mins

AI & Productivity Claude-Code Hooks Ai Developer-Tools Claudedeck

I wrote a daemon to listen to Claude Code hooks. My first version read `$CLAUDE_HOOK_PAYLOAD` and logged empty bodies for two days straight. The payload was sitting on stdin the whole time. This post is the five gotchas I hit while wiring up ClaudeDeck — a Stream Deck plugin (a small program that runs inside Elgato’s Stream Deck app on the USB grid of programmable LCD keys) that talks to Claude Code over its hooks system. Claude Code is Anthropic’s terminal CLI for Claude — claude in your shell — and its hooks are user-defined scripts it spawns at certain points in a session (before a tool call, on session start, on prompt submit). My daemon is a long-running background process the plugin and the hooks both talk to over a local socket. None of the gotchas are exotic. All of them cost me hours. Each one is a place where the docs were either silent, ambiguous, or contradicted by tribal knowledge I picked up from other people’s projects.

I polled an undocumented endpoint for 18 hours. The data was on stdin.

10 May 2026·2147 words·11 mins

AI & Productivity Claude-Code Rate-Limiting Api Statusline Claudedeck

My daemon logged 111 consecutive HTTP 429s against `https://api.anthropic.com/api/oauth/usage` over an 18-hour stretch, with zero successful responses ever in its lifetime. The poller was reading `Retry-After: 272` and ignoring it. While I was arguing with the backoff, Claude Code was pushing the same `rate_limits.five_hour` and `rate_limits.seven_day` numbers to my statusline command every turn, on stdin, for free. (Quick framing: Claude Code is Anthropic’s terminal CLI for Claude; Claude Max is the higher-tier subscription plan with weekly and 5-hour usage windows. HTTP 429 is “Too Many Requests” — the server’s polite way of saying “back off.” Retry-After is the response header that tells the client how long to wait. OAuth is the auth protocol Claude Code uses to talk to Anthropic on behalf of a logged-in user. And the statusline — the same one I covered in the statusline side-channel post — is the script Claude Code spawns every turn with a JSON blob on stdin.)

I split my daemon in two so a Node subprocess could own the PTY

10 May 2026·2329 words·11 mins

AI & Productivity Claude-Code Pty Tty Shell-Integration Bun Node Claudedeck

I built a Claude Code permission gate that holds an HTTP response open until a Stream Deck key is pressed. Then I needed to inject a keystroke into Claude Code's own TTY so a key press could write `1\r` straight into Claude's stdin. Bun can hold HTTP open all day. Bun cannot reliably wrap a child PTY through `node-pty` and capture the parent shell's PID. So I split my daemon: HTTP and WebSocket stay on Bun, and a Node CommonJS subprocess owns the PTY that runs Claude. (Quick grounding before the story: a PTY — pseudo-terminal — is the kernel object every interactive shell talks to. It’s a pair of file descriptors, master and slave; the program reads/writes the slave end as if it were a real terminal, and anything you write to the master end looks to that program like a human typing. The TTY is the slave end seen from the child’s side. node-pty is Microsoft’s library that gives a JavaScript parent process a writable handle to the master. Bun is a JavaScript runtime — Node’s faster sibling — and Node CommonJS is plain old require()-based Node, no transpile step. The story below is about which runtime owns the PTY.)

The shape of the gate #

The daemon: hold the HTTP response #

Picking the timeout #

The gating-everything problem #

The “plugin connected but device unplugged” gotcha #

Diagnosing a press that didn’t make it #

Lessons #

References #

Related