diff --git a/docs/prompts/2026-03-17-ai-copilot-integration.md b/docs/prompts/2026-03-17-ai-copilot-integration.md new file mode 100644 index 0000000..2674a9f --- /dev/null +++ b/docs/prompts/2026-03-17-ai-copilot-integration.md @@ -0,0 +1,160 @@ +# Mission Brief: Wraith AI Copilot Integration + +Full boot sequence first — CLAUDE.md, AGENTS.md, Memory MCP. Read the spec at `docs/superpowers/specs/2026-03-17-wraith-desktop-design.md` and both phase plans before you start. + +--- + +## The Mission + +Design and build a first-class AI copilot integration into Wraith. Not a chatbot sidebar. Not a prompt window. A co-pilot seat where any XO (Claude instance) can: + +1. **See what the Commander sees** — in any RDP session, receive the screen as a live visual feed (FreeRDP3 bitmap frames → vision input). No Playwright needed. The RDP session IS the browser. + +2. **Type what the Commander types** — in any SSH/terminal session, read stdout in real-time and write to stdin. Full bidirectional terminal I/O. The XO can run commands, read output, navigate filesystems, edit files, run builds — everything a human can do in a terminal. + +3. **Click what the Commander clicks** — in any RDP session, emulate mouse movements, clicks, scrolls, and keyboard input via FreeRDP3's input channel. The XO can navigate a Windows desktop, open applications, click buttons, fill forms, interact with any GUI application. + +4. **Do development work** — an XO can open an SSH session to a dev machine, cd to a repo, run a build, open an RDP session to the same machine, navigate to `localhost:3000` in a browser, and visually verify the output — all without Playwright, all through Wraith's native protocol channels. + +5. **Collaborate in real-time** — the Commander and the XO see the same sessions. The Commander can watch the XO work, take over at any time, or let the XO drive. Shared context, shared view, shared control. + +--- + +## Design Requirements + +### SSH/Terminal Integration + +The XO needs these capabilities on any active SSH session: + +- **Read terminal output** — subscribe to the `ssh:data:{sessionId}` event stream. Receive raw terminal output as it happens. +- **Write terminal input** — call `SSHService.Write(sessionId, data)` to type commands. +- **Read CWD** — use the OSC 7 CWD tracker (already built in Phase 2) to know the current directory. +- **Resize terminal** — call `SSHService.Resize(sessionId, cols, rows)` if needed. +- **SFTP operations** — use `SFTPService` methods to read/write files, upload/download, navigate the remote filesystem. + +This means the XO can: ssh into a Linux box, `cd /var/log`, `tail -f syslog`, read the output, identify an issue, `vim /etc/nginx/nginx.conf`, make an edit via stdin keystrokes, save, `systemctl restart nginx`, verify the fix — all autonomously. + +### RDP Vision Integration + +The XO needs to see the remote desktop: + +- **Frame capture** — FreeRDP3 already decodes RDP bitmap updates. Capture the current screen state as an image (JPEG/PNG) at a configurable interval or on-demand. +- **Frame → AI vision** — send the captured frame to the Claude API as an image input. The XO receives it as visual context — it can read text on screen, identify UI elements, understand application state. +- **Configurable capture rate** — the Commander controls how often frames are sent (e.g., on-demand, every 5 seconds, or continuous for active work). Token cost matters — don't stream 30fps to the API. +- **Region-of-interest** — optionally crop to a specific region of the screen for focused analysis (e.g., "watch this log window"). + +### RDP Input Emulation + +The XO needs to interact with the remote desktop: + +- **Mouse** — move to coordinates, left/right click, double-click, scroll, drag. FreeRDP3 has input channels for all of these. +- **Keyboard** — send keystrokes, key combinations (Ctrl+C, Alt+Tab, Win+R), and text strings. Support both individual key events and bulk text entry. +- **Coordinate mapping** — the XO specifies actions in terms of what it sees in the frame ("click the OK button at approximately x=450, y=320"). The integration layer maps pixel coordinates to RDP input coordinates. + +This means the XO can: connect to a Windows server via RDP, see the desktop, open a browser (Win+R → "chrome" → Enter), navigate to a URL (click address bar → type URL → Enter), read the page content via vision, interact with web applications — all without Playwright or any browser automation tool. + +### The AI Service Layer + +Build a Go service (`internal/ai/`) that: + +``` +AIService + ├── Connect to Claude API (Anthropic SDK or raw HTTP) + ├── Manage conversation context (system prompt + message history) + ├── Tool definitions for SSH, SFTP, RDP input, RDP vision + ├── Process tool calls → dispatch to Wraith services + ├── Stream responses to the frontend (chat panel) + └── Handle multi-session awareness (which sessions exist, which is active) +``` + +**Tool definitions the AI should have access to:** + +``` +Terminal Tools: + - terminal_write(sessionId, text) — type into a terminal + - terminal_read(sessionId) — get recent terminal output + - terminal_cwd(sessionId) — get current working directory + +File Tools: + - sftp_list(sessionId, path) — list directory + - sftp_read(sessionId, path) — read file content + - sftp_write(sessionId, path, content) — write file + - sftp_upload(sessionId, local, remote) + - sftp_download(sessionId, remote) + +RDP Tools: + - rdp_screenshot(sessionId) — capture current screen + - rdp_click(sessionId, x, y, button) — mouse click + - rdp_doubleclick(sessionId, x, y) + - rdp_type(sessionId, text) — type text string + - rdp_keypress(sessionId, key) — single key or combo (ctrl+c, alt+tab) + - rdp_scroll(sessionId, x, y, delta) — scroll wheel + - rdp_move(sessionId, x, y) — move mouse + +Session Tools: + - list_sessions() — what's currently open + - connect_ssh(connectionId) — open a new SSH session + - connect_rdp(connectionId) — open a new RDP session + - disconnect(sessionId) — close a session +``` + +### Frontend: The Copilot Panel + +A collapsible panel (right side or bottom) that shows the AI interaction: + +- **Chat messages** — the conversation between Commander and XO +- **Tool call visualization** — when the XO executes a tool, show what it did (e.g., "Typed `ls -la` in Terminal 1", "Clicked at (450, 320) in RDP 2", "Read /etc/nginx/nginx.conf") +- **Screen capture preview** — when the XO takes an RDP screenshot, show a thumbnail in the chat +- **Session awareness indicator** — show which session the XO is currently focused on +- **Take control / Release control** — the Commander can let the XO drive a session or take it back +- **Quick commands** — "Watch this session", "Fix this error", "Deploy this", "What's on screen?" + +### Interaction Model + +The Commander and XO interact through natural language in the chat panel. The XO has access to all tools and uses them autonomously based on the conversation: + +``` +Commander: "SSH into asgard and check if the nginx service is running" +XO: [calls connect_ssh(asgardConnectionId)] + [calls terminal_write(sessionId, "systemctl status nginx")] + [calls terminal_read(sessionId)] + "Nginx is active (running) since March 15. PID 1234, 3 worker processes. + Memory usage is 45MB. No errors in the last 50 journal lines." + +Commander: "Open RDP to dc01 and check the Event Viewer for any critical errors" +XO: [calls connect_rdp(dc01ConnectionId)] + [calls rdp_screenshot(sessionId)] + "I can see the Windows Server desktop. Opening Event Viewer..." + [calls rdp_keypress(sessionId, "win+r")] + [calls rdp_type(sessionId, "eventvwr.msc")] + [calls rdp_keypress(sessionId, "enter")] + [waits 2 seconds] + [calls rdp_screenshot(sessionId)] + "Event Viewer is open. I can see 3 critical errors in the System log from today. + Let me click into the first one..." + [calls rdp_click(sessionId, 320, 280, "left")] + [calls rdp_screenshot(sessionId)] + "Critical error: The Kerberos client received a KRB_AP_ERR_MODIFIED error + from the server dc02$. This usually indicates a DNS or SPN misconfiguration..." +``` + +--- + +## Architecture Constraints + +- **Claude API key** stored in the encrypted vault (same Argon2id + AES-256-GCM as credentials) +- **Token budget awareness** — track token usage per conversation, warn at thresholds +- **Conversation persistence** — save conversations to SQLite, resume across sessions +- **No external dependencies** — the AI service is a Go package using the Claude API directly (HTTP + SSE streaming), not a Python sidecar +- **Model selection** — configurable in settings (claude-sonnet-4-5-20250514, claude-opus-4-5-20250414, etc.) +- **Streaming responses** — SSE from Claude API → Wails events → Vue frontend, token by token + +--- + +## What to Build + +Design this system fully (spec it out), then implement it. Phase it if needed — terminal integration first (lower complexity, immediate value), then RDP vision, then RDP input. But design the whole thing upfront so the architecture supports all three from day one. + +The end state: a single Wraith window where a human and an AI work side by side on remote systems, sharing vision, sharing control, sharing context. The AI sees what you see. The AI types what you'd type. And you can take the wheel whenever you want. + +Build it. diff --git a/docs/superpowers/specs/2026-03-17-wraith-ai-copilot-design.md b/docs/superpowers/specs/2026-03-17-wraith-ai-copilot-design.md new file mode 100644 index 0000000..ff48f81 --- /dev/null +++ b/docs/superpowers/specs/2026-03-17-wraith-ai-copilot-design.md @@ -0,0 +1,439 @@ +# Wraith AI Copilot — Design Spec + +> **Date:** 2026-03-17 +> **Purpose:** First-class AI copilot integration — Claude as an XO (Executive Officer) with full terminal, filesystem, and RDP desktop access through Wraith's native protocol channels +> **Depends on:** Wraith Desktop v0.1.0 (all 4 phases complete) +> **License:** MIT (same as Wraith) + +--- + +## 1. What This Is + +An AI co-pilot that shares the Commander's view and control of remote systems. The XO (Claude) can: + +- **See** RDP desktops via FreeRDP3 bitmap frames → Claude Vision API +- **Type** in SSH terminals via bidirectional stdin/stdout pipes +- **Click** in RDP sessions via FreeRDP3 mouse/keyboard input channels +- **Read/write files** via SFTP — the same connection the terminal uses +- **Open/close sessions** — autonomously connect to hosts from the connection manager + +This is NOT a chatbot sidebar. It's a second operator with the same access as the human, working through the same protocol channels Wraith already provides. + +**Why this is unique:** No other tool does this. Existing AI coding assistants work on local files. Wraith's XO works on remote servers — SSH terminals, Windows desktops, remote filesystems — all through native protocols. No Playwright, no browser automation, no screen recording. The RDP session IS the viewport. The SSH session IS the shell. + +--- + +## 2. Architecture + +``` +┌─────────────────────────────────────────────────────────────────┐ +│ Wraith Application │ +│ │ +│ ┌─ AI Service (internal/ai/) ─────────────────────────────────┐ │ +│ │ │ │ +│ │ ┌──────────────┐ ┌───────────────┐ ┌─────────────────┐ │ │ +│ │ │ Claude API │ │ Tool Dispatch │ │ Conversation │ │ │ +│ │ │ Client │ │ Router │ │ Manager │ │ │ +│ │ │ (HTTP + SSE) │ │ │ │ (SQLite) │ │ │ +│ │ └──────┬───────┘ └───────┬───────┘ └─────────────────┘ │ │ +│ │ │ │ │ │ +│ │ │ ┌─────────────▼──────────────┐ │ │ +│ │ │ │ Tool Definitions │ │ │ +│ │ │ │ │ │ │ +│ │ │ │ Terminal: write, read, cwd │ │ │ +│ │ │ │ SFTP: list, read, write │ │ │ +│ │ │ │ RDP: screenshot, click, │ │ │ +│ │ │ │ type, keypress, move │ │ │ +│ │ │ │ Session: list, connect, │ │ │ +│ │ │ │ disconnect │ │ │ +│ │ │ └─────────────┬──────────────┘ │ │ +│ │ │ │ │ │ +│ └─────────┼──────────────────┼─────────────────────────────────┘ │ +│ │ │ │ +│ ▼ ▼ │ +│ ┌─────────────────┐ ┌──────────────────────────────────────┐ │ +│ │ Claude API │ │ Existing Wraith Services │ │ +│ │ (Anthropic) │ │ │ │ +│ │ │ │ SSHService.Write/Read │ │ +│ │ Messages API │ │ SFTPService.List/Read/Write │ │ +│ │ + Tool Use │ │ RDPService.SendMouse/SendKey │ │ +│ │ + Vision │ │ RDPService.GetFrame → JPEG encode │ │ +│ │ + Streaming │ │ SessionManager.Create/List │ │ +│ └─────────────────┘ └──────────────────────────────────────┘ │ +│ │ +│ ┌─ Frontend ─────────────────────────────────────────────────┐ │ +│ │ CopilotPanel.vue — right-side collapsible panel │ │ +│ │ ├── Chat messages (streaming, markdown rendered) │ │ +│ │ ├── Tool call visualization (what the XO did) │ │ +│ │ ├── RDP screenshot thumbnails inline │ │ +│ │ ├── Session awareness (which session XO is focused on) │ │ +│ │ ├── Control toggle (XO driving / Commander driving) │ │ +│ │ └── Quick commands bar │ │ +│ └────────────────────────────────────────────────────────────┘ │ +└──────────────────────────────────────────────────────────────────┘ +``` + +--- + +## 3. AI Service Layer (`internal/ai/`) + +### 3.1 Claude API Client + +Direct HTTP client — no Python sidecar, no external SDK. Pure Go. + +```go +type ClaudeClient struct { + apiKey string // decrypted from vault on demand + model string // configurable: claude-sonnet-4-5-20250514, etc. + httpClient *http.Client + baseURL string // https://api.anthropic.com +} + +// SendMessage sends a messages API request with tool use + vision support. +// Returns a streaming response channel for token-by-token delivery. +func (c *ClaudeClient) SendMessage(req *MessageRequest) (<-chan StreamEvent, error) +``` + +**Message format:** Anthropic Messages API v1 (`/v1/messages`). + +**Streaming:** SSE (`stream: true`). Parse `event: content_block_delta`, `event: content_block_stop`, `event: message_delta`, `event: tool_use` events. Emit to frontend via Wails events. + +**Vision:** RDP screenshots sent as base64-encoded JPEG in the `image` content block type. Resolution capped at 1280x720 for token efficiency (downscale from native resolution before encoding). + +**Token tracking:** Parse `usage` from the API response. Track `input_tokens`, `output_tokens`, `cache_creation_input_tokens`, `cache_read_input_tokens` per conversation. Store totals in SQLite. + +### 3.2 Tool Definitions + +```go +var CopilotTools = []Tool{ + // Terminal + {Name: "terminal_write", Description: "Type text into an active SSH terminal session", + InputSchema: {sessionId: string, text: string}}, + {Name: "terminal_read", Description: "Get recent terminal output from an SSH session (last N lines)", + InputSchema: {sessionId: string, lines: int (default 50)}}, + {Name: "terminal_cwd", Description: "Get the current working directory of an SSH session", + InputSchema: {sessionId: string}}, + + // SFTP + {Name: "sftp_list", Description: "List files and directories at a remote path", + InputSchema: {sessionId: string, path: string}}, + {Name: "sftp_read", Description: "Read the contents of a remote file (max 5MB)", + InputSchema: {sessionId: string, path: string}}, + {Name: "sftp_write", Description: "Write content to a remote file", + InputSchema: {sessionId: string, path: string, content: string}}, + + // RDP + {Name: "rdp_screenshot", Description: "Capture the current RDP desktop screen as an image", + InputSchema: {sessionId: string}}, + {Name: "rdp_click", Description: "Click at screen coordinates in an RDP session", + InputSchema: {sessionId: string, x: int, y: int, button: string (default "left")}}, + {Name: "rdp_doubleclick", Description: "Double-click at coordinates", + InputSchema: {sessionId: string, x: int, y: int}}, + {Name: "rdp_type", Description: "Type a text string into the RDP session", + InputSchema: {sessionId: string, text: string}}, + {Name: "rdp_keypress", Description: "Press a key or key combination (e.g. 'enter', 'ctrl+c', 'alt+tab', 'win+r')", + InputSchema: {sessionId: string, key: string}}, + {Name: "rdp_scroll", Description: "Scroll the mouse wheel at coordinates", + InputSchema: {sessionId: string, x: int, y: int, delta: int}}, + {Name: "rdp_move", Description: "Move the mouse cursor to coordinates", + InputSchema: {sessionId: string, x: int, y: int}}, + + // Session Management + {Name: "list_sessions", Description: "List all active SSH and RDP sessions", + InputSchema: {}}, + {Name: "connect_ssh", Description: "Open a new SSH session to a saved connection", + InputSchema: {connectionId: int}}, + {Name: "connect_rdp", Description: "Open a new RDP session to a saved connection", + InputSchema: {connectionId: int}}, + {Name: "disconnect", Description: "Close an active session", + InputSchema: {sessionId: string}}, +} +``` + +### 3.3 Tool Dispatch Router + +```go +type ToolRouter struct { + ssh *ssh.SSHService + sftp *sftp.SFTPService + rdp *rdp.RDPService + sessions *session.Manager + connections *connections.ConnectionService +} + +// Dispatch executes a tool call and returns the result +func (r *ToolRouter) Dispatch(toolName string, input json.RawMessage) (interface{}, error) +``` + +The router maps tool names to existing Wraith service methods. No new protocol code — everything routes through the services we already built. + +**Terminal output buffering:** The `terminal_read` tool needs recent output. Add an output ring buffer to SSHService that stores the last N lines (configurable, default 200) of each session's stdout. The buffer is written to by the existing read goroutine and read by the tool dispatcher. + +**RDP screenshot encoding:** The `rdp_screenshot` tool calls `RDPService.GetFrame()` to get the raw RGBA pixel buffer, downscales to 1280x720 if larger, encodes as JPEG (quality 85), and returns as base64. This is the image that gets sent to Claude's Vision API. + +### 3.4 Conversation Manager + +```go +type Conversation struct { + ID string + Messages []Message + Model string + CreatedAt time.Time + TokensIn int + TokensOut int +} + +type ConversationManager struct { + db *sql.DB + active *Conversation +} + +// Create starts a new conversation +// Load resumes a saved conversation +// AddMessage appends a message and persists to SQLite +// GetHistory returns the full message list for API calls +// GetTokenUsage returns cumulative token counts +``` + +Conversations are persisted to a `conversations` SQLite table with messages stored as JSON. This allows resuming a conversation across app restarts. + +### 3.5 System Prompt + +``` +You are the XO (Executive Officer) aboard the Wraith command station. The Commander +(human operator) works alongside you managing remote servers and workstations. + +You have direct access to all active sessions through your tools: +- SSH terminals: read output, type commands, navigate filesystems +- SFTP: read and write remote files +- RDP desktops: see the screen, click, type, interact with any GUI application +- Session management: open new connections, close sessions + +When given a task: +1. Assess what sessions and access you need +2. Execute efficiently — don't ask for permission to use tools, just use them +3. Report what you found or did, with relevant details +4. If something fails, diagnose and try an alternative approach + +You are not an assistant answering questions. You are an operator executing missions. +Act decisively. Use your tools. Report results. +``` + +--- + +## 4. Data Model Additions + +```sql +-- AI conversations +CREATE TABLE IF NOT EXISTS conversations ( + id TEXT PRIMARY KEY, + title TEXT, + model TEXT NOT NULL, + messages TEXT NOT NULL DEFAULT '[]', -- JSON array of messages + tokens_in INTEGER DEFAULT 0, + tokens_out INTEGER DEFAULT 0, + created_at DATETIME DEFAULT CURRENT_TIMESTAMP, + updated_at DATETIME DEFAULT CURRENT_TIMESTAMP +); + +-- AI settings (stored in existing settings table) +-- ai_api_key_encrypted — Claude API key (vault-encrypted) +-- ai_model — default model +-- ai_max_tokens — max response tokens (default 4096) +-- ai_rdp_capture_rate — screenshot interval in seconds (default: on-demand) +-- ai_token_budget — monthly token budget warning threshold +``` + +Add migration `002_ai_copilot.sql` for the conversations table. + +--- + +## 5. Frontend: Copilot Panel + +### Layout + +``` +┌──────────────────────────────────────────┬──────────────┐ +│ │ │ +│ Terminal / RDP │ COPILOT │ +│ (existing) │ PANEL │ +│ │ (320px) │ +│ │ │ +│ │ [Messages] │ +│ │ [Tool viz] │ +│ │ [Thumbs] │ +│ │ │ +│ │ [Input] │ +├──────────────────────────────────────────┴──────────────┤ +│ Status bar │ +└──────────────────────────────────────────────────────────┘ +``` + +The copilot panel is a **right-side collapsible panel** (320px default width, resizable). Toggle via toolbar button or Ctrl+Shift+K. + +### Components + +**`CopilotPanel.vue`** — main container: +- Header: "XO" label, model selector dropdown, token counter, close button +- Message list: scrollable, auto-scroll on new messages +- Tool call cards: collapsible, show tool name + input + result +- RDP screenshots: inline thumbnails (click to expand) +- Input area: textarea with send button, Shift+Enter for newlines, Enter to send + +**`CopilotMessage.vue`** — single message: +- Commander messages: right-aligned, blue accent +- XO messages: left-aligned, markdown rendered (code blocks, lists, etc.) +- Tool use blocks: collapsible card showing tool name, input params, result + +**`CopilotToolViz.vue`** — tool call visualization: +- Icon per tool type (terminal icon, folder icon, monitor icon, etc.) +- Summary line: "Typed `ls -la` in Asgard (SSH)", "Screenshot from DC01 (RDP)" +- Expandable detail showing raw input/output + +**`CopilotSettings.vue`** — configuration modal: +- API key input (stored encrypted in vault) +- Model selector +- Token budget threshold +- RDP capture settings +- Conversation history management + +### Streaming + +Claude API responses stream token-by-token: + +``` +Go: Claude API (SSE) → parse events → Wails events +Frontend: listen for Wails events → append to message → re-render +``` + +Events: +- `ai:text:{conversationId}` — text delta (append to current message) +- `ai:tool_use:{conversationId}` — tool call started (show pending card) +- `ai:tool_result:{conversationId}` — tool call completed (update card) +- `ai:done:{conversationId}` — response complete +- `ai:error:{conversationId}` — error occurred + +--- + +## 6. Go Backend Structure + +``` +internal/ + ai/ + service.go # AIService — orchestrates everything + service_test.go + client.go # ClaudeClient — HTTP + SSE to Anthropic API + client_test.go + tools.go # Tool definitions (JSON schema) + router.go # ToolRouter — dispatches tool calls to Wraith services + router_test.go + conversation.go # ConversationManager — persistence + history + conversation_test.go + types.go # Message, Tool, StreamEvent types + screenshot.go # RDP frame → JPEG encode + downscale + screenshot_test.go + terminal_buffer.go # Ring buffer for terminal output history + terminal_buffer_test.go + db/ + migrations/ + 002_ai_copilot.sql # conversations table +``` + +--- + +## 7. Frontend Structure + +``` +frontend/src/ + components/ + copilot/ + CopilotPanel.vue # Main panel container + CopilotMessage.vue # Single message (commander or XO) + CopilotToolViz.vue # Tool call visualization card + CopilotSettings.vue # API key, model, budget configuration + composables/ + useCopilot.ts # AI service wrappers, streaming, state + stores/ + copilot.store.ts # Conversation state, messages, streaming +``` + +--- + +## 8. Implementation Phases + +| Phase | Deliverables | +|---|---| +| **A: Core** | AI service, Claude API client (HTTP + SSE streaming), tool definitions, tool router, conversation manager, SQLite migration, terminal output buffer | +| **B: Terminal Tools** | Wire terminal_write/read/cwd + sftp_* tools to existing services, test with real SSH sessions | +| **C: RDP Vision** | Screenshot capture (RGBA → JPEG → base64), rdp_screenshot tool, vision in API calls | +| **D: RDP Input** | rdp_click/type/keypress/move/scroll tools, coordinate mapping, key combo parsing | +| **E: Frontend** | CopilotPanel, message streaming, tool visualization, settings, conversation persistence | + +--- + +## 9. Key Implementation Details + +### Terminal Output Buffer + +```go +type TerminalBuffer struct { + lines []string + mu sync.RWMutex + max int // default 200 lines +} + +func (b *TerminalBuffer) Write(data []byte) // append, split on newlines +func (b *TerminalBuffer) ReadLast(n int) []string // return last N lines +func (b *TerminalBuffer) ReadAll() []string +``` + +Added to SSHService — the existing read goroutine writes to both the Wails event (for xterm.js) AND the buffer (for AI reads). + +### RDP Screenshot Pipeline + +``` +RDPService.GetFrame() → raw RGBA []byte (1920×1080×4 = ~8MB) + ↓ +image.NewRGBA() + copy → Go image.Image + ↓ +imaging.Resize(1280, 720) → downscaled for token efficiency + ↓ +jpeg.Encode(quality=85) → JPEG []byte (~100-200KB) + ↓ +base64.StdEncoding.Encode() → base64 string (~150-270KB) + ↓ +Claude API image content block → Vision input +``` + +One screenshot ≈ ~1,500 tokens. At on-demand capture (not continuous), this is manageable. + +### Key Combo Parsing + +For `rdp_keypress`, parse key combo strings into FreeRDP input sequences: + +``` +"enter" → scancode 0x1C down, 0x1C up +"ctrl+c" → Ctrl down, C down, C up, Ctrl up +"alt+tab" → Alt down, Tab down, Tab up, Alt up +"win+r" → Win down, R down, R up, Win up +"ctrl+alt+delete" → special handling (Ctrl+Alt+Del) +``` + +Map key names to scancodes using the existing `input.go` scancode table. + +### Token Budget + +Track cumulative token usage per day/month. When approaching the configured budget threshold: +- Show warning in the copilot panel header +- Log a warning +- Don't hard-block — the Commander decides whether to continue + +--- + +## 10. Security Considerations + +- **API key** stored in vault (same AES-256-GCM encryption as SSH keys) +- **API key never logged** — mask in all log output +- **Conversation content** may contain sensitive data (terminal output, file contents, screenshots of desktops). Stored in SQLite alongside other encrypted data. Consider encrypting the messages JSON blob with the vault key. +- **Tool access is unrestricted** — the XO has the same access as the Commander. This is by design. The human is always watching and can take control. +- **No autonomous session creation without Commander context** — the XO can open sessions, but the connections (with credentials) were set up by the Commander