wraith/docs/superpowers/specs/2026-03-17-wraith-ai-copilot-design.md

24 KiB
Raw Blame History

Wraith AI Copilot — Design Spec

Date: 2026-03-17 Purpose: First-class AI copilot integration — Claude as an XO (Executive Officer) with full terminal, filesystem, and RDP desktop access through Wraith's native protocol channels Depends on: Wraith Desktop v0.1.0 (all 4 phases complete) License: MIT (same as Wraith)


1. What This Is

An AI co-pilot that shares the Commander's view and control of remote systems. The XO (Claude) can:

  • See RDP desktops via FreeRDP3 bitmap frames → Claude Vision API
  • Type in SSH terminals via bidirectional stdin/stdout pipes
  • Click in RDP sessions via FreeRDP3 mouse/keyboard input channels
  • Read/write files via SFTP — the same connection the terminal uses
  • Open/close sessions — autonomously connect to hosts from the connection manager

This is NOT a chatbot sidebar. It's a second operator with the same access as the human, working through the same protocol channels Wraith already provides.

Why this is unique: No other tool does this. Existing AI coding assistants work on local files. Wraith's XO works on remote servers — SSH terminals, Windows desktops, remote filesystems — all through native protocols. No Playwright, no browser automation, no screen recording. The RDP session IS the viewport. The SSH session IS the shell.


2. Architecture

┌─────────────────────────────────────────────────────────────────┐
│  Wraith Application                                              │
│                                                                  │
│  ┌─ AI Service (internal/ai/) ─────────────────────────────────┐ │
│  │                                                             │ │
│  │  ┌──────────────┐  ┌───────────────┐  ┌─────────────────┐  │ │
│  │  │ Claude API    │  │ Tool Dispatch  │  │ Conversation    │  │ │
│  │  │ Client        │  │ Router         │  │ Manager         │  │ │
│  │  │ (HTTP + SSE)  │  │                │  │ (SQLite)        │  │ │
│  │  └──────┬───────┘  └───────┬───────┘  └─────────────────┘  │ │
│  │         │                  │                                 │ │
│  │         │    ┌─────────────▼──────────────┐                 │ │
│  │         │    │      Tool Definitions       │                 │ │
│  │         │    │                             │                 │ │
│  │         │    │  Terminal: write, read, cwd │                 │ │
│  │         │    │  SFTP: list, read, write    │                 │ │
│  │         │    │  RDP: screenshot, click,    │                 │ │
│  │         │    │       type, keypress, move  │                 │ │
│  │         │    │  Session: list, connect,    │                 │ │
│  │         │    │           disconnect        │                 │ │
│  │         │    └─────────────┬──────────────┘                 │ │
│  │         │                  │                                 │ │
│  └─────────┼──────────────────┼─────────────────────────────────┘ │
│            │                  │                                    │
│            ▼                  ▼                                    │
│  ┌─────────────────┐  ┌──────────────────────────────────────┐   │
│  │  Claude API      │  │  Existing Wraith Services            │   │
│  │  (Anthropic)     │  │                                      │   │
│  │                  │  │  SSHService.Write/Read                │   │
│  │  Messages API    │  │  SFTPService.List/Read/Write         │   │
│  │  + Tool Use      │  │  RDPService.SendMouse/SendKey        │   │
│  │  + Vision        │  │  RDPService.GetFrame → JPEG encode   │   │
│  │  + Streaming     │  │  SessionManager.Create/List          │   │
│  └─────────────────┘  └──────────────────────────────────────┘   │
│                                                                   │
│  ┌─ Frontend ─────────────────────────────────────────────────┐  │
│  │  CopilotPanel.vue — right-side collapsible panel            │  │
│  │  ├── Chat messages (streaming, markdown rendered)           │  │
│  │  ├── Tool call visualization (what the XO did)              │  │
│  │  ├── RDP screenshot thumbnails inline                       │  │
│  │  ├── Session awareness (which session XO is focused on)     │  │
│  │  ├── Control toggle (XO driving / Commander driving)        │  │
│  │  └── Quick commands bar                                     │  │
│  └────────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────────┘

3. AI Service Layer (internal/ai/)

3.1 Authentication — OAuth PKCE (Max Subscription)

Wraith authenticates against the user's Claude Max subscription via OAuth Authorization Code Flow with PKCE. No API key needed. No per-token billing. Same auth path as Claude Code, but with Wraith's own independent token set (no shared credential file, no race conditions).

OAuth Parameters:

Parameter Value
Authorize URL https://claude.ai/oauth/authorize
Token URL https://platform.claude.com/v1/oauth/token
Client ID 9d1c250a-e61b-44d9-88ed-5944d1962f5e
PKCE Method S256
Code Verifier 32 random bytes, base64url (no padding)
Code Challenge SHA-256(verifier), base64url (no padding)
Redirect URI http://localhost:{dynamic_port}/callback
Scopes user:inference user:profile
State 32 random bytes, base64url

Auth Flow:

1. User clicks "Connect to Claude" in Wraith copilot settings
2. Wraith generates PKCE code_verifier + code_challenge
3. Wraith starts a local HTTP server on a random port
4. Wraith opens browser to:
     https://claude.ai/oauth/authorize
       ?code=true
       &client_id=9d1c250a-e61b-44d9-88ed-5944d1962f5e
       &response_type=code
       &redirect_uri=http://localhost:{port}/callback
       &scope=user:inference user:profile
       &code_challenge={challenge}
       &code_challenge_method=S256
       &state={state}
5. User logs in with their Anthropic/Claude account
6. Browser redirects to http://localhost:{port}/callback?code={auth_code}&state={state}
7. Wraith validates state, exchanges code for tokens:
     POST https://platform.claude.com/v1/oauth/token
     {
       "grant_type": "authorization_code",
       "code": "{auth_code}",
       "redirect_uri": "http://localhost:{port}/callback",
       "client_id": "9d1c250a-e61b-44d9-88ed-5944d1962f5e",
       "code_verifier": "{verifier}",
       "state": "{state}"
     }
8. Response: { access_token, refresh_token, expires_in, scope }
9. Wraith encrypts tokens with vault and stores in SQLite settings:
     - ai_access_token (vault-encrypted)
     - ai_refresh_token (vault-encrypted)
     - ai_token_expires_at (unix timestamp)
10. Done — copilot is authenticated

Token Refresh (automatic, silent):

When access_token is expired (checked before each API call):
  POST https://platform.claude.com/v1/oauth/token
  {
    "grant_type": "refresh_token",
    "refresh_token": "{decrypted_refresh_token}",
    "client_id": "9d1c250a-e61b-44d9-88ed-5944d1962f5e",
    "scope": "user:inference user:profile"
  }
  → New access_token + refresh_token stored in vault

Implementation: internal/ai/oauth.go — Go HTTP server for callback, PKCE helpers, token exchange, token refresh. Uses pkg/browser to open the authorize URL.

Fallback: For users without a Max subscription, allow raw API key input (stored in vault). The client checks which auth method is configured and uses the appropriate header.

3.2 Claude API Client

Direct HTTP client — no Python sidecar, no external SDK. Pure Go.

type ClaudeClient struct {
    auth       *OAuthManager    // handles token refresh + auth header
    model      string           // configurable: claude-sonnet-4-5-20250514, etc.
    httpClient *http.Client
    baseURL    string           // https://api.anthropic.com
}

// SendMessage sends a messages API request with tool use + vision support.
// Returns a streaming response channel for token-by-token delivery.
func (c *ClaudeClient) SendMessage(messages []Message, tools []Tool, systemPrompt string) (<-chan StreamEvent, error)

Auth header: Authorization: Bearer {access_token} (from OAuth). Falls back to x-api-key: {api_key} if using raw API key auth.

Message format: Anthropic Messages API v1 (/v1/messages).

Streaming: SSE (stream: true). Parse event: content_block_delta, event: content_block_stop, event: message_delta, event: tool_use events. Emit to frontend via Wails events.

Vision: RDP screenshots sent as base64-encoded JPEG in the image content block type. Resolution capped at 1280x720 for token efficiency (downscale from native resolution before encoding).

Token tracking: Parse usage from the API response. Track input_tokens, output_tokens, cache_creation_input_tokens, cache_read_input_tokens per conversation. Store totals in SQLite.

3.2 Tool Definitions

var CopilotTools = []Tool{
    // Terminal
    {Name: "terminal_write", Description: "Type text into an active SSH terminal session",
     InputSchema: {sessionId: string, text: string}},
    {Name: "terminal_read", Description: "Get recent terminal output from an SSH session (last N lines)",
     InputSchema: {sessionId: string, lines: int (default 50)}},
    {Name: "terminal_cwd", Description: "Get the current working directory of an SSH session",
     InputSchema: {sessionId: string}},

    // SFTP
    {Name: "sftp_list", Description: "List files and directories at a remote path",
     InputSchema: {sessionId: string, path: string}},
    {Name: "sftp_read", Description: "Read the contents of a remote file (max 5MB)",
     InputSchema: {sessionId: string, path: string}},
    {Name: "sftp_write", Description: "Write content to a remote file",
     InputSchema: {sessionId: string, path: string, content: string}},

    // RDP
    {Name: "rdp_screenshot", Description: "Capture the current RDP desktop screen as an image",
     InputSchema: {sessionId: string}},
    {Name: "rdp_click", Description: "Click at screen coordinates in an RDP session",
     InputSchema: {sessionId: string, x: int, y: int, button: string (default "left")}},
    {Name: "rdp_doubleclick", Description: "Double-click at coordinates",
     InputSchema: {sessionId: string, x: int, y: int}},
    {Name: "rdp_type", Description: "Type a text string into the RDP session",
     InputSchema: {sessionId: string, text: string}},
    {Name: "rdp_keypress", Description: "Press a key or key combination (e.g. 'enter', 'ctrl+c', 'alt+tab', 'win+r')",
     InputSchema: {sessionId: string, key: string}},
    {Name: "rdp_scroll", Description: "Scroll the mouse wheel at coordinates",
     InputSchema: {sessionId: string, x: int, y: int, delta: int}},
    {Name: "rdp_move", Description: "Move the mouse cursor to coordinates",
     InputSchema: {sessionId: string, x: int, y: int}},

    // Session Management
    {Name: "list_sessions", Description: "List all active SSH and RDP sessions",
     InputSchema: {}},
    {Name: "connect_ssh", Description: "Open a new SSH session to a saved connection",
     InputSchema: {connectionId: int}},
    {Name: "connect_rdp", Description: "Open a new RDP session to a saved connection",
     InputSchema: {connectionId: int}},
    {Name: "disconnect", Description: "Close an active session",
     InputSchema: {sessionId: string}},
}

3.3 Tool Dispatch Router

type ToolRouter struct {
    ssh         *ssh.SSHService
    sftp        *sftp.SFTPService
    rdp         *rdp.RDPService
    sessions    *session.Manager
    connections *connections.ConnectionService
}

// Dispatch executes a tool call and returns the result
func (r *ToolRouter) Dispatch(toolName string, input json.RawMessage) (interface{}, error)

The router maps tool names to existing Wraith service methods. No new protocol code — everything routes through the services we already built.

Terminal output buffering: The terminal_read tool needs recent output. Add an output ring buffer to SSHService that stores the last N lines (configurable, default 200) of each session's stdout. The buffer is written to by the existing read goroutine and read by the tool dispatcher.

RDP screenshot encoding: The rdp_screenshot tool calls RDPService.GetFrame() to get the raw RGBA pixel buffer, downscales to 1280x720 if larger, encodes as JPEG (quality 85), and returns as base64. This is the image that gets sent to Claude's Vision API.

3.4 Conversation Manager

type Conversation struct {
    ID        string
    Messages  []Message
    Model     string
    CreatedAt time.Time
    TokensIn  int
    TokensOut int
}

type ConversationManager struct {
    db       *sql.DB
    active   *Conversation
}

// Create starts a new conversation
// Load resumes a saved conversation
// AddMessage appends a message and persists to SQLite
// GetHistory returns the full message list for API calls
// GetTokenUsage returns cumulative token counts

Conversations are persisted to a conversations SQLite table with messages stored as JSON. This allows resuming a conversation across app restarts.

3.5 System Prompt

You are the XO (Executive Officer) aboard the Wraith command station. The Commander
(human operator) works alongside you managing remote servers and workstations.

You have direct access to all active sessions through your tools:
- SSH terminals: read output, type commands, navigate filesystems
- SFTP: read and write remote files
- RDP desktops: see the screen, click, type, interact with any GUI application
- Session management: open new connections, close sessions

When given a task:
1. Assess what sessions and access you need
2. Execute efficiently — don't ask for permission to use tools, just use them
3. Report what you found or did, with relevant details
4. If something fails, diagnose and try an alternative approach

You are not an assistant answering questions. You are an operator executing missions.
Act decisively. Use your tools. Report results.

4. Data Model Additions

-- AI conversations
CREATE TABLE IF NOT EXISTS conversations (
    id          TEXT PRIMARY KEY,
    title       TEXT,
    model       TEXT NOT NULL,
    messages    TEXT NOT NULL DEFAULT '[]',  -- JSON array of messages
    tokens_in   INTEGER DEFAULT 0,
    tokens_out  INTEGER DEFAULT 0,
    created_at  DATETIME DEFAULT CURRENT_TIMESTAMP,
    updated_at  DATETIME DEFAULT CURRENT_TIMESTAMP
);

-- AI settings (stored in existing settings table)
-- ai_api_key_encrypted  — Claude API key (vault-encrypted)
-- ai_model              — default model
-- ai_max_tokens         — max response tokens (default 4096)
-- ai_rdp_capture_rate   — screenshot interval in seconds (default: on-demand)
-- ai_token_budget       — monthly token budget warning threshold

Add migration 002_ai_copilot.sql for the conversations table.


5. Frontend: Copilot Panel

Layout

┌──────────────────────────────────────────┬──────────────┐
│                                          │              │
│           Terminal / RDP                  │   COPILOT    │
│           (existing)                     │   PANEL      │
│                                          │   (320px)    │
│                                          │              │
│                                          │  [Messages]  │
│                                          │  [Tool viz]  │
│                                          │  [Thumbs]    │
│                                          │              │
│                                          │  [Input]     │
├──────────────────────────────────────────┴──────────────┤
│  Status bar                                              │
└──────────────────────────────────────────────────────────┘

The copilot panel is a right-side collapsible panel (320px default width, resizable). Toggle via toolbar button or Ctrl+Shift+K.

Components

CopilotPanel.vue — main container:

  • Header: "XO" label, model selector dropdown, token counter, close button
  • Message list: scrollable, auto-scroll on new messages
  • Tool call cards: collapsible, show tool name + input + result
  • RDP screenshots: inline thumbnails (click to expand)
  • Input area: textarea with send button, Shift+Enter for newlines, Enter to send

CopilotMessage.vue — single message:

  • Commander messages: right-aligned, blue accent
  • XO messages: left-aligned, markdown rendered (code blocks, lists, etc.)
  • Tool use blocks: collapsible card showing tool name, input params, result

CopilotToolViz.vue — tool call visualization:

  • Icon per tool type (terminal icon, folder icon, monitor icon, etc.)
  • Summary line: "Typed ls -la in Asgard (SSH)", "Screenshot from DC01 (RDP)"
  • Expandable detail showing raw input/output

CopilotSettings.vue — configuration modal:

  • API key input (stored encrypted in vault)
  • Model selector
  • Token budget threshold
  • RDP capture settings
  • Conversation history management

Streaming

Claude API responses stream token-by-token:

Go: Claude API (SSE) → parse events → Wails events
Frontend: listen for Wails events → append to message → re-render

Events:

  • ai:text:{conversationId} — text delta (append to current message)
  • ai:tool_use:{conversationId} — tool call started (show pending card)
  • ai:tool_result:{conversationId} — tool call completed (update card)
  • ai:done:{conversationId} — response complete
  • ai:error:{conversationId} — error occurred

6. Go Backend Structure

internal/
  ai/
    service.go              # AIService — orchestrates everything
    service_test.go
    oauth.go                # OAuth PKCE flow — authorize, callback, token exchange, refresh
    oauth_test.go
    client.go               # ClaudeClient — HTTP + SSE to Anthropic API
    client_test.go
    tools.go                # Tool definitions (JSON schema)
    router.go               # ToolRouter — dispatches tool calls to Wraith services
    router_test.go
    conversation.go         # ConversationManager — persistence + history
    conversation_test.go
    types.go                # Message, Tool, StreamEvent types
    screenshot.go           # RDP frame → JPEG encode + downscale
    screenshot_test.go
    terminal_buffer.go      # Ring buffer for terminal output history
    terminal_buffer_test.go
  db/
    migrations/
      002_ai_copilot.sql    # conversations table

7. Frontend Structure

frontend/src/
  components/
    copilot/
      CopilotPanel.vue       # Main panel container
      CopilotMessage.vue      # Single message (commander or XO)
      CopilotToolViz.vue      # Tool call visualization card
      CopilotSettings.vue     # API key, model, budget configuration
  composables/
    useCopilot.ts             # AI service wrappers, streaming, state
  stores/
    copilot.store.ts          # Conversation state, messages, streaming

8. Implementation Phases

Phase Deliverables
A: Core AI service, Claude API client (HTTP + SSE streaming), tool definitions, tool router, conversation manager, SQLite migration, terminal output buffer
B: Terminal Tools Wire terminal_write/read/cwd + sftp_* tools to existing services, test with real SSH sessions
C: RDP Vision Screenshot capture (RGBA → JPEG → base64), rdp_screenshot tool, vision in API calls
D: RDP Input rdp_click/type/keypress/move/scroll tools, coordinate mapping, key combo parsing
E: Frontend CopilotPanel, message streaming, tool visualization, settings, conversation persistence

9. Key Implementation Details

Terminal Output Buffer

type TerminalBuffer struct {
    lines []string
    mu    sync.RWMutex
    max   int // default 200 lines
}

func (b *TerminalBuffer) Write(data []byte)      // append, split on newlines
func (b *TerminalBuffer) ReadLast(n int) []string // return last N lines
func (b *TerminalBuffer) ReadAll() []string

Added to SSHService — the existing read goroutine writes to both the Wails event (for xterm.js) AND the buffer (for AI reads).

RDP Screenshot Pipeline

RDPService.GetFrame()           → raw RGBA []byte (1920×1080×4 = ~8MB)
    ↓
image.NewRGBA() + copy          → Go image.Image
    ↓
imaging.Resize(1280, 720)       → downscaled for token efficiency
    ↓
jpeg.Encode(quality=85)         → JPEG []byte (~100-200KB)
    ↓
base64.StdEncoding.Encode()     → base64 string (~150-270KB)
    ↓
Claude API image content block  → Vision input

One screenshot ≈ ~1,500 tokens. At on-demand capture (not continuous), this is manageable.

Key Combo Parsing

For rdp_keypress, parse key combo strings into FreeRDP input sequences:

"enter"     → scancode 0x1C down, 0x1C up
"ctrl+c"    → Ctrl down, C down, C up, Ctrl up
"alt+tab"   → Alt down, Tab down, Tab up, Alt up
"win+r"     → Win down, R down, R up, Win up
"ctrl+alt+delete" → special handling (Ctrl+Alt+Del)

Map key names to scancodes using the existing input.go scancode table.

Token Budget

Track cumulative token usage per day/month. When approaching the configured budget threshold:

  • Show warning in the copilot panel header
  • Log a warning
  • Don't hard-block — the Commander decides whether to continue

10. Security Considerations

  • OAuth tokens stored in vault (same AES-256-GCM encryption as SSH keys). Access token + refresh token both encrypted at rest.
  • Tokens never logged — mask in all log output. Only log token expiry times and auth status.
  • Token refresh is automatic and silent — no user interaction needed after initial login. Refresh token rotation handled properly (new refresh token replaces old).
  • Independent from Claude Code — Wraith has its own OAuth session. No shared credential files, no race conditions with other Anthropic apps.
  • Fallback API key also stored in vault if used instead of OAuth.
  • Conversation content may contain sensitive data (terminal output, file contents, screenshots of desktops). Stored in SQLite alongside other encrypted data. Consider encrypting the messages JSON blob with the vault key.
  • Tool access is unrestricted — the XO has the same access as the Commander. This is by design. The human is always watching and can take control.
  • No autonomous session creation without Commander context — the XO can open sessions, but the connections (with credentials) were set up by the Commander
  • PKCE prevents token interception — authorization code flow with S256 challenge ensures the code can only be exchanged by the app that initiated the flow