~/posts/2026-03-02_building-gigaboy-autonomous-software-engineering-agent.md
$

cat 2026-03-02_building-gigaboy-autonomous-software-engineering-agent.md

📅

2026-03-02

Your issue tracker has 40 tickets in the backlog. Twelve of them are well-defined, self-contained, and have been sitting there for three sprints because nobody has the bandwidth to pick them up. You know exactly what needs to happen — you've even written the acceptance criteria — but the work never starts.

Gigaboy is an attempt to close that gap: an orchestrator that watches your Linear workspace, picks up tickets when they move to "Todo", and drives them to a merged PR with no human in the loop unless the agent gets stuck.


The Problem

Copilots and chat-based coding assistants are useful for in-editor autocomplete and short Q&A loops, but they still require a human to drive every step: open the file, paste context, review the output, commit, push, open the PR. The feedback cycle is faster, but the human bottleneck is the same.

What we actually want is an agent that can own a task end-to-end:

  1. Read the ticket and understand what needs to change
  2. Explore the repository, form a plan
  3. Make the code changes on a branch
  4. Open a pull request and report back
  5. Respond to review comments and iterate
  6. Merge when approved

And do all of that without requiring a developer to babysit the process — while still letting humans stay in the loop when the agent genuinely doesn't know what to do.


Prerequisites

  • Familiarity with Go (the codebase is Go 1.22 throughout)
  • Basic understanding of Linear, GitHub, and Telegram APIs
  • Redis and PostgreSQL familiarity (the event bus and store)
  • Some exposure to LLM tool-use / function-calling APIs

Technical Decisions

Linear as the command interface, not a custom dashboard

The most consequential design choice was using Linear as the primary UI.

The alternative was to have a separate web dashboard with agent controls which would have taken weeks to build and would have introduced yet another tool that engineering teams need to adopt. Linear is already where engineering teams manage their work. Issues have descriptions, labels, comments, and a defined state machine. All the inputs and outputs an agent needs are already there.

The tradeoff: the agent's behavior is constrained to what Linear's model expresses. You can't do rich interactive UIs. You can't show diffs inline. But for the vast majority of asynchronous delegation workflows, Linear's comment thread is exactly the right surface.

Concretely: when an issue moves to "Todo", the orchestrator picks it up. When the agent needs clarification, it posts a comment and marks the ticket "Blocked". When a human replies, the agent resumes. When the PR is ready, a /merge comment in Linear triggers the merge. No dashboard. No custom CLI.

Redis Streams for webhook fan-out

Webhooks arrive from three sources, mainly Linear, GitHub, and Telegram and need to be routed to the orchestrator for processing. The naive approach is to call the orchestrator handler directly from the webhook HTTP handler. This works until you need at-least-once delivery, retry on failure, or the ability to replay events from a crash.

Redis Streams give us all three. The gateway publishes events to three named streams (gigaboy:stream:linear, gigaboy:stream:github, gigaboy:stream:telegram). The orchestrator consumes them via a consumer group. If processing fails, the message stays in the pending list and gets retried. On restart, pending messages from prior runs are replayed before new ones are consumed.

// gateway.go — publish an event and return immediately
raw, _ := json.Marshal(payload)
evt := events.Event{
    Type:    fmt.Sprintf("linear.%s.%s", payload.Type, payload.Action),
    Payload: raw,
}
g.bus.Publish(c.Request.Context(), events.StreamLinear, evt)
c.JSON(http.StatusOK, gin.H{"ok": true})

The webhook handler returns 200 in milliseconds regardless of how long processing takes. Linear doesn't time out, doesn't retry unnecessarily, and the orchestrator works through events at its own pace.

asynq for agent work queue

Spawning a long-running LLM tool-use loop inline in the orchestrator's event handler would block the consumer goroutine for minutes. We need to hand off agent execution to a separate worker.

asynq (backed by Redis) handles this. The orchestrator enqueues a typed task payload; the worker picks it up in a separate goroutine pool and runs the agent. asynq also gives us deduplication, retry scheduling, and a task inspector without building any of that ourselves.

The ResumeKind field on the task payload lets the worker distinguish between a fresh start, a clarification reply, a change-request resume, a merge request, and a crash recovery:

type AgentTaskPayload struct {
    SessionID   string
    WorkspaceID string
    ProjectID   string
    IssueID     string
    ResumeKind  ResumeKind // New | Clarification | Merge | Recover
}

Claude tool-use API, not a subprocess

Early versions considered shelling out to claude CLI as a subprocess. The appeal: you get streaming output, process isolation, and the full tool-execution environment that the Claude CLI provides.

The problem: subprocess invocation makes it hard to inject per-session context, capture structured results, enforce token limits, and intercept tool calls. The tool-use API gives the agent the same capabilities but with the orchestrator fully in control of the message loop.

The agent loop is a straightforward for i < maxToolIterations cycle: send messages to Claude, execute any tool calls, append results, repeat until end_turn or a terminal tool result (CLARIFICATION_REQUESTED: or LEARNING_SUBMITTED:).

// executor.go — the core loop
for i := 0; i < maxToolIterations; i++ {
    resp, err := a.deps.Anthropic.Messages.New(ctx, anthropic.MessageNewParams{
        Model:     anthropic.F(claudeModel),     // claude-opus-4-6
        MaxTokens: anthropic.F(int64(8192)),
        System:    anthropic.F([]anthropic.TextBlockParam{...}),
        Tools:     anthropic.F(a.tools.AnthropicTools()),
        Messages:  anthropic.F(messages),
    })
    // ... execute tool calls, append results ...
}

A cap of 50 iterations prevents runaway loops. The agent terminates cleanly when it hits end_turn or calls one of the terminal tools.

pgvector for cross-session learning

Each agent session ends with a finish_learning tool call. The agent summarises what it learned about the codebase — conventions, patterns, known pitfalls — and submits them as structured entries with categories (e.g., "testing", "database", "api").

These entries are embedded with OpenAI's text-embedding-ada-002 and stored in PostgreSQL via pgvector. The next time an agent works in the same project, a similarity search over its issue title and description retrieves the top-10 most relevant chunks and injects them into the system prompt.

// context/manager.go — retrieve relevant context
func (m *Manager) GetContext(ctx context.Context, projectID uuid.UUID, query string) ([]*db.ContextChunkResult, error) {
    embedding, err := m.embed(ctx, query)
    // ...
    return m.queries.SearchContextChunks(ctx, projectID, 10, pgvector.NewVector(embedding))
}

Over time the agent accumulates a project-specific memory: which files are important, what test patterns the team uses, what caused past bugs. This is not a RAG system over the full codebase — the codebase itself is accessed via the GitHub API at read time. The vector store holds only synthesised lessons from prior runs.

AES-256-GCM credential storage

Every workspace stores API keys (Linear, GitHub, Telegram bot token) encrypted at rest. The encryption key is a 32-byte AES-256-GCM key supplied as a 64-character hex env var (ENCRYPTION_KEY). Credentials are encrypted on write (at onboarding) and decrypted on demand before each API call.

This is a deliberate trade-off. Storing credentials in Postgres (vs. AWS Secrets Manager or HashiCorp Vault) keeps the deployment simple and self-contained. The operative risk model is: if the database is compromised without the encryption key, the credentials are opaque ciphertext. If both are compromised, nothing protects them — but that's a property of any key-material-in-env-var approach.


Implementation

Phase 1: Webhook gateway and event fan-out

The Gin HTTP server exposes four routes:

GET  /health
POST /onboarding
POST /webhooks/linear
POST /webhooks/github
POST /webhooks/telegram

Linear and GitHub webhooks are validated with HMAC-SHA256 signatures before the body is parsed. The middleware reads and buffers the body (since io.ReadAll consumes it), validates the MAC, and replaces c.Request.Body with a re-readable buffer:

body, _ := io.ReadAll(c.Request.Body)
c.Request.Body = io.NopCloser(newBytesReader(body))

sig := c.GetHeader("Linear-Signature")
if !validateHMAC(body, sig, g.cfg.LinearWebhookSecret) {
    c.AbortWithStatusJSON(http.StatusUnauthorized, ...)
}

Validated payloads are serialised to JSON and published to the appropriate Redis Stream. The webhook handler returns immediately — event processing is fully decoupled.

Phase 2: Orchestrator FSM

The orchestrator consumes events from all three streams in separate goroutines. Linear events drive the main agent lifecycle:

| Linear event | Action | | --------------------------------------- | ------------------------------------------------------- | | Issue.update → state type unstarted | Spawn agent session | | Issue.update → state type cancelled | Mark session FAILED | | Comment.create | Resume session (if awaiting clarification or PR review) | | Project.create/update | Auto-register project + seed GitHub repo |

The session state machine is:

INITIALIZING → CODING → PR_OPEN → AWAITING_MERGE → LEARNING → COMPLETED
                              ↓
                   NEEDS_CLARIFICATION (parked, waiting for comment)
                              ↓
                           FAILED

One critical invariant: one active session per issue. If a ticket is moved to Todo while a session is already running, the new event is skipped. If the ticket is moved to Todo after a previous session completed, the old session is reset to INITIALIZING and re-used (preserving branch name and PR metadata). This prevents the branch proliferation that happens when agents keep creating new branches for the same issue.

// Check for an active session first
if existing, err := sm.queries.GetActiveSessionByIssue(ctx, issue.ID); err == nil {
    log.Printf("[orchestrator] active session %s already exists — skipping", existing.ID)
    return nil
}
// Reuse completed session or create new one
session, err := sm.queries.GetLatestSessionByIssue(ctx, issue.ID)
if err == nil {
    session, err = sm.queries.ResetSessionForRetry(ctx, session.ID)
} else {
    session, err = sm.queries.CreateAgentSession(ctx, ...)
}

Phase 3: Agent tool-use loop

Each session runs a GeneralAgent that:

  1. Fetches the issue and comments from Linear (filtering out its own progress comments)
  2. Runs a pgvector similarity search for relevant prior context
  3. Fetches the Linear project description to use as project-specific agent rules
  4. Builds a system prompt combining issue, context, and rules
  5. Enters the tool-use loop

The 11 tools available to the agent:

| Tool | Purpose | | ------------------- | ----------------------------------------------------- | | read_file | Read a file from GitHub at a given ref | | write_file | Create or update a file via GitHub Contents API | | delete_file | Delete a file | | list_files | List directory contents | | create_branch | Branch from default branch | | create_pr | Open a pull request | | get_ci_status | Check GitHub Actions results for a ref | | post_comment | Post a comment on the Linear issue | | ask_clarification | Park the session and post a question to Linear | | finish_learning | Submit structured learnings and end the session | | create_subtask | Create a child issue in Linear (used by PlannerAgent) |

The ask_clarification and finish_learning tools work via a sentinel return value: when the executor sees CLARIFICATION_REQUESTED: or LEARNING_SUBMITTED: in the tool result, it breaks out of the loop and handles the termination condition accordingly.

// tools.go — ask_clarification returns a sentinel, not a real result
func (t *AskClarificationTool) Execute(ctx context.Context, input json.RawMessage) (string, error) {
    // ...
    return fmt.Sprintf("CLARIFICATION_REQUESTED:%s", p.Question), nil
}
// executor.go — executor intercepts the sentinel
if strings.HasPrefix(result, "CLARIFICATION_REQUESTED:") || strings.HasPrefix(result, "LEARNING_SUBMITTED:") {
    return result, nil
}

Phase 4: Agent identities

Workspaces can assign different personas to different issues via Linear labels. A label of agent_backend routes the issue to the "backend" identity, which overrides the generic system prompt with a backend-specialist persona.

Three built-in identities are seeded on every workspace registration: backend, frontend, and infrastructure. Each has a tailored system prompt focused on the relevant concerns (e.g., the backend identity emphasises security, auth, and observability; the infrastructure identity emphasises Terraform, least-privilege, and cost).

Phase 5: Telegram control plane

Telegram serves as an out-of-band control and notification channel. The agent can notify the user when blocked on clarification, and the user can query agent status via bot commands:

  • /status — list all active sessions and their states
  • /repo <name> — show active sessions for a repo
  • /issue <identifier> — show the session for a specific Linear issue (e.g. /issue ENG-42)

Incoming Telegram messages arrive via webhook (auto-registered at onboarding), are published to gigaboy:stream:telegram, consumed by the orchestrator, and dispatched via tgControl.dispatch().


How It All Fits Together

Linear           GitHub            Telegram
   │                │                  │
   └──── HMAC ──────┴─── HMAC ─────────┘
              │
         Gin Gateway (port 8082)
              │
    Redis Streams (fan-out)
    ┌──────────────────────────┐
    │ gigaboy:stream:linear    │
    │ gigaboy:stream:github    │
    │ gigaboy:stream:telegram  │
    └──────────────────────────┘
              │
       Orchestrator FSM
              │
    ┌─────────┴──────────┐
    │  asynq task queue  │
    └─────────┬──────────┘
              │
         Worker pool
              │
       GeneralAgent
        ┌─────┴─────────────────────────────────┐
        │  Claude claude-opus-4-6 tool-use loop  │
        │  11 tools (GitHub API + Linear API)    │
        │  pgvector context retrieval            │
        └─────────────────────────────────────────┘
              │
         PostgreSQL (sessions, issues, learning chunks)
         pgvector (context embeddings)

A ticket enters the system as a Linear webhook. It leaves as a merged PR with a session summary stored in the vector database for the next agent to benefit from.


Lessons Learned

The one-session-per-issue invariant is load-bearing. Early versions created a new session every time a ticket moved to Todo. This produced duplicate PRs, duplicate branches, and confusing Linear comment threads. The fix — reset and reuse — was simple but required adding GetLatestSessionByIssue and ResetSessionForRetry to the query layer.

System comments must be explicitly filtered. The agent posts progress updates to Linear. Without filtering, those comments were being picked up by the orchestrator as human instructions, causing the agent to resume itself in infinite loops. The solution is a isSystemComment() check on both the orchestrator side (before re-enqueuing) and the context builder side (before injecting comments into the system prompt).

HMAC validation should be middleware, not inline. The original implementation read the body inside the handler, validated the signature, then called ShouldBindJSON which also reads the body — and got EOF because the body was already consumed. Buffering the body in middleware and replacing c.Request.Body with a re-readable wrapper is the right pattern.

The pgvector context retrieval is non-fatal. Embedding API calls fail. The context retrieval is wrapped in a non-fatal path: if it errors, the agent runs without prior context. A session should not fail because a vector search timed out.

Stale session recovery is necessary. Workers can crash mid-execution. Sessions that are stuck in CODING or INITIALIZING with a stale heartbeat need to be re-enqueued. A startup goroutine (recoverStaleSessions) handles this. The one exception: PR_OPEN sessions are intentionally not recovered — they are waiting for a human /merge comment, not for the agent to do more work.


What's Next

The GitHub CI integration is partially wired: the orchestrator can receive check_run events but doesn't yet resolve which session owns a given check run. Closing that loop would let the agent self-correct when CI fails without any human intervention.

The PlannerAgent (which decomposes issues into Linear subtasks) exists but is not yet triggered by the orchestrator. Connecting it for large or ambiguous issues is a natural next step.

Conflict resolution on merge currently escalates to the user with instructions to resolve manually. A local git merge context (checked-out worktree) would let the agent handle conflicts programmatically.


References