Building your own multi-agent coding harness with GitHub actions

By Alexander Lontke

The Problem

As a developer thesedays, you find yourself often in a situation where you have a rough idea on what you need to implement and how. Enough content for a typical GitHub issue with a rough idea on what the solution is suppossed to look like. Then the first step always looks the same, you write down your ideas, paste them into your favorite AI tool, plan, and hit "Ready to implement".

I often found myself, iterating on the solution afterwards but these first steps were always the same: "define", "plan" and "implement first version", with little iteration on the initial plan. So then I thought, why can't I skip this step and start with the first version directly.

In the following sections I will show you the basic version of a GitHub actions workflow that can automate the process of going from a GitHub issue to a PR, using an agent. The Section after that will explain how to build an actual "Harness" on top of this workflow, that can manage multiple agents, their history, and their plans & decisions.

Initial Setup

For the initial setup, we need two things:

  1. Instructions for the agent
  2. An automated way to trigger the agent

Agent Instructions

The agent instructions can be captured in the form of a system prompt file. It is important that you invest proper time into these instructions as they can make or break the quality of your implemenetation. So watch the performance of your agent and if necessary iterate heavily on the system prompt.

# .github/prompts/automation-agent-system-prompt.md

You are an autonomous coding agent following these 5 steps to implement complex tasks on your own:
1. Plan
2. Criticize
3. ...
5. Create a PR

## 1. Plan
...

The same is true for supplementary files that typically aid your agents in understanding your project, like AGENTS.md, CLAUDE.md, or README.md.

Triggering your Agents

Now these instructions also need to be executed. For this I created an automated GitHub action that is triggered when a certain label is added to an issue in the repository.

# github/workflow/auto-code-issue.yml

on:
  issues:
    types: [labeled]

jobs:
  work-on-issues:
    # Run when a certain label is applied
    if: github.event.label.name == "automate"
    runs-on: ubuntu-latest
    timeout-minutes: 120
    permissions:
      contents: write
      pull-requests: write
      issues: write
    steps:
      - name: checkout
        uses: actions/checkout@v5

      - name: Create feature branch
        run: |
          ISSUE_NUMBER=${{ github.event.number }}
          BRANCH="${ISSUE_NUMBER}-automated"
          git checkout -b "$BRANCH"
          echo "branch=$BRANCH" >> "$GITHUB_OUTPUT"

      - name: Automate using Claude Code
        env:
          MODEL_API_KEY: ${{ secrets.MODEL_API_KEY }}
          GH_TOKEN: ${{ secrets.PROJECT_TOKEN }}
        run: |
          <coding-agent-cli-of-choice> \
            --model claude-opus-4-6 \
            --allowedTools "Bash,Read,Edit,Write,Grep,Glob,WebSearch,WebFetch" \
            --append-system-prompt-file .github/prompts/automation-agent-system-prompt.md \
            -p "You are a coding agent working on a GitHub issue (#${{ github.event.issue.number}}).

            Title: ${{ github.event.issue.title }}

            Body:
            ${{ github.event.issue.body }}

This could of course be extended by additional steps that move the current issue into "In Progress" on the project board or similar enhancements. Additionally, you can try different agent setups, like "Ralph Wiggum" or "Generator vs. Evaluator".

While this setup is very simple it is at the same time very effective. It provides you with a history through issues, commits, and PRs, a way to run multiple agents in parallel, and orchestrate them using a planner agent that can creeate GitHub issues for you (with HITL if you set the trigger label yourself).

V2: GitHub Projects - The perfect Harness

As coding agents mature, "Harness Engineering" is gaining in popularity. A Harness in essence is a way for your agent to structure its workflow. In a sense its a way for us developers to develop "claude code" beyond it's capabilities and doing Anthropic's job for them. ;)

In the last section we saw how to creat a simple autonomous coding agetn in a github workflow. This section will show how to get actual value from this.

In particular we want to augment the agent with the following abilities:

  1. Triage, Design, Spec, Plan, and Evaluate Implementations
  2. Repository History & Tracing
  3. Creating Human Visibility: Issues, PRs, and Commits

We will dive deeper into how to implement these in the next sub sections, but the general idea is to use GitHub Projects as a way to structure the workflow of our agents. The project board can be used to manage the different steps of the workflow, while issues and PRs can be used to create visibility for humans and repository files used to manage the history of the agent's work.

Triage, Design, Spec, Plan, and Evaluate Implementations

Initially we need to define the overall behavior of our agents. For this we pre-define the steps that our agents go through, when working on an issue. These steps could all be performed by a single agent, but it can be helpful to split them into multiple agents that are specialized on one of these steps. This avoids problems like "context anxiety" and ideally leads to more consistent behavior.

In our case we will define multiple sub-agents to work on each issue. The following sub-agents are a good starting point for a typical software development workflow, but feel free to adapt them to your needs.

The Triage Agent

There is one problem with issues is that they have different levels of complexity. For simple issues like fixing a typo, we don't want to go through all these steps, but for more complex issues like "Implement a new feature" we want to make sure that we have a proper plan in place before we start implementing. So ideally we would have a way to decide which steps are necessary for which issue. This is the job of the triage agent, it decides which steps are necessary for which issue and assigns the issue to the corresponding agents.

Specifically we will have two triage agents:

  1. The "Simple Triage Agent" that decides if an issue is simple enough to skip creating a spec and plan and assigns it to the "Implementation Agent" directly.
  2. The "Complex Triage Agent" that decides if an issue is complex and assigns it to the "Design Agent" before following the spec, plan, and implement steps.

This means that working on an issue can take one of three paths:

  1. Tier 1: Simple Path: Implementation -> Evaluate -> Fix/Iterate -> Create PR
  2. Tier 2: Standard Path: Spec -> Plan -> Implementation -> Evaluate -> Fix/Iterate -> Create PR
  3. Tier 3: Complex Path: Design -> Spec -> Plan -> Implementation -> Evaluate -> Fix/Iterate -> Create PR

That means both agents need to be placed at the beginning of the workflow. Based on their outcome another step in your GitHub actions pipeline can be triggered, that assigns the issue to the corresponding agent.

# github/workflow/auto-code-issue.yml

name: Triage
env:
  MODEL_API_KEY: ${{ secrets.MODEL_API_KEY }}
  GH_TOKEN: ${{ secrets.PROJECT_TOKEN }}
run: |
  RESULT=$(npx -y <coding-agent-cli-of-choice>
      --append-system-prompt-file .github/prompts/triage-agent-system-prompt.md
      -p "You are workin on..."
  )

  TIER=$(echo "$RESULT" | grep -o '"tier":\s*"tier-[0-9]"' | grep -o 'tier-[0-9]' | tail -1)

  echo $"TIER=$TIER" >> "$GITHUB_OUTPUT"

# [...]

name: Design
if: steps.triage.outputs.TIER == 'tier-3'
env:

Since the triage step is simple you can pick a small model for this step, like a claude haiku model or similar. Nonetheless, this step is crucial and you should invest proper time into the system prompt.

Hint: the following agents are best implementd using the "skills" pattern, this lets you integrate the same capabilities into your own human workflows as well.

The Designer Agent

This agent is responsible for creating high level design documents. Specifically, "blueprints" and "architecture decision records". These do not yet include code specifics but rather outline the overall application components themselves (blueprints) as well as conventions that need to be followed (ADRs).

The Spec Agent

The spec agent is used as an intermediate step between issue description and planning, since human issue descriptions can often be vague and not structured in a way that is ideal for the planning agent. The spec agent creates a more structured and detailed specification based on the issue description, that gets rid of disambiguities. The spec is committed to the repository and can be used by the planning agent as a source of truth for the implementation. Additionally, it can be used by the evaluation agent to evaluate the implementation later on.

The Planning Agent

The planning agent then actually create the implementation plan, exploring the code base, looking at historical commits, and historical repository artifacts (more on this in the next section) to create a plan that can be followed for the implementation. Also the plan is commited.

The Evaluation Agent

The Evaluation agent is our quality gate. It evaluates the implementation created by the implementation agent based on the original issue description, the spec, and the plan. This can be done in multiple iterations, until a certain quality threshold is reached. It creates a detailed evaluation report that can be used by the fix/iterate agent to improve the implementation. This step is inspired by the "Generator vs. Evaluator" pattern from this Anthropic blog post.

The Fix/Iterate Agent

This agent can be triggered if the evaluation report indicates that changes are needed. It is then responsible for fixing the implementation based on the report created by the evaluation agent, if necessary this can be done in multiple iterations until the evaluation report is satisfied.

Finally we trigger another agent that is also responsible for creating the PR at the end of the workflow.

Following this pattern, we can create a github actions workflow that is able to manage the entire lifecycle of an issue, from creating a feature branch to then triage, design, spec, planning, implementation, evaluation, and iteration. As shown above this can be triggered by assigning a certain label to an issue in your project board.

Repository History & Tracing

So far the agents have three types of context:

  1. Repository files through explore like the code itself, README, AGENTS.md, or similar
  2. System prompts
  3. The issue description itself

This section explains how to setup additional repository files to maximize the context gained from point 1. This is especially important for the planning and evaluation agent, since they need to have a good understanding of the code base.

Specifically we want to setup the following folder structure in our repository:

artifacts/
  issues/
    # contains the original issue description as a markdown file
    # e.g. 123-issue-title.md for issue #123
  specs/
    # contains the spec created by the spec agent for each issue
    # e.g. 123-spec-title.md for issue #123
  designs/
    # contains the design documents created by the design agent for each issue
    # e.g. 123-design-title.md for issue #123
  ADR/
    # contains the architecture decision records
    # these are the only files that are not related to a specific issue
    # e.g. 001-adr-title.md for ADR #1

This allows the agents to have a structured way to access the history of their work and the work of other agents. For example, the planning agent can look at the spec created by the spec agent to create a more informed plan, while the evaluation agent can look at both the original issue description and the spec to create a more informed evaluation report.

Creating Human Visibility: Issues, PRs, and Commits

Now the agents outlined above do a great job creating a structured implementation but we need to also make sure that us developers have visibility into what the agents are doing. To create this mental alignment we luckily already have the perfect tools in GitHub: Issues, PRs, and Commits.

Throuch PR's we can evaluate the implementation created by the agents. It is therefore very important to create a detailed PR description that outlines the changes made by the agent and the reasoning behind them. This can be done by the agent creating the PR description based on the original issue description, the spec, the plan, and the evaluation report. This way we can create a clear mental model of why certain changes were made and how they relate to the original issue.

Additionally, a follow up task may be required to be completed at a later step or because human help is required e.g., setting up a GitHub Actions variable, or similar. In this case the agent can create a follow up issue that is linked to the original issue to create visibility for the human developers.

Conclusion

In this blog post we have seen how to create a simple autonomous coding agent using GitHub Actions. We have also seen how to create a more structured workflow for our agents using GitHub Projects as a harness. This allows us to manage the entire lifecycle of an issue, from triage to design, spec, planning, implementation, evaluation, and iteration. Additionally, we have seen how to create visibility for human developers through issues, PRs, and commits.

By using GitHub as a harness for our agents, we can create a seamless integration between human and agent work, allowing us to leverage the strengths of both to further enhance our development process and productivity.