Autonomous CI Fix Agent Guide

Automatically Fix Common CI/CD Failures Without Human Intervention

Table of Contents

  1. Overview
  2. What It Does
  3. Supported Auto-Fixes
  4. Setup Instructions
  5. How It Works
  6. Enhancing with AI
  7. Customization
  8. Example: Complete AI-Powered Version
  9. Monitoring & Alerts
  10. Best Practices
  11. Troubleshooting
  12. Cost Considerations

Overview

An autonomous agent that monitors your GitHub repository, detects CI/CD failures, analyzes errors, and automatically fixes common issues without human intervention.

Monitors

Watches GitHub Actions workflows for failures

Analyzes

Uses pattern matching and AI to understand errors

Fixes

Automatically applies fixes for common issues

Reports

Creates summaries and issues for complex problems

What It Does

Core Capabilities

  • Monitors: Watches GitHub Actions workflows for failures
  • Analyzes: Uses pattern matching and AI to understand errors
  • Fixes: Automatically applies fixes for common issues
  • Reports: Creates summaries and issues for complex problems

Key Benefits

  • No manual intervention needed for common errors
  • Faster CI/CD pipeline recovery
  • Reduced developer context switching
  • Consistent fix quality
  • 24/7 monitoring and fixing

Supported Auto-Fixes

1. NPM Lock File Sync Issues

Error: npm ci can only install packages when your package.json and package-lock.json are in sync

Common Cause: Someone updated package.json but forgot to commit the updated package-lock.json

Auto-Fix: Runs npm install to update lock file and commits the change

# The agent automatically runs:
npm install
git add package-lock.json
git commit -m "🤖 Auto-fix: Update package-lock.json to sync with package.json"
git push

2. Missing Dependencies

Error: Missing: [package] from lock file

Auto-Fix: Installs missing dependencies and updates lock file

# The agent automatically runs:
npm install [missing-package]
git add package-lock.json package.json
git commit -m "🤖 Auto-fix: Install missing dependencies"
git push

3. Other Errors

Error: Unknown or complex errors

Action: Creates a GitHub issue with error details for manual review

Smart Behavior: The agent only auto-fixes errors it's confident about. Complex or unknown errors are flagged for human review.

Setup Instructions

Step 1: Enable the Workflow

The workflow file is already created at .github/workflows/autonomous-ci-fix-agent.yml

Verify the File Exists

# Check if the workflow file exists
ls .github/workflows/autonomous-ci-fix-agent.yml

Step 2: Configure Permissions

The workflow needs these permissions (already configured):

  • contents: write - To commit fixes
  • pull-requests: write - To create PRs (if needed)
  • issues: write - To create issues for complex errors

Note: These permissions are already set in the workflow file. No action needed unless you want to modify them.

Step 3: Test the Agent

Option A: Manual Trigger

  1. Go to your GitHub repository
  2. Navigate to Actions tab
  3. Select Autonomous CI Fix Agent
  4. Click Run workflow

Option B: Automatic Trigger

The agent will trigger automatically when CI workflows fail. To test:

  1. Intentionally break a workflow (e.g., add invalid syntax)
  2. Push the change
  3. Wait for the workflow to fail
  4. The agent will automatically analyze and attempt to fix

How It Works

Workflow Triggers

on:
  workflow_run:
    workflows: ["CI", "Tests", "Build"]
    types:
      - completed

The agent runs when:

  • Any workflow named "CI", "Tests", or "Build" completes
  • Only if the workflow failed
  • Can also be manually triggered

Customization: You can modify the workflows list to monitor different workflow names.

Error Analysis

The agent uses pattern matching to identify common errors:

# NPM lock file sync
if grep -q "npm ci.*can only install packages"; then
  fix_action="run_npm_install"
fi

# Missing dependencies
if grep -q "Missing:.*from lock file"; then
  fix_action="run_npm_install"
fi

How It Works:

  1. Downloads the failed workflow's logs
  2. Searches for known error patterns
  3. Identifies the error type
  4. Determines the appropriate fix action

Auto-Fix Process

  1. Detect Error: Analyze workflow logs
  2. Identify Type: Match error patterns
  3. Apply Fix: Run appropriate fix command
  4. Commit: Automatically commit the fix
  5. Report: Create summary or issue

Example Fix Flow

  1. CI workflow fails with "npm ci can only install packages..."
  2. Agent detects the error pattern
  3. Agent runs npm install
  4. Agent checks if package-lock.json changed
  5. If changed, agent commits and pushes the fix
  6. CI workflow runs again automatically (if configured)

Enhancing with AI

Option 1: Use OpenAI API (More Intelligent)

Add this step to analyze errors with GPT:

- name: Analyze error with OpenAI
  env:
    OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
  run: |
    ERROR_LOG=$(cat workflow_logs.txt)
    
    RESPONSE=$(curl -s https://api.openai.com/v1/chat/completions \
      -H "Authorization: Bearer $OPENAI_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "gpt-4",
        "messages": [{
          "role": "system",
          "content": "You are a CI/CD error analyzer. Analyze the error and suggest a fix."
        }, {
          "role": "user",
          "content": "Error: '"$ERROR_LOG"'"
        }]
      }')
    
    echo "analysis=$RESPONSE" >> $GITHUB_OUTPUT

Benefits: More intelligent error analysis, can handle complex errors, suggests better fixes

Cost: ~$0.01-0.10 per analysis

Option 2: Use Ollama (Free, Local)

If you have a self-hosted runner with Ollama:

- name: Analyze with Ollama
  run: |
    ERROR_LOG=$(cat workflow_logs.txt)
    
    ANALYSIS=$(ollama run llama3.2:3b "Analyze this CI error and suggest a fix: $ERROR_LOG")
    
    echo "analysis=$ANALYSIS" >> $GITHUB_OUTPUT

Benefits: Completely free, runs locally, no API costs, private

Requirement: Self-hosted GitHub Actions runner with Ollama installed

Option 3: Use GitHub Copilot API

- name: Analyze with GitHub Copilot
  uses: actions/github-script@v7
  with:
    github-token: ${{ secrets.GITHUB_TOKEN }}
    script: |
      const errorLog = fs.readFileSync('workflow_logs.txt', 'utf8');
      // Use GitHub API to analyze
      // Implementation depends on Copilot API availability

Note: GitHub Copilot API integration may require additional setup. Check GitHub's documentation for current availability.

Customization

Add More Error Patterns

Edit the workflow to add more patterns:

- name: Analyze error with AI
  run: |
    # Add your custom patterns
    if echo "$ERROR_LOG" | grep -q "Your custom error pattern"; then
      echo "error_type=custom_error" >> $GITHUB_OUTPUT
      echo "fix_action=custom_fix" >> $GITHUB_OUTPUT
    fi

Example: Add Python Dependency Error

# Detect: "ERROR: Could not find a version that satisfies the requirement"
if echo "$ERROR_LOG" | grep -q "Could not find a version"; then
  echo "error_type=python_dependency" >> $GITHUB_OUTPUT
  echo "fix_action=update_requirements" >> $GITHUB_OUTPUT
fi

Add More Auto-Fixes

Add new fix steps:

- name: Auto-fix custom error
  if: steps.analyze.outputs.error_type == 'custom_error'
  run: |
    # Your fix commands here
    npm run fix-custom-issue
    git add .
    git commit -m "🤖 Auto-fix: Custom error"
    git push

Example: Auto-fix Python Dependencies

- name: Auto-fix Python dependencies
  if: steps.analyze.outputs.error_type == 'python_dependency'
  run: |
    pip install --upgrade pip
    pip install -r requirements.txt
    git add requirements.txt
    git commit -m "🤖 Auto-fix: Update Python dependencies"
    git push

Monitor Different Workflows

Change which workflows trigger the agent:

on:
  workflow_run:
    workflows: ["Your-Workflow-Name", "Another-Workflow"]
    types:
      - completed

Tip: You can monitor all workflows by using workflows: ["*"], but be careful as this will trigger on every workflow failure.

Example: Complete AI-Powered Version

Here's a more advanced version using OpenAI:

name: AI-Powered CI Fix Agent

on:
  workflow_run:
    workflows: ["CI"]
    types:
      - completed

jobs:
  ai-fix:
    if: github.event.workflow_run.conclusion == 'failure'
    runs-on: ubuntu-latest
    permissions:
      contents: write
      pull-requests: write
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Get error logs
        id: logs
        run: |
          gh run view ${{ github.event.workflow_run.id }} --log > error.log
          echo "error=$(cat error.log | base64 -w 0)" >> $GITHUB_OUTPUT
      
      - name: AI Analysis
        id: ai
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: |
          ERROR=$(echo "${{ steps.logs.outputs.error }}" | base64 -d)
          
          ANALYSIS=$(curl -s https://api.openai.com/v1/chat/completions \
            -H "Authorization: Bearer $OPENAI_API_KEY" \
            -H "Content-Type: application/json" \
            -d '{
              "model": "gpt-4",
              "messages": [{
                "role": "system",
                "content": "Analyze CI errors and return JSON: {\"error_type\": \"...\", \"fix_commands\": [\"...\"], \"confidence\": 0.9}"
              }, {
                "role": "user",
                "content": "'"$ERROR"'"
              }]
            }' | jq -r '.choices[0].message.content')
          
          echo "analysis=$ANALYSIS" >> $GITHUB_OUTPUT
      
      - name: Apply AI-suggested fix
        if: steps.ai.outputs.analysis != ''
        run: |
          ANALYSIS='${{ steps.ai.outputs.analysis }}'
          FIX_COMMANDS=$(echo "$ANALYSIS" | jq -r '.fix_commands[]')
          
          for cmd in $FIX_COMMANDS; do
            eval "$cmd"
          done
          
          git add .
          git commit -m "🤖 AI Auto-fix: ${{ steps.ai.outputs.analysis | jq -r '.error_type' }}"
          git push

Security Note: Be very careful when executing AI-suggested commands. Always review the commands before execution, or add a safety check to only execute commands from a whitelist.

Monitoring & Alerts

Get Notifications

Add Slack/Discord notifications:

- name: Notify on fix
  if: steps.analyze.outputs.error_type != 'no_logs'
  uses: slackapi/slack-github-action@v1
  with:
    webhook-url: ${{ secrets.SLACK_WEBHOOK }}
    payload: |
      {
        "text": "🤖 Auto-fixed CI error: ${{ steps.analyze.outputs.error_type }}"
      }

Email Notifications

- name: Send email notification
  uses: dawidd6/action-send-mail@v3
  with:
    server_address: smtp.gmail.com
    server_port: 465
    username: ${{ secrets.EMAIL_USERNAME }}
    password: ${{ secrets.EMAIL_PASSWORD }}
    subject: "CI Auto-Fix: ${{ steps.analyze.outputs.error_type }}"
    body: "The agent fixed: ${{ steps.analyze.outputs.error_type }}"
    to: your-email@example.com

Best Practices

  1. Start Simple: Begin with pattern matching, add AI later
  2. Test Thoroughly: Test on non-critical branches first
  3. Monitor Results: Review auto-fixes to improve patterns
  4. Set Boundaries: Only auto-fix safe, common errors
  5. Document: Keep track of what the agent fixes

Safety Recommendations

  • Only auto-fix errors you're 100% confident about
  • Require manual approval for complex fixes
  • Set up alerts for all auto-fixes
  • Review agent actions regularly
  • Have a rollback plan

Troubleshooting

Agent Not Triggering

  • Check workflow names: Ensure they match exactly (case-sensitive)
  • Verify permissions: Check that permissions are set correctly
  • Check workflow_run event: Ensure it's supported in your repository
  • Check Actions tab: Look for any error messages

Fixes Not Working

  • Review error logs: Check Actions logs for details
  • Check fix commands: Verify commands are correct
  • Verify git permissions: Ensure the agent can commit
  • Check branch protection: Some branches may prevent direct commits

Too Many Auto-Fixes

  • Add confidence thresholds: Only fix if confidence is high
  • Require manual approval: For certain fix types
  • Limit to specific error types: Only auto-fix known safe errors
  • Add rate limiting: Limit number of fixes per day

Cost Considerations

Free Option (Current)

  • Uses GitHub Actions (free for public repos)
  • Pattern matching (no API costs)
  • Basic error detection

Perfect for: Most use cases, especially if you have a public repository

AI-Powered Option

Service Cost per Analysis Best For
OpenAI API ~$0.01-0.10 Complex error analysis
Ollama Free Self-hosted runners
GitHub Copilot Included with subscription GitHub Enterprise users

Cost Estimate: If you have 10 CI failures per week and use OpenAI API, that's approximately $0.10-1.00 per week, or $5-50 per year.

Next Steps

  1. Enable the workflow in your repository
  2. Test it by triggering a known failure
  3. Monitor the first few auto-fixes
  4. Enhance with AI if needed
  5. Expand to more error types

Related Resources

Conclusion

This agent autonomously fixes CI failures, saving you time and keeping your builds green. Start with the basic pattern matching version, then enhance with AI as needed.

Remember:

  • Start simple: Pattern matching works for most common errors
  • Test first: Always test on non-critical branches
  • Monitor closely: Review agent actions regularly
  • Expand gradually: Add more error types over time