Building an AI Code Review Bot
Prerequisites
Before we start, you’ll need:
- GitHub account with repo access
- OpenAI API key (and the willingness to watch your credits evaporate)
- Node.js 18+
- Basic GitHub Actions knowledge
- A mass-produced motivational desk poster (“Teamwork Makes The Dream Work” or similar)
- At least one mass-produced motivational desk poster
What We’re Building
A GitHub bot that automatically reviews pull requests. It catches the obvious stuff, enforces style, and gives your human reviewers more time for the things that actually matter. You know, architecture discussions, bikeshedding over variable names, the important work.
Ive wanted something like this for ages. Every team I’ve worked on has the same problem: PRs sit in a queue while reviewers context-switch away from their own work, and by the time anyone looks, half the comments are “missing semicolon” or “unused import.” Tedious for everyone involved.
So lets build a bot that handles the boring bits.
The Approach
- Create GitHub App
- Build webhook handler
- Parse PR diffs
- Generate reviews with LLM
- Post comments
Nothing too wild. The whole thing is surprisingly straightforward once you see it laid out.
Step 1: Create GitHub App
- Go to GitHub, then Settings, then Developer Settings, then GitHub Apps
- Create new app with permissions:
- Pull requests: Read & Write
- Contents: Read
- Subscribe to events: Pull request, Pull request review
- Generate private key and save
Keep that private key somewhere safe. Not in your repo. Not in a Slack message. You know who you are.

Step 2: Set Up Project
mkdir ai-code-reviewer && cd ai-code-reviewer
npm init -y
npm install @octokit/app @octokit/webhooks openai express
Standard fare. Octokit for GitHub, OpenAI for the brains, Express because it’s there and it works.
// src/index.ts
import express from 'express';
import { App } from '@octokit/app';
import { createNodeMiddleware } from '@octokit/webhooks';
import { handlePullRequest } from './reviewer';
const app = new App({
appId: process.env.GITHUB_APP_ID!,
privateKey: process.env.GITHUB_PRIVATE_KEY!,
webhooks: {
secret: process.env.WEBHOOK_SECRET!,
},
});
app.webhooks.on('pull_request.opened', handlePullRequest);
app.webhooks.on('pull_request.synchronize', handlePullRequest);
const server = express();
server.use(createNodeMiddleware(app.webhooks));
server.listen(3000);
We’re listening for two events: when a PR is opened and when new commits are pushed to it. The synchronize event is the one people forget about. Without it, your bot only reviews the initial push and then goes silent while the author frantically force-pushes fixes.
Step 3: Fetch PR Diff
// src/github.ts
import { Octokit } from '@octokit/rest';
interface FileDiff {
filename: string;
patch: string;
additions: number;
deletions: number;
}
export async function getPRDiff(
octokit: Octokit,
owner: string,
repo: string,
pullNumber: number
): Promise<FileDiff[]> {
const { data: files } = await octokit.pulls.listFiles({
owner,
repo,
pull_number: pullNumber,
});
return files
.filter((file) => file.patch)
.map((file) => ({
filename: file.filename,
patch: file.patch!,
additions: file.additions,
deletions: file.deletions,
}));
}
export async function getFileContent(
octokit: Octokit,
owner: string,
repo: string,
path: string,
ref: string
): Promise<string> {
const { data } = await octokit.repos.getContent({
owner,
repo,
path,
ref,
});
if ('content' in data) {
return Buffer.from(data.content, 'base64').toString('utf-8');
}
throw new Error('Not a file');
}
The .filter((file) => file.patch) is doing quiet but critical work here. Binary files dont have patches, and sending a PNG diff to an LLM is a fantastic way to burn tokens on absolute nonsense.
Step 4: Build the Reviewer
This is where it gets fun. We’re handing the diff to GPT-4o and asking it to be a code reviewer. The system prompt is doing most of the heavy lifting.
// src/reviewer.ts
import OpenAI from 'openai';
import { getPRDiff, getFileContent } from './github';
const openai = new OpenAI();
interface ReviewComment {
path: string;
line: number;
body: string;
severity: 'error' | 'warning' | 'suggestion';
}
const SYSTEM_PROMPT = `You are a code reviewer. Review the following diff and identify issues.
For each issue found, respond with JSON:
{
"comments": [
{
"path": "filename",
"line": line_number,
"body": "description of issue and suggestion",
"severity": "error|warning|suggestion"
}
]
}
Focus on:
- Bugs and logic errors
- Security vulnerabilities
- Performance issues
- Code style problems
- Missing error handling
Be specific. Include code examples for fixes.
If the code looks good, return an empty comments array.`;
export async function reviewDiff(files: FileDiff[]): Promise<ReviewComment[]> {
const diffText = files
.map((f) => `### ${f.filename}\n\`\`\`diff\n${f.patch}\n\`\`\``)
.join('\n\n');
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{ role: 'system', content: SYSTEM_PROMPT },
{ role: 'user', content: diffText },
],
response_format: { type: 'json_object' },
temperature: 0,
});
const result = JSON.parse(response.choices[0].message.content!);
return result.comments || [];
}
Setting temperature: 0 is important. You want deterministic, pedantic reviews, not creative writing. Nobody needs their code reviewer having an artistic moment.

Step 5: Post Review Comments
// src/post-review.ts
import { Octokit } from '@octokit/rest';
import { ReviewComment } from './reviewer';
export async function postReview(
octokit: Octokit,
owner: string,
repo: string,
pullNumber: number,
comments: ReviewComment[],
commitId: string
): Promise<void> {
if (comments.length === 0) {
await octokit.pulls.createReview({
owner,
repo,
pull_number: pullNumber,
event: 'APPROVE',
body: '✅ No issues found. LGTM!',
});
return;
}
const hasErrors = comments.some((c) => c.severity === 'error');
await octokit.pulls.createReview({
owner,
repo,
pull_number: pullNumber,
commit_id: commitId,
event: hasErrors ? 'REQUEST_CHANGES' : 'COMMENT',
body: `AI Review found ${comments.length} issue(s)`,
comments: comments.map((c) => ({
path: c.path,
line: c.line,
body: formatComment(c),
})),
});
}
function formatComment(comment: ReviewComment): string {
const icons = {
error: '🔴',
warning: '🟡',
suggestion: '💡',
};
return `${icons[comment.severity]} **${comment.severity.toUpperCase()}**\n\n${comment.body}`;
}
The REQUEST_CHANGES vs COMMENT distinction matters. If the bot only finds suggestions and warnings, it comments but doesnt block the merge. Actual errors? It requests changes. This way your team can still merge when the bot is being overly cautious, but genuine problems get flagged properly.
Step 6: Wire It Together
// src/handler.ts
import { EmitterWebhookEvent } from '@octokit/webhooks';
import { getPRDiff } from './github';
import { reviewDiff } from './reviewer';
import { postReview } from './post-review';
export async function handlePullRequest(
event: EmitterWebhookEvent<'pull_request'>
): Promise<void> {
const { pull_request, repository, installation } = event.payload;
if (!installation) return;
const octokit = await app.getInstallationOctokit(installation.id);
console.log(`Reviewing PR #${pull_request.number}`);
const files = await getPRDiff(
octokit,
repository.owner.login,
repository.name,
pull_request.number
);
const relevantFiles = files.filter(
(f) => f.filename.endsWith('.ts') ||
f.filename.endsWith('.tsx') ||
f.filename.endsWith('.js')
);
if (relevantFiles.length === 0) {
console.log('No reviewable files');
return;
}
const comments = await reviewDiff(relevantFiles);
await postReview(
octokit,
repository.owner.login,
repository.name,
pull_request.number,
comments,
pull_request.head.sha
);
console.log(`Posted ${comments.length} comments`);
}
Notice we’re filtering to .ts, .tsx, and .js files only. You could expand this, but I’d recommend starting narrow. Reviewing markdown files or config YAML with an LLM is like hiring a Michelin-star chef to microwave your leftovers.
Step 7: GitHub Actions Alternative
If running a dedicated app feels like overkill (and honestly, for smaller teams it probably is), you can do the whole thing as a GitHub Action instead:
# .github/workflows/ai-review.yml
name: AI Code Review
on:
pull_request:
types: [opened, synchronize]
jobs:
review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Get diff
run: |
git diff origin/${{ github.base_ref }}...HEAD > diff.txt
- name: Run AI review
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: npx ai-reviewer review --diff diff.txt --pr ${{ github.event.pull_request.number }}
Same idea, less infrastructure. The tradeoff is you lose the real-time webhook feel and it runs on Actions minutes instead. Pick your poison.
Step 8: Rate Limiting and Costs
This bit isnt glamorous but it’ll save you from a nasty surprise on your OpenAI bill.
const COST_PER_1K_INPUT = 0.01;
const COST_PER_1K_OUTPUT = 0.03;
const MAX_DIFF_LINES = 500;
function estimateCost(diffText: string): number {
const inputTokens = diffText.length / 4;
const outputTokens = 500;
return (inputTokens / 1000 * COST_PER_1K_INPUT) +
(outputTokens / 1000 * COST_PER_1K_OUTPUT);
}
function shouldReview(files: FileDiff[]): boolean {
const totalLines = files.reduce((sum, f) => sum + f.additions + f.deletions, 0);
return totalLines <= MAX_DIFF_LINES;
}
That 500-line cap is generous but sensible. Massive PRs should be broken up anyway. If someone opens a 2,000-line PR, the bot politely declining to review it might be the best feedback it could give.

The Result
What you end up with:
- Automatic first-pass code review on every PR
- Common issues caught before a human even looks
- Consistent feedback across the whole team
- Configurable severity levels so you can tune the noise
- Scales effortlessly as your team grows

What I’d Do Differently
Add context about the codebase. Generic reviews miss project-specific patterns and conventions. Including your README, style guide, and architecture docs in the prompt makes a massive difference. Without that context, the bot is reviewing code in a vacuum, and its suggestions will feel generic because they are.
The other thing I’d change is adding a feedback loop. Let reviewers react to bot comments with thumbs up or down, then use that signal to refine the prompt over time. A bot that learns what your team actually cares about is worth ten times more than one running a stock prompt.
AI code review isnt replacing humans. It’s handling the tedious bits so your reviewers can focus on design, architecture, and arguing about whether that variable should be called data or result.