Intermediate

Step 2: Code Analysis with LLM

Design prompts that turn an LLM into an effective code reviewer, handle large diffs that exceed token limits, and parse structured JSON output for downstream processing.

The Prompt is Everything

The quality of your AI code review depends almost entirely on the prompt. A vague prompt produces vague results. A well-structured prompt with clear instructions, examples, and output format produces actionable, line-specific feedback.

Our prompt strategy has three components:

System prompt — Defines the reviewer persona and rules
Review instructions — What to look for and what to ignore
Output format — Structured JSON so we can programmatically post comments

The System Prompt

Create src/prompts.js with the prompt templates:

// src/prompts.js

const SYSTEM_PROMPT = `You are an expert code reviewer. You review pull request
diffs and identify issues. You are thorough but not pedantic.

Rules:
- Only comment on ADDED or MODIFIED lines (lines starting with +)
- Never comment on removed lines or unchanged context lines
- Focus on: bugs, security vulnerabilities, performance issues, error handling
- Ignore: style preferences, formatting, minor naming opinions
- Be specific: reference the exact line number and explain WHY it is an issue
- Be constructive: suggest a fix for every issue you find
- Be concise: one issue per comment, no filler text
- Rate severity: "critical", "warning", or "suggestion"

If the code looks good and you have no issues to report, return an empty array.
Do NOT invent issues just to have something to say.`;

const REVIEW_PROMPT = `Review the following code diff. For each issue found,
return a JSON array of objects with this exact structure:

[
  {
    "file": "path/to/file.js",
    "line": 42,
    "severity": "critical" | "warning" | "suggestion",
    "title": "Brief issue title",
    "description": "What is wrong and why",
    "suggestion": "The suggested fix (code or explanation)"
  }
]

If no issues are found, return: []

IMPORTANT: Return ONLY the JSON array. No markdown, no explanation, no code fences.

Here is the diff to review:

`;

module.exports = { SYSTEM_PROMPT, REVIEW_PROMPT };

💡

Why this prompt works: We explicitly tell the model to only comment on added lines (preventing noise), define severity levels (enabling filtering), require structured JSON (enabling automation), and give it permission to return nothing (preventing hallucinated issues).

The LLM Analyzer

Create src/analyzer.js to send diffs to the LLM and parse results:

// src/analyzer.js
const OpenAI = require('openai');
const { SYSTEM_PROMPT, REVIEW_PROMPT } = require('./prompts');
const { hunksToString } = require('./diff-parser');

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

// Token limit for the model (leave room for response)
const MAX_INPUT_TOKENS = 6000;
const APPROX_CHARS_PER_TOKEN = 4;
const MAX_INPUT_CHARS = MAX_INPUT_TOKENS * APPROX_CHARS_PER_TOKEN;

/**
 * Analyze an array of parsed file objects and return issues.
 */
async function analyzeCode(files) {
  // Chunk files to fit within token limits
  const chunks = chunkFiles(files);
  console.log(`Analyzing ${files.length} files in ${chunks.length} chunk(s)`);

  const allIssues = [];

  for (const chunk of chunks) {
    const issues = await analyzeChunk(chunk);
    allIssues.push(...issues);
  }

  return allIssues;
}

/**
 * Send a single chunk of file diffs to the LLM for review.
 */
async function analyzeChunk(files) {
  // Build the diff text for this chunk
  const diffText = files.map(f => hunksToString(f)).join('\n---\n');
  const userMessage = REVIEW_PROMPT + diffText;

  try {
    const response = await openai.chat.completions.create({
      model: 'gpt-4o',
      messages: [
        { role: 'system', content: SYSTEM_PROMPT },
        { role: 'user', content: userMessage },
      ],
      temperature: 0.1,  // Low temperature for consistent, precise output
      max_tokens: 2000,
    });

    const content = response.choices[0].message.content.trim();
    return parseResponse(content);
  } catch (error) {
    console.error('LLM analysis failed:', error.message);
    return [];
  }
}

/**
 * Parse the LLM response into structured issue objects.
 * Handles edge cases like markdown code fences and invalid JSON.
 */
function parseResponse(content) {
  // Strip markdown code fences if the model included them
  let cleaned = content
    .replace(/^```json\s*/i, '')
    .replace(/^```\s*/i, '')
    .replace(/\s*```$/i, '')
    .trim();

  try {
    const parsed = JSON.parse(cleaned);

    if (!Array.isArray(parsed)) {
      console.warn('LLM returned non-array response, wrapping');
      return [parsed];
    }

    // Validate each issue has required fields
    return parsed.filter(issue => {
      if (!issue.file || !issue.line || !issue.description) {
        console.warn('Skipping malformed issue:', issue);
        return false;
      }
      // Normalize severity
      issue.severity = normalizeSeverity(issue.severity);
      return true;
    });
  } catch (error) {
    console.error('Failed to parse LLM response as JSON:', error.message);
    console.error('Raw response:', content.substring(0, 200));
    return [];
  }
}

function normalizeSeverity(severity) {
  const s = (severity || '').toLowerCase();
  if (s === 'critical' || s === 'error' || s === 'high') return 'critical';
  if (s === 'warning' || s === 'medium' || s === 'warn') return 'warning';
  return 'suggestion';
}

/**
 * Split files into chunks that fit within the token limit.
 * Each chunk is an array of file objects.
 */
function chunkFiles(files) {
  const chunks = [];
  let currentChunk = [];
  let currentSize = 0;

  for (const file of files) {
    const fileText = hunksToString(file);
    const fileSize = fileText.length;

    // If a single file exceeds the limit, it gets its own chunk
    // (the LLM will do its best with truncation)
    if (fileSize > MAX_INPUT_CHARS) {
      if (currentChunk.length > 0) {
        chunks.push(currentChunk);
        currentChunk = [];
        currentSize = 0;
      }
      chunks.push([file]);
      continue;
    }

    // If adding this file would exceed the limit, start a new chunk
    if (currentSize + fileSize > MAX_INPUT_CHARS) {
      chunks.push(currentChunk);
      currentChunk = [file];
      currentSize = fileSize;
    } else {
      currentChunk.push(file);
      currentSize += fileSize;
    }
  }

  if (currentChunk.length > 0) {
    chunks.push(currentChunk);
  }

  return chunks;
}

module.exports = { analyzeCode, analyzeChunk, parseResponse, chunkFiles };

Using Anthropic Claude Instead

If you prefer to use Claude instead of OpenAI, the swap is straightforward. Install the Anthropic SDK and create an alternative analyzer:

npm install @anthropic-ai/sdk

// src/analyzer-claude.js (alternative provider)
const Anthropic = require('@anthropic-ai/sdk');
const { SYSTEM_PROMPT, REVIEW_PROMPT } = require('./prompts');
const { hunksToString } = require('./diff-parser');

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

async function analyzeChunkClaude(files) {
  const diffText = files.map(f => hunksToString(f)).join('\n---\n');

  try {
    const response = await anthropic.messages.create({
      model: 'claude-sonnet-4-20250514',
      max_tokens: 2000,
      system: SYSTEM_PROMPT,
      messages: [
        { role: 'user', content: REVIEW_PROMPT + diffText },
      ],
    });

    const content = response.content[0].text.trim();
    return parseResponse(content);  // Same parser works for both
  } catch (error) {
    console.error('Claude analysis failed:', error.message);
    return [];
  }
}

🛠

Provider-agnostic design: Because our prompt outputs structured JSON, the same parseResponse() function works regardless of which LLM provider you use. You can even support both and let users choose via the config file (Lesson 5).

Handling Large Diffs

Real-world PRs can be huge. Here is how our chunking strategy handles edge cases:

Small PR (under 6K tokens)

All files go in a single chunk. One LLM call. Fastest and cheapest. This covers most PRs.

Medium PR (6K-30K tokens)

Files split into multiple chunks. Each chunk reviewed independently. Results merged. Slightly more cost but complete coverage.

Large PR (30K+ tokens)

Many chunks, each reviewed separately. Consider adding a summary step that combines findings across chunks for a holistic review.

Single huge file

Gets its own chunk. The LLM reviews what fits. For extremely large files, consider splitting by hunk rather than by file.

Testing the Analyzer

Create a test file to verify the analyzer works end-to-end:

// test-analyzer.js
require('dotenv').config();
const { analyzeCode, parseResponse } = require('./src/analyzer');

// Test the JSON parser with a sample response
const sampleResponse = `[
  {
    "file": "src/utils.js",
    "line": 13,
    "severity": "critical",
    "title": "Missing null check",
    "description": "item.price could be undefined, causing NaN in the total",
    "suggestion": "Add: if (item.price == null) continue;"
  },
  {
    "file": "src/utils.js",
    "line": 30,
    "severity": "warning",
    "title": "Weak email validation",
    "description": "Checking only for @ is insufficient validation",
    "suggestion": "Use a regex: /^[^\\\\s@]+@[^\\\\s@]+\\\\.[^\\\\s@]+$/.test(email)"
  }
]`;

const issues = parseResponse(sampleResponse);
console.log('Parsed issues:', JSON.stringify(issues, null, 2));
console.log(`Found ${issues.length} issues`);

💡

Cost tip: During development, test with the JSON parser first (no API calls needed). Only call the real API when you are confident the pipeline works. A typical code review costs $0.01-$0.05 with GPT-4o.

What Is Next

The LLM analyzer is complete. We can now send code diffs to the AI and get back structured issues with file paths, line numbers, and severity levels. In the next lesson, we will build Step 3: Posting Review Comments — taking these issues and posting them as inline comments directly on the GitHub PR.

← Previous GitHub Webhook & PR Parsing Next → Posting Review Comments