It feels like a new “agentic coding tool” crops up every day now. So here is how they actually work and can how you build one yourself.

1. Gather Context

Before an AI coding tool like Claude Code or Cursor can write a single line of code, it has to decide what information to look at.

Models can’t read your entire codebase at once. They’re limited by something called the context window, which is basically the model’s short-term memory. If your repo has 100,000 lines of code and the model’s context window holds only 10,000 lines worth, it has to choose carefully what to load.

So the system gathers context selectively, using four main tools:

  1. Shell Search

    1. This is the model’s equivalent of using grep, ls, or find.
      It searches through file names, directories, and code symbols to identify pieces of code related to your query (ie auth.py or login_handler.js when you ask about authentication) and loads only those into its working memory.

  2. Semantic Search

    1. While shell search looks for text matches, semantic search looks for meaning. It typically uses a vector database to find snippets or files that are conceptually similar to your request, even if they don’t contain the same words. For example, asking about “login” might surface functions that mention “credentials” or “session,” even if “login” never appears in the file.

  3. Sub-Agent

    1. Sometimes the model needs to go deeper — like running a small experiment, fetching API docs, or scanning dependencies. Instead of doing that itself and bloating the context window, it spawns a sub-agent to handle the task. The sub-agent runs independently, then reports back only key findings, preserving the main model’s limited memory while still unlocking full agentic capability.

  4. Compacting

    1. If the context window still starts overflowing, or a single file is too long, the system uses compaction. It summarizes the file, keeping only the important parts, and replaces the full text with that summary. This way, the model stays aware of what’s inside the file without exceeding its memory budget.

2. Execute

Once the model has gathered enough context, it starts actually doing things. There are three main components that make this possible:

  1. MCP Servers

    1. These are the tools which allow the agent to interface with the world outside you shell / IDE. Internet access, API calls, and references to library documentation all come from here.

  2. Scripts

    1. Scripts handle local or repeatable automation tasks. They can format files, run linters, execute build steps, or apply consistent transformations without invoking the full model. They’re fast, reliable, and reusable, and they also help preserve context.

  3. Code Gen / Sub-agent

    1. This is where code actually gets written to your files. It can be done with a sub-agent operating independently, taking instructions and generating code changes while the main process keeps its context clean.

3. Verify

After the model executes its plan it moves into the verification stage to make sure the changes were applied successfully.

Verification is essential because (like myself) even advanced models make mistakes. They might misunderstand a dependency, write slightly incorrect syntax, or fail a test.

So instead of assuming success, the system runs a series of validation steps to confirm correctness before looping back or presenting results to the user.

There are three main ways this happens:

  1. Unit Tests

    1. The simplest and most reliable form of verification. If your repo already has tests, the model can run them automatically to confirm that existing functionality wasn’t broken, or that new functionality behaves as expected. It’s the same idea as any CI pipeline: if the tests fail, the model knows something’s wrong and can re-plan or patch the issue before proceeding.

  2. LLM Verification

    1. Even when there aren’t explicit tests, the model can self-check its work using reasoning. It rereads its own output and asks questions like:

      1. Does this function handle edge cases?

      2. Did I forget to close the database connection?

      3. Does this change align with the user’s intent?

    2. This layer uses the model’s own reasoning ability (essentially doing an internal code review) to catch logic or design errors before relying on human feedback.

  3. MCP Validation

    1. The system can use MCP servers for specific tests, ie running Selenium or Cypress

If the verification fails, the agent can re-run the context and execution stages to try to fix errors before handing control back to the user.

See you next week,
– Arjay

PS: I recently started posted in X. No promises it will be software only, but if you want to hear my random thoughts, tag me in your demos, etc, it’s the best place for it right now.

Keep reading

No posts found