AI & Software Development, Engineering Insights

Act via Code

January 2, 2025

Two and a half years since the launch of the GPT-3 API, code assistants have emerged as potentially the premier use case of LLMs. The rapid adoption of AI-powered IDEs and prototype builders isn’t surprising—code is structured, deterministic, and rich with patterns, making it an ideal domain for machine learning. Developers actively working with tools like Cursor (myself included) have an exhilarating yet uncertain sense that the field of software engineering is approaching an inflection point.

Yet there’s a striking gap between understanding and action for today’s code assistants. When provided proper context, frontier LLMs can analyze massive enterprise codebases and propose practical paths towards sophisticated, large-scale improvements. But implementing changes that impact more than a small set of files with modern AI assistants is fundamentally infeasible. The good news is that for focused, file-level changes, we’ve found real success: AI-powered IDEs (Windsurf, Cursor) are transforming how developers write and review code, while chat-based assistants are revolutionizing how we bootstrap and prototype new applications (via tools like v0, lovable.dev, and bolt.new).

However, there’s a whole class of critical engineering tasks that remain out of reach—tasks that are fundamentally programmatic and deal with codebase structure at scale. Much of modern engineering effort is directed towards eliminating tech debt, managing migrations, analyzing dependency graphs, enforcing type coverage, and other global concerns. Today’s AI assistants can propose solutions but lack the mechanisms to execute them. The intelligence is there, but it’s trapped in your IDE’s text completion window.

The bottleneck isn’t intelligence—it’s tooling. The solution is giving AI systems the ability to programmatically interact with codebases through code execution environments. These environments are the most expressive tools we can offer agents, enabling composition, abstraction, and systematic manipulation of complex systems. By combining code execution with custom APIs for large-scale operations, we unlock new high-value use cases.

Beating Minecraft with Code Execution

In mid-2023, a research project called Voyager solved Minecraft, performing several multiples better than prior SOTA. This success wasn’t about raw intelligence—it was about providing a more expressive action space: code.

GPT-4, when allowed to write and execute JavaScript programs through a clean API, could craft high-level behaviors and reuse learned “action programs” across tasks. This enabled skill accumulation, experience recall, and systematic reuse.

“We opt to use code as the action space instead of low-level motor commands because programs can naturally represent temporally extended and compositional actions…”

Code is an Ideal Action Space

Letting AI act through code rather than atomic commands yields a step change in capability. In software engineering, this means expressing assistant behavior through code that manipulates codebases.

# Implement `grep` via for loops and if statements
for function in codebase.functions:
    if 'Page' in function.name:
        function.move_to_file('/pages/' + function.name + '.tsx')

This paradigm brings multiple advantages:

API-Driven Extensibility: Agents can use any operation exposed via a clean API.
Programmatic Efficiency: Batch operations across large codebases are fast and systematic.
Composability: Agents can chain simple operations to form more complex ones.
Constrained Action Space: APIs act as guardrails, preventing invalid actions.
Objective Feedback: Errors provide clear debugging signals.
Natural Collaboration: Code is human-readable and reviewable.

Code Manipulation Programs

To match how developers think about code, agents need high-level APIs, not raw AST surgery. We’re building a framework that reflects actual engineering intuition and abstracts over common edge cases, while preserving correctness.

# Access to high-level semantic operations
for component in codebase.jsx_components:
    if len(component.usages) == 0:
        component.rename(component.name + 'Page')

This isn’t string substitution. The framework understands structure: React hierarchies, type systems, usage graphs. It enables both rapid analysis and safe edits.

We’re also extending this interface to systems outside the repo: AWS, Datadog, and CI/CD platforms. This is the path to autonomous software engineering.

Codegen is now OSS

We’re excited to release Codegen as open source under Apache 2.0 and build out this vision with the developer community. Schedule a demo or join our Slack community to share ideas and feedback.

— Jay Hack, Founder

Act via Code

Beating Minecraft with Code Execution

Code is an Ideal Action Space

Code Manipulation Programs

Codegen is now OSS

Company

Product

Resources

Accounts