Jules: Google's Async AI Coding Agent Actually Changes How I Think About Code Reviews

DevArt keeps this article discoverable at a fast, self-canonical URL and links clearly to the original DEV publication.

This is a submission for the Google I/O 2026 Challenge: Explore Google I/O 2026

Google announced a lot at I/O 2026. Most of it was Gemini branding on existing products. Jules is not that.

Jules is an asynchronous AI coding agent that takes a GitHub issue, opens a cloud VM, clones your repo, reads the code, writes a plan, implements the fix, runs the test suite, and opens a pull request — while you're doing something else entirely.

I've used Copilot for autocomplete and Cursor for in-editor AI. Jules is a different category of tool. Here's my honest take.

What Jules Actually Does

The workflow is dead simple:

You link Jules to a GitHub repo
You point it at an issue: "Handle rate limiting for the /api/search endpoint"
Jules spins up a cloud VM, clones the repo, reads the relevant code
It writes a plan (visible to you), implements the changes, runs tests
Opens a PR — you review it like you'd review a teammate's work

The key word is asynchronous. You don't sit and watch it work. You come back to a PR.

This sounds like every AI coding demo you've seen. It isn't. The difference is what happens in the cloud VM step — Jules doesn't just generate code. It actually runs the code. It sees failing tests. It iterates. It knows when it's wrong.

Where It Genuinely Surprised Me

It reads the whole repo, not just the file.

When I gave Jules an issue about adding rate limiting, it found the existing middleware stack, noticed there was already a request logger, and plugged the rate limiter into that chain rather than creating a separate one. A naive code generator would've added a duplicate layer. Jules read the architecture first.

It writes a plan before touching any code.

Before Jules writes a single line, it produces a step-by-step plan that you can review and amend. This is more useful than it sounds — it's how you catch when Jules misunderstood the issue before it wastes 10 minutes implementing the wrong thing.

It runs the test suite and fixes failures.

This is the part no autocomplete tool does. If tests break, Jules doesn't just stop. It reads the failure, traces it back to its own change, and fixes it. It's not perfect — deeply integration-tested code with external dependencies will still stump it — but for unit-tested Python and TypeScript code, it's surprisingly capable.

The Language Support Is Real

Jules currently supports Python, TypeScript, JavaScript, Go, Rust, and Java. This isn't a Python-mostly tool with half-baked Go support. I tested it on a Go service with table-driven tests and it handled the test structure correctly — something most AI tools fumble.

Where Jules Falls Short

It struggles with ambiguous issues.

If your issue says "the login is broken," Jules will make a guess about what broken means. That guess might be wrong. Jules works best on issues that are specific: a concrete bug with reproduction steps, or a clearly defined feature with a stated interface.

Complex integration tests with real databases will trip it up.

If your test suite requires a running Postgres container, Jules will hit errors it can't easily recover from in its sandboxed VM. It handles unit tests and in-process integration tests well. Anything that requires external services is still rough.

Review culture still matters.

Jules opens PRs. Someone still needs to review them. The dangerous failure mode is treating Jules PRs as pre-approved because "the AI wrote it." The code is good enough that it's easy to rubber-stamp, which is exactly when you'll miss the subtle issue Jules didn't see. Your review process doesn't go away — it just shifts from "write the code" to "evaluate the code."

The Mental Model Shift

Here's what I think Jules actually changes: the cost of starting a task.

The hardest part of addressing a backlog of small issues isn't implementation time — it's the context-switching cost of picking up each one, understanding the relevant code, and writing the fix. Jules collapses that cost. Small bugs that would sit in the backlog for three sprints become things you triage in a Slack message and review over coffee.

Large, complex features still need human engineers doing human engineering. But there's a huge class of work — small bugs, refactors with clear scope, adding tests to existing code, migrating deprecated APIs — where Jules is going to be the right tool.

Bottom Line

Jules isn't a replacement for software engineers. It's a replacement for the cognitive overhead of small, well-defined tasks. That's a meaningful thing to automate, and Google's implementation is more capable than I expected.

The async model is the right design. The plan-first approach is the right design. Running tests in the loop is the right design. These aren't accidents — they're the three things that actually make an AI coding agent useful rather than a party trick.

Try it on your most boring GitHub backlog issue. You'll understand immediately why this is the version that matters.

Links: Jules on Google Labs · Google I/O 2026 Dev Keynote