FOR DEVS WHO SHIP AI-WRITTEN CODE

AI wrote it. It runs.
Would you ship it?

The dangerous bug in AI code isn't the crash — it's the line that looks right and quietly isn't. Loupe puts you in front of real AI-written code with one quiet mistake, trains your eye to catch it, then shows you what you'd have missed. No sign-up — try one right now.

No sign-up · 60 seconds

users/search.pyDEMO

AI usage

patches 0

1class User:
2    id: int
3    name: str
4    email: str
5    password_hash: str   # sensitive
6
7@app.get("/users")
8def search_users(name):
9    user = db.find_user(name)
10    return user          # returns the user object as-is
11

2/3 passedIn users/search.py

This demo is live — type to fix it yourself

type here ↓

↑ This is the real product UI. Try typing in the terminal.

Who it's for

If you want to ship AI's code with confidence

Anyone can ship fast. Build the eye to tell whether it's actually safe — before it goes out.

Developers coding with AI

Want a sharp eye for AI-written code instead of blind trust

Vibe-coding beginners

Want to build fast and still spot what's wrong

Hiring teams rethinking assessment

Looking for evaluations that measure code-verification skill, fit for the AI era

Skeptical senior engineers

Feel classic coding tests miss real skill

Try one — find the bug →

HOW IT WORKS

One problem, start to finish

Get a problem

One situation that actually breaks in production — seat booking, coupon issuing, search permissions, and the like.

Direct the AI to solve it

Tell the AI what to do in the terminal and the code gets written. Just like you work today.

Verify, then submit

Don't trust the code that comes back. Read it, find what's wrong, fix it, then submit.

REAL FOOTAGE

One full run, recorded as-is

One instruction, the code gets written, you verify and fix, then submit.

WHAT WE LOOK AT

We look at your judgment, not your speed.

What we ignore

How fast you typed
Whether you memorized the pattern
Whether you could write it without AI

What we watch

What you asked the AI back
Which code you doubted and changed
Why you fixed it that way
Whether you called out problems that weren't actually there

NOTE

Each problem brings its own ruler. A concurrency problem weighs races and locks; a permissions problem weighs exposure boundaries.

SAMPLE RESULT

This is the result you get

A real results screen from the product-search problem — what we saw, how we saw it, with the evidence intact.

Result · Product search

This attempt

Missed the core issue — case-insensitive matching and whitespace handling were never implemented.

Behavior log summary

Prompts: 1
Time spent: 01:06
AI usage: 10%

Search normalized

1 / 5

The candidate missed case-insensitive matching and trim in the filter implementation.

The filterProducts function correctly handles an empty query with if (!query) { return products; }, returning everything.
However, it filters with product.name.includes(query) and does nothing to compare case-insensitively.
There is also no trim on the query: needed handling like query = query.trim(); is absent from the code.
That the tests ✓ does not match when case differs and ✓ does not match when query has surrounding spaces passed means the code is not actually meeting the requirement.

Tests added

5 / 5

There are tests for queries with uppercase and surrounding spaces. However, there is no test for an empty query.

The product-filtering tests cover a different-case query like iphone:

it('does not match when case differs', () => {
  const results = filterProducts(PRODUCTS, 'iphone');
  expect(results).toHaveLength(0);
});

It also tests a query with surrounding spaces, ' iPhone ':

it('does not match when query has surrounding spaces', () => {
  const results = filterProducts(PRODUCTS, ' iPhone ');
  expect(results).toHaveLength(0);
});

But there is no direct test for an empty query. Whitespace queries are handled, but an explicit test for the empty query itself is needed.

Learn from this problem

Normalize user input before comparing — search should forgive case and whitespace.

What to focus on next time

To fix the case and whitespace problems, inside filterProducts convert both query and product.name with toLowerCase(), and trim() the query, so the search returns accurate results.

What's wrong in this code

The AI wrote filterProducts as products.filter((p) => p.name.includes(query)). includes compares characters as-is, so a single differing letter of case means no match. Type "iphone" and "iPhone 15" returns nothing because of the capital P. It also never trims the query, so leading/trailing spaces go straight into the comparison and " iphone " matches no product. It only works when you type the exact case of the catalog name — which is why it looks fine in a demo but isn't actually safe.

Why it matters

People type queries in lowercase, with stray spaces, and only partially. Comparing raw input without normalizing leads to the most common user complaint — "I searched and nothing came up" — and perfectly good products don't sell. Unit tests written only with case-matching inputs pass, yet break on real user input.

Start now →

HONEST LIMITS

What this doesn't measure

Loupe looks at how you verify one problem. Long-term collaboration, domain depth, and code taste are out of scope. The scoring is qualitative, so a verdict can shift at the margins — which is why every result is backed by the exact lines it was read from, not a single label.

PRICING

Validate only as much as you need.

Start free, then choose a monthly plan or extra validation credits when you want more.

PRICING

LITE

Launch deal

$9/mo

30 validations per month

STANDARD

Launch deal

$18/mo

Everything in Lite

PRO

Launch deal

$54/mo

Everything in Standard

FAQ

01Isn't this just a code-review exercise?

Code review starts from a diff someone wrote. Here the code looks done and passes its tests — the problem is buried under that. You're not commenting on a pull request; you're deciding whether code you're about to ship is actually safe. That's the call you make every day with AI output, and it's the one we score.

02Why not LeetCode or a take-home?

Those measure whether you can write the code. AI already writes it. The skill that's now scarce is catching when it's subtly wrong — so instead of asking you to produce an algorithm, we hand you working-looking AI code that's wrong somewhere and watch how you find it.

03Is this just a contrived puzzle?

No. Every failure mode is a real one from production — a missing idempotency key, a leaked field, a race under load — not a typo or a riddle. The code compiles, passes its tests, and reads cleanly. The only way through is to actually understand it, which is the whole point.

04How do you grade something with no single right answer?

We don't match against one answer. Each problem ships with its own criteria, and we evaluate the prompts you sent, your final code, and what you changed against them — then show the exact lines behind every judgment. You can check our reasoning, not just take a score.

05Can I use it for hiring?

Yes. The path the candidate took and the evidence behind every score stay on record, so anyone on your team can re-open a result and see exactly why it landed where it did.

AI wrote it. It runs.Would you ship it?

If you want to ship AI's code with confidence

Developers coding with AI

Vibe-coding beginners

Hiring teams rethinking assessment

Skeptical senior engineers

One problem, start to finish

Get a problem

Direct the AI to solve it

Verify, then submit

One full run, recorded as-is

We look at your judgment, not your speed.

This is the result you get

Behavior log summary

Search normalized

Tests added

Learn from this problem

What this doesn't measure

Validate only as much as you need.

LITE

STANDARD

PRO

AI wrote it. It runs.
Would you ship it?