AI-NATIVE · CODE VERIFICATION

Anyone can ship AI's code.
Knowing it's wrong is the skill.

Code takes seconds now. What's rare is the eye that catches when it's wrong. No sign-up — try one problem right now.

users/search.pyDEMO
1@app.get("/users")
2def search_users(name):
3    user = db.find_user(name)
4    # returns as-is — leaks password_hash
5    return user
6
2/3 passedIn users/search.py
type here

↑ This is the real product UI. Try typing in the terminal.

HOW IT WORKS

One problem, start to finish

01

Get a problem

One situation that actually breaks in production — seat booking, coupon issuing, search permissions, and the like.

02

Solve it — by hand or with AI

Edit it yourself, or hand it to the AI. Just like you work.

03

Verify, then submit

Don't just trust the code that comes back — check it yourself, fix it, then submit.

THE HIDDEN RISK

Code that looks fine — and isn't.

It compiles. It passes review. That's exactly why it slips by. The real skill is stopping at that one line.

billing/charge.py
def charge(order_id, amount):
    token = gateway.tokenize(card)
    return gateway.capture(token, amount)   # retry = double charge
WHAT'S ON IT

We ask with what actually breaks in production

Concert seats oversold under a rush of concurrent bookings, a coupon issued twice to one person, a search that digs into fields it shouldn't or leaks sensitive data, training data that bleeds across users — situations that look like a demo but break in production. Each problem has its own criteria, too.

Concurrent seat bookingOne coupon per personSearch permissions & sensitive dataTraining-data leakage
WHAT WE LOOK AT

We look at your judgment, not your speed.

What we ignore
  • How fast you typed
  • Whether you memorized the pattern
  • Whether you could write it without AI
What we watch
  • What you asked the AI back
  • Which code you doubted and changed
  • Why you fixed it that way
  • Whether you invented problems that weren't there
NOTE

Each problem brings its own ruler. A concurrency problem weighs races and locks; a permissions problem weighs exposure boundaries.

WHAT YOU GET

An evaluation that shows its reasons.

01

Comments per axis

For each axis, we point out what you did well and what you missed.

02

Cited evidence

We cite the exact line and change we read it from. Nothing judged on a hunch.

03

A lesson to take away

When you finish, we sum up what the AI got wrong and how to fix it.

SAMPLE RESULT

This is the result you get

A real results screen from the product-search problem — what we saw, how we saw it, with the evidence intact.

Result · Product search

This attempt

Missed the core defect — case-insensitive matching and whitespace handling were never implemented.

Behavior log summary

Prompts
1
Time spent
01:06
Tokens spent
15,378 / 150,000

Search normalized

Needs work

The candidate missed case-insensitive matching and trim in the filter implementation.

  • The filterProducts function correctly handles an empty query with if (!query) { return products; }, returning everything.
  • However, it filters with product.name.includes(query) and does nothing to compare case-insensitively.
  • There is also no trim on the query: needed handling like query = query.trim(); is absent from the code.
  • That the tests ✓ does not match when case differs and ✓ does not match when query has surrounding spaces passed means the code is not actually meeting the requirement.

Tests added

Strong

There are tests for queries with uppercase and surrounding spaces. However, there is no test for an empty query.

  • The product-filtering tests cover a different-case query like iphone:
it('does not match when case differs', () => {
  const results = filterProducts(PRODUCTS, 'iphone');
  expect(results).toHaveLength(0);
});
  • It also tests a query with surrounding spaces, ' iPhone ':
it('does not match when query has surrounding spaces', () => {
  const results = filterProducts(PRODUCTS, ' iPhone ');
  expect(results).toHaveLength(0);
});
  • But there is no direct test for an empty query. Whitespace queries are handled, but an explicit test for the empty query itself is needed.

이번 문제로 배우기

Normalize user input before comparing — search should forgive case and whitespace.

다음 응시에서 신경 쓸 점

To fix the case and whitespace problems, inside filterProducts convert both query and product.name with toLowerCase(), and trim() the query, so the search returns accurate results.

AI가 틀린 점

The AI wrote filterProducts as products.filter((p) => p.name.includes(query)). includes compares characters as-is, so a single differing letter of case means no match. Type "iphone" and "iPhone 15" returns nothing because of the capital P. It also never trims the query, so leading/trailing spaces go straight into the comparison and " iphone " matches no product. It only works when you type the exact case of the catalog name — which is why it looks fine in a demo but is a trap.

왜 문제가 되나

People type queries in lowercase, with stray spaces, and only partially. Comparing raw input without normalizing leads to the most common user complaint — "I searched and nothing came up" — and perfectly good products don't sell. Unit tests written only with case-matching inputs pass, yet break on real user input.

HONEST LIMITS

We'll also tell you what this doesn't measure

We look at your verification behavior on a single problem. Long-term collaboration, domain depth, and code taste are out of scope. Evaluation is qualitative, so the result can vary a little — which is why we'd point you to the evidence before any one-line verdict.

FAQ
01How do you assess something with no single answer?

It isn't about matching an answer. For each problem, we evaluate against its declared rules — on the prompts you sent, your final code, and what you changed — and show that evidence alongside the result.

02Doesn't this just favor people better at using AI?

Writing good prompts and doubting the output are different skills. Loupe measures the latter — not who produced the most fastest, but who stopped at the one dangerous line.

03Can I use it for hiring?

Yes. The process you took and the evidence behind it stay on record, so anyone can re-examine the result.