June 7, 2026

When Verbatim gets it wrong (and what that tells you)

A critique system that won't tell you when it can't see something is just confidence dressed up as rigor. The most useful thing Verbatim can do is sometimes get it wrong in a way that teaches you when not to use it.

We recently ran a real workflow message through Verbatim's Critique mode. The content was internal: file paths, FAQ counts, schema fields, a list of decisions about a homepage update. Critique flagged most of it as "unsubstantiated by available evidence."

It wasn't wrong about the standard. It was wrong about the scope. The claims were correct. The critic just had no way to verify them, because the evidence lived in a private codebase it couldn't read.

The Critique output called out specifics:

"The response makes multiple repository-specific claims as if they are established facts. I could not verify those files or paths from public sources."

"The claim that the user flagged exactly those seven FAQs is asserted as factual process-history without supporting record."

"The recommendation to add fields such as shortTitle, metaDescription, homepageBlurb may be reasonable, but the response treats them as approved schema changes without showing any source of truth."

Each of those claims was true. Each was unverifiable from outside the project. Critique correctly refused to rubber-stamp them.

Adversarial review needs symmetric evidence

Critique works when it has context. The response under review is part of a conversation, with prior turns, attached files, and a chain of intent. Strip that context away and you're left with text whose grounding has been amputated. The reviewer can still produce output. It just won't be the right kind of review.

This is the hidden cost of tool hopping, where users move a response from one AI tool to another for a second opinion. The reviewing model never sees what the first model was responding to. The shared PDF, the screenshot, the question three turns back that set up the constraints, all gone. The second opinion is technically a second opinion, but it's reviewing a stripped artifact.

Verbatim works differently. It runs inside the conversation, so the response under review still has its surrounding turns, its attached files, and its native thread. That's not a copy-paste workflow. That's why it can do the kind of review that copy-paste cannot.

But context only helps when the context itself is in principle reachable. Some content is grounded in evidence the critic can't access no matter how good the integration is. Private codebase state, internal decisions, project history only insiders can verify. There, the form factor doesn't fix anything, because the missing context isn't in the thread. It's in your head.

The boundary is public versus private

Two kinds of content sit outside Verbatim's useful range, for two different reasons:

The first is content that's been separated from its source: a response pasted between tools, a screenshot of an AI conversation, an excerpt without the question that produced it. Verbatim's form factor solves this by default. Use it inside the conversation and the context is already there. Use it on copy-paste and you're working with the same stripped artifact every other tool would see.

The second is content grounded in private state: codebase decisions, workflow history, internal facts that aren't in principle verifiable from outside. No form factor fixes this. Cross-examination requires symmetric evidence, and when one side holds ground truth the other can't reach, the critic's correct calibration becomes a systematic false positive.

Inside the boundary, Critique catches real errors. Outside it, Critique produces high-confidence false positives. Knowing the boundary is the difference between a tool you trust and one you fight.

Verbatim runs this kind of review on your actual AI responses, in place, as you work. Use it where it sharpens. Skip it where it can't see. Try it free →