AI Replaced My Research Time But Not My Judgement

I trusted it once without checking. It cost me two hours of public embarrassment.

I write technical articles on the side. Mostly about Laravel, AWS and the kind of infrastructure mistakes developers make quietly and fix without telling anyone. To write well about those things, I need accurate numbers. Pricing figures. Version release dates. Exact error messages. Details that are either right or wrong with no room in between.

Before AI tools, research for a single article took me a few hours. Five browser tabs open, cross-referencing AWS pricing pages with community threads with release notes, trying to piece together something accurate enough to publish without a correction in the comments a week later.

Now it takes twenty minutes. I ask, I get an answer, I verify the important parts and I move on. The workflow is genuinely faster and I will not pretend otherwise.

But a while back I skipped the verification step on one article. I was in a hurry and the answer came back confident and detailed. It had the right tone. It had numbers. It cited what sounded like a reasonable understanding of how the service worked.

It was wrong. Specifically, it told me that a particular AWS service cost a fixed monthly amount per availability zone. That was the old pricing model. The service had changed. I published the figure anyway. The correction came within forty-eight hours, from a reader who worked at AWS.

That was the last time I published a number I had not checked against the primary source.

The Problem With Confident Answers

The thing about AI tools is that confidence is part of the product. The answer does not hedge unless you push it to. It does not say “I think this was true in 2023 but you should probably check.” It says the number. It states the fact. It sounds like someone who knows.

This is useful most of the time. The confidence means you are not wading through a paragraph of qualifications just to get to the part that answers your question. But it means the responsibility for knowing when to trust it sits entirely with you.

The gap is not between right and wrong. It is between a confident answer and a verified one. Those are different things and mixing them up is where the problems start.

How I Judge AI Research Now

After the AWS incident I built a simple judgment process that runs every time I use AI output in something I publish. It takes a few minutes and has saved me from three more public corrections that I know of.

Step 1. Identify the claim type.

Not every claim carries the same risk. I sort what the AI told me into two buckets.

The first bucket is stable facts. How HTTP caching works. What a database index does. The difference between synchronous and asynchronous processing. These change slowly if at all and are documented across hundreds of sources. I trust these without a separate check.

The second bucket is volatile facts. Pricing numbers. Version release dates. Default behaviours introduced or changed in a recent update. API endpoints. Anything that a product team could change between last year and today. I verify everything in this bucket against the primary source before publishing.

Step 2. Ask how the AI would know this.

For any volatile claim, I ask myself: where would this information come from? If the answer is “a pricing page that gets updated” or “release notes from a recent version” then I go find that page myself. If the answer is “conceptual knowledge that has been stable for years” I usually move on.

Step 3. Check the primary source directly.

Not a blog post that references the primary source. Not a Stack Overflow answer that quotes it. The actual documentation page, pricing calculator or changelog. This takes two minutes. If the numbers match, I am confident. If they do not match, I rewrite before publishing.

Step 4. Note the date.

Anything that has a number attached to it gets a mental timestamp. AWS pricing, package versions, API rate limits. When I read something from a search result or an AI answer that is undated, I treat it as suspect until I find a dated primary source.

That is the whole process. Four steps, a few minutes. It is not complicated but it requires actually doing it every time, which is the part that is easy to skip when you are in a hurry.

The Fact-Check Skill I Put in Every Prompt

After doing this manually for a few months I built it into how I prompt. I have a reusable block of instructions that I include whenever I am asking AI to help with research for something I will publish. I call it my fact-check skill.

Here is how it reads:

FACT-CHECK SKILL

When providing factual claims in your response, apply the following rules:
1. Clearly separate stable facts from volatile facts.
   - Stable: Concepts, architecture patterns, general behaviour that has not changed in years.
   - Volatile: Prices, version numbers, rate limits, default values, recent feature changes.
2. For every volatile fact, state your confidence level and the likely source:
   - HIGH: Directly documented in stable official docs, unlikely to have changed.
   - MEDIUM: From official docs but in a section that changes with releases.
   - LOW: From my training data, likely accurate but should be verified against live source.
3. For LOW-confidence volatile facts, append a verification note:
   - "Verify against: [specific page or documentation section]"
4. Never present a volatile fact without a confidence label.
5. If you are uncertain whether a fact is stable or volatile, treat it as volatile.

I paste this block at the top of my prompt before the actual question. It changes how the response is structured. Instead of getting a paragraph of confident statements I cannot easily audit, I get facts labelled by their risk level with verification pointers attached.

How I Built It

The skill started as a single line: “tell me which facts I should verify before publishing.” That helped a little but the output was vague. It would say things like “double-check the pricing” without telling me where to look or how confident it was in the first place.

I iterated on it over about ten prompting sessions. Each time I noticed a gap I added a rule. The confidence levels came from realising that “should verify” was not specific enough. I needed to know whether something was probably fine or definitely risky. The verification note came from getting tired of knowing I needed to check something but not knowing where to start. The rule about treating uncertain items as volatile came from a session where the AI hedged on whether a default value had changed, and I had to dig for twenty minutes before finding the answer.

The current version took four or five iterations to reach. Each iteration fixed one failure mode from the previous session.

It is not a sophisticated piece of engineering. It is just a set of constraints that make the output easier to audit. The harder part was noticing which constraints were missing.

A Real Use Case

Here is a recent example. I had a system architecture document attached — a PDF covering an existing monolith that needed to be broken into microservices. I wanted to research best practices for service boundaries, database distribution patterns and running cost implications before forming my own opinion on the design.

The attachment is important context. It changes what “volatile” means. A general statement about microservices patterns is stable. A claim about what a specific database engine costs to run at a given replication factor is volatile. The fact-check skill needs to apply to both.

My prompt looked like this:

FACT-CHECK SKILL

When providing factual claims in your response, apply the following rules:
1. Clearly separate stable facts from volatile facts.
- Stable: Concepts, architecture patterns, general behaviour that has not changed in years.
- Volatile: Prices, version numbers, rate limits, default values, recent feature changes.
2. For every volatile fact, state your confidence level and the likely source:
- HIGH: Directly documented in stable official docs, unlikely to have changed.
- MEDIUM: From official docs but in a section that changes with releases.
- LOW: From my training data, likely accurate but should be verified against live source.
3. For LOW-confidence volatile facts, append a verification note:
- "Verify against: [specific page or documentation section]"
4. Never present a volatile fact without a confidence label.
5. If you are uncertain whether a fact is stable or volatile, treat it as volatile.
---
[attachment: architecture-doc.pdf]
Based on the attached system architecture document, research the following:
1. Best practices for defining service boundaries in this kind of monolith-to-microservices split
2. Database distribution patterns suitable for the data relationships described in the document
3. Running cost implications: what gets more expensive when you distribute a single database across services, and what concrete cost drivers should I verify before committing to a pattern
Apply the fact-check skill to every claim. Flag anything derived from the attached document separately from general knowledge.

The response came back in three sections matching the three questions. The pattern claims came back marked HIGH — things like bounded context principles and the database-per-service pattern are well-documented and stable. The cost claims came back MEDIUM and LOW with verification pointers attached. It looked roughly like this:

Bounded context as the primary boundary heuristic: [HIGH] Well-established in domain-driven design literature, stable across sources.
Database-per-service pattern to avoid shared schema coupling: [HIGH] Standard recommendation, consistent across major architecture references.
Cross-service query cost via API calls vs direct joins: [MEDIUM] Pattern is stable but latency and cost figures vary by infrastructure. Verify against your cloud provider’s inter-service network pricing.
Managed database instance cost per service (vs single shared instance): [LOW] Highly dependent on engine, region and instance tier. Verify against: cloud provider pricing pages for your chosen database engine. Current figures from my training data may be outdated.
Read replica cost for distributed read-heavy services: [LOW] Pricing changes with provider updates. Verify against live pricing before using in any cost estimate.

The split between HIGH and LOW told me exactly where to spend my verification time. The conceptual sections I read and moved on. The cost sections I opened three separate pricing pages for and checked line by line.

Without the skill, the response would have looked uniform. Confident statements across the board with no signal about which ones I should trust and which ones I should open a browser tab for. I would have either verified everything (slow) or verified nothing (risky).

What the skill bought me was triage. Not the facts themselves but a clear list of which facts were safe to use and which ones needed a second source.

What Did Not Change

The research time is shorter. The verification is structured. The output is cleaner.

But the judgment about what matters, what to publish and what to push back on is still mine. The skill does not decide whether a claim is important enough to verify. It does not know that a pricing error in a technical article will get caught by a reader who works at the company. It does not know that getting it wrong in public has a cost that is different from getting it wrong in a private note.

That context comes from experience and it does not transfer into a prompt.

The practical version of this is simple. Use AI for the parts of your work that require time but not expertise. Summarising, structuring, first drafts, background context you need in order to ask better questions. Those are the places where the speed gain is real and the risk is low.

For anything your readers could check, check it yourself first. Not because the answer will definitely be wrong. Because you will not know it is wrong until after you have published.

The research time is gone. The accountability is not. That is the deal and it is worth knowing the terms before you take it.