top of page

Confidence is Cheap. Evidence is Architecture.

David Harvey
May 1
6 min read

We use large language models ourselves.

That matters. Because this is not an argument against LLMs. It is an argument about what they should — and should not — be asked to do.

Large language models are extraordinary. They can explain complexity in language people can actually use. They can synthesise vast context into clear reasoning. They can write, reason, translate, and adapt their tone across almost any domain. Inside SHE ZenAI powered by Omega*, we use LLMs for exactly these things.

That is exactly why the distinction matters.

The question is not whether language models are useful. They are. The question is whether they should be trusted to act as the memory, the evidence ledger, the audit trail, and the judge of truth — all at the same time.

Our answer is no. Not because LLMs are weak — but because strong systems need clear roles.

Voice is not evidence

An LLM is a powerful voice. It is very good at taking context and producing human-readable language. That matters enormously. A system that cannot explain itself clearly will struggle to be used, trusted, or adopted.

But voice is not evidence.

A confident answer is not an audit trail. A fluent explanation is not a verified record. A helpful tone is not proof. A good summary is not the same as memory.

This is why “just make the model more careful” is not the answer. The real question is architectural: what should the LLM be trusted to do, and what should be handled by a different layer?

What happens when the LLM is asked to do everything

When a single model carries the full weight of memory, evidence, reasoning, and narration, several problems become structural rather than occasional.

It may sound certain when the evidence is weak. It may imply continuity where no real memory exists. It may summarise a past decision without knowing whether the action was completed. It may say a risk has been resolved when no outcome has been recorded. It may explain a recommendation without being able to point to the source, receipt, test, or event that supports it.

That is not just a hallucination problem. It is an evidence problem.

For high-trust AI, the issue is not only whether the model can produce a good answer. The issue is whether the system can show what the answer is based on.

NIST’s AI Risk Management Framework defines trustworthy AI in terms of validity, reliability, accountability, transparency, explainability, and fairness. Those are system properties — not writing-style improvements. ISO/IEC 42001 takes the same view at the management-system level: responsible AI is not a model question. It is an operating-system question.

The research confirms it at scale

This architectural failure is not theoretical. Two major research bodies have now quantified it.

A March 2026 Harvard Business Review study by researchers from Esade, the University of Sydney, and NYU Stern tested seven leading LLMs across seven strategic trade-offs in thousands of simulations. Every model clustered toward the same buzzword-aligned answers regardless of context. Better prompting moved the bias by less than 2%. Richer context shifted it by only 11%. The researchers named the pattern “trendslop” — AI optimising for the positive emotional valence of words rather than evidence-grounded reasoning.

A September 2025 NBER working paper by Harvard, Duke, and OpenAI economists — analysing actual ChatGPT usage data from 700 million weekly users — found that Making Decisions and Solving Problems was the most common work activity across every occupation group studied, without exception. 49% of all messages were people seeking advice to inform consequential decisions. None of those interactions produced a verifiable evidence record.

Read the companion article: Confidence is Cheap. Evidence is Architecture.

Two systems. Two different safety postures.

The difference between an LLM-first and an evidence-first system is not subtle. It changes what the system can say — and stand behind.

An LLM-first system:

Prompt → Generated answer → Confidence by tone

An evidence-first system:

Decision → Evidence → Graph memory → Learning event → Persona guidance

The second pattern is harder to build. It is also much safer to operate.

What OmegaSense actually does

OmegaSenseKernel is the deterministic evidence substrate that sits beneath the language layer inside SHE ZenAI powered by Omega*. It can operate without an LLM call.

From structured system facts, it can read a Decision Thread, read a payment or receipt record, create evidence nodes, connect evidence to a decision, detect known gaps and tensions, record learning events, persist a snapshot, and prepare a Maya-ready report.

No model required. Same input, same output. Fully testable. Fully auditable.

That is a very different safety posture from asking a model to “remember” what happened.

Five problems the substrate solves

1 — Hallucinated continuity

A model can imply that something was previously proven because it sounds consistent with the conversation. A substrate can check whether a record exists. Did the receipt exist? Was it linked? Was the outcome recorded? Was a snapshot persisted? That is the difference between remembered language and recorded evidence.

2 — Fake certainty

LLMs can sound confident even when the proof is weak. OmegaSense separates confidence from evidence. Instead of “Launch readiness looks strong” — the system says: “Payment proof: evidenced. Repeatability proof: still a gap. Learning events recorded: 11. Snapshot persisted: yes.” That is a much more useful kind of confidence.

3 — Missing audit trail

A trustworthy system should not need to imagine its own audit trail. It should have one. The substrate records the nodes, edges, evidence references, learning events, and snapshots that explain why a report says what it says — for governance, due diligence, clinical workflows, financial decision support, and any environment where “because the model said so” is not good enough.

4 — Persona drift

Maya — the SHE ZenAI voice persona — can be warm, clear, and useful. But Maya should not be the source of truth. If the persona layer is allowed to freely invent continuity, the system becomes charming but unsafe. If Maya renders from substrate facts, she can be helpful without drifting beyond the evidence. The persona becomes more trustworthy, not less human.

5 — Compute waste

Not every task needs a large model. If a deterministic layer can check whether a receipt exists, whether an outcome has been recorded, or whether a known gap is present — using a large model for that task is unnecessary overhead. At 2.5 billion ChatGPT messages per day and growing, the energy cost of that overhead is no longer abstract.

AI without costing the Earth

The International Energy Agency estimates that data centres consumed approximately 415 TWh of electricity in 2024 — around 1.5% of global electricity consumption — with demand growing significantly faster than overall electricity use.

Efficient AI architecture is not only about cost. It is about using the right tool for the right job. Use deterministic systems where they are stronger. Use LLMs where language, synthesis, and reasoning support are genuinely needed.

That is one route toward AI without costing the Earth.

The architecture that matters

Inside SHE ZenAI powered by Omega*, each layer has a defined role:

The LLM becomes the voice.

OmegaSense becomes the evidence-bound memory.

Omega* becomes the decision system.

Ask Omega* becomes the user interface.

Maya becomes the persona that explains what the evidence shows.

The voice should not be the only memory. The memory should not invent evidence. The persona should not decide what is true. The decision system should not rely on style as proof.

Ask Omega* Decision Creation Flow — Exploded Microservices Architecture showing the 8-step CTO-level system design — The architecture in production: Intent → Context → Reasoning → Decision → Evidence. Every step deterministic. Every output auditable.

The real breakthrough

The real breakthrough is not that an AI system can answer. Many systems can answer.

The breakthrough is that the system can begin to say:

Here is what was decided. Here is the evidence. Here is what remains unresolved. Here is what changed. Here is what was learned. Here is what Maya can safely explain.

That is the beginning of adaptive intelligence. Not because the model sounds more confident. Because the system has evidence-bound memory.

The next generation of AI systems will not be judged only by how impressive their answers sound. They will be judged by whether they can prove what they know, remember what happened, learn from outcomes, and explain themselves without inventing the audit trail.

Confidence is cheap. Evidence is architecture.

Join the founding cohort

Ask Omega* is now open to a founding cohort of 100. We are looking for practitioners — clinicians, CTOs, analysts, and founders — who already know that confident AI is not the same as trustworthy AI. Five decisions. Five days. Under US$25 to find out if the architecture holds.

To register your interest, click Learn More at the top of this page. Fill in your details and we will be in touch within 24 hours.

Ask Omega* interface — Tap the Orb to speak your question. For clarity, certainty, and comfort in your decisions. — Ask Omega* — For clarity, certainty, and comfort in your decisions. Tap the Orb. Speak your question. Evidence-bound intelligence, ready when you are.

No lock-in. No performance. Just evidence.

References

[1] Romasanta, A., Thomas, L.D.W., & Levina, N. (2026). Researchers Asked LLMs for Strategic Advice. They Got “Trendslop” in Return. Harvard Business Review, March 2026.

[2] Chatterji, A., Cunningham, T., Deming, D.J. et al. (2025). How People Use ChatGPT. NBER Working Paper No. 34255, September 2025.

[3] National Institute of Standards and Technology (2023). AI Risk Management Framework (AI RMF 1.0). NIST, January 2023.

[4] International Energy Agency (2025). Electricity 2025: Analysis and Forecast to 2027. IEA, 2025.

Omega* Sensing is part of the Omega* Unified Ecosystem, developed by Design By Zen, an NZ-based AI Lab. Omega* is the algorithmic engine beneath the ecosystem. SHE ZenAI is the brand of a governed clinical intelligence framework designed for high-trust domains where evidence, not confidence, is the currency of care. Version 1.0, April 2026.

Recent Posts

NZ Government AI and Decision Governance: When Copilot Isn't Enough

NZ Government AI and Decision Governance: When Copilot Isn't Enough

The Demo Nobody Watched

The Demo Nobody Watched

Why More Data Doesn't Mean Better AI Decision Making — The Deficit Model Explained

Why More Data Doesn't Mean Better AI Decision Making — The Deficit Model Explained

Comments

bottom of page