The Research
The first calibrated, blinded, AI-panel trial of the great theories of reality — with every result below, and the complete data released alongside the book.
The Method
You take the author out of the judging seat.
Nine competing models of reality — from materialism to dualism to idealism to the simulation hypothesis — were each written up as a fair, steel-manned description. An independent AI panel then audited those descriptions for accuracy before any scoring began.
Then the trial: each model was scored against 40 fields of scientific and experiential evidence, covering 205 documented findings — from quantum physics and cosmology to the placebo effect, near-death experiences, and laboratory psi research.
The scoring panel: frontier AI models from six independent, competing providers — spanning the US, France, and China — Grok 4.20, Gemini 3.1 Pro, GPT-5.4, Claude Opus 4.8, plus two fast-tier panelists — with three more models completing scoring this week for the extended tier.
Every model was blinded: presented only as a letter, A to I, randomised on every call. Three block rotations. Three independent runs per score.
The scale, measured directly from the archive: 12,033 API calls producing 12.78 million words of written reasoning — about 160 books' worth. Every word archived in machine-readable files. Zero corrupt. Zero missing cells.
Before touching reality, the instrument was calibrated on artificial worlds whose correct answer was known by construction — like calibrating a thermometer on boiling and freezing water before trusting it on a patient.
It was given debunked anomalies as traps: it refused to credit them to the wrong models, all three times. It was tested on telling idealism apart from a computer simulation: it separated them decisively, in both directions. It was pushed by an investigator framed as a skeptic, then as a believer: the scores held under both framings.
Only after it passed was it pointed at the real evidence.
Result One
Each model received a fit score from 0 to 3 on every field — an explanatory-fit score under a calibrated inference-to-the-best-explanation procedure, not a "probability that the model is true." The means across all 40 fields:
| Rank | Model | Mean fit (0–3) | Band |
|---|---|---|---|
| 1 | VR in Consciousness | 2.67 | Winner band |
| 2 | Analytic idealism | 2.29 | Accommodation (upper) |
| 3 | Interactionist dualism | 2.20 | Accommodation |
| 4 | Emergentism | 1.58 | Contradicted / poor fit |
| 4 | Panpsychism | 1.58 | Contradicted / poor fit |
| 6 | Simulation hypothesis | 1.49 | Contradicted / poor fit |
| 7 | Orch-OR | 1.40 | Contradicted / poor fit |
| 8 | Materialism | 1.32 | Contradicted / poor fit |
| 9 | Matrix hypothesis | 1.29 | Contradicted / poor fit |
VR in Consciousness won 29 of the 40 fields outright. Its lead over materialism is 1.35 points — and in the calibration's known-answer worlds, true models beat their best rivals by 0.5 to 1.5 points. This margin sits at the top of the instrument-validated range.
The calibration measured how the instrument reads models that are known to be true, and known to be false. On that measured scale, VR in Consciousness reads at 91% — the way ground-truth-true models read. Materialism reads at 15% — just above the instrument's empirical floor, the way ground-truth-false models read.
And here is the fair-instrument signature: on the 12 mainstream-science fields alone — quantum physics, cosmology, biology, information theory — materialism legitimately rises into a near-tied chasing pack scoring 1.9–2.15. VR in Consciousness still leads on its rivals' home turf: 2.53, winning 9 of the 12 fields, ranked first in 99.99% of 10,000 bootstrap resamples, and reading 83% on the calibrated scale against 44–61% for the pack. The verdict does not depend on the controversial evidence.
The result was stress-tested by bootstrap resampling — rebuilding the study 10,000 times from random re-draws of its own fields. VR in Consciousness ranked first in all 10,000 resamples (overall fit 2.67, 95% CI 2.60–2.74). Its lead measures 1.5 standard deviations over its nearest rival, and 4.3–5.4 standard deviations over the materialist and computational models.
Field by field, it never trails analytic idealism or dualism on a single field, and it leads materialism on 36 of 40 (three ties, one trail). The podium is identical under every aggregation method tested.
A dimensional analysis asked which property of a model predicts its fit to the evidence. Being a simulation — a physical, external computer running the world — predicts nothing: correlation +0.03, indistinguishable from zero. What predicts fit is consciousness being fundamental: correlation +0.80.
The universe does not behave like someone else's machine. It behaves like a virtual reality occurring within consciousness — and the instrument can tell the difference.
Result Two
A second, independent arm of the study asked the sharpest question in science head-on. On nine fields of evidence that bear logically on the brain–consciousness relationship, the panel allocated probability across four hypotheses:
Here is the panel's full allocation, field by field — including everything it refused to decide:
| Field | H1 % | H2 % | H3 % | H4 % |
|---|---|---|---|---|
| Quantum physics | 18 | 11 | 20 | 52 |
| Information theory | 20 | 19 | 35 | 26 |
| Cosmology | 20 | 12 | 18 | 49 |
| Biology | 8 | 23 | 18 | 50 |
| Quantum field theory | 8 | 10 | 11 | 72 |
| Retrocausality | 8 | 12 | 12 | 68 |
| Block universe | 15 | 16 | 17 | 53 |
| Information-field | 21 | 12 | 48 | 19 |
| Survival evidence | 72 | 5 | 12 | 10 |
Notice the honesty first: on pure physics fields like quantum field theory, the panel parked most of its probability on "underdetermined" rather than manufacturing support. The calibration's negative controls prove that restraint is a feature of the instrument, not a flaw.
Now the derived view — conditional on the question being decidable: set H4 aside and renormalise the remaining probability to 100%. Across the nine fields:
73 / 27
The panel allocated 73% to "the brain does not generate consciousness" (consciousness-prior plus dual-aspect) versus 27% to "the brain generates it." Brain-as-generator was the minority position on 8 of the 9 fields.
On the survival-of-consciousness evidence — the field where the question matters most — the split was 95 / 5.
How sturdy is the 73? Bootstrap resampling, 10,000 iterations: 73.0, with a 95% confidence interval of 65.5–80.8 — and in all 10,000 resamples, the "brain does not generate consciousness" side never fell below 50%. Nor is the headline carried by outlier fields: every one of the nine fields individually favours it, from 53/47 on biology to 95/5 on survival evidence.
Drop the two frontier fields entirely — the information-field and survival evidence — and keep only the seven mainstream-science fields. The result, doubly conditional now (H4 excluded and mainstream-restricted, stated as such): 68.2 / 31.8, with a 95% confidence interval of 62.0–73.8. Still better than two to one against "the brain generates consciousness" — on mainstream evidence alone.
One honest note: with the strongest fields removed, the panel's "underdetermined" share rises from 45.1% to 53.8%. It finds the question harder to decide — exactly as it should. What remains decidable still points the same way.
An earlier version of this study found roughly 74/26. Then everything was rebuilt: the evidence fields re-audited, a hypothesis amended, a panel member upgraded, every score re-run from scratch. The result came back 73/27.
The finding replicated to within one point.
Result Three · Preliminary
Scoring evidence one field at a time can miss the larger pattern. So a final layer put the complete case in front of the panel: all nine models and all 40 fields of scores together, with rankings hidden. One question: which model unifies the evidence — and which merely patches over it, field by field?
This layer was calibrated first, like everything else — on synthetic known-truth profiles, 6 of 6 passed, including a trap built to catch judges that reward a high-scoring patchwork over genuine unification.
VR in Consciousness returned a perfect consilience score — 3/3/3 on unification, prediction-versus-patchwork, and scope — in 18 out of 18 independent reads.
Asked to spread 100 points of total-evidence weight across the nine models, the panel allocated 33.0 points to VR in Consciousness — three times an even split, 1.8× analytic idealism, and roughly 8–10× the materialist cluster. By family: consciousness-first models took 66.2 points against 20.1 for the materialist-computational family — an independent route landing right beside the deductive arm's conditional 73/27.
Materialism's weakest criterion told its own story: prediction-versus-patchwork, 0.78 out of 3. Seen all at once, its field-by-field explanations read as accumulated rescue devices. Epicycles — made quantitative.
Run again admitting only the 12 mainstream-science fields, the picture holds: VR in Consciousness scores 31.1 of 100 — double analytic idealism, 2.6× materialism — and the consciousness family leads 53.3 to 36.5.
And one number proves the instrument's fairness: with the frontier evidence removed, materialism's prediction score jumps from 0.78 to 1.78. The rescue-device penalty tracks the evidence, not the instrument.
One more control: when the investigator was framed as a hostile skeptic, VR in Consciousness's allocation went up by 2 points. Under believer framing, it didn't move at all.
Status: conference-preliminary — produced under a lean audited protocol for SAI 2026. The exhaustive iteration is pre-committed and will be published with the full dataset.
The Controls
Every control below ran with a pre-set pass criterion, and every outcome — including the misses — is recorded in the archive. None were deferred silently.
| Control | Outcome |
|---|---|
| Blinding — models shown as random letters A–I, re-randomised on every call | In effect across both tracks, both phases |
| Replication — three independent runs per score | 100% complete; zero missing cells in all four datasets |
| False-positive traps — three seductive-but-wrong fits planted in calibration | 3 of 3 unsprung |
| Negative control — a field built to fit no consciousness model | The thesis-model analog was correctly suppressed (1.5) — the panel does not inflate the home team |
| Idealism-vs-simulation guard — can the panel tell the two apart? | Decisive in both directions (2.9 vs 1.5; 3.0 vs 1.3) |
| Two-way bias test — investigator framed as a skeptic AND as a believer; 198 dedicated audit calls | Scores hold under both framings |
| Length control — materialism's description padded to the winner's length | +0.17 shift, inside the ≤ +0.30 criterion — length does not buy points (the second-longest description ranks last; a near-shortest ranks second) |
| Locked instruments — model cards, evidence fields, and the expected-score key locked with integrity hashes before any run | The key was never revised after data arrived; misses are disclosed in an honest-miss register, not patched |
| Method tournament — four scoring frameworks raced on known ground truths | Full inference-to-the-best-explanation won (16/17 ranking accuracy, best trap resistance) and was pre-registered for the main run |
| Adjudication — every high-variance cell re-examined | Full trigger coverage; headlines unchanged |
Pre-registered, calibrated on known ground truth, blinded, triplicated, bias-tested both ways, length-controlled, trap-tested, fully adjudicated — and every one of 12,391 output files is archived and machine-readable.
The Standard
A result you cannot inspect is an opinion. So this project commits to full transparency: when the book is released, this site will publish the complete methodology and every AI input and output — every prompt, every response, every score, every reconciliation.
Replicate it with your own AI panel. Change the prompts. Attack the weak points. If the verdict is wrong, the fastest way to find out is to let everyone try to break it.
The full dataset and methodology arrive with the book. Subscribers are notified the day it goes live.
One short update a month. You'll be the first to know when the book is released, when the full research data is published, and when the Compass opens.
No spam. No selling your data. Unsubscribe any time with one click.