The Statistics the CIA Chose Not to Announce

CIA Headquarters, Langley, Virginia — CIA headquarters at Langley, Virginia, home of the program the 1995 AIR review was commissioned to evaluate.

The official story of Project Stargate ends in 1995 with a CIA review that determined remote viewing had not been proven to work. That summary is accurate as far as it goes. It leaves out the finding that made the recommendation complicated enough that the two lead reviewers could not agree on what it meant.

Jessica Utts was a statistician at UC Davis. Ray Hyman was a psychologist at the University of Oregon. They were chosen because they represented genuinely different orientations: Utts thought the laboratory evidence for psi phenomena was credible, Hyman thought it was not.

The AIR Report: Who Commissioned It and Why

The review was commissioned by Congress through the CIA. The American Institutes for Research (AIR), a behavioral science research organization in Washington, D.C., was contracted to conduct the evaluation. AIR's job was not to study psi phenomena directly. It was to assess whether the existing body of research justified continued government funding. This is an important distinction. The reviewers were not asked "is remote viewing real?" They were asked "has the program produced intelligence value, and is continued investment warranted?"

AIR assembled a team that included the two primary evaluators, Utts and Hyman, along with additional staff who reviewed the operational history of the program. The review examined both the laboratory research conducted at SRI International and the operational record of the program under its various codenames. The Congressional Research Service had also prepared a separate, shorter assessment that reached broadly similar conclusions: the evidence was mixed, the methodology had improved over time, and the question of whether the effect was real remained open.

The structure of the AIR report effectively guaranteed a split conclusion. By choosing one evaluator who had published favorably on psi research and one who had published critically, the commissioning body ensured that the review would present both perspectives. Whether this was intellectual rigor or political cover depends on your reading of the situation. It gave the CIA the ability to cite whichever conclusion supported the decision it was already inclined to make.

What the Numbers Said: Utts's Findings

Utts reviewed the statistical record of the laboratory experiments and found that some subjects had scored between 5 and 15 percent above chance across a large enough number of trials to be statistically significant. To non-statisticians, 5 to 15 percent sounds like almost nothing. In controlled experimental conditions with randomized targets and blinded evaluation, it is not nothing.

Utts's analysis was built on meta-analytic methods, the same statistical techniques used to evaluate the cumulative evidence for pharmaceutical drugs, educational interventions, and other areas where individual studies show small effects. She examined the body of experiments conducted at SRI and later at Science Applications International Corporation (SAIC), focusing on the studies with the strongest methodological controls. Her findings were specific: the overall hit rate across the best-controlled experiments was approximately 34 percent, where chance would predict 25 percent on a four-choice task. The associated p-values were extremely small, on the order of 10 to the negative 12, meaning the probability that the results were due to chance alone was vanishingly low.

The effect size Utts reported was modest but consistent. She compared it to effect sizes in accepted medical research and pointed out that many pharmaceutical treatments approved by the FDA are based on effect sizes smaller than what the remote viewing data showed. Aspirin's effect on heart attack prevention, for instance, was established with an effect size smaller than the one Utts found in the psi data. This comparison was not intended to prove that remote viewing works. It was intended to demonstrate that the statistical standard being applied to psi research was different from the standard applied to other fields, and that the data, judged by the same criteria used everywhere else in science, showed something.

Hyman's Objections

Hyman agreed with Utts on the finding itself: something in the data exceeded chance expectations. Where they diverged was on what that meant. Hyman argued the experiments had methodological flaws significant enough to explain the results. Utts countered that even in the most methodologically rigorous subset of the experiments, the above-chance results held.

Hyman's objections were specific and worth understanding in detail. He raised concerns about randomization procedures, arguing that some of the target selection methods may not have been truly random. He questioned whether the judging process, in which independent evaluators matched viewer descriptions to potential targets, was sufficiently blinded. He noted that in some experimental series, the same subjects appeared repeatedly, raising the possibility that statistical anomalies were being driven by a small number of individuals rather than reflecting a general phenomenon.

Perhaps most significantly, Hyman argued that the improvements in methodology over time made it difficult to pool early and late experiments into a single meta-analysis. He suggested that the strongest results came from the earlier, less well-controlled studies, and that as the experimental controls tightened, the effect sizes shrank. Utts disputed this characterization, pointing to the SAIC experiments from the early 1990s, which she considered the most methodologically rigorous in the entire archive, and which still showed statistically significant above-chance results.

The disagreement between Utts and Hyman was not about whether the data showed an anomaly. Both agreed it did. The disagreement was about whether the anomaly was best explained by a real perceptual phenomenon or by subtle methodological artifacts that had not been fully eliminated. This is a meaningful distinction, and it is the same kind of disagreement that occurs in many areas of science where small effects are being measured in noisy conditions.

What Got Left Out of the Press Release

The CIA's public summary emphasized Hyman's conclusion. Utts' conclusion — that the statistical evidence for an anomalous effect was real and deserved further rigorous investigation — appeared in the same official report and was not emphasized. It was not the only thing left out of the public narrative: the program's most productive viewer, Joseph McMoneagle, had received a Legion of Merit for his contributions — a fact the termination announcement did not address. She was not claiming spies can read minds. She was saying the data did not look like random noise and a serious scientist should want to understand why.

"It is clear to this author that anomalous cognition is possible and has been demonstrated. This conclusion is not based on belief, but on conventional scientific standards."

— Jessica Utts, UC Davis, 1995 AIR Review.

That statement appears in the official CIA-commissioned report. Most news coverage of the 1995 termination never mentioned it.

The Effect Sizes Are Real. Train Them. Utts found statistically significant remote viewing performance. Psionic Training applies the same methodology with AI-guided sessions and real-time scoring.

Start Free →

Why the CIA Chose Hyman Over Utts

The CIA's decision to foreground Hyman's skeptical assessment and downplay Utts's statistical conclusions was, in retrospect, as much a political decision as a scientific one. By 1995, the Cold War was over. The Soviet threat that had justified the program for two decades no longer existed. The intelligence community was undergoing budget cuts and reorganization. Stargate had always been a bureaucratic orphan, too strange to champion publicly and too inexpensive to kill for budget reasons alone. What the CIA needed was a defensible reason to end the program, and Hyman's dissent provided one.

There is a second factor worth considering. The CIA's statement emphasized that remote viewing had not been demonstrated to have operational intelligence value, even if a laboratory effect existed. This was Hyman's position: that a small statistical anomaly in a lab setting did not translate into a reliable intelligence tool. The distinction between "statistically real" and "operationally useful" is legitimate, and it gave the CIA a way to acknowledge the data without committing to its implications. You can shut down a program because it does not produce useful intelligence even if the underlying phenomenon is genuine.

The Congressional Research Service assessment, which received even less public attention than Utts's findings, had been somewhat more measured than either the CIA's press statement or the news coverage that followed it. The CRS noted that the experimental evidence was stronger than critics typically acknowledged and weaker than proponents typically claimed, and that the question of whether the effect was real remained scientifically open. This middle-ground position was not useful to anyone with a decision to make, and so it was largely ignored.

Context: These Effect Sizes Compared to Medicine

The comparison between remote viewing effect sizes and accepted medical research deserves more attention than it usually gets. The effect size that Utts reported for the best-controlled remote viewing experiments is in the range of 0.11 to 0.25 (Cohen's d), depending on how the analysis is structured. For context, the effect size of aspirin on cardiovascular mortality, which is considered a well-established medical finding, is approximately 0.03. The effect size of many commonly prescribed antidepressants compared to placebo is in the range of 0.15 to 0.30.

This does not mean that remote viewing is as well-established as the efficacy of aspirin. The volume of research, the number of independent replications, and the theoretical framework supporting the two findings are not comparable. What it does mean is that the statistical bar being applied to remote viewing research is considerably higher than the bar applied to other fields. A pharmaceutical company presenting the same effect size and p-values to the FDA would have a reasonable case for approval. A psi researcher presenting the same numbers to a psychology journal faces rejection on the grounds that the effect is implausible.

Utts made this argument explicitly and has continued to make it in the decades since the AIR review. Her position is not that the comparison proves remote viewing works. Her position is that the scientific community applies a double standard: effects of similar or smaller magnitude are accepted in other fields without controversy, while effects in psi research are rejected on prior grounds. Whether this double standard is justified depends on how much weight you give to theoretical plausibility versus empirical data, and that is a philosophical question masquerading as a scientific one.

What It Means for Practice

If an anomalous effect exists, even a small and inconsistent one, then tracking your own session accuracy over many trials is a meaningful thing to do. You cannot evaluate a 5% above-chance effect in five sessions. You can see it emerge over time across hundreds. The argument for keeping a record of your practice is the same argument the statistician made: if there is a signal in the noise, more data is how you find it.

The Psionic Training training app open on a laptop, showing its three AI instructors

Track your accuracy over time

PsionicAssist scores every session against verified targets and builds your personal accuracy record. The statistics only become meaningful across many trials. Start yours.

Begin Training →