How to read a GWAS study, effect sizes, p-values, and what they actually mean for you
If you've ever clicked through to a paper from a genetic variant page, on Expressive, SNPedia, ClinVar, or anywhere else, there's a high chance the paper was a GWAS. Genome-wide association studies are the dominant tool for finding common genetic variants that influence common traits and diseases. If you've ever wondered how to read a GWAS study without taking the headline at face value, this is the genome wide association study explained from the ground up.
Most coverage of GWAS findings in consumer media is bad. The findings get reported with confident-sounding language ("variant X linked to Y!") that obscures the actual effect sizes, the sample populations, and the certainty of the finding. A reader who wants to evaluate a claim about their own genome needs to look past the headline and check the actual numbers.
This post walks through how to do that. Once you can read a GWAS critically, you can tell the difference between a finding that's robust and one that's preliminary, which is the difference between something worth changing your behavior over and something to file under "interesting research signal."
What a GWAS actually does
A GWAS works like this:
- Recruit a large cohort of people, typically tens of thousands, sometimes millions.
- Genotype each person at hundreds of thousands of SNP positions.
- Measure or record some trait, disease diagnosis, height, blood pressure, cholesterol levels, whatever.
- For each SNP, test whether one allele appears more often in people with the trait.
- Report the SNPs whose allele distribution differs between case and control groups beyond what chance would predict.
The statistical machine is regression: at each SNP, you fit a model predicting the trait from the allele dose (0, 1, or 2 copies of one allele), adjusted for covariates like age, sex, and ancestry. The output is a coefficient (the effect size) and a p-value.
That's the entire shape. Everything else is detail. Most published hits are deposited in the NHGRI-EBI GWAS Catalog, which is the canonical public registry of associations and the place to start if you want to see a finding in context rather than through a press release. The catalog FAQ is also the cleanest short explainer for the inclusion thresholds and the curation process, the GWAS catalog explained by the people who maintain it.
What to look at when reading a GWAS hit
When you see a GWAS finding cited, the numbers that matter are:
Sample size. Bigger is more reliable, basically without exception. A GWAS of 200,000 people finding a hit is much more trustworthy than a GWAS of 2,000 people finding a hit. The 2,000-person hit might disappear in a larger study. The 200,000-person hit is much more likely to survive.
P-value. The standard threshold for "genome-wide significant" is p < 5x10^-8. That sounds dramatic. It's actually conservative, necessary because GWAS tests so many SNPs simultaneously that even rare coincidences become common. A p-value of 5x10^-8 in a GWAS context is roughly equivalent to a p-value of 0.05 in a single-hypothesis test, after multiple-testing correction. Anything weaker than 5x10^-8 is "suggestive" and shouldn't be taken as confirmed. When you ask what does statistically significant mean in a GWAS, this is the entire answer: it cleared the multiple-testing bar, nothing more.
Effect size. This is the number that gets buried but matters most, and it is the heart of what is effect size in genetics as a working concept. It usually appears as:
- Odds ratio (OR) for binary outcomes (disease yes/no). OR = 1.0 means no effect. OR = 1.05 means "5% higher odds." OR = 2.0 means "twice the odds." OR = 5.0 is "five times the odds."
- Beta coefficient for continuous outcomes (height, BMI, lipid levels). The interpretation depends on the trait's units.
For most common GWAS hits, effect sizes are tiny. An OR of 1.05-1.20 is typical. That sounds non-trivial but actually means: out of every 100 people with the risk allele, maybe 5-20 more have the trait than would otherwise. For any one person, the absolute risk shift is small.
Replication. Has this finding been seen in independent cohorts? Studies that include explicit replication in a second dataset are far more reliable. A finding that's appeared in one study, even a large one, is preliminary until it shows up again.
Population studied. Most published GWAS used to be entirely European-ancestry cohorts. This has slowly shifted. But it means that for many published variants, the effect sizes specifically measured were in Europeans. Effect sizes in other populations can be similar, or smaller, or absent. Always check.
How to use this information for yourself
When you look up a variant, say rs9939609 in the FTO gene, the famous "obesity gene" SNP, the standard story is "OR ~1.20 for obesity." That's true. It's also nearly meaningless for an individual.
Here's why. The lifetime prevalence of obesity in the relevant Western populations is roughly 40%. An OR of 1.20 shifts that to roughly 45% for a homozygous risk-allele carrier. Compare that to the effect of, say, regularly drinking sugar-sweetened beverages, or having an inactive lifestyle, those factors have effect sizes that dwarf rs9939609.
The honest reading: rs9939609 is real (large studies, multiple replications, established mechanism). It tilts your probability slightly. It is not why you're overweight, if you are. It is not a determining factor. It is one of dozens of small genetic contributors that, in aggregate, explain a small fraction of population variance in BMI, and a much smaller fraction of any individual person's BMI.
This is the consistent pattern for common-variant GWAS hits. They're real but small. They add up to explain a portion of population-level variation. They almost never determine outcome for any one person. This aggregation is, in fact, the whole motivation behind polygenic risk score explained as a concept: combine hundreds or thousands of small-effect variants into a single weighted score, and you get a number that tracks population-level risk better than any single SNP does, while still being weak for predicting any one individual's outcome.
When a GWAS finding is actually clinically meaningful
A small number of common variants have effects large enough to matter clinically. The standard examples:
- APOE ÃÂõ4 for Alzheimer's: OR of 3-15 depending on heterozygous vs homozygous. Large enough to inform screening timing. Our APOE post walks through it.
- HLA-B*57:01 for abacavir hypersensitivity: large enough that pre-prescription testing is standard. Genotype dictates whether the drug is safe.
- CYP2C19 \*2 for clopidogrel response: poor metabolizers get less benefit. FDA-labeled.
- Factor V Leiden for venous thromboembolism risk: meaningful for hormonal contraceptive and pregnancy-related VTE risk.
- BRCA1/2 pathogenic variants (technically rare, not common GWAS): large enough to warrant surveillance and prophylactic interventions.
For nearly everything else, the long tail of GWAS hits for BMI, height, intelligence, personality, sleep duration, blood pressure, etc., the effect sizes are small, the clinical actionability is low, and any service that confidently tells you to change behavior based on these variants is overinterpreting.
What to skip
When you read a GWAS-based claim, a few patterns are red flags:
- No effect size reported. "Variant X is associated with Y" without an OR or beta means the writer doesn't want you to evaluate the actual size.
- "Significant" without context. A p-value below 5x10^-8 in a GWAS is just the entry criterion. It's not evidence of clinical importance.
- No mention of sample size. Small studies often don't replicate. The sample size should be the first number you check.
- No mention of population. A finding in 4,000 South Asian patients doesn't necessarily transfer to a 70-year-old European-ancestry reader.
- Causal language. Most GWAS hits don't establish causality. They show statistical association. The mechanism, does this variant change a protein function, regulatory activity, splicing, is usually a separate question.
A trustworthy summary acknowledges all of this. A bad one elides it.
How Expressive surfaces this
When you look at a variant page on Expressive for any well-studied variant, the page surfaces:
- The trait associations, with the strongest cited evidence
- The effect size (odds ratio or beta coefficient where available)
- The sample size of the supporting studies
- The replication status across studies
- Population context where it's known to vary
We do this because we think the right way for a non-specialist to make decisions about their genome is to see the underlying evidence quality, not just the conclusion. If a finding is preliminary, we want to say so. If a finding is robust, we want to say so. The numbers matter.
If you've got your raw genetic file from 23andMe, AncestryDNA, MyHeritage, or anywhere else, you can upload it to Expressive and see what the research actually says about the 600,000+ variants in it, with the evidence quality explicit on every claim. We don't prescribe, we describe: your genome stays yours, and the citations stay visible.
Or you can do it the hard way, look up each variant on dbSNP for the variant record and PubMed for the underlying papers. Either approach beats taking confident-sounding claims at face value.
Want updates when we ship new variant pages or a research deep-dive? Read the latest issue or get notified about early access.