EditorialResearch anchoringMethods

How we anchor every recommendation in real research

A consumer genetic-health product can choose between two failure modes.

One: the platform invents recommendations the literature doesn't actually support, because empty output looks bad. The user gets a confident-sounding action plan; the underlying citations don't hold up.

Two: the platform refuses to produce recommendations when the evidence is thin, even though many users would prefer "something to do" over "nothing concrete." The user gets fewer bullets; the bullets they get are real.

We chose the second. Here's what that means in practice, and why evidence-based genomics has to mean more than a logo on a marketing page.

The citation rules

Every recommendation in the Expressive corpus has to anchor on one of four kinds of evidence, all curated upstream:

  1. A PMID, a peer-reviewed study from PubMed that appears in the variant's anchored article corpus. The article body has to mention the rsid in prose (not just in a supplementary table), and the article has to be fetch_status='complete' with real abstract or full text. Methodology-wrap papers that cite the variant only in supplementary tables are filtered out.
  2. A PharmGKB clinical annotation, these are curated by hand at PharmGKB, scored on a four-level evidence ladder (1A is highest), and tied to specific gene-drug interactions. A pharmacogenomic recommendation anchored on a PharmGKB 1A annotation gets strength: high. 1B is moderate. We don't surface 2-or-below to users without a clinical context flag.
  3. A GWAS Catalogue row, the structured association data the NHGRI-EBI GWAS Catalog curates from every genome-wide association study. Each row carries the trait, the p-value, the odds ratio, the sample size, the population, and the source PMID. These rows are primary evidence even when the cited paper's prose doesn't name the variant (which it often doesn't, consortium GWAS papers reference variants by supplementary table). If you've ever wanted GWAS catalog explained in one paragraph, that's it: structured rows tying a risk allele to a trait, with a p-value and an effect size you can actually read.
  4. A ClinVar entry, submitted clinical variant interpretations with explicit review status. We surface Pathogenic / Likely Pathogenic entries with expert-panel or multi-submitter review; we ignore Conflicting and Uncertain unless there's separate trait-level evidence. ClinVar variant interpretation is the closest thing genomics has to a public ledger of clinical judgement, and we treat it accordingly.

A recommendation that can't anchor on at least one of these doesn't get persisted. The model running our generator (Anthropic's Haiku 4.5) is given the four-kind citation rule inline in the prompt: empty array is an invalid citation set.

The "refuse to invent" rule

The biggest engineering challenge wasn't getting the model to produce recommendations. It was getting it to not produce recommendations.

Language models are reward-shaped to be helpful. When you hand one a variant and ask for a lifestyle recommendation, the default behavior is to give you one, whether or not the evidence is there. Left unchecked, a Phe508del CFTR variant (which causes cystic fibrosis, a serious Mendelian disease) might come back with "consider increasing leafy green intake."

The prompt rule we built around this is explicit: output zero recommendations rather than invent any. The model is told that the correct output for most variants is the empty array. It's told what counts as "evidence that doesn't anchor a concrete action", Mendelian disease findings, pure population-genetics markers, GWAS hits below genome-wide significance, structurally-curated rows without intervention literature. When it refuses, it has to explain why.

The result, today: about 14% of variants we've run produce real recommendations; the rest produce reasoned refusals with citations to the catalogue row showing why. That ratio is honest. It would be trivially easy to push it higher by relaxing the rules. We've chosen not to.

Conflict between variants

A single variant produces a single recommendation. Real human genomes carry tens of thousands of variants that interact with each other in non-obvious ways. The classic example: an MTHFR variant (look it up in dbSNP by rsid and you'll see the allele frequencies vary substantially across populations) might recommend "favor methylfolate over folic acid"; a downstream methylation-pathway variant might separately recommend "moderate methylated B-vitamins." Both are anchored. Both are right in isolation. Together they need a human to think.

The consolidation engine that takes a user's genome and produces an action plan looks for these collisions. Every recommendation in the corpus carries a contradicts_keys field, a list of canonical recommendation slugs that this rec actively disagrees with. When two variants point at recommendations with overlapping contradiction keys, the consolidation engine surfaces them as a conflict for the user to resolve, with the underlying citations in hand.

It's not the action plan deciding for you. It's the action plan saying here are the two anchored views, here's the contradiction, here are the studies, you call it.

Why this matters for trust

The temptation to ship a slicker product is constant. Every "this needs anchoring" decision pushes the launch date out and reduces the rec count we can show. Every "this is a Mendelian finding, route to physician" decision means we're not the platform that fixed it for you.

But the alternative, a confident-sounding action plan with citations that don't hold up, is the failure mode that ends the company. Genetics is an intimate, high-trust domain. Users will tolerate "we don't have enough evidence to tell you anything yet." Users will not tolerate, in retrospect, having been told to take supplements that didn't help, or avoid foods that were never the problem, or worry about variants that don't matter for them.

We don't prescribe, we describe. Evidence quality is always visible on every recommendation we ship: the citation count, the strength tier, the source ladder. That is what honest genomics looks like in production, and it is the only posture compatible with the anti-grift genetics stance the rest of the consumer market badly needs.

The discipline is the product.


Want updates when we ship new variant pages or a research deep-dive? Read the latest issue or get notified about early access.