How Expressive handles your DNA, a technical explanation
This post is written for the user who reads privacy policies and wants to know how the cryptography actually works rather than how the marketing copy describes it. If you're new to the topic, our why-your-DNA-is-the-most-sensitive-data-you-own post is the right starting point. If you've already read it and want the specifics, what algorithms, what key custody model, what threat model is and isn't addressed, that's what this post is for.
The question "who owns your genetic data" has a different answer at Expressive than at most direct to consumer genomics providers, and the difference is enforced by the cryptography below, not by a clause in a privacy policy.
The design principle
The principle we built around: minimize the amount of plaintext genomic data that exists outside the user's direct control.
This translates into several concrete decisions:
- The encryption key for a user's genome is derived from a signature the user produces with a wallet they control.
- The key never lives at rest on our servers in any form.
- Server-side processing uses encrypted reads with key material flowing through the user's session.
- Lookups against the user's records use HMAC-derived identifiers rather than plaintext indices.
- Backups and transit are encrypted with the same key custody.
The result, from a threat-modeling standpoint, is that a compromise of our infrastructure yields ciphertext but not data. A compelled production request via legal subpoena can produce ciphertext but cannot produce plaintext without the user's cooperation. A breach of our backup store yields nothing actionable. This is what genetic data sovereignty means in operational terms: not a slogan, but a property of the system that survives a hostile audit.
For readers who want the regulatory baseline this sits on top of, the HHS overview of HIPAA individual rights and the NHGRI explainer on GINA, the Genetic Information Nondiscrimination Act, describe the legal floor. Client-side encryption DNA architecture is the engineering layer above that floor.
Key derivation
When a user signs up, they connect a wallet (Ethereum-style or similar). They produce a signature over a fixed message, "Expressive DNA: derive encryption key v1", which is deterministic for their wallet but unpredictable to anyone else.
That signature is fed through a key-derivation function (Argon2id, configured for ~250ms work on the user's machine) to produce a 256-bit encryption key. The key is derived client-side. It never goes over the network. The choice of Argon2id and AES-256-GCM tracks the NIST cryptographic standards and guidelines rather than anything bespoke; rolling our own primitives is exactly the kind of move that turns encrypted genetic data storage into theatre.
The Argon2id parameters and the message template are versioned. If we ever need to migrate the derivation, users can re-derive with the new parameters under user consent. Old key derivations remain valid for unlocking old ciphertext.
What happens at upload
The user uploads their raw genetic file (23andMe, AncestryDNA, MyHeritage, VCF, etc.) through the browser. The upload goes over TLS to our upload endpoint. The file is held in encrypted form on our infrastructure. If you came here looking to interpret 23andMe raw data or upload a DNA file for health reports, this is the path your bytes take.
When the file arrives, our processing pipeline:
- Encrypts the raw bytes immediately with AES-256-GCM, using a per-record content key derived from the user's master key. The per-record key is itself wrapped with the user's master key, so we can encrypt new derivatives without re-asking the user to sign.
- Parses the file (variant by variant) to extract a structured representation: rsID, chromosome, position, genotype.
- Builds an HMAC-indexed lookup table. The HMAC is computed from the user's master key plus a stable per-record salt. This means we can find a user's data by querying a hash, but the hash is not reversible to the user's identity from outside the user's signed session.
- Computes the report content (the variant-by-variant interpretations, anchored to public references like dbSNP, ClinVar, and the GWAS Catalog) and stores it in encrypted form.
The original raw bytes are encrypted at rest and never decrypted server-side without an active user session.
What happens at read
When the user signs in and asks for their report:
- The user re-derives their master key client-side from the wallet signature (a one-time action per session, ~250ms).
- The browser establishes a session with the server, exchanging a session token. The master key never leaves the browser.
- For each variant lookup, the client computes the HMAC for the variant ID using the master key, sends the HMAC to the server, receives the encrypted record back, decrypts it client-side, and renders it.
The server side of this flow sees: an authenticated session, a sequence of HMACs, and a return of encrypted bytes. The server side does not see: the user's master key, the variant IDs being looked up (it sees HMACs of them), or the plaintext content of the records being returned. This is the closest a production system gets to anonymous DNA analysis without sacrificing the ability to answer "what does this variant mean for me" at all.
This is what "we can't decrypt your data server-side" means in practice. The cryptography enforces it.
What about server-side processing for new analyses?
A common question: if you can't decrypt server-side, how do you run any analysis on the data after upload?
The answer is in two parts.
First, for one-time analyses that need to run on every variant in the file (e.g., the initial report generation, or a periodic refresh as new research lands), we do these synchronously during a user-authenticated session. The user signs in. They authorize a one-time pipeline run. The server-side pipeline decrypts the relevant data into RAM only, never to disk, for the duration of the run, generates the report content, encrypts the result, writes the encrypted result back, and discards the plaintext from RAM. We don't keep plaintext at rest at any step.
Second, for ongoing analyses (e.g., your action plan recalculating when a new study is published for a variant you carry), we use a delayed-batch model. New research signals are accumulated. The next time the user signs in, the pipeline replays the relevant analysis against their data during the authenticated session. Until they sign in, the new research signal is held server-side as a pending recompute; the user's plaintext data isn't touched.
This is slower than the "always have access to your data" pattern most consumer services use. The tradeoff is real and intentional: speed vs the property that we cannot access plaintext data without your active participation.
Threat models, what this defends against
Server compromise. An attacker who gains full access to our infrastructure and our databases finds AES-256-GCM ciphertext with no readily available decryption keys. The keys live on user devices. Even if the attacker were to access our database backups, the same applies.
Insider threat. No employee at Expressive has standing access to user genomic data in plaintext. The pipeline can decrypt within a user session, not unilaterally. An employee with full database access has the same information an external attacker has: ciphertext.
Legal subpoena for user data. A subpoena can compel us to produce all records associated with a user. The records are ciphertext. We cannot produce plaintext because we don't have the decryption capability. The subpoena would have to be redirected to the user, who has the key. This is a meaningful procedural protection.
Bankruptcy or acquisition. If Expressive itself transitions ownership, sold, acquired, dissolved, the new entity inherits ciphertext databases and an architecture that requires user-side keys to read them. The new entity cannot unilaterally re-purpose user data. The recent history of consumer genomics shows why this matters: when a provider's corporate status changes, the question of who owns your genetic data resolves to whoever currently holds the plaintext. Here that answer is, structurally, still you.
Research data sharing. We don't share with third-party researchers. The architecture is currently designed for individual access only. If we ever offer explicit research opt-in, the opt-in flow will be its own consent (signed in-browser by the user, defining what specific data is contributed, to what specific study, for what specific purpose), separate from sign-up.
Threat models, what this does NOT defend against
It's important to be explicit about the limits.
Client compromise. If your device is compromised (malware, an attacker with physical access during an authenticated session), the in-browser plaintext can be exfiltrated. Our cryptography doesn't help if your endpoint is compromised. This is the same constraint as basically all end-to-end-encrypted systems.
You sharing your data. If you choose to export your data, or screenshot your reports, or share with a third party, that's data flow outside our cryptography. We can't prevent it; we shouldn't try to. Your data, your choice.
Inference from non-DNA data. Some traits and conditions are partially inferable from other data we may collect (browsing patterns, IP-derived location, etc.). We try to minimize what we collect of this kind. But "we can't access your DNA plaintext" doesn't mean "we have no other data about you." Re-identification risk in genetic data is a real research literature, not a marketing concern, and we treat it as such: the less ancillary metadata we hold, the harder any future correlation attack becomes.
Future cryptographic breaks. AES-256-GCM is currently believed to be secure against all known attacks, including future quantum attacks within reasonable timeframes. If that changes, which is unlikely but possible, old ciphertext on our infrastructure could theoretically be decryptable retroactively. The mitigation is migrating to post-quantum schemes if and when warranted.
Side channels in the user's own device or browser. Timing attacks, cache attacks, and similar side channels against browser-based crypto are known to exist in theory and occasionally in practice. We use established crypto libraries (WebCrypto for the AES operations) rather than rolling our own, but we don't claim resistance to a sufficiently sophisticated side-channel attack against the user's browser.
Why we open-sourcing the relevant crypto code
The threat-modeling section is only meaningful if the implementation matches. We publish the encryption layer, the key derivation, and the HMAC lookup logic in our open repository so anyone can verify that the code does what the documentation says it does.
This isn't an academic exercise. The combination of (1) handling sensitive data and (2) making strong privacy claims requires being publicly checkable. We'd rather have a critical reader find a bug than ship a vulnerability quietly.
What this means for you
If you're considering uploading your DNA to a service, ours or anyone else's, the questions worth asking are:
- Where do the encryption keys live? On your device, or on the company's servers? If on theirs, the privacy story is much weaker.
- Can the company technically decrypt your data without your participation? If yes, that capability exists for legal compulsion, internal misuse, and breach. If no, those vectors are substantially closed.
- What's the data retention model? Is deletion verifiable? Are backups encrypted with user-held keys?
- What's the policy on research sharing? Bundled into sign-up consent, or its own opt-in?
- Is the relevant code public? Privacy claims are stronger when they're publicly checkable.
We answer these for Expressive in the rest of this post. We hope competitors answer them too. The industry will be in a better place when privacy-first genetic testing is treated as a first-class product feature rather than a marketing afterthought, and when "your genome stays yours" is something a user can verify by reading code rather than something they have to take on faith.
If you want to use Expressive with this architecture in mind, you can sign up here. The wallet-derived-key flow is part of onboarding; no separate steps required to get the encryption story. This is genetic data you actually own, in the only sense of "own" that matters when the database is subpoenaed, sold, or breached.
Want updates when we ship new variant pages or a research deep-dive? Read the latest issue or get notified about early access.