Choosing a scientific data platform in 2026 starts on AWS for almost every biotech R&D team. HealthOmics, Bedrock, SageMaker, Glue, and Lake Formation all assume S3 is the durable substrate, and most managed bioinformatics tools ship an AWS deployment path before anything else. The harder question is what sits on top of S3, because that decision shapes how scientists find data, how QA proves integrity, and how your pipelines scale past the first dozen users.
This post is a working guide for teams running that evaluation. It covers what "AWS-native" actually requires, the three areas where most platforms succeed or fail (automation, governance, and the catalog layer), and a concrete checklist you can take into vendor conversations. We work on the Quilt Data Platform and use it as a reference architecture in the second half. Where there are trade-offs, we'll say so.
The term gets used loosely, so it helps to be specific. A platform that's AWS-native in a way that matters for biotech R&D should:
The reason this matters specifically for biotech is the size and longevity of the data. A single NGS run can produce hundreds of gigabytes; a research program produces petabytes over its lifetime. Once that data is in S3 with retention policies and KMS encryption, moving it out is operationally and politically costly. The platform you choose should make data more usable where it already lives, rather than pulling it into a second silo.
Inari follows this pattern in production. Their NGS outputs, imaging, and field data live in their own AWS account, and Quilt provides the catalog and packaging layer on top. The data never leaves their environment. The Inari case study walks through the full architecture.
If you want to predict whether a platform will hold up at scale, look at the path a new dataset takes from instrument or pipeline to "findable in the catalog with all of its metadata attached." Anything that requires a human to fill in a web form before data is registered will collapse under its own weight once you pass a few dozen users.
The patterns we see work in production:
The failure mode to watch for is a platform that expects a scientist to do the metadata work after the fact. By the time the dataset is interesting enough to find, no one remembers the parameters.
Most evaluations check the "RBAC" and "audit log" boxes and move on. The questions that matter in a 2026 biotech context go deeper:
An AWS-native answer to those questions leans on the primitives AWS already provides (S3 Object Lock, KMS, CloudTrail, IAM, Config) and adds the higher-level concepts AWS doesn't ship out of the box: schemas, workflow contracts, package-level immutability, and metadata that auditors can read.
The Quilt approach is to treat every dataset as an immutable, versioned package addressed by a cryptographic hash. Once registered, the contents cannot drift. A new revision produces a new hash. CloudTrail records who registered each revision; the package itself records what was inside. The combination is enough to put a defensible audit trail in front of a regulator without writing custom tooling.
This is the area most evaluations underweight going in, and it's the one that determines whether the platform creates value past the first quarter. Storage is the easy part. Finding the right version of the right dataset two years later is where teams quietly give up and Slack their colleagues for file paths.
A catalog that scientists actually use has four working properties:
assay=RNA-seq and tissue=liver and project=KRAS-001 without writing Athena queries.Inari's experience is illustrative. A single catalog used by computational scientists, lab scientists, and field analysts compounded value across teams that previously couldn't share file paths. The catalog didn't replace anyone's existing tools. It became the common denominator under them.
The architecture we recommend as a baseline (with substitutions allowed for components you already operate):
┌─────────────────────────────────────────────────────────────┐
│ Scientists (Python, R, web UI) · AI agents · Auditors │
└───────────────────────────────┬─────────────────────────────┘
│
┌───────────────▼────────────────┐
│ Quilt Web Catalog + quilt3 │ discovery, packaging, governance
└───────────────┬────────────────┘
│
┌────────────────────────┼────────────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌───────────────┐ ┌──────────────┐
│ Amazon S3 │◀──────▶│ AWS Glue + │◀──────▶│ AWS Bedrock │
│ (data plane) │ │ Athena │ │ / SageMaker │
└──────┬───────┘ └───────────────┘ └──────────────┘
│
├── Object Lock + KMS + Versioning (governance primitives)
├── CloudTrail + Config (audit + posture)
└── HealthOmics / Batch / Nextflow (compute)
Three properties make this work in practice. S3 is the only storage of record, so every AWS-native tool keeps functioning without translation. Governance is enforced at the package layer on top of AWS primitives, which means both file-level and dataset-level integrity. And the catalog is the user interface for everyone, with the same packages and metadata reachable via the web for scientists and via quilt3 for engineers and AI agents.
Use this when scoring any AWS-native scientific data platform. For biotech R&D in 2026, you want most of these to be "yes" without qualifiers.
Before booking vendor demos, audit your own data. Pick three high-value datasets: your most-cited NGS output, your lead candidate's assay data, and your most recent submission package. For each, try to answer four questions in under sixty seconds:
Datasets where any of the four are hard to answer point to a packaging problem more than a platform problem. The right AWS-native scientific data platform is the one that makes those questions trivial to answer on every dataset, going forward.
That's the bar to hold every vendor to, ours included. If you want to walk three datasets through the checklist together, the team at Quilt is happy to do a working session: quilt.bio/demo.