By Kevin Moore, CEO, Quilt
I recently had the privilege of speaking at the 2025 Nextflow Summit, which allowed me to explore one of the most persistent challenges in life sciences data—metadata—and how Quilt uses the abstra
ction of data containers to help teams bridge the gap between the raw data researchers generate and the context they need to actually use it.
In most scientific environments, your data isn’t the problem – your metadata is.
You may have petabytes of raw sequencing data stored in S3, collected across multiple systems, platforms, and labs. But when it’s time to analyze it, share it, or reproduce a result six months later, things fall apart. Why?
Because critical metadata is still scattered across spreadsheets, internal databases, LIMS platforms like Benchling, or worse, forgotten in the minds of scientists who have since moved on.
At Quilt, we believe that fixing this isn’t just a matter of governance hygiene – it’s a strategic necessity. AI can’t scale on top of brittle, tribal-knowledge workflows. Reproducibility and reuse won’t happen when files are unlabelled, siloed, or stale.
That’s why we’re building a better foundation.
Why MetaData Matters More Than Ever
As I related in the presentation:
“We talk to biopharma companies all the time who say, ‘Yeah, I have a lot of FASTQs. I’m really not sure what they are.’”
It’s not just about having data. It’s about knowing where it came from, what it means, and how to use it. Metadata is what transforms raw files into reusable, trustworthy, queryable information.
When that metadata is scattered or incomplete, even the simplest analytical questions – like identifying cancer cell lines with high EGFR expression – require tedious manual effort.
Rather than asking researchers to stop and engineer schema-heavy warehouses upfront, we meet teams where they are – with messy, real-world data – and provide them with tools to wrap that data in logic, context, and structure.
We designed Quilt Data Packages to bring structure, portability, and traceability to raw scientific data, just like Docker containers do for software.
The result: self-contained, reproducible units of data and metadata that can be shared, searched, governed, and trusted.
Instead of building the data warehouse first and re-engineering your workflows to fit it, the warehouse emerges as your current pipeline creates packages for every run and every experiment.
Many teams already utilize LIMS platforms, such as Benchling, or sequencing tools like BaseSpace. Quilt doesn’t replace those systems – it connects them.
Using our new Quilt Packaging Engine, we:
Support for standards like RO-Crate enables us to ingest and build packages automatically, converting your sequencing workflows into versioned, queryable datasets without the need for copy-pasting metadata from system to system.
Metadata governance shouldn’t require perfection – it should prioritize capturing, connecting, and providing context.
By encapsulating data and metadata together:
Perhaps most critically, you reduce your dependency on “hero workflows”—those undocumented, one-off manual hacks that never scale.
Quilt Data Packages are to scientific data what Docker containers were to software: a clean abstraction that makes something chaotic finally tractable. They bring reproducibility, context, and composability to the scientific stack, ensuring your datasets are versioned, searchable, and compliant with the increasing demands of regulated environments.
If your team is struggling to find, reuse, or trust your data, Quilt can help.
Reach out, or better yet, come see a data package in action.