Skip to content

Quilt + Benchling Integration: Seamless Data Packaging for Life Science Labs

Life science R&D teams today grapple with floods of experimental data scattered across many tools.  Bench scientists love using Benchling’s Electronic Lab Notebook (ELN) to record experiments, but Benchling alone isn’t built for storing terabytes of raw data. In fact, academic Benchling accounts cap storage at 10 GB and often rely on external cloud storage for big files. This gap between lab notebooks and data lakes has long caused headaches: data becomes siloed, context is lost, and wet lab scientists struggle to share large datasets with computational teams.

Enter the Quilt–Benchling integration – a new bridge between Benchling and Quilt’s data packaging platform.  This integration automatically converts Benchling notebook entries into versioned Quilt data packages. In practice, that means researchers can keep working in Benchling as usual while their large datasets and results are transparently shuttled to Amazon S3 as Quilt packages. The result is unlimited, version-controlled cloud storage attached directly to your Benchling notebooks. In this post, we’ll explore what this integration does, how it works, and why it’s a game-changer for scientists and lab managers seeking a unified, collaborative data workflow.

What Does the Integration Do?

Quilt’s Benchling integration provides an automated, real-time link between Benchling and cloud data storage. Whenever a scientist creates or updates an entry in Benchling, the integration springs into action – packaging the entry’s data and metadata into a Quilt package (a versioned dataset in S3). It also embeds an interactive Quilt panel inside the Benchling entry for immediate access to those packages. Here’s a quick overview of its key capabilities:

  • Real-time Data Packaging: Every time you save or modify a Benchling notebook entry, the integration automatically creates or updates a versioned Quilt data package with that entry’s files and information. This ensures that every experiment snapshot is captured with exact versions of data, giving a complete audit trail of changes over time.

  • Benchling Canvas Integration: The integration adds an interactive “Quilt” canvas block inside your Benchling notebook page. This panel displays links to the corresponding Quilt package and lets you trigger package updates on demand. Scientists can click an “Update Package” button in Benchling to instantly create a new version of the dataset, making the latest results available in the cloud.

  • Asynchronous, Robust Processing: Large data exports won’t slow you down. The integration uses background workflows (AWS Step Functions and Lambda) to handle packaging without blocking the Benchling UI. You receive notifications via the Benchling canvas when packaging is complete – typically within seconds – and can continue working uninterrupted.

  • Automated Documentation: Each Quilt package includes a README and rich metadata generated from the corresponding Benchling entry. The integration automatically carries over the experiment context (such as the notebook entry text, metadata fields, timestamps, etc.) into the package documentation. This means every dataset is self-documenting, saving scientists from manual note-taking in multiple places.

  • Multi-Channel Event Handling: Beyond simple save events, the integration listens to various Benchling events – from entry creation/updates to interactive canvas button clicks and app lifecycle hooks. It intelligently routes these events so that, for example, a user clicking “Update Package” in the canvas triggers the packaging workflow immediately. In contrast, routine entry edits might batch or debounce updates.

  • Unlimited Scalable Storage: By offloading data to Amazon S3 via Quilt, your Benchling notebooks can reference terabytes of data without slowing down. Quilt’s packages store data in cloud buckets, effectively giving Benchling users unlimited storage with version control. This eliminates the need for ad-hoc solutions like shared drives or Dropbox links and ensures data is stored in a secure, centralized repository.

In short, the integration turns Benchling into a gateway to a data lake. Scientists continue to use Benchling as their familiar ELN interface, but behind the scenes, their data is packaged, versioned, and made accessible to the whole team through Quilt. Next, let’s peek under the hood to see how this works.

How It Works: Architecture Overview

Architecture of the Benchling–Quilt integration. Benchling notebook events (saves or user actions) trigger a serverless workflow on AWS. The workflow exports data via Benchling’s API, stores files in S3, and creates a versioned Quilt package. A Benchling Canvas panel provides real-time links and lets users trigger updates on demand.

The integration is built on a serverless AWS architecture to ensure scalability and responsiveness. Here’s a simplified look at the components involved:

  • Benchling Webhooks: Benchling sends out webhook events for various actions (e.g., an entry is created or modified, or a user clicks a canvas button). The integration registers a webhook endpoint, so it’s notified in real-time whenever relevant events occur.

  • API Gateway: Incoming webhook events hit an AWS API Gateway endpoint, which serves as the entry point to AWS. This securely receives the event data from Benchling.

  • Step Functions (Orchestration): The heavy lifting is coordinated by AWS Step Functions, which use state machines to orchestrate workflows. There are two main state machines:

    • Webhook Router: The first state machine processes the initial webhook event. It checks the event type and decides what needs to happen. For example, if it’s an “entry updated” event or a canvas “Update Package” click, it will trigger the packaging workflow. It can also handle Benchling app lifecycle events (like installation or removal) as needed.

    • Packaging Workflow: The second state machine handles the end-to-end process of exporting data and creating the Quilt package. This workflow runs asynchronously so as not to keep Benchling waiting.

  • Lambda Functions: Within the packaging workflow, AWS Lambda functions perform specific tasks. One Lambda uses Benchling’s APIs (with proper OAuth 2.0 authorization) to retrieve the latest entry data and any attached files. Another Lambda might generate the README.md or metadata for the package. These functions are small, focused pieces of code that run on demand.

  • S3 and Quilt Packaging: The data and generated documentation are then stored in an S3 bucket. At this point, the integration places a message in an SQS queue to notify Quilt’s packaging engine (which monitors the queue) that a new package version is ready to be finalized. Quilt’s packaging service then assembles the files and metadata from S3 into an immutable versioned package.

  • Quilt Catalog and Links: Once the Quilt package is built, it’s available in the Quilt data catalog (which is essentially a friendly web UI for the S3-backed data package). The integration updates the Benchling canvas panel with a direct link to the package (and the latest version ID). The canvas can display a list of package versions, so a scientist can click to open any version on Quilt. All of this happens within the Benchling interface – the scientist doesn’t have to leave Benchling to browse their data, unless they want to dive into the Quilt web catalog for advanced visualization.

Despite the complex cloud machinery under the hood, the user experience is straightforward. Benchling becomes the control panel for data packaging, and Quilt becomes the cloud-based back-end for data storage, versioning, and sharing. This design also means the system can scale seamlessly – AWS handles large workloads, and there are no servers to manage. Whether your lab makes one notebook entry a day or hundreds, the integration will keep up without manual intervention.

Why Connect Benchling and Quilt? (Benefits for Scientists)

Connecting an ELN like Benchling with a data packaging platform like Quilt yields powerful benefits for research teams:

  • Seamless Data Flow: Lab scientists can continue using Benchling’s friendly interface to record experiments, while behind the scenes, their raw data (instrument outputs, assay results, images, etc.) flows automatically to a secure cloud repository. This bridges the gap between the wet lab and cloud storage. Researchers no longer have to manually upload files to some shared drive or paste links – it’s all handled by the integration.

  • Unlimited, Organized Storage: Benchling is terrific for notes and structured data, but not built for large unstructured files. With Quilt, each Benchling entry gets an unlimited storage bucket in the form of a Quilt package. Instead of hitting size limits or slowing down the ELN, big files (like sequencing data or microscopy images) reside in S3, where they’re cheap to store and easy to retrieve. The integration automatically organizes these files into logical packages with metadata, so nothing gets lost in a random folder.

  • Version Control & Provenance: Every change in Benchling triggers a new versioned snapshot in Quilt. This provides a complete audit trail of how data evolved – a huge plus for reproducible science and compliance. If a month ago a certain analysis was done, you can go back and fetch the exact dataset (files and all) as it was then, because Quilt preserves each version. No more confusion about which CSV or file version corresponds to which experimental run – it’s all tracked.

  • Cross-Platform Collaboration: Often, not everyone who needs the data has a Benchling account – e.g., a data scientist or external collaborator might not work in Benchling. By packaging data in Quilt (which is built on open AWS storage), it becomes easy to share with colleagues outside Benchling. You can send a link to a Quilt package (with appropriate permissions), and a collaborator can access the data via the Quilt web catalog or Python APIs, without needing access to the Benchling ELN. This opens up your lab data to the whole organization in a controlled, secure way.

  • Improved Lab Efficiency: The integration removes a lot of manual steps. Wet lab scientists don’t need to double-enter information or worry about uploading files to multiple places. They save time and also reduce errors (no missing files, no typos in filenames, etc.). Lab managers get peace of mind knowing data is properly archived the moment it’s generated, with consistent documentation and labels. Overall, teams can move faster from data generation to analysis since the hand-off is instantaneous.

  • Bridging Wet and Dry Labs: Perhaps most importantly, connecting Benchling to Quilt helps break down the wall between wet lab data capture and dry lab data analysis. Benchling remains the system of record for experimental context, while Quilt serves as the system of record for the data itself. Together, they ensure that context and data stay linked. This means a bioinformatician looking at a dataset in Quilt can trace it back to the Benchling notebook entry that produced it, and a bench scientist in Benchling can immediately find where the data lives in the cloud. It’s a unified ecosystem rather than isolated islands of information.

From a broader perspective, what we’re seeing is an example of a “connected lab” in action – an environment where different tools (ELN, LIMS, data lakes, analysis pipelines) all talk to each other. Benchling itself encourages integration with data lakes for this reason. The Quilt-Benchling integration is a turnkey solution to achieve that connectivity, making rich experimental data as easy to access and version as code.

User Workflow: From Experiment to Quilt Package

How do you actually use this integration in practice? Here’s a step-by-step look at a typical workflow for a scientist or lab:

  1. One-Time Setup: A lab admin installs the Quilt app within Benchling (via Benchling’s app marketplace or developer console) and configures the webhook endpoint provided by Quilt. This setup might involve entering your Benchling tenant name and giving the Quilt app permission (via OAuth2) to read data. Once configured, Benchling entries can start talking to Quilt.

  2. Perform Experiment & Record in Benchling: A researcher carries out an experiment and records all notes, results, and attachments in a Benchling notebook entry as usual. They might attach raw data files or results to the entry – for example, uploading a set of microscopy images or a CSV of instrument readings into Benchling.

  3. Insert the Quilt Canvas: In the Benchling entry, the scientist adds the “Quilt Integration” canvas block (this is as simple as hitting the plus/add-block button and selecting the Quilt app canvas). Immediately, this canvas will display a panel (maybe showing “No package yet. Save the entry to create one,” or an “Update Package” button if a package exists).

  4. Automatic Package Creation: Upon saving the entry (or at scheduled intervals, depending on configuration), Benchling sends a webhook event. The integration receives the event and triggers the packaging workflow. Within moments, a Quilt package is created containing all the attached files and a snapshot of the entry’s content. The Benchling canvas updates to show a link to this package in Quilt. For example, it might show “Dataset Package: User/BenchlingNotebook/EntryName – Version 1 created at 2025-09-25 14:05”. The researcher can click that link to view the package in Quilt’s web UI if desired.

  5. On-Demand Updates: Later, if the researcher updates the Benchling entry (perhaps adding new results or correcting data), they can trigger a new version. Simply clicking the “Update Package” button on the Quilt canvas will send off another event to Quilt. The integration then packages the updated data as a new Quilt package version. The canvas will update to reflect “Version 2” with the latest timestamp. In this way, the scientist can version their dataset with one click, whenever they reach a new milestone or want to snapshot the current state.

  6. Accessing Data via Quilt: All the data is now available through Quilt. If a bench scientist later wants to analyze the data, they or a colleague can use Quilt’s tools to pull it. For instance, a computational biologist could use the Quilt Python SDK to load the dataset directly into a Jupyter notebook or download it from the Quilt web catalog. The links in Benchling make it effortless to find the right package. Even months later, anyone reviewing the Benchling entry can retrieve the exact data by following the Quilt link. Nothing is lost, and everything is documented.

For lab managers or IT admins, there are also straightforward processes to manage this integration (monitoring the AWS components, setting up permissions, etc.), but those details are beyond the scope of this post. The takeaway is that from a user’s perspective, it’s extremely simple: write in Benchling, click save, and your data is packaged and ready to go. The learning curve is minimal, since the integration sits invisibly in the background until you need it.

Under the Hood: Modern Cloud Design and Security

It’s worth noting some of the technical innovations that make this integration reliable and secure (even if you don’t see them directly):

  • Event-Driven Architecture: The integration is entirely event-driven, responding to Benchling’s webhooks in real time. This means there’s no polling or manual exporting needed – data flows as soon as it’s generated. It also means the system scales out automatically under higher loads (multiple events can be processed in parallel by separate Lambda instances and Step Function executions).

  • OAuth2.0 Secure Access: The Quilt app uses Benchling’s OAuth2 for authentication, so it only accesses data that the user or organization has explicitly permitted. All data transfers happen over HTTPS. No credentials are stored in plain text – instead, secure tokens (often stored in AWS Secrets Manager) are used by the Lambdas to call the Benchling API. This ensures the integration adheres to enterprise security practices while connecting the two platforms.

  • Robust Error Handling and Retries: Lab workflows can be unpredictable – maybe an internet hiccup or a Benchling API rate limit could cause a failed export. The integration is built with robust error handling and retry logic. If a packaging attempt fails, it can retry or queue the task without losing the event. AWS Step Functions can maintain state and attempt steps again on failure, and dead-letter queues catch any unprocessed events. This design avoids data falling through the cracks; lab managers can trust that every entry will eventually be packaged, even if minor issues occur along the way.

  • Template-Driven Metadata: The system uses templates to generate README documentation and metadata for each Quilt package. This ensures a consistent structure across packages – for example, every package might include sections like “Originating Benchling Entry URL”, “Entry author”, “Experiment date”, etc., pulled from Benchling. This consistency makes it easier to search and navigate packages on Quilt (e.g., you could search the Quilt catalog for all packages from a certain project or with a certain assay type, if those are included as metadata). It basically brings the rich context from Benchling into the data lake, so you don’t end up with unlabeled files in S3.

  • Scalable & Serverless: Because the integration leverages AWS Lambda, Step Functions, S3, and SQS, it requires zero infrastructure maintenance. There are no servers to run or update. It also scales with usage – if 50 experiments finish at the same time, AWS will spawn the necessary Lambdas in parallel. If nothing is happening (e.g., no one is using Benchling at midnight), there’s no compute running at all (and no cost incurred except minimal standby resources). This makes it cost-efficient and highly reliable. The architecture is designed to handle enterprise workloads (hundreds of scientists, thousands of notebook entries) without a hiccup.

For those interested in implementing or customizing this integration, these technical choices mean it’s both powerful and extensible. The open-source repository (the “Benchling Packager”) demonstrates patterns that could be adapted to other ELNs or data sources as well. But even if you’re not an engineer, you can appreciate that a lot of thoughtful engineering underpins the smooth user experience.

Conclusion: A Step Toward the Connected Lab

The Quilt–Benchling integration represents a significant step toward a truly connected lab environment. Automatically linking a bench scientist’s notebook to a cloud-based data hub eliminates data silos and manual transfers, allowing researchers to focus on science rather than IT logistics. Wet lab scientists gain the benefits of modern data management (version control, cloud backups, shareable datasets) without leaving their ELN, and dry lab scientists or data engineers get immediate access to freshly generated data in a structured way.

Such seamless integration is valuable not just for convenience, but for scientific rigor – every piece of data is accounted for, versioned, and associated with its experimental context. Lab managers can more easily enforce data management best practices without relying on nagging their teams; the infrastructure simply takes care of it. And teams can onboard new members or external partners more smoothly by pointing them to Quilt packages instead of emailing large files or granting access to a patchwork of systems.

In a broader sense, connecting Benchling with Quilt is about bridging the wet–dry lab divide. It acknowledges that modern life science research spans both physical experiments and digital data analysis, and it provides a highway between the two. We believe this will help labs accelerate discovery – when data moves freely and is accessible in the right format to the right people, insights come faster.

For scientists and lab leaders considering this integration, the message is: you don’t have to change how you work in Benchling, but you’ll instantly gain a robust data backbone powered by Quilt. It’s an easy win for better data management. We’re excited to see how labs use this capability – whether it’s tracking an assay’s lineage across versions, collaborating with bioinformaticians on live datasets, or ensuring compliance with a full audit trail of your research data.

Interested in trying it out? You can find the Benchling integration documentation and setup instructions on our website and GitHub.

Here’s to more connected, efficient, and data-driven science!

 

Comments