Synthetic biology has reached the same inflection point achieved by computer science in the 1950's
The foundational pieces emerging, in the form of standardized DNA parts, packaged for combinatoric assembly using standards such as BioBricks.
But:
- Creating new DNA parts is tedious and time-consuming
- Constructing systems from sets of DNA parts is an ad hoc, manual process that limits the size, complexity, and capability of the resulting systems.
The time is right for automation of biological design.
We work at the intersection of synthetic biology and computer science. We have introduced and pursued a vision of a toolchain that stretches from high-level languages to cellular implantation of genetic circuits. We have developed tools for high-level design and data representation. Our efforts have also focused on the reproducibility of results, improving device libraries, and high-precision prediction.
We apply lessons learned and techniques from computer science and artificial intelligence to synthetic biology. Engineering practices such as libraries of parts, modularization, standards and interfaces, computer aided design, and AI techniques can advance the capabilities of synthetic biology.
- built the first end-to-end toolchain for synthetic biology design automation including the BioCompiler that outperforms human designers.
- active participant in the Synthetic Biology Open Language (SBOL) that serves as a hub for linking many different synthetic biology resources.
- led the iGEM interlab study that examines characterization and reproducibility of results across hundreds of laboratories.
- developed a calibrated flow cytometry method to measure, compare, and combine biological circuit components which enabled RTX BBN Technologies' high-precision quantitative prediction software more accurately.
Our research spans the boundary between academia and industry and between data-driven/AI methods and wet-lab investigations.
Areas of work include:
- Calibrated measurement and characterization of genetic devices
- Engineering of high-performance genetic regulation devices
- Representation and sharing of designs, protocols, and data
- Sequence-based detection of pathogens, toxins, and signs of engineering
- Modeling and manipulating the 3D growth and differentiation of cells
- Applied capabilities of multi-organism communities
- Advanced biological detection methods
Synthetic Biology is important for diverse applications
including:
- new medical diagnostics and therapies,
- environmental remediation and sensing, and
- chemical production or detection
Under DARPA’s Bio Reporters for Subterranean Surveillance program we are exploring a method to use naturally-occurring fungus to detect buried TNT. By using natural soil fungal webs to propagate engineered bacteria underground and send signals back up to the surface, we plan to create a warning system that glows under ultra-violet light to indicate the presence of buried TNT.
Under DARPA’s Friend or Foe program, scientists at Raytheon Technologies are developing a portable device to detect bacteria and evaluate their potential to cause harm as soon as or even before they pose a threat to civilian and military populations. This system will characterize bacteria quickly by examining their behavior, where current surveillance techniques don’t work on undiscovered bacterial strains or on bacteria engineered to evade detection.
Right now, there is no technology that can quickly detect engineered microorganisms. For the Intelligence Advanced Research Projects Activity (IARPA), we’re developing a system that adapts proven microfluidics hardware and uses our proven cybersecurity techniques to identify microorganisms based on their DNA sequences.
News & Feature Stories
Related Solutions
Publications
Journal Articles
Bartley, Bryan, Jacob Beal, Miles Rogers, Daniel Bryce, Robert P. Goldman, Benjamin Keller, Peter Lee, Vanessa Biggers, Joshua Nowak, and Mark Weston. "Building an Open Representation for Biological Protocols." ACM Journal on Emerging Technologies in Computing Systems 19, no. 3 (2023): 1-21. https://www.biorxiv.org/content/10.1101/2022.07.05.498808v1
Building an Open Representation for Biological Protocols
Abstract
Laboratory protocols are critical to biological research and development, yet difficult to communicate and reproduce across projects, investigators, and organizations. While many attempts have been made to address this challenge, there is currently no available protocol representation that is unambiguous enough for precise interpretation and automation, yet simultaneously abstract enough to enable reuse and adaptation. The Protocol Activity Markup Language (PAML) is a free and open protocol representation aiming to address this gap, building on a foundation of UML, Autoprotocol, and SBOL RDF. PAML provides a representation both for protocols and for records of their execution and the resulting data, as well as a framework for exporting from PAML for execution by either humans or laboratory automation. PAML is currently implemented in the form of an RDF knowledge representation, specification document, and Python library, can be exported for execution as either a manual “paper protocol” or Autoprotocol, and is being further developed as an open community effort.
Cummins, Breschine, Justin Vrana, Robert C. Moseley, Hamed Eramian, Anastasia Deckard, Pedro Fontanarrosa, Daniel Bryce, Jacob Beal, Bryan Bartley, Tom Mitchell, Tramy Nguyen, Nicholas Roehner, et al. "Robustness and reproducibility of simple and complex synthetic logic circuit designs using a DBTL loop." Synthetic Biology 8, no. 1 (2023): ysad005. https://pubmed.ncbi.nlm.nih.gov/37073283/
Abstract
Computational tools addressing various components of design-build-test-learn (DBTL) loops for the construction of synthetic genetic networks exist but do not generally cover the entire DBTL loop. This manuscript introduces an end-to-end sequence of tools that together form a DBTL loop called Design Assemble Round Trip (DART). DART provides rational selection and refinement of genetic parts to construct and test a circuit. Computational support for experimental process, metadata management, standardized data collection and reproducible data analysis is provided via the previously published Round Trip (RT) test-learn loop. The primary focus of this work is on the Design Assemble (DA) part of the tool chain, which improves on previous techniques by screening up to thousands of network topologies for robust performance using a novel robustness score derived from dynamical behavior based on circuit topology only. In addition, novel experimental support software is introduced for the assembly of genetic circuits. A complete design-through-analysis sequence is presented using several OR and NOR circuit designs, with and without structural redundancy, that are implemented in budding yeast. The execution of DART tested the predictions of the design tools, specifically with regard to robust and reproducible performance under different experimental conditions. The data analysis depended on a novel application of machine learning techniques to segment bimodal flow cytometry distributions. Evidence is presented that, in some cases, a more complex build may impart more robustness and reproducibility across experimental conditions.
Mante, Jeanet, Julian Abam, Sai P. Samineni, Isabel M. Pötzsch, Jacob Beal, and Chris J. Myers. "Excel-SBOL Converter: Creating SBOL from Excel Templates and Vice Versa." ACS Synthetic Biology 12, no. 1 (2023): 340-346. https://pubs.acs.org/doi/10.1021/acssynbio.2c00521
Excel-SBOL Converter: Creating SBOL from Excel Templates and Vice Versa
Abstract
Standards support synthetic biology research by enabling the exchange of component information. However, using formal representations, such as the Synthetic Biology Open Language (SBOL), typically requires either a thorough understanding of these standards or a suite of tools developed in concurrence with the ontologies. Since these tools may be a barrier for use by many practitioners, the Excel–SBOL Converter was developed to facilitate the use of SBOL and integration into existing workflows. The converter consists of two Python libraries: one that converts Excel templates to SBOL and another that converts SBOL to an Excel workbook. Both libraries can be used either directly or via a SynBioHub plugin.
Aldulijan, Ibrahim, Jacob Beal, Sonja Billerbeck, Jeff Bouffard, Gaël Chambonnier, Nikolaos Ntelkis, Isaac Guerreiro et al. "Functional synthetic biology." Synthetic Biology 8, no. 1 (2023): ysad006. https://arxiv.org/abs/2207.00538
Abstract
Synthetic biologists have made great progress over the past decade in developing methods for modular assembly of genetic sequences and in engineering biological systems with a wide variety of functions in various contexts and organisms. However, current paradigms in the field entangle sequence and functionality in a manner that makes abstraction difficult, reduces engineering flexibility, and impairs predictability and design reuse. Functional Synthetic Biology aims to overcome these impediments by focusing the design of biological systems on function, rather than on sequence. This reorientation will decouple the engineering of biological devices from the specifics of how those devices are put to use, requiring both conceptual and organizational change, as well as supporting software tooling. Realizing this vision of Functional Synthetic Biology will allow more flexibility in how devices are used, more opportunity for reuse of devices and data, improvements in predictability, and reductions in technical risk and cost.
Beal, Jacob, Adam Clore, and Jeff Manthey. "Studying pathogens degrades BLAST-based pathogen identification." Scientific Reports 13, no. 1 (2023): 5390. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10068195/
Studying pathogens degrades BLAST-based pathogen identification
Abstract
As synthetic biology becomes increasingly capable and accessible, it is likewise increasingly critical to be able to make accurate biosecurity determinations regarding the pathogenicity or toxicity of particular nucleic acid or amino acid sequences. At present, this is typically done using the BLAST algorithm to determine the best match with sequences in the NCBI nucleic acid and protein databases. Neither BLAST nor any of the NCBI databases, however, are actually designed for biosafety determination. Critically, taxonomic errors or ambiguities in the NCBI nucleic acid and protein databases can also cause errors in BLAST-based taxonomic categorization. With heavily studied taxa and frequently used biotechnology tools, even low frequency taxonomic categorization issues can lead to high rates of errors in biosecurity decision-making. Here we focus on the implications for false positives, finding that BLAST against NCBI’s protein database will now incorrectly categorize a number of commonly used biotechnology tool sequences as the pathogens or toxins with which they have been used. Paradoxically, this implies that problems are expected to be most acute for the pathogens and toxins of highest interest and for the most widely used biotechnology tools. We thus conclude that biosecurity tools should shift away from BLAST against general purpose databases and towards new methods that are specifically tailored for biosafety purposes.
Beal, Jacob, Vinoo Selvarajah, Gaël Chambonnier, Traci Haddock, Alejandro Vignoni, Gonzalo Vidal, and Nicholas Roehner. "Standardized Representation of Parts and Assembly for Build Planning." ACS Synthetic Biology 12, no. 12 (2023): 3646-3655. https://pubs.acs.org/doi/10.1021/acssynbio.3c00418
Standardized Representation of Parts and Assembly for Build Planning
Abstract
The design and construction of genetic systems, in silico, in vitro, or in vivo, often involve the handling of various pieces of DNA that exist in different forms across an assembly process: as a standalone “part” sequence, as an insert into a carrier vector, as a digested fragment, etc. Communication about these different forms of a part and their relationships is often confusing, however, because of a lack of standardized terms. Here, we present a systematic terminology and an associated set of practices for representing genetic parts at various stages of design, synthesis, and assembly. These practices are intended to represent any of the wide array of approaches based on embedding parts in carrier vectors, such as BioBricks or Type IIS methods (e.g., GoldenGate, MoClo, GoldenBraid, and PhytoBricks), and have been successfully used as a basis for cross-institutional coordination and software tooling in the iGEM Engineering Committee.
Mo, Yuanqui, Soura Dasgupta, Jacob Beal, “Stability and Resilience of Distributed Information Spreading in Aggregate Computing”, Paper in: IEEE Transactions on Automatic Control, Jan 5, 2023, doi: 10.1109/TAC.2022.3140253 https://ieeexplore.ieee.org/document/9670637
Stability and Resilience of Distributed Information Spreading in Aggregate Computing
Abstract
Spreading information through a network of devices is a core activity for most distributed systems. Self-stabilizing algorithms for information spreading are one of the key building blocks enabling aggregate computing to provide resilient coordination in open complex distributed systems. This article improves a general spreading block in the aggregate computing literature by making it resilient to network perturbations, establishes its global uniform asymptotic stability, and proves that it is ultimately bounded under persistent disturbances. The ultimate bounds depend only on the magnitude of the largest perturbation and the network diameter, and three design parameters trading off competing aspects of performance. For example, as in many dynamical systems, values leading to greater resilience to network perturbations slow convergence and vice versa.
Buecherl, Lukas, Thomas Mitchell, James Scott-Brown, Prashant Vaidyanathan, Gonzalo Vidal, Hasan Baig, Bryan Bartley, Jacon Beal, et al. "Synthetic biology open language (SBOL) version 3.1. 0." Journal of integrative bioinformatics 20, no. 1 (2023): 20220058. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10063177/
Synthetic biology open language (SBOL) version 3.1. 0.
Abstract
Synthetic biology builds upon genetics, molecular biology, and metabolic engineering by applying engineering principles to the design of biological systems. When designing a synthetic system, synthetic biologists need to exchange information about multiple types of molecules, the intended behavior of the system, and actual experimental measurements. The Synthetic Biology Open Language (SBOL) has been developed as a standard to support the specification and exchange of biological design information in synthetic biology, following an open community process involving both bench scientists and scientific modelers and software developers, across academia, industry, and other institutions. This document describes SBOL 3.1.0, which improves on version 3.0.0 by including a number of corrections and clarifications as well as several other updates and enhancements. First, this version includes a complete set of validation rules for checking whether documents are valid SBOL 3. Second, the best practices section has been moved to an online repository that allows for more rapid and interactive of sharing these conventions. Third, it includes updates based upon six community approved enhancement proposals. Two enhancement proposals are related to the representation of an object’s namespace. In particular, the Namespace class has been removed and replaced with a namespace property on each class. Another enhancement is the generalization of the CombinatorialDeriviation class to allow direct use of Features and Measures. Next, the Participation class now allow Interactions to be participants to describe higher-order interactions. Another change is the use of Sequence Ontology terms for Feature orientation. Finally, this version of SBOL has generalized from using Unique Reference Identifiers (URIs) to Internationalized Resource Identifiers (IRIs) to support international character sets.
Goldman, Robert P., Robert Moseley, Nicholas Roehner, Breschine Cummins, Justin D. Vrana, Katie J. Clowers, Daniel Bryce, Jacob Beal, Bryan Bartley, Richard Markeloff, Tom Mitchell, Tramy Nguyen, Daniel Sumorok, et al. "Highly-automated, high-throughput replication of yeast-based logic circuit design assessments." Synthetic Biology 7, no. 1 (2022): ysac018. https://pubmed.ncbi.nlm.nih.gov/36285185/
Highly-automated, high-throughput replication of yeast-based logic circuit design assessments
Abstract
We describe an experimental campaign that replicated the performance assessment of logic gates engineered into cells of Saccharomyces cerevisiae by Gander et al. Our experimental campaign used a novel high-throughput experimentation framework developed under Defense Advanced Research Projects Agency's Synergistic Discovery and Design program: a remote robotic lab at Strateos executed a parameterized experimental protocol. Using this protocol and robotic execution, we generated two orders of magnitude more flow cytometry data than the original experiments. We discuss our results, which largely, but not completely, agree with the original report and make some remarks about lessons learned. Graphical Abstract.
Conference Papers
Rosenstein, J. K.; Yin, Y.; Hu, K.; Epstein, S.; Wanunu, M.; Adler, A.; Larkin, J. W., “Live-Cell Imaging with Integrated Capacitive Sensor Arrays”, In 2023 International Electron Devices Meeting (IEDM) (pp. 1-4). IEEE. (2023, December) https://ieeexplore.ieee.org/abstract/document/10413778
Live-Cell Imaging with Integrated Capacitive Sensor Arrays
Abstract
Capacitive imaging is a near-field sensing technique that detects changes in the dielectric properties and geometry of materials above a microelectrode array. Taking advantage of modern integrated circuits, dense capacitive sensing arrays can now be created at the scales of single biological cells. These non-optical imaging arrays offer intriguing possibilities for cell culture monitoring, offering low cost, portability, single-cell resolution, a wide field of view, and co-integration with multiple electrochemical sensing and stimulation modes. Here we review state-of-the-art examples of capacitive imaging arrays and present new demonstrations of all-electrical imaging of growing bacterial cultures.
Wyschogrod, Daniel, Steven Murphy, Jacob Beal, and Allison Taggart, "Securing Fieldable Bioinformatics", 10th International Workshop on High Performance Computing on Bioinformatics at IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2023), December 2023. https://www.computer.org/csdl/proceedings-article/bibm/2023/10385290/1TOc43AaM2Q
Securing Fieldable Bioinformatics
Abstract
Nanopore sequencing offers promising potential for rapid and target-agnostic sensing and diagnostics in field applications. However, realizing this potential necessitates addressing not only sample collection and processing challenges but also concerns related to information security and privacy when deployed in field devices. We thus introduce the Secure Bloom-Filter Analysis and Compression (SB-FAC) architecture, which breaks bioinformatics computations into a Bloom-filter-based field preprocessing stage that identifies regions of interest in the raw read data and a server-side interpretation stage that combines and interprets these identified regions. Sensitive information encoded in the Bloom filter can be protected from extraction by the use of a cryptographic hash paired with a salt in a Trusted Platform Module (TPM). We experimentally validate the predicted scaling of this approach, confirming that cost per operation linear in salt length, while reverse engineering cost is exponential.
Conference Abstracts
Aaron Adler, Jacob Beal, Partha Pal, Miles Rogers, Dan Wyschogrod. "Peering into the Cyberbio Threat Horizon." In Hudson CM, Pattengale ND, Iyer RK, Kalbarczyk ZT, Alli N. Genomic and Synthetic Biology Digital Biosecurity. Pac Symp Biocomput. 2022;27:402-406. PMID: 34890167.
Abstract
As threats in the realm of digital biosecurity escalate, can we learn from past cyber domain experience to distinguish imminent critical risks from science fiction?
The history of cyber security provides both a cautionary tale and a potential roadmap for anticipating and mitigating digital threats in the expanding bioeconomy. Since the relatively primitive and low-consequence cyber attacks of the 1990s, threats and countermeasures have co-evolved. As the varieties of attacks expanded from the purely digital into the realm of the “cyber-physical”, attackers have held the advantage despite enormous investments in mitigations. Much of this advantage stems from two facts. First, digital infrastructure such as networks, operating systems, and mobiles devices were designed for innovative functionality rather than security. Second, whatever hardening was provided was designed around current day threats.
Synthetic biology and automated laboratory processes are now increasingly intertwined with the cyber world. When discussing threats, we often hear the refrain "that's not a current threat". By focusing exclusively on enhancing the capabilities of automated biology and ignoring the domain-specific threats, we risk repeating the past and handing attackers a long-lasting advantage as vulnerable systems become universal standards. It is therefore necessary to focus on a wide variety of threats, characterize them by time horizons. and prioritize effective mitigations that respect the pace of innovation.
We will argue that cyberbio threats belong to three categories:
- Conventional cyber(e.g. database pollution, information theft, etc.)
- Biological cyber-physicalattacks which rely on or attack biological processes in the physical world (e.g. sequencers, desktop synthesis, sensors and IoT devices)
- Hybrid attackswhich co-opt benign processes and labs to unwittingly synthesize attacker-directed products
We will present Category 1 attacks that exist today and Category 2 attacks that are either feasible now or in the near future. Although the complex attacks in Category 3 may currently be hypothetical in light of current synthetic biology capabilities, we argue for a careful assessment of the attack time horizon given increasing network connectivity and the rapid pace of biological advancement.
Examples of various types of attacks in the above categories and analysis of what obstacles attackers would have to overcome, how difficult that would be, and the necessary time horizon will be discussed as well as the predictive countermeasures researchers, laboratories, and businesses can implement to forestall their use.
Aaron Adler, Jacob W. Crandall, Michael A. Goodrich, Knut Hinkelmann, Mayank Kejriwal, Eric Kildebeck, Andreas Martin, Abhinav Shrivastava. "Reports of the Association for the Advancement of Artificial Intelligence’s 2022 Spring Symposium Series" AI Magazine. Volume 43, Issue 2. Jul 1, 2022.
Bryan Bartley, Jacob Beal, Alexis Casas, Jeremy Cahill, Timothy Fallon, Daniel Bryce, Robert P. Goldman, Luiza Hesketh, Tim Dobbs, Alejandro Vignoni, “Implementing Cross-Platform Protocol Execution with the Protocol Activity Modeling Language,” International Workshop on Bio-Design Automation (IWBDA), October, 2022.
Abstract
Laboratory protocols are used for a wide range of purposes in research and development, at many different stages, including experiment design, execution, data analysis, interpretation, and communication and sharing with other groups. However, protocols are often difficult to communicate or reproduce, given the differences in context, skills, instruments, and other resources between different projects, investigators, and organizations. To this end, the Bioprotocols Working Group has developed a draft specification for a unified protocol modeling language, called the Protocol Activity Modeling Language (PAML). The PAML data model has been designed to support the following needs:
- Execution by either humans or machines
- Maintaining execution records and associated metadata markup
- Mapping protocols from one laboratory environment to another
- Recording modifications of protocols and the relationship between different versions
- Verification and validation of protocol completeness and coherence
- Planning, scheduling, and allocation of laboratory resources
Here we describe recent progress implementing PAML and demonstrating that it can be translated to and executed across different laboratory platforms in order to address use cases presented by the stakeholder community.
Bryan Bartley, Alexis Casas, Jacob Beal, “Early Adoption of Machine-readable Protocols in the iGEM Community,” HARMONY 2022, April 2022.
Abstract
The annual International Genetically Engineered Machines (iGEM) competition, involving hundreds of collegiate teams each year, is an ideal forum for promoting democratized science and early-career adoption of standards. One of the goals of iGEM is to educate and promote good engineering practice among students. Toward this end, the iGEM Engineering Committee organizes interlaboratory studies to develop reproducible measurement practices, in which students are both contributors and beneficiaries. This year the Interlaboratory Working Group is piloting the use of the Protocol Activity Markup Language (PAML) for machine-readable protocols in the iGEM community. In this talk, we will discuss the outlook for and progress toward early adoption of PAML in the iGEM community.
Robert Goldman, Dan Bryce, Bryan Bartley, Jacob Beal, “The Container Ontology and Its Server,” HARMONY 2022, April 2022.
Abstract
The Container Ontology is a shared model of containers used in biological protocols. In work on the proposed protocol representation standard, PAML, we determined that a key task in adapting a protocol from one site to another was mapping the containers used in one lab to those available at another. The containers available at a lab are in turn constrained by the equipment available there, as well as miscellaneous other considerations. This PAML requirement argued for a shared framework for describing containers, and further that this shared framework should support identifying containers meeting descriptions. This, and a desire to be PAML-compatible, led us to base our model on the Web Ontology Language (OWL). In order to make uptake easier, since commissioning OWL tools can be difficult, we distribute a ready-to-use Docker image with the container ontology. We will describe the container ontology, the server, and directions for further development.
Jacob Beal, “Implementing Safe Cross-Document RDF References,” COMBINE 2022, October 2022
Abstract
RDF has been widely used as a means of sharing bioinformatic knowledge, e.g., for biomedical ontologies and for RDF-based design representations such as SBOL 3. While RDF makes it easy for documents or datastores to refer to knowledge in other documents or datastores, however, such references are fragile and subject to unexpected failure, since there are no guarantees provided regarding changes to the contents of the knowledge that is referred to. As a result, RDF representations either tend to accrete either broken links or confusing collections of "forked" information.
In SBOL, this has become a critical problem regarding the sharing of information about genetic designs. To this end, the SBOL Enhancement Proposal (SEP) 054 has proposed a set of practices for managing genetic design packages, which allows design information to be treated more like version controlled software.
The keystone for implementing this proposal is management of cross-document references in a way that both maintains RDF's URI guarantees and supports version control of information. In this talk, we will discuss the pragmatics of this challenge, current implementation in SBOL utilities, and potential implications for other uses of RDF.
Daniel Bryce, Jacob Beal, Bryan Bartley, Timothy Fallon, Jeremy Cahill, Mark Doerr, “Laboratory Open Protocol (LabOP),” COMBINE 2020, October 2020.
Timothy R. Fallon, Daniel Bryce, Jacob Beal, Jeremy Cahill, Mark Doerr, “Laboratory Open Protocol: developing an open community standard for biological protocols,” Global Biofoundries Alliance 2022, October 2022. https://dl.acm.org/doi/10.1145/3604568
Laboratory Open Protocol: developing an open community standard for biological protocols
Abstract
Laboratory protocols are critical to biological research and development, yet difficult to communicate and reproduce across projects, investigators, and organizations. While many attempts have been made to address this challenge, there is currently no available protocol representation that is unambiguous enough for precise interpretation and automation, yet simultaneously “human friendly” and abstract enough to enable reuse and adaptation. The Laboratory Open Protocol language (LabOP) is a free and open protocol representation aiming to address this gap, building on a foundation of UML, Autoprotocol, Aquarium, SBOL RDF, and the Provenance Ontology. LabOP provides a linked-data representation both for protocols and for records of their execution and the resulting data, as well as a framework for exporting from LabOP for execution by either humans or laboratory automation. LabOP is currently implemented in the form of an RDF knowledge representation, specification document, and Python library, and supports execution as manual “paper protocols,” by Autoprotocol or by Opentrons. From this initial implementation, LabOP is being further developed as an open community effort.
Ibrahim Aldulijan, Jacob Beal, Sonja Billerbeck, Jeff Bouffard, Gael Chambonnier, Nikolaos Delkis, Isaac Guerreiro, Martin Holub, Daisuke Kiga, Jacky Loo, Paul Ross, Vinoo Selvarajah, Noah Sprent, Gonzalo Vidal, Alejandro Vignoni, “Steps Towards Functional Synthetic Biology,” International Workshop on Bio-Design Automation (IWBDA), October, 2022. https://easychair.org/publications/preprint/dHXK
Steps Towards Functional Synthetic Biology
Abstract:
While synthetic biology has made great progress in methods for modular assembly of genetic sequences and in engineering biological systems with a wide variety of functions, current paradigms entangle sequence and functionality in a manner that makes abstraction difficult, reduces engineering flexibility, and impairs predictability and design reuse. Functional Synthetic Biology proposes a roadmap to overcome these limits by focusing on behavior descriptions, predictability, flexibility, and risk reduction, so synthetic biologists can more effectively share successes and avoid failures. The iGEM community, like other synthetic biology communities, faces challenges in the effective sharing and reuse of biological devices. These are particularly acute for iGEM, since iGEM teams need to execute projects in only a few months and many team members have little prior experience. At the same time, barriers for adoption are lowered by the culture of openness, sharing, and reuse that is encouraged by iGEM. For these reasons, the iGEM Engineering Committee has been working to implement the early phases of the Functional Synthetic Biology roadmap in the context of iGEM’s annual DNA distribution.
Jacob Beal, “FAST-NA Scanner: increased speed and accuracy for biological threat screening”, Biodetection, June 2022.
Abstract
FAST-NA Scanner adapts methods from cybersecurity to produce significant improvements in the detection of biological pathogens and toxins in nucleic acid sequences. Using these methods, minimum sequence length can be reduced from 200bp to 50bp while simultaneously reducing false positive rates below 2%. Moreover, detection speed is orders of magnitude faster than BLAST, allowing this to be applied not only to nucleic acid screening, but diverse other applications including combinatorial oligo screening, interpretation of sequencer reads, and screening of sample and design collections.
Jacob Beal, Vinoo Selvarajah, Gael Chambonnier, Traci Haddock-Angelli, Alejandro Vignoni, Gonzalo Vidal, Nicholas Roehner, “Standardizing the Representation of Parts and Devices for Build Planning,” International Workshop on Bio-Design Automation (IWBDA), October, 2022. https://easychair.org/publications/preprint/kRPr
Standardizing the Representation of Parts and Devices for Build Planning
Abstract
One of the most common tasks in synthetic biology is building genetic constructs by assembling smaller parts. Despite this commonality, however, there is often a much confusion when practitioners communicate about parts, sequences, and build plans. Parts often go through many stages during a build process, each with a different sequence. For example, a fragment of DNA may be synthesized as an insert into a vector backbone, then digested out of that backbone and assembled together with other fragments to produce a final construct. At present, without a shared standard for describing build plans, it is often difficult to tell which stage a given sequence is describing, leading to frequent confusion, errors, difficulty sharing information, and waste.
We address this problem with a standard vocabulary for describing build plans, which we have further mapped into a concrete representation using the SBOL 3 standard. Specifically, we target representation of assembly based on digestion and ligation, supporting at least BioBricks Assembly and Type IIS assemblies like GoldenGate, MoClo, and GoldenBraid. The resulting vocabulary should be useful to practitioners no matter what tools or representations they may be using, while representation in SBOL 3 provides full details for use by software tool builders.
Jacob Beal, Dan Wyschogrod, Adam Clore, Jeff Manthey, Tom Mitchell, Steve Murphy, “FAST-NA Scanner: high-speed, low-SWaP computational assessment of biological threats,” 2022 Chemical and Biological Defense Science and Technology (CBD S&T) Conference, December 2022.
Abstract
As DNA synthesis becomes cheaper and more accessible, there is a corresponding increase in opportunities for synthesis of dangerous pathogenic sequences by either malicious or careless actors. To mitigate this threat, major DNA synthesis providers screen sequence orders for pathogenic content, following guidance from the US Department of Health and Human Services and the International Genome Synthesis Consortium (IGSC).
Current methods for screening, however, have been unable to scale sufficiently to keep up. The current dominant method for screening is to evaluate sequence homology, using BLAST (or similar) to test if the sequence's best alignment is with a controlled pathogenic organism. This approach produces a high rate of false positives, estimated at more than 4% from a survey of IGSC member companies, worsened by the fact that these methods generally search for all genes in an organism, including harmless “housekeeping” genes and others that have no functional relationship to pathogenesis. Moreover, the rate of false positives increases markedly as sequence length shortens. Due to the cost of resolving false positives, synthesis providers thus typically only screen dsDNA sequences that are at least 200 bp long and do not screen oligonucleotides at all.
We hypothesized that these challenges could be addressed by adapting methods for detection of malware in network traffic, which faces even greater challenges of scale. To this end, we adapted the Framework for Autogenerated Signature Technology (FAST) signature extraction method for use with nucleic acid sequences,
producing the FAST for Nucleic Acids (FAST-NA) method for DNA screening. Our resulting implementation of FAST-NA is able to detect DNA sequences far faster than BLAST-based methods, and with equivalent sensitivity and significantly improved specificity, even while reducing the minimum scanning window from 200bp to 50bp.
Jacob Beal, Bryan Bartley, Tom Mitchell, “SBOL utilities,” HARMONY 2022, April 2022.
Abstract
SBOL-utilities is a collection of scripts and functions for manipulating SBOL 3 data that can be run from the command line or as functions in Python. SBOL 2 libraries tended to be complex, embedding a great deal of functionality and assumptions about workflows. With SBOL 3, on the other hand, the core libraries are currently much more minimal, and we are aiming to support a more flexible space of possible workflows by collecting "micro-service" utilities in the SBOL-utilities library, with the aim that these should be readily composable in many configurations, similar to Unix shell utilities. Currently, SBOL-utilities includes scripts for diagramming SBOL information, generating SBOL from Excel templates, converting between SBOL and other genetic design formats, expanding combinatorial derivations, calculating DNA sequences, computing the difference between SBOL documents, and various "macros" for easier creating and manipulation of SBOL objects. SBOL-utilities is available on GitHub and pypi and welcomes community contributions.
Jacob Beal, Tom Mitchell, Bryan Bartley, Nicholas Roehner, “Agile Data Curation,” AI4Synbio Symposium at AAAI 2022 Spring Symposia, March 2022.
Abstract
Engineering fields require the sharing of reusable design and performance data, but such information is still extremely scarce in synthetic biology. One key reason for this is that curating data and metadata for sharing, while a widely embraced ideal, is not part of the daily practices of synthetic biology research and development. As a consequence, the information that is shared through articles and public databases is often erroneous, incomplete, and incompatible. Software engineering has historically had analogous challenges relating to testing, documentation, and integration. Over the past two decades, however, the agile software community has radically transformed professional software development by developing processes that bring management of correctness, completeness, and compatibility into the core activities of software development and supporting them with complementary automation tools (e.g., linting, regression testing, continuous integration, merge review). We observe that, with appropriate choices of representation and process controls, the same processes and tools can be directly applied to synthetic biology designs, data, metadata, and models. Early application of this approach have given promising results, which we illustrate with two examples: collective development of genetic designs for the iGEM 2022 distribution, and model-driven analysis of tunable CRISPR safety switch architectures.
Robert P. Goldman, Daniel Bryce, Jacob Beal, Bryan Bartley, “Protocol Modeling for High Throughput Experimentation, Data Analysis, and Replication,” AI4Synbio Symposium at AAAI 2022 Spring Symposia, March 2022.
Abstract
Gaining the most scientific knowledge at the least cost motivates the question of “New AI or New Data?” We highlight recent efforts to improve experiment replicability through experimental protocol representations and how this new data holds opportunities for applying AI to advance science. We present a range of solutions for experimental metadata capture that vary from manual to highly automated and offer different cost-benefit profiles. The most advanced of these, the Protocol Activity Modeling Language (PAML), holds the potential for enabling AI decision making through automated planning or reinforcement learning.
Tom Mitchell, Bryan Bartley, Jacob Beal, “pySBOL3”, HARMONY 2022, April 2022.
Nicholas Roehner, Aaron Adler, Brian Basnight, John Grothendieck, Tyler Marshall, Miles Rogers, Helen Scott, Allison Taggart, Benjamin Toll, Fusun Yaman, Daniel Wyschogrod, “Rapid
Abstract
Analysis of Biothreats via Biological Decompilation,” Chemical and Biological Defense Science and Technology, San Francisco, CA, Defense Threat Reduction Agency, Dec. 6th-9th, 2022.
This talk announces the release of pySBOL3 1.0, the first stable release of a Python implementation of SBOL3. pySBOL3 is a native Python package that provides the ability to create, read, modify, and write SBOL3 documents. pySBOL3 is in use in a number of synthetic biology projects.
Nilesh K Sharma, Felipe Carrillo, Jaclyn Thompson, Allison Taggart, Jacob Beal, Miles Rogers, Natalie Farny, Eric M Young, “Fungal highways enable migration and communication of engineered bacteria in soil,” Microbial Engineering II conference in Albufeira, Portugal April 3-7, 2022
Fungal highways enable migration and communication of engineered bacteria in soil
Abstract
Study of natural soil microbiomes has revealed that bacterial migration and long-distance chemical signaling is facilitated by filamentous fungal highways. These emergent properties in soil are of particular interest for nutrient cycling, pollution remediation, and underground chemical detection, yet they are only just beginning to be understood. Here, we took a synthetic biology approach to construct a cross-kingdom consortium that enables bacterial migration and quorum sensing over several centimeters. First, we identified Pseudomonas putida as the ideal bacterial host since it survives in soil and is genetically tractable. Yet, it was unknown which genetic parts would function in soil. Therefore, we tested several genetic circuits, eventually identifying quorum sensing systems Lux and Las as the most robust, achieving 15-fold induction in soil. Next, we investigated several potential fungal partners, quantifying growth rate, soil penetration, and compatibility with P. putida. We found that Lyophyllum atratum extends P. putida soil survival and accelerates migration. Finally, we tested bacterial migration and quorum sensing signal propagation through soil. Without fungal highways, signal propagated 35 mm and fluorescent protein expression lasted for 48 hours in response to inducer. With L. atratum, signal propagated 80 mm and P. putida fluorescent protein expression lasted for 120 hours. These results show that interkingdom networks are key to engineering robust genetic circuit function in soil. Thus, this study builds the foundation for diverse applications of engineered biology in agriculture, nature, and the built environment.
Dan Wyschogrod, Jeff Manthey, Tom Mitchell, Steven Murphy, Adam Clore, Jacob Beal, “Adapting Malware Detection to DNA Screening,” International Workshop on Bio-Design Automation (IWBDA), October, 2022. https://easychair.org/publications/preprint/RMqB
Adapting Malware Detection to DNA Screening
Abstract:
As DNA synthesis becomes cheaper and more accessible, there is a corresponding increase in opportunities for synthesis of dangerous pathogenic sequences by either malicious or careless actors. To mitigate this threat, major DNA synthesis providers screen sequence orders for pathogenic content, following guidance from the US Department of Health and Human Services and the International Genome Synthesis Consortium.
Current methods for screening, however, have been un- able to scale sufficiently to keep up. The current dominant method for screening is to evaluate sequence homology, using BLAST (or similar) to test if the sequence’s best alignment is with a controlled pathogenic organism. This approach produces a high rate of false positives, estimated at more than 4% from a survey of IGSC member companies, worsened by the fact that these methods generally search for all genes in an organism, including harmless “housekeeping” genes and others that have no functional relationship to pathogenesis. Moreover, the rate of false positives increases markedly as sequence length shortens. Due to the cost of resolving false positives, synthesis providers thus typically only screen dsDNA sequences that are at least 200 bp long and do not screen oligonucleotides at all.
We hypothesized that these challenges could be addressed by adapting methods for detection of malware in network traffic, which faces even greater challenges of scale. To this end, we adapted the Framework for Autogenerated Signature Technology (FAST) signature extraction method for use with nucleic acid sequences, producing the FAST for Nucleic Acids (FAST-NA) method for DNA screening. Our resulting implementation of FAST-NA is able to detect DNA sequences far faster than BLAST-based methods, and with equivalent sensitivity and significantly improved specificity, even while reducing the minimum scanning window from 200bp to 50bp.
Archival Preprints
Jacob Beal, Adam Clore, Jeff Manthey, “Studying Pathogens Degrades BLAST-based Pathogen Identification," bioRxiv, https://www.biorxiv.org/content/10.1101/2022.07.12.499705v1 , July, 2022.
Studying Pathogens Degrades BLAST-based Pathogen Identification
Abstract
As synthetic biology becomes increasingly capable and accessible, it is likewise increasingly critical to be able to make accurate biosecurity determinations regarding the pathogenicity or toxicity of particular nucleic acid or amino acid sequences. At present, this is typically done using the BLAST algorithm to determine the best match with sequences in the NCBI databases. Neither BLAST nor the NCBI databases, however, are actually designed for biosafety determination. Critically, taxonomic errors or ambiguities in the NCBI databases can also cause errors in BLAST-based taxonomic categorization. With heavily studied taxa and frequently used biotechnology tools, even low frequency taxonomic categorization issues can lead to high rates of errors in biosecurity decision-making. Here we focus on the implications for false positives, finding that NCBI BLAST will now incorrectly categorize a number of commonly used biotechnology tool sequences as the pathogens or toxins with which they have been used. Paradoxically, this implies that problems are expected to be most acute for the pathogens and toxins of highest interest and the most widely used biotechnology tools. We thus conclude that biosecurity tools should shift away from BLAST against NCBI and towards new methods that are specifically tailored for biosafety purposes.
Ibrahim Aldulijan, Jacob Beal, Sonja Billerbeck, Jeff Bouffard, Gael Chambonnier, Nikolaos Delkis, Isaac Guerreiro, Martin Holub, Paul Ross, Vinoo Selvarajah, Noah Sprent, Gonzalo Vidal, Alejandro Vignoni, "Functional Synthetic Biology," arXiv, https://arxiv.org/abs/2207.00538 , July 2022.
Abstract
Synthetic biologists have made great progress over the past decade in developing methods for modular assembly of genetic sequences and in engineering biological systems with a wide variety of functions in various contexts and organisms. However, current paradigms in the field entangle sequence and functionality in a manner that makes abstraction difficult, reduces engineering flexibility, and impairs predictability and design reuse. Functional Synthetic Biology aims to overcome these impediments by focusing the design of biological systems on function, rather than on sequence. This reorientation will decouple the engineering of biological devices from the specifics of how those devices are put to use, requiring both conceptual and organizational change, as well as supporting software tooling. Realizing this vision of Functional Synthetic Biology will allow more flexibility in how devices are used, more opportunity for reuse of devices and data, improvements in predictability, and reductions in technical risk and cost.
Bryan Bartley, Jacob Beal, Miles Rogers, Daniel Bryce, Robert P Goldman, Benjamin Keller, Peter Lee, Vanessa Biggers, Joshua Nowak, Mark Weston, "Building an Open Representation for Biological Protocols," bioRxiv, https://www.biorxiv.org/content/10.1101/2022.07.05.498808v1 , July 2022.
Building an Open Representation for Biological Protocols
Abstract
Laboratory protocols are critical to biological research and development, yet difficult to communicate and reproduce across projects, investigators, and organizations. While many attempts have been made to address this challenge, there is currently no available protocol representation that is unambiguous enough for precise interpretation and automation, yet simultaneously abstract enough to enable reuse and adaptation. The Protocol Activity Markup Language (PAML) is a free and open protocol representation aiming to address this gap, building on a foundation of UML, Autoprotocol, and SBOL RDF. PAML provides a representation both for protocols and for records of their execution and the resulting data, as well as a framework for exporting from PAML for execution by either humans or laboratory automation. PAML is currently implemented in the form of an RDF knowledge representation, specification document, and Python library, can be exported for execution as either a manual “paper protocol” or Autoprotocol, and is being further developed as an open community effort.
Breschine Cummins, Robert C Moseley, Anastasia Deckard, Mark Weston, George Zheng, Daniel Bryce, Joshua Nowak, Marcia Gameiro, Tomas Gedeon, Konstantin Mischaikow, Jacob Beal, Tessa Johnson, Matthew Vaughn, Niall Gaffney, Shweta Gopaulakrishnan, Joshua Urrutia, Robert P Goldman, Bryan Bartley, Tramy T Nguyen, Nicholas Roehner, Tom Mitchell, Justin D Vrana, Katie J Clowers, Narendra Maheshri, Diveena Becker, Ekaterina Mikhalev, Vanessa Biggers, Trissha Higa, Lorraine Mosqueda, Steven B Haase, "Computational prediction of synthetic circuit function across growth conditions," bioRxiv, https://www.biorxiv.org/content/10.1101/2022.06.13.495701v1 , June 2022.
Computational prediction of synthetic circuit function across growth conditions
Abstract
A challenge in the design and construction of synthetic genetic circuits is that they will operate within biological systems that have noisy and changing parameter regimes that are largely unmeasurable. The outcome is that these circuits do not operate within design specifications or have a narrow operational envelope in which they can function. This behavior is often observed as a lack of reproducibility in function from day to day or lab to lab. Moreover, this narrow range of operating conditions does not promote reproducible circuit function in deployments where environmental conditions for the chassis are changing, as environmental changes can affect the parameter space in which the circuit is operating. Here we describe a computational method for assessing the robustness of circuit function across broad parameter regions. Previously designed circuits are assessed by this computational method and then circuit performance is measured across multiple growth conditions in budding yeast. The computational predictions are correlated with experimental findings, suggesting that the approach has predictive value for assessing the robustness of a circuit design.
Breschine Cummins, Justin Vrana, Robert C Moseley, Hamed Eramian, Anastasia Deckard, Pedro Fontanarrosa, Daniel Bryce, Mark Weston, George Zheng, Joshua Nowak, Francis C Motta, Mohammed Eslami, Kara Layne Johnson, Robert P Goldman, Chris J Myers, Tessa Johnson, Matthew W Vaughn, Niall Gaffney, Joshua Urrutia, Shweta Gopaulakrishnan, Vanessa Biggers, Trissha Higa, Lorraine Mosqueda, Marcia Gameiro, Tomas Gedeon, Konstantin Mischaikow, Jacob Beal, Bryan Bartley, Tom Mitchell, Tramy T Nguyen, Nicholas Roehner, Steven B Haase, "Robustness and reproducibility of simple and complex synthetic logic circuit designs using J DBTL loop," bioRxiv, https://www.biorxiv.org/content/10.1101/2022.06.10.495560v1 , June 2022.
Abstract
Computational tools addressing various components of design-build-test-learn loops (DBTL) for the construction of synthetic genetic networks exist, but do not generally cover the entire DBTL loop. This manuscript introduces an end-to-end sequence of tools that together form a DBTL loop called DART (Design Assemble Round Trip). DART provides rational selection and refinement of genetic parts to construct and test a circuit. Computational support for experimental process, metadata management, standardized data collection, and reproducible data analysis is provided via the previously published Round Trip (RT) test-learn loop. The primary focus of this work is on the Design Assemble (DA) part of the tool chain, which improves on previous techniques by screening up to thousands of network topologies for robust performance using a novel robustness score derived from dynamical behavior based on circuit topology only. In addition, novel experimental support software is introduced for the assembly of genetic circuits. A complete design-through-analysis sequence is presented using several OR and NOR circuit designs, with and without structural redundancy, that are implemented in budding yeast. The execution of DART tested the predictions of the design tools, specifically with regard to robust and reproducible performance under different experimental conditions. The data analysis depended on a novel application of machine learning techniques to segment bimodal flow cytometry distributions. Evidence is presented that, in some cases, a more complex build may impart more robustness and reproducibility across experimental conditions.
Journal Articles
M. Eslami, Aaron Adler, R. Caceres, J. Dunn, N. Kelley-Loughnane, V. Varaljay, and H. Martin, "Artificial intelligence for synthetic biology," Communications of the ACM, Vol. 65, No. 5, pp. 88-97, (Online 25 Apr 2022). https://dl.acm.org/doi/10.1145/3500922
Artificial intelligence for synthetic biology
Abstract
The opportunities and challenges of adapting and applying AI principles to synbio.
Bryan Bartley, "Tyto: A Python Tool Enabling Better Annotation Practices for Synthetic Biology Data-Sharing," ACS Synthetic Biology," Vol. 11, Article 3, pp. 1373-1376, (Online 28 Feb 2022). https://pubs.acs.org/doi/10.1021/acssynbio.1c00450
Tyto: A Python Tool Enabling Better Annotation Practices for Synthetic Biology Data-Sharing
Abstract
As synthetic biology becomes increasingly automated and data-driven, tools that help researchers implement FAIR (findable-accessible-interoperable-reusable) data management practices are needed. Crucially, in order to support machine processing and reusability of data, it is important that data artifacts are appropriately annotated with metadata drawn from controlled vocabularies. Unfortunately, adopting standardized annotation practices is difficult for many research groups to adopt, given the set of specialized database science skills usually required to interface with ontologies. In response to this need, Take Your Terms from Ontologies (Tyto) is a lightweight Python tool that supports the use of controlled vocabularies in everyday scripting practice. While Tyto has been developed for synthetic biology applications, its utility may extend to users working in other areas of bioinformatics research as well. Tyto is available as a Python package distribution or available as source at https://github.com/SynBioDex/tyto.
D. Bryce, R. Goldman, M. DeHaven, Jacob Beal, Bryan Bartley, Tramy T. Nguyen, Nicholas Walczak, M. Weston, G. Zheng, J. Nowak, P. Lee, J. Stubbs, N. Gaffney, M. Vaughn, C. Myers, R. Moseley, S. Haase, A. Deckard, B. Cummins, and N. Leiby, "Round Trip: An Automated Pipeline for Experimental Design, Execution, and Analysis," ACS Synthetic Biology, Vol. 11, pp. 608-622, 2022. https://pubs.acs.org/doi/10.1021/acssynbio.1c00305
Round Trip: An Automated Pipeline for Experimental Design, Execution, and Analysis
Abstract:
Synthetic biology is a complex discipline that involves creating detailed, purpose-built designs from genetic parts. This process is often phrased as a Design-Build-Test-Learn loop, where iterative design improvements can be made, implemented, measured, and analyzed. Automation can potentially improve both the end-to-end duration of the process and the utility of data produced by the process. One of the most important considerations for the development of effective automation and quality data is a rigorous description of implicit knowledge encoded as a formal knowledge representation. The development of knowledge representation for the process poses a number of challenges, including developing effective human–machine interfaces, protecting against and repairing user error, providing flexibility for terminological mismatches, and supporting extensibility to new experimental types. We address these challenges with the DARPA SD2 Round Trip software architecture. The Round Trip is an open architecture that automates many of the key steps in the Test and Learn phases of a Design-Build-Test-Learn loop for high-throughput laboratory science. The primary contribution of the Round Trip is to assist with and otherwise automate metadata creation, curation, standardization, and linkage with experimental data. The Round Trip’s focus on metadata supports fast, automated, and replicable analysis of experiments as well as experimental situational awareness and experimental interpretability. We highlight the major software components and data representations that enable the Round Trip to speed up the design and analysis of experiments by 2 orders of magnitude over prior ad hoc methods. These contributions support a number of experimental protocols and experimental types, demonstrating the Round Trip’s breadth and extensibility. We describe both an illustrative use case using the Round Trip for an on-the-loop experimental campaign and overall contributions to reducing experimental analysis time and increasing data product volume in the SD2 program.
Y. Mo, S. Dasgupta, and Jacob Beal, "Stability and Resilience of Distributed Information Spreading in Aggregate Computing," IEEE Transactions on Automatic Control, Vol. 68, Issue 1, Jan 2023. https://ieeexplore.ieee.org/document/9670637
Stability and Resilience of Distributed Information Spreading in Aggregate Computing
Abstract:
Spreading information through a network of devices is a core activity for most distributed systems. Self-stabilizing algorithms for information spreading are one of the key building blocks enabling aggregate computing to provide resilient coordination in open complex distributed systems. This article improves a general spreading block in the aggregate computing literature by making it resilient to network perturbations, establishes its global uniform asymptotic stability, and proves that it is ultimately bounded under persistent disturbances. The ultimate bounds depend only on the magnitude of the largest perturbation and the network diameter, and three design parameters trading off competing aspects of performance. For example, as in many dynamical systems, values leading to greater resilience to network perturbations slow convergence and vice versa.
Jacob Beal, C. Telmer, A. Vignoni, Y. Boada, G. Baldwin, L. Hallett, T. Lee, V. Selvarajah, S. Billerbeck, B. Brown, G. Cai, L. Cai, E. Eisenstein, D. Kiga, D. Ross, N. Alperovich, N. Sprent, J. Thompson, E. Young, D. Endy, and T. Haddock- Angelli, "Multicolor plate reader fluorescence calibration," Synthetic Biology, Vol. 7, No. 1, 2022. https://pubmed.ncbi.nlm.nih.gov/35949424/
Multicolor plate reader fluorescence calibration
Abstract:
Plate readers are commonly used to measure cell growth and fluorescence, yet the utility and reproducibility of plate reader data is limited by the fact that it is typically reported in arbitrary or relative units. We have previously established a robust serial dilution protocol for calibration of plate reader measurements of absorbance to estimated bacterial cell count and for green fluorescence from proteins expressed in bacterial cells to molecules of equivalent fluorescein. We now extend these protocols to calibration of red fluorescence to the sulforhodamine-101 fluorescent dye and blue fluorescence to Cascade Blue. Evaluating calibration efficacy via an interlaboratory study, we find that these calibrants do indeed provide comparable precision to the prior calibrants and that they enable effective cross-laboratory comparison of measurements of red and blue fluorescence from proteins expressed in bacterial cells.
Y. Mo, G. Audrito, S. Dasgupta, and Jacob Beal, "Near-optimal knowledge-free resilient leader election," Automatica Vol. 146, Paper 110583 (Online Dec 2022). https://dl.acm.org/doi/abs/10.1016/j.automatica.2022.110583
Near-optimal knowledge-free resilient leader election
Abstract:
Leader election, is a fundamental coordination problem in distributed systems. It has been addressed in many ways for different systems. Among these approaches, resilient leader election algorithms are of particular interest due to the ongoing emergence of open, complex distributed systems such as smart cities and the Internet of Things. However, previous algorithms attaining the optimal scaling of O(diameter) stabilization time complexity either assume some prior knowledge of the network or else that very large messages can be sent. In this paper, we present a resilient leader election algorithm with O(diameter) stabilization time, small messages, and no prior knowledge of the network. This algorithm is based on aggregate computing, which provides a layered approach to algorithm development based on composition of resilient algorithmic “building blocks.” With our algorithm, a key design function g ( ⋅ ) defines important performance attributes: a fast-growing g ( ⋅ ) will delay discarding of obsolete data, while a slow-growing g ( ⋅ ) will slow down convergence to a single leader. We prove that the best asymptotic behavior for g ( x ) is ( 1 + 2 ) x + o ( x ), guaranteeing a near-optimal time complexity of ( 2 + 2 2 ) diameter + o(diameter) rounds for stabilization.
Jacob Beal, B. Teague, J. Sexton, S. Castillo-Hair, N. DeLateur, M. Samineni, J. Tabor, R. Weiss, and the Calibrat Flow Cytometry Study Consortium, "Meeting measurement precision requirements for effective engineering of genetic regulatory networks," ACS Synthetic Biology, Vol. 11, No. 3, pp. 1196-1207, (Online 22 Feb 2022). https://pubs.acs.org/doi/abs/10.1021/acssynbio.1c00488
Meeting measurement precision requirements for effective engineering of genetic regulatory networks
Abstract:
Reliable, predictable engineering of cellular behavior is one of the key goals of synthetic biology. As the field matures, biological engineers will become increasingly reliant on computer models that allow for the rapid exploration of design space prior to the more costly construction and characterization of candidate designs. The efficacy of such models, however, depends on the accuracy of their predictions, the precision of the measurements used to parametrize the models, and the tolerance of biological devices for imperfections in modeling and measurement. To better understand this relationship, we have derived an Engineering Error Inequality that provides a quantitative mathematical bound on the relationship between predictability of results, model accuracy, measurement precision, and device characteristics. We apply this relation to estimate measurement precision requirements for engineering genetic regulatory networks given current model and device characteristics, recommending a target standard deviation of 1.5-fold. We then compare these requirements with the results of an interlaboratory study to validate that these requirements can be met via flow cytometry with matched instrument channels and an independent calibrant. On the basis of these results, we recommend a set of best practices for quality control of flow cytometry data and discuss how these might be extended to other measurement modalities and applied to support further development of genetic regulatory network engineering.
Tom Mitchell, Jacob Beal, and Bryan Bartley, "pySBOL3: SBOL3 for Python Programmers," ACS Synthetic Biology Vol. 11, Article 7, pp. 2523-2526, (Online 29 July 2022). https://doi.org/10.1021/acssynbio.2c00249
pySBOL3: SBOL3 for Python Programmers
Abstract:
The Synthetic Biology Open Language version 3 (SBOL3) provides a data model for representation of synthetic biology information across multiple scales and throughout the design-build-test-learn workflow. To support practical use of this data model, we have developed pySBOL3, a Python library that allows programmers to create and edit SBOL3 documents. Here we describe this library and key engineering decisions in its design. The resulting implementation is a compact and maintainable core that provides both a familiar, pythonic interface for manipulating SBOL3 objects as well as mechanisms for building additional extensions and representations on this base.
Helen Scott, Dashan Sun, Jacob Beal, and Samira Kiani, "Simulation-Based Engineering of Time-Delayed Safety Switches for Safer Gene Therapies," ACS Synthetic Biology, Vol. 11, No. 5, pp. 1782-1789, Apr 2022. https://pubs.acs.org/doi/10.1021/acssynbio.1c00621
Simulation-Based Engineering of Time-Delayed Safety Switches for Safer Gene Therapies
Abstract:
CRISPR-based gene editing is a powerful tool with great potential for applications in the treatment of many inherited and acquired diseases. The longer that CRISPR gene therapy is maintained within a patient, however, the higher the likelihood that it will result in problematic side effects such as off-target editing or immune response. One approach to mitigating these issues is to link the operation of the therapeutic system to a safety switch that autonomously disables its operation and removes the delivered therapeutics after some amount of time. We present here a simulation-based analysis of the potential for regulating the time delay of such a safety switch using one or two transcriptional regulators and/or recombinases. Combinatorial circuit generation identifies 30 potential architectures for such circuits, which we evaluate in simulation with respect to tunability, sensitivity to parameter values, and sensitivity to cell-to-cell variation. This modeling predicts one of these circuit architectures to have the desired dynamics and robustness, which can be further tested and applied in the context of CRISPR therapeutics.
A. Pfotenhauer, A. Occhialini, M. Nguyen, Helen Scott, L. Dice, S. Harbison, L. Li, D. Reuter, T. Schimel, C. Stewart Jr, Jacob Beal, and S. Lenaghan, "Building the Plant Syn Bio Toolbox through Combinatorial Analysis of DNA Regulatory Elements," ACS Synthetic Biology, Vol. 11, No. 8, pp. 2741-2755, (Online 28 July 2022). https://pubs.acs.org/doi/10.1021/acssynbio.2c00147
Building the Plant Syn Bio Toolbox through Combinatorial Analysis of DNA Regulatory Elements
Abstract:
While the installation of complex genetic circuits in microorganisms is relatively routine, the synthetic biology toolbox is severely limited in plants. Of particular concern is the absence of combinatorial analysis of regulatory elements, the long design-build-test cycles associated with transgenic plant analysis, and a lack of naming standardization for cloning parts. Here, we use previously described plant regulatory elements to design, build, and test 91 transgene cassettes for relative expression strength. Constructs were transiently transfected into Nicotiana benthamiana leaves and expression of a fluorescent reporter was measured from plant canopies, leaves, and protoplasts isolated from transfected plants. As anticipated, a dynamic level of expression was achieved from the library, ranging from near undetectable for the weakest cassette to a ∼200-fold increase for the strongest. Analysis of expression levels in plant canopies, individual leaves, and protoplasts were correlated, indicating that any of the methods could be used to evaluate regulatory elements in plants. Through this effort, a well-curated 37-member part library of plant regulatory elements was characterized, providing the necessary data to standardize construct design for precision metabolic engineering in plants.
Nicholas Roehner, Aaron Adler, Brian Basnight, John Grothendieck, Tyler Marshall, Miles Rogers, Helen Scott, Allison Taggart, Benjamin Toll, Fusun Yaman, Daniel Wyschogrod, "Rapid Analysis of Biothreats via Biological Decompilation," Chemical and Biological Defense Science and Technology, San Francisco, CA, 7Defen Threat Reduction Agency, Dec. 6th-9th, 2022.
(no summary)
Engineered yeast genomes accurately assembled from pure and mixed samples
Abstract
Yeast whole genome sequencing (WGS) lacks end-to-end workflows that identify genetic engineering. Here we present Prymetime, a tool that assembles yeast plasmids and chromosomes and annotates genetic engineering sequences. It is a hybrid workflow—it uses short and long reads as inputs to perform separate linear and circular assembly steps. This structure is necessary to accurately resolve genetic engineering sequences in plasmids and the genome. We show this by assembling diverse engineered yeasts, in some cases revealing unintended deletions and integrations. Furthermore, the resulting whole genomes are high quality, although the underlying assembly software does not consistently resolve highly repetitive genome features. Finally, we assemble plasmids and genome integrations from metagenomic sequencing, even with 1 engineered cell in 1000. This work is a blueprint for building WGS workflows and establishes WGS-based identification of yeast genetic engineering.
Curation Principles derived from the Analysis of the SBOL iGEM Data Set
Abstract
As an engineering endeavor, synthetic biology requires effective sharing of genetic design information that can be reused in the construction of new designs. While there are a number of large community repositories of design information, curation of this information has been limited. This in turn limits the ways in which design information can be put to use. The aim of this work was to improve this situation by creating a curated library of parts from the International Genetically Engineered Machines (iGEM) registry data set. To this end, an analysis of the Synthetic Biology Open Language (SBOL) version of the iGEM registry was carried out using four different approaches simple statistics, SnapGene auto annotation, SYNBICT auto annotation, and expert analysis the results of which are presented herein. Key challenges encountered include the use of free text, insufficient part provenance, part duplication, lack of part removal, and insufficient continuous curation. On the basis of these analyses, the focus has shifted from the creation of a curated iGEM part library to instead the extraction of a set of lessons, which are presented here. These lessons can be exploited to facilitate the creation and curation of other part libraries using a simpler and less labor intensive process.
Stability and Resilience of Distributed Information Spreading in Aggregate Computing
Abstract
Spreading information through a network of devices is a core activity for most distributed systems. As such, self-stabilizing algorithms implementing information spreading are one of the key building blocks enabling aggregate computing to provide resilient coordination in open complex distributed systems. This paper improves a general spreading block in the aggregate computing literature by making it resilient to network perturbations, establishes its global uniform asymptotic stability and proves that it is ultimately bounded under persistent disturbances. The ultimate bounds depend only on the magnitude of the largest perturbation and the network diameter, and three design parameters trade off competing aspects of performance. For example, as in many dynamical systems, values leading to greater resilience to network perturbations slow convergence and vice versa.
Intent Parser: A Tool for Codification and Sharing of Experimental Design
Abstract
Communicating information about experimental design among a team of collaborators is challenging because different people tend to describe experiments in different ways and with different levels of detail. Sometimes, humans can interpret missing information by making assumptions and drawing inferences from information already provided. Doing so, however, is error-prone and typically requires a high level of interpersonal communication. In this paper, we present a tool that addresses this challenge by providing a simple interface for incremental formal codification of experiment designs. Users interact with a Google Docs word-processing interface with structured tables, backed by assisted linking to machine-readable definitions in a data repository (SynBioHub) and specification of available protocols and requests for execution in the Open Protocol Interface Language (OPIL). The result is an easy-to-use tool for generating machine-readable descriptions of experiment designs with which users in the DARPA SD2 program have collected data from 80 208 samples using a variety of protocols and instruments over the course of 181 experiment runs.
Abstract
Microphysiological organ-on-chip models offer the potential to improve the prediction of drug safety and efficacy through recapitulation of human physiological responses. The importance of including multiple cell types within tissue models has been well documented. However, the study of cell interactions in vitro can be limited by complexity of the tissue model and throughput of current culture systems. Here, we describe the development of a co-culture microvascular model and relevant assays in a high-throughput thermoplastic organ-on-chip platform, PREDICT96. The system consists of 96 arrayed bilayer microfluidic devices containing retinal microvascular endothelial cells and pericytes cultured on opposing sides of a microporous membrane.
Abstract
Drug development suffers from a lack of predictive and human-relevant in vitro models. Organ-on-chip (OOC) technology provides advanced culture capabilities to generate physiologically appropriate, human-based tissue in vitro, therefore providing a route to a predictive in vitro model. However, OOC technologies are often created at the expense of throughput, industry-standard form factors, and compatibility with state-of-the-art data collection tools. Here we present an OOC platform with advanced culture capabilities supporting a variety of human tissue models including liver, vascular, gastrointestinal, and kidney. The platform has 96 devices per industry standard plate and compatibility with contemporary high-throughput data collection tools. Specifically, we demonstrate programmable flow control over two physiologically relevant flow regimes: perfusion flow that enhances hepatic tissue function and high-shear stress flow that aligns endothelial monolayers.
Synthetic Biology Curation Tools (SYNBICT)
Abstract
Much progress has been made in developing tools to generate component-based design representations of biological systems from standard libraries of parts. Most biological designs, however, are still specified at the sequence level. Consequently, there exists a need for a tool that can be used to automatically infer component-based design representations from sequences, particularly in cases when those sequences have minimal levels of annotation. Such a tool would assist computational synthetic biologists in bridging the gap between the outputs of sequence editors and the inputs to more sophisticated design tools, and it would facilitate their development of automated workflows for design curation and quality control. Accordingly, we introduce Synthetic Biology Curation Tools (SYNBICT), a Python tool suite for automation-assisted annotation, curation, and functional inference for genetic designs. We have validated SYNBICT by applying it to genetic designs in the DARPA Synergistic Discovery & Design (SD2) program and the International Genetically Engineered Machines (iGEM) 2018 distribution. Most notably, SYNBICT is more automated and parallelizable than manual design editors, and it can be applied to interpret existing designs instead of only generating new ones.
Synthetic biology open language visual (SBOL visual) version 3.0
Abstract
People who engineer biological organisms often find it useful to draw diagrams in order to communicate both the structure of the nucleic acid sequences that they are engineering and the functional relationships between sequence features and other molecular species. Some typical practices and conventions have begun to emerge for such diagrams. SBOL Visual aims to organize and systematize such conventions in order to produce a coherent language for expressing the structure and function of genetic designs. This document details version 3.0 of SBOL Visual, a new major revision of the standard. The major difference between SBOL Visual 3 and SBOL Visual 2 is that diagrams and glyphs are defined with respect to the SBOL 3 data model rather than the SBOL 2 data model. A byproduct of this change is that the use of dashed undirected lines for subsystem mappings has been removed, pending future determination on how to represent general SBOL 3 constraints; in the interim, this annotation can still be used as an annotation. Finally, deprecated material has been removed from collection of glyphs: the deprecated “insulator” glyph and “macromolecule” alternative glyphs have been removed, as have the deprecated BioPAX alternatives to SBO terms.
Towards collaborative and automated development of resources for data standards in synthetic biology
Abstract
Data standards in synthetic biology are becoming ever more important as the number of tools addressing different needs increases, such as designing genetic circuits and visualizing and storing the designs. The Synthetic Biology Open Language (SBOL) [4] has been developed to provide a mechanism for the electronic exchange and common understanding of these designs and related information. Moreover, SBOL Visual [1] standardizes the representation of genetic circuit designs via well-defined glyphs.
Excel-SBOL Converter: Creating SBOL from Excel Templates and Vice Versa
Abstract
Synthetic biology is bringing together engineers and biologists [10]. Associated with this interdisciplinary movement is the need for reusable tools that supplement the current understanding of genetic sequences. To satisfy this need, Synthetic Biology communities across the world have developed tools and ontologies to help describe their unique semantic annotations [1, 3–9, 13, 14, 17–19, 22]. Shared representations for data and metadata, grounded in well-defined ontology terms, can help reduce confusion when sharing materials between practitioners,[20]. The Synthetic Biology Open Language (SBOL) [5] is one of the approaches that has been developed to address this challenge. SBOL provides a standardized format for the electronic exchange of information on the structural and functional aspect of biological designs, supporting use of engineering principles of abstraction, modularity, and standardization in synthetic biology. Many tools have been created that work with SBOL, including the SynBioHub repository software for storing and sharing designs [12].
Data Representation in the DARPA SD2 Program
Abstract
Modern scientific enterprises are often highly complex and multidisciplinary, particularly in areas like synthetic biology where the subject at hand is itself inherently complex and multidisciplinary. Collaboration across many organizations is necessary to efficiently tackle such problems [6, 15], but remains difficult. The challenge is further amplified by automation that increases the pace at which new information can be produced, and particularly so for matters of fundamental research, where concepts and definitions are inherently fluid and may rapidly change as an investigation evolves [7].
Cyberbiosecurity and Public Health in the Age of COVID-19
Abstract
Introduction Cyber biosecurity, the aspect of biosecurity involving the digital representation of biological data, had already been emerging as a matter of public concern even prior to the onset of the COVID-19 pandemic. Key issues of concern include, among others, the privacy of patient data, the security of public health databases, the integrity of diagnostic test data, the integrity of public biological databases, the security implications of automated laboratory systems and the security of proprietary biological engineering advances.
Effect of Monotonic Filtering on Graph Collection Dynamics,
Abstract
Distributed data collection is a fundamental task in open systems. In such networks, data is aggregated across a network to produce a single aggregated result at a source device. Though self-stabilizing, algorithms performing data collection can produce large overestimates of aggregates in the transient phase. For example, in [1] we demonstrated that in a line graph, a switch of sources after initial stabilization may produce overestimates that are quadratic in the network diameter. We also proposed monotonic filtering as a strategy for removing such large overestimates. Monotonic filtering prevents the transfer of data from device A to device B unless the distance estimate at A is more than that at B at the previous iteration.
Abstract
Many synthetic gene circuits are restricted to single-use applications or require iterative refinement for incorporation into complex systems. One example is the recombinase-based digitizer circuit, which has been used to improve weak or leaky biological signals. Here we present a workflow to quantitatively define digitizer performance and predict responses to different input signals. Using a combination of signal-to-noise ratio (SNR), area under a receiver operating characteristic curve (AUC), and fold change (FC), we evaluate three smallmolecule inducible digitizer designs demonstrating FC up to 508x and SNR up to 3.77 dB. To study their behavior further and improve modularity, we develop a mixed phenotypic/ mechanistic model capable of predicting digitizer configurations that amplify a synNotch cellto-cell communication signal (Δ SNR up to 2.8 dB). We hope the metrics and modeling approaches here will facilitate incorporation of these digitizers into other systems while providing an improved workflow for gene circuit characterization.
Synthetic biology open language visual (SBOL Visual) version 2.3
Abstract
People who are engineering biological organisms often find it useful to communicate in diagrams, both about the structure of the nucleic acid sequences that they are engineering and about the functional relationships between sequence features and other molecular species. Some typical practices and conventions have begun to emerge for such diagrams. The Synthetic Biology Open Language Visual (SBOL Visual) has been
Abstract
Reproducibility is a key challenge of synthetic biology, but the foundation of reproducibility is only as solid as the reference materials it is built upon. Here we focus on the reproducibility of fluorescence measurements from bacteria transformed with engineered genetic constructs. This comparative analysis comprises three large interlaboratory studies using flow cytometry and plate readers, identical genetic constructs, and compatible unit calibration protocols. Across all three studies, we find similarly high precision in the calibrants used for plate readers. We also find that fluorescence measurements agree closely across the flow cytometry results and two years of plate reader results, with an average standard deviation of 1.52-fold, while the third year of plate reader results are consistently shifted by more than an order of magnitude, with an average shift of 28.9-fold. Analyzing possible sources of error indicates this shift is due to incorrect preparation of the fluorescein calibrant. These findings suggest that measuring fluorescence from engineered constructs is highly reproducible, but also that there is a critical need for access to quality controlled fluorescent calibrants for plate readers.
A Lyapunov Analysis of a Most Probable Path Finding Algorithm
Abstract
Distributed information spreading algorithms are important building blocks in Aggregate Computing. We consider a special case, namely for finding a most probable path for message delivery from a set of sources to each device in a network. We formulate a Lyapunov function to prove its regional stability subject to initialization of estimated probabilities to the natural interval 0,1). We also prove that the algorithm converges in a finite time, and is ultimately bounded under persistent measurement errors. We provide tight bounds for convergence time, the ultimate bound, and the time for its attainment.
Priority-enabled Load Balancing for Dispersed Computing
Abstract
Opportunistic managed access to local in-network compute resources can improve the performance of distributed applications and reduce the dependence on shared network resources. Instead of backhauling application data to a centralized cloud data center for processing, networked services may be adaptively and continuously dispersed into shared compute resources that are closer to the source of need. While this approach has several benefits, support for mission-aware access to computation is often an afterthought, and is implemented as a brittle extension over traditional load-balancer solutions.
Incomplete Cell Sorting Creates Engineerable Structures with Long-Term Stability
Abstract
Adhesion-mediated cell sorting has long been considered an organizing principle in developmental biology. While most computational models have emphasized the dynamics of segregation to fully sorted structures, cell sorting can also generate a plethora of transient, incompletely sorted states. The timescale of such states in experimental systems is unclear: if they are long-lived, they can be harnessed by development or engineered in synthetic tissues. Here, we use experiments and computational modeling to demonstrate how such structures can be systematically designed by quantitative control of cell composition. By varying the number of highly adhesive and less adhesive cells in multicellular aggregates, we find the cell-type ratio and total cell count control pattern formation, with resulting structures maintained for several days. Our work takes a step toward mapping the design space of self-assembling structures in development and provides guidance to the emerging field of shape engineering with synthetic biology.
Abstract
Laboratory automation now commonly allows high-throughput sample preparation, culturing, and acquisition of microscopy images, but quantitative image analysis is often still a painstaking and subjective process. This is a problem especially significant for work on programmed morphogenesis, where the spatial organization of cells and cell types is of paramount importance. To address the challenges of quantitative analysis for such experiments, we have developed TASBE Image Analytics, a software pipeline for automatically segmenting collections of cells using the fluorescence channels of microscopy images. With TASBE Image Analytics, collections of cells can be grouped into spatially disjoint segments, the movement or development of these segments tracked over time, and rich statistical data output in a standardized format for analysis. Processing is readily configurable, rapid, and produces results that closely match hand annotation by humans for all but the smallest and dimmest segments. TASBE Image Analytics can thus provide the analysis necessary to complete the design-build-test-learn cycle for high-throughput experiments in programmed morphogenesis, as validated by our application of this pipeline to process experiments on shape formation with engineered CHO and HEK293 cells.
Levels of Autonomy in Synthetic Biology Engineering
Abstract
Engineering biological organisms is a complex, challenging, and often slow process. Other engineering domains have addressed such challenges with a combination of standardization and automation, enabling a divide-and-conquer approach to complexity and greatly increasing productivity. For example, standardization and automation allow rapid and predictable translation of prototypes into fielded applications (e.g., “design for manufacturability”), simplify sharing and reuse of work between groups, and enable reliable outsourcing and integration of specialized subsystems. Although this approach has also been part of the vision of synthetic biology, almost since its very inception (Knight & Sussman, 1998), this vision still remains largely unrealized (Carbonell et al, 2019).
Field-based Coordination with the Share Operator
Abstract
Field-based coordination has been proposed as a model for coordinating collective adaptive systems, promoting a view of distributed computations as functions manipulating data structures spread over space and evolving over time, called computational fields. The field calculus is a formal foundation for field computations, providing specific constructs for evolution (time) and neighbor interaction (space), which are handled by separate operators (called rep and nbr, respectively). This approach, however, intrinsically limits the speed of information propagation that can be achieved by their combined use. In this paper, we propose a new field-based coordination operator called share, which captures the space-time nature of field computations in a single operator that declaratively achieves: (i) observation of neighbors’ values; (ii) reduction to a single local value; and (iii) update and converse sharing to neighbors of a local variable.
Robust Estimation of Bacterial Cell Count from Optical Density
Abstract
Optical density (OD) is widely used to estimate the density of cells in liquid culture, but cannot be compared between instruments without a standardized calibration protocol and is challenging to relate to actual cell count. We address this with an interlaboratory study comparing three simple, low-cost, and highly accessible OD calibration protocols across 244 laboratories, applied to eight strains of constitutive GFP-expressing E. coli. Based on our results, we recommend calibrating OD to estimated cell count using serial dilution of silica microspheres, which produces highly precise calibration (95.5% of residuals <1.2-fold), is easily assessed for quality control, also assesses instrument effective linear range, and can be combined with fluorescence calibration to obtain units of Molecules of Equivalent Fluorescein (MEFL) per cell, allowing direct comparison and data fusion with flow cytometry measurements: in our study, fluorescence per cell measurements showed only a 1.07-fold mean difference between plate reader and flow cytometry data.
Abstract
Reproducibility is a key challenge of synthetic biology, but the foundation of reproducibility is only as solid as the reference materials it is built upon. Here we focus on the reproducibility of fluorescence measurements from bacteria transformed with engineered genetic constructs. This comparative analysis comprises three large interlaboratory studies using flow cytometry and plate readers, identical genetic constructs, and compatible unit calibration protocols. Across all three studies, we find similarly high precision in the calibrants used for plate readers. We also find that fluorescence measurements agree closely across the flow cytometry results and two years of plate reader results, with an average standard deviation of 1.52-fold, while the third year of plate reader results are consistently shifted by more than an order of magnitude, with an average shift of 28.9-fold. Analyzing possible sources of error indicates this shift is due to incorrect preparation of the fluorescein calibrant.
The Synthetic Biology Open Language (SBOL) Version 3: Simplified Data Exchange for Bioengineering
Abstract
The Synthetic Biology Open Language (SBOL) is a community-developed data standard that allows knowledge about biological designs to be captured using a machine-tractable, ontology-backed representation that is built using Semantic Web technologies. While early versions of SBOL focused only on the description of DNAbased components and their sub-components, SBOL can now be used to represent knowledge across multiple scales and throughout the entire synthetic biology workflow, from the specification of a single molecule or DNA fragment through to multicellular systems containing multiple interacting genetic circuits. The third major iteration of the SBOL standard, SBOL3, is an effort to streamline and simplify the underlying data model with a focus on real-world applications, based on experience from the deployment of SBOL in a variety of scientific and industrial settings. Here, we introduce the SBOL3 specification both in comparison to previous versions of SBOL and through practical examples of its use. Keywords: synthetic biology, data standards, data exchange, knowledge representation, SBOL
A Lyapunov Analysis of a Most Probable Path Finding Algorithm
Abstract
Distributed information spreading algorithms are important building blocks in Aggregate Computing. We consider a special case, namely for finding a most probable path for message delivery from a set of sources to each device in a network. We formulate a Lyapunov function to prove its regional stability subject to initialization of estimated probabilities to the natural interval [0,1). We also prove that the algorithm converges in a finite time, and is ultimately bounded under persistent measurement errors. We provide tight bounds for convergence time, the ultimate bound, and the time for its attainment.
Priority-enabled Load Balancing for Dispersed Computing
Abstract
Opportunistic managed access to local in-network compute resources can improve the performance of distributed applications and reduce the dependence on shared network resources. Instead of backhauling application data to a centralized cloud for processing, networked services may be adaptively and continuously dispersed into shared compute resources that are closer to the source of need. While this approach has several benefits, support for mission-aware access to computation is often an afterthought, and is implemented as a brittle extension over traditional load-balancer solutions. In this work, we investigate the design of two priority-aware resource allocation strategies and two load-balancing dispatching strategies as first class citizens in an open-source dispersed computing middleware. We present a theoretic analysis of these load-balancing primitives to identify weaknesses and strengths in our design, and recommend future directions. We then prototype two priority-aware allocation algorithms to validate our priority predictions. In initial experiments our prototype shows substantial gains in processing prioritized load. Finally, we make our source-code and experimental configurations open source.
Incomplete Cell Sorting Creates Engineerable Structures with Long-Term Stability
Abstract
Adhesion-mediated cell sorting has long been considered an organizing principle in developmental biology. While most computational models have emphasized the dynamics of segregation to fully sorted structures, cell sorting can also generate a plethora of transient, incompletely sorted states. The timescale of such states in experimental systems is unclear: if they are long-lived, they can be harnessed by development or engineered in synthetic tissues. Here, we use experiments and computational modeling to demonstrate how such structures can be systematically designed by quantitative control of cell composition. By varying the number of highly adhesive and less adhesive cells in multicellular aggregates, we find the cell-type ratio and total cell count control pattern formation, with resulting structures maintained for several days. Our work takes a step toward mapping the design space of self-assembling structures in development and provides guidance to the emerging field of shape engineering with synthetic biology.
CMOS Electrochemical Imaging Arrays for the Detection and Classification of Microorganisms
Abstract:
Microorganisms account for most of the biodiversity on earth. Yet while there are increasingly powerful tools for studying microbial genetic diversity, there are fewer tools for studying microorganisms in their natural environments. In this paper, we present recent advances in CMOS electrochemical imaging arrays for detecting and classifying microorganisms. These microscale sensing platforms can provide non-optical measurements of cell geometries, behaviors, and metabolic markers. We review integrated electronic sensors appropriate for monitoring microbial growth, and present measurements of single-celled algae using a CMOS sensor array with thousands of active pixels. Integrated electrochemical imaging can contribute to improved medical diagnostics and environmental monitoring, as well as discoveries of new microbial populations.
Abstract
Laboratory automation now commonly allows high-throughput sample preparation, culturing, and acquisition of microscopy images, but quantitative image analysis is often still a painstaking and subjective process. This is a problem especially significant for work on programmed morphogenesis, where the spatial organization of cells and cell types is of paramount importance. To address the challenges of quantitative analysis for such experiments, we have developed TASBE Image Analytics, a software pipeline for automatically segmenting collections of cells using the fluorescence channels of microscopy images. With TASBE Image Analytics, collections of cells can be grouped into spatially disjoint segments, the movement or development of these segments tracked over time, and rich statistical data output in a standardized format for analysis. Processing is readily configurable, rapid, and produces results that closely match hand annotation by humans for all but the smallest and dimmest segments. TASBE Image Analytics can thus provide the analysis necessary to complete the design-build-test-learn cycle for high-throughput experiments in programmed morphogenesis, as validated by our application of this pipeline to process experiments on shape formation with engineered CHO and HEK293 cells.
Levels of Autonomy in Synthetic Biology Engineering
Abstract
Engineering biological organisms is a complex process and challenging that could benefit from a combination of standardization and automation. This Commentary discussed the advantages and challenges of achieving high levels of autonomy in synthetic biology.
Field-based Coordination with the Share Operator
Abstract
Recent work in the area of coordination models and collective adaptive systems promotes a view of distributed computations as functions manipulating computational fields (data structures spread over space and evolving over time), and introduces the field calculus as a formal foundation for field computations. In field calculus, evolution (time) and neighbor interaction (space) are handled by separate functional operators: however, this intrinsically limits the speed of information propagation that can be achieved by their combined use. In this paper, we propose a new field-based coordination operator called share, which captures the space-time nature of field computations in a single operator that declaratively achieves: (i) observation of neighbors’ values; (ii) reduction to a single local value; and (iii) update and converse sharing to neighbors of a local variable. In addition to conceptual economy, use of the share operator also allows many prior field calculus algorithms to be greatly accelerated, which we validate empirically with simulations of a number of frequently used network propagation and collection algorithms.
Robust Estimation of Bacterial Cell Count from Optical Density
Abstract
Optical density (OD) is widely used to estimate the density of cells in liquid culture, but cannot be compared between instruments without a standardized calibration protocol and is challenging to relate to actual cell count. We address this with an interlaboratory study comparing three simple, low-cost, and highly accessible OD calibration protocols across 244 laboratories, applied to eight strains of constitutive GFP-expressing E. coli. Based on our results, we recommend calibrating OD to estimated cell count using serial dilution of silica microspheres, which produces highly precise calibration (95.5% of residuals <1.2-fold), is easily assessed for quality control, also assesses instrument effective linear range, and can be combined with fluorescence calibration to obtain units of Molecules of Equivalent Fluorescein (MEFL) per cell, allowing direct comparison and data fusion with flow cytometry measurements: in our study, fluorescence per cell measurements showed only a 1.07-fold mean difference between plate reader and flow cytometry data.
Capturing Multicellular System Designs Using the Synthetic Biology Open Language
Abstract
Synthetic biology aims to improve the development of biological systems and in-crease their reproducibility through the use of engineering principles, such as standardization and modularization. It is important that these systems can be represented and shared in a standard way to ensure they are easily understood, reproduced, and utilized by other researchers. The Synthetic Biology Open Language (SBOL) is a data standard for sharing biological designs and information about their implementation and characterization. Thus far, this standard has been used to represent designs in homogeneous systems, where the same design is implemented in every cell. In recent years there has been increasing interest in multicellular systems, where biological designs are split across multiple cells to optimize the system behavior and function. Here we show how the SBOL standard can be used to represent such multicellular systems, and hence how researchers can better share designs with the community.
The Synthetic Biology Open Language (SBOL) Version 3: Simplified Data Exchange for Bioengineering
Abstract
The Synthetic Biology Open Language (SBOL) is a community-developed data standard that allows knowledge about biological designs to be captured using a machine-tractable, ontology-backed representation that is built using Semantic Web technologies. While early versions of SBOL focused only on the description of DNA-based components and their sub-components, SBOL can now be used to represent knowledge across multiple scales and throughout the entire synthetic biology workflow, from the specification of a single molecule or DNA fragment through to multicellular systems containing multiple interacting genetic circuits. The third major iteration of the SBOL standard, SBOL3, is an effort to streamline and simplify the underlying data model with a focus on real-world applications, based on experience from the deployment of SBOL in a variety of scientific and industrial settings. Here, we introduce the SBOL3 specification both in comparison to previous versions of SBOL and through practical examples of its use.
Improving Collection Dynamics by Monotonic Filtering
Abstract
A key coordination problem in distributed open systems is distributed sensing, as achieved by cooperation and interaction among individual devices. An archetypal operation of distributed sensing is data summarization over a region of space, by which many higher level problems can be addressed, including counting items, measuring space, averaging environmental values, etc. A typical coordination strategy to perform data summarization in a peer-to-peer scenario, where devices can communicate only with a neighborhood, is to progressively accumulate information towards one or more collector devices, though this typically exhibits problems of reactivity and fragility. In this paper, we present a monotonic filtering strategy for improving the dynamics of single path collection algorithms. The strategy consists of inhibiting communication across devices whose distance towards the collector device is not decreasing. We prove that single path collection in a line graph results in quadratic overestimates after a source change and that these overestimates disappear with the application of monotonic filtering. These preliminary results suggest that monotonic filtering is likely to improve the dynamics of singlepath collection algorithms, by preventing excessive overestimates.
Round-Trip: An Automated Pipeline for Experimental Design, Execution, and Analysis
Abstract
Synthetic biology is a complex discipline that involves creating detailed, purpose-built designs from genetic parts. This process is often phrased as a Design-Build-Test-Learn loop, where iterative design improvements can be made, implemented, measured, and analyzed. Automation can potentially improve both the end-to-end duration of the process and the utility of data produced by the process. One of the most important considerations for the development of effective automation and quality data is a rigorous description of implicit knowledge encoded as a formal knowledge representation. The development of knowledge representation for the process poses a number of challenges, including developing effective human–machine interfaces, protecting against and repairing user error, providing flexibility for terminological mismatches, and supporting extensibility to new experimental types. We address these challenges with the DARPA SD2 Round Trip software architecture.
Intent Parser: a tool for codifying experiment design
Abstract
Communicating information about experimental design among a team of collaborators is challenging because different people tend to describe experiments in different ways and with different levels of detail. Sometimes, humans can interpret missing information by making assumptions and drawing inferences from information already provided. Doing so, however, is error-prone and typically requires a high level of interpersonal communication. In this paper, we present a tool that addresses this challenge by providing a simple interface for incremental formal codification of experiment designs. Users interact with a Google Docs word-processing interface with structured tables, backed by assisted linking to machine-readable definitions in a data repository (SynBioHub) and specification of available protocols and requests for execution in the Open Protocol Interface Language (OPIL). The result is an easy-to-use tool for generating machine-readable descriptions of experiment designs with which users in the DARPA SD2 program have collected data from 80 208 samples using a variety of protocols and instruments over the course of 181 experiment runs.
Collaborative Terminology: SBOL Project Dictionary
Abstract
Sharing information about biological experiments between researchers is often challenging. Reagents, strains, and genetic constructs are often given “shorthand” names that are ambiguous (e.g., “ara” for L-arabinose), differ between researchers (e.g., “L-arab” vs. “Arabinose”) or are unknown outside of a particular group (e.g., “plasmid 37”). Likewise, the particular combinations used in each sample of an experiment are often expressed in variable personal shorthand’s, often accidentally omitting important details
Describing engineered biological systems with SBOL3 and ShortBOL2
Abstract
Data standards are essential to exchange information about the engineering of biological systems. The Synthetic Biology Open Language (SBOL) is a community-driven standard that facilitates the exchange of data relating to the design, implementation, testing and refinement of engineered biological systems [4]. Versions 1 and 2 of SBOL have gained widespread adoption, with over 170 developers, 29 SBOL supporting software tools and 42 institutions involved in their development and deployment (as of June 2020). Recently, SBOL was refactored to simplify its data model, resulting in the release of the SBOL3 speciation [1].
Synthetic Biology Open Language (SBOL) Version 3.0.0
Abstract
Synthetic biology builds upon genetics, molecular biology, and metabolic engineering by applying engineering principles to the design of biological systems. When designing a synthetic system, synthetic biologists need to exchange information about multiple types of molecules, the intended behavior of the system, and actual experimental measurements. The Synthetic Biology Open Language (SBOL) has been developed as a standard to support the specification and exchange of biological design information in synthetic biology, following an open community process involving both wet bench scientists and dry scientific modelers and software developers, across academia, industry, and other institutions. This document describes SBOL 3.0.0, which condenses and simplifies previous versions of SBOL based on experiences in deployment across a variety of scientific and industrial settings. In particular, SBOL 3.0.0, (1) separates sequence features from part/sub-part relationships, (2) renames Component Definition/Component to Component/Sub-Component, (3) merges Component and Module classes, (4) ensures consistency between data model and ontology terms, (5) extends the means to define and reference Sub-Components, (6) refines requirements on object URIs, (7) enables graph-based serialization, (8) moves Systems Biology Ontology (SBO) for Component types, (9) makes all sequence associations explicit, (10) makes interfaces explicit, (11) generalizes Sequence Constraints into a general structural Constraint class, and (12) expands the set of allowed constraints.
Synthetic Biology Open Language Visual (SBOL Visual) Version 2.2
Abstract
People who are engineering biological organisms often find it useful to communicate in diagrams, both about the structure of the nucleic acid sequences that they are engineering and about the functional relationships between sequence features and other molecular species. Some typical practices and conventions have begun to emerge for such diagrams. The Synthetic Biology Open Language Visual (SBOL Visual) has been developed as a standard for organizing and systematizing such conventions in order to produce a coherent language for expressing the structure and function of genetic designs. This document details version 2.2 of SBOL Visual, which builds on the prior SBOL Visual 2.1 in several ways. First, the grounding of molecular species glyphs is changed from BioPAX to SBO, aligning with the use of SBO terms for interaction glyphs. Second, new glyphs are added for proteins, introns, and polypeptide regions (e. g., protein domains), the prior recommended macromolecule glyph is deprecated in favor of its alternative, and small polygons are introduced as alternative glyphs for simple chemicals.
Synthetic biology open language (SBOL) version 2.3
Abstract
Synthetic biology builds upon the techniques and successes of genetics, molecular biology, and metabolic engineering by applying engineering principles to the design of biological systems. The field still faces substantial challenges, including long development times, high rates of failure, and poor reproducibility. One method to ameliorate these problems is to improve the exchange of information about designed systems between laboratories. The synthetic biology open language (SBOL) has been developed as a standard to support the specification and exchange of biological design information in synthetic biology, filling a need not satisfied by other pre-existing standards. This document details version 2.3.0 of SBOL, which builds upon version 2.2.0 published in last year’s JIB Standards in Systems Biology special issue.
Abstract
Synthetic biology needs to adopt sound scientific and industry‐like standards in order to achieve its ambitious goals of efficient and accurate engineering of biological systems.
Embrace experimentation in biosecurity governance
Abstract
As biological research and its applications rapidly evolve, new attempts at the governance of biology are emerging, challenging traditional assumptions about how science works and who is responsible for governing. However, these governance approaches often are not evaluated, analyzed, or compared. This hinders the building of a cumulative base of experience and opportunities for learning. Consider “biosecurity governance,” a term with no internationally agreed definition, here defined as the processes that influence behavior to prevent or deter misuse of biological science and technology. Changes in technical, social, and political environments, coupled with the emergence of natural diseases such as coronavirus disease 2019 (COVID-19), are testing existing governance processes. This has led some communities to look beyond existing biosecurity models, policies, and procedures. But without systematic analysis and learning across them, it is hard to know what works.
Abstract
Standardizing the visual representation of genetic parts and circuits is essential for unambiguously creating and interpreting genetic designs. To this end, an increasing number of tools are adopting well-defined glyphs from the Synthetic Biology Open Language (SBOL) Visual standard to represent various genetic parts and their relationships. However, the implementation and maintenance of the relationships between biological elements or concepts and their associated glyphs has up to now been left up to tool developers. We address this need with the SBOL Visual 2 Ontology, a machine-accessible resource that provides rules for mapping from genetic parts, molecules, and interactions between them, to agreed SBOL Visual glyphs. This resource, together with a web service, can be used as a library to simplify the development of visualization tools, as a stand-alone resource to computationally search for suitable glyphs, and to help facilitate integration with existing biological ontologies and standards in synthetic biology.
Abstract
The Synthetic Biology Open Language (SBOL) is an emerging synthetic biology data exchange standard, designed primarily for unambiguous and efficient machine-to-machine communication. However, manual editing of SBOL is generally difficult for nontrivial designs. Here, we describe ShortBOL, a lightweight SBOL scripting language that bridges the gap between manual editing, visual design tools, and direct programming. ShortBOL is a shorthand textual language developed to enable users to create SBOL designs quickly and easily, without requiring strong programming skills or visual design tools.
Organizing Genome Editing for the Gigabase Scale
Abstract
Genome-scale engineering holds great potential to impact science, industry, medicine, and society, and recent improvements in DNA synthesis have enabled the manipulation of megabase genomes. However, coordinating and integrating the workflows and large teams necessary for gigabase genome engineering remains a considerable challenge. We examine this issue and recommend a path forward by: 1) adopting and extending existing representations for designs, assembly plans, samples, data, and workflows; 2) developing new technologies for data curation and quality control; 3) conducting fundamental research on genome-scale modeling and design; and 4) developing new legal and contractual infrastructure to facilitate collaboration.
Abstract
Standardizing the visual representation of genetic parts and circuits is essential for unambiguously creating and interpreting genetic designs. To this end, an increasing number of tools are adopting well-defined glyphs from the Synthetic Biology Open Language (SBOL) Visual standard to represent various genetic parts and their relationships. However, the implementation and maintenance of the relationships between biological elements or concepts and their associated glyphs has up to now been left up to tool developers. We address this need with the SBOL Visual 2 Ontology, a machine-accessible resource that provides rules for mapping from genetic parts, molecules, and interactions between them, to agreed SBOL Visual glyphs. This resource, together with a web service, can be used as a library to simplify the development of visualization tools, as a stand-alone resource to computationally search for suitable glyphs, and to help facilitate integration with existing biological ontologies and standards in synthetic biology.
Automated Detection of Yeast Genetic Engineering in Whole Genomes and Metagenomes with Prymetime
Abstract
Yeast genomes can be assembled from sequencing data, but genetic engineering changes often fail to be resolved with accuracy, completeness, and contiguity. Further, searching for engineered sequences in sequence data is currently a manual process. To overcome these challenges, we applied nanopore assembly and short read error correction to create an integrated workflow that achieves accurate whole genome and plasmid sequences of engineered yeasts, automatically annotating synthetic biology parts. We named this workflow Prymetime, "Pipeline for Recombinant Yeast genoMEs That Illuminates Markers of Engineering."
Verification of genetic engineering in yeasts with nanopore whole genome sequencing
Abstract
Yeast genomes can be assembled from sequencing data, but genetic engineering changes often fail to be resolved with accuracy, completeness, and contiguity. Further, searching for engineered sequences in sequence data is currently a manual process. To overcome these challenges, we applied nanopore assembly and short read error correction to create an integrated workflow that achieves accurate whole genome and plasmid sequences of engineered yeasts, automatically annotating synthetic biology parts. We named this workflow Prymetime, "Pipeline for Recombinant Yeast genoMEs That Illuminates Markers of Engineering."
2018
Small molecule-based regulation of gene expression for RNA-delivered circuits in mammalian cells
Abstract
Synthetic mRNA is an attractive vehicle for gene therapies because of its transient nature and improved safety profile over DNA. However, unlike DNA, broadly applicable methods to control expression from mRNA are lacking. Here we describe a platform for small-molecule-based regulation of expression from modified RNA (modRNA) and self-replicating RNA (replicon) delivered to mammalian cells. Specifically, we engineer small-molecule-responsive RNA binding proteins to control expression of proteins from RNA-encoded genetic circuits. Coupled with specific modRNA dosages or engineered elements from a replicon, including a sub genomic promoter library, we demonstrate the capability to externally regulate the timing and level of protein expression. These control mechanisms facilitate the construction of ON, OFF, and two-output switches, with potential therapeutic applications such as inducible cancer immunotherapies. These circuits, along with other synthetic networks that can be developed using these tools, will expand the utility of synthetic mRNA as a therapeutic modality.
2018
Capturing Multicellular System Designs Using the Synthetic Biology Open Language (SBOL)
Abstract
Synthetic biology aims to improve the development of biological systems and increase their reproducibility through the use of engineering principles, such as standardization and modularization. It is important that these systems can be represented and shared in a standard way to ensure they are easily understood, reproduced, and utilized by other researchers. The Synthetic Biology Open Language (SBOL) is a data standard for sharing biological designs and information about their implementation and characterization. Thus far, this standard has been used to represent designs in homogeneous systems, where the same design is implemented in every cell. In recent years there has been increasing interest in multicellular systems, where biological designs are split across multiple cells to optimize the system behavior and function. Here we show how the SOBL standard can be used to represent such multicellular systems and hence how researchers can better share designs with the community.
2018
Formalizing Sample Transformation Plans
Abstract
Experimental protocols are typically represented in either a natural language that is hard to replicate or compare, or in procedural languages that are difficult to automatically synthesize, detach from a specific experimental design for reuse, or analyze. We introduce a new approach based on techniques from automated planning. We describe how to represent transformation operators that manipulate samples in terms of applying conditions to samples. We define the semantics of this representation. We also present a simplified version of the notation that removes much of the modeling burden required of scientists. The resulting representation supports automated planning, provides sample provenance and metadata tracking at no cost by virtue of a plan’s causal structure, and separates protocol specification from experimental design.
2018
Time to get serious about measurement in synthetic biology
Abstract
For synthetic biology to mature, composition of devices into functional systems must become routine. This requires widespread adoption of comparable and replicable units of measurement. Interlaboratory studies organized through the International Genetically Engineered Machine (iGEM) competition show that fluorescence can be calibrated with simple, low-cost protocols, so fluorescence should no longer be published without units.
2018
Toward Programming 3D Shape Formation in Mammalian Cells
Abstract
Biological cells are remarkably elective at predictable and resilient formation of complex three-dimensional shapes, as aptly demonstrated by most multicellular life on this planet. Not only can intricate shapes be formed with high reliability, but organisms also maintain functional integration of the entire system throughout development, as well as adapting form in response to environmental conditions, damage, and other disruptions. Moreover, these feats of manufacturing are accomplished entirely with reprocessed locally harvested materials.
2018
Abstract
A critical bottleneck for large-scale engineering collaboration in synthetic biology has been the inability to integrate data through successive stages of the design-build-test-learn (DBTL) engineering life-cycle. These workflows generate large volumes of data and physical artifacts (e.g., DNA samples and cell stocks) that are difficult to organize, track, and manage without systematized, automated tool chains.
2018
Specifying Combinatorial Designs with the Synthetic Biology Open Language
Abstract
During the last decade, new technologies have been developed for the combinatorial assembly of genetic parts [8, 9], enabling synthetic biologists to more readily generate libraries of genetic construct variants. These types of combinatorial libraries can play an important role in genetic design by allowing designers to explore the impact of part choice, order, and orientation on construct behavior. In order to support the design of such libraries, new tools and formalisms have been developed to enable the specification, permutation, and sampling of combinatorial genetic design spaces [1, 2]. In turn, these formalisms have given rise to the need for a standard representation of combinatorial genetic designs in order to enable sharing of such designs between tools and laboratories and to simplify human and machine reasoning over them.
2018
Quantification of Bacterial Fluorescence using Independent Calibrants
Abstract
Fluorescent reporters are commonly used to quantify activities or properties of both natural and engineered cells. Fluorescence is still typically reported only in arbitrary or normalized units, however, rather than in units defined using an independent calibrant, which is problematic for scientific reproducibility and even more so when it comes to effective engineering. In this paper, we report an inter-laboratory study showing that simple, low-cost unit calibration protocols can remedy this situation, producing comparable units and dramatic improvements in precision over both arbitrary and normalized units. Participants at 92 institutions around the world measured fluorescence from E. coli transformed with three engineered test plasmids, plus positive and negative controls, using simple, low-cost unit calibration protocols designed for use with a plate reader and/or flow cytometer. In addition to providing comparable units, use of an independent calibrant allows quantitative use of positive and negative controls to identify likely instances of protocol failure. The use of independent calibrants thus allows order of magnitude improvements in precision, narrowing the 95% confidence interval of measurements in our study up to 600-fold compared to normalized units.
2018
XPlan: Experiment Planning for Synthetic Biology
Abstract
We describe preliminary work on XPlan as a system for experiment planning in synthetic biology, in synthetic biology, as in other emerging fields, scientific exploration and engineering design must be interleaved, because of uncertainty about the underlying mechanisms. Through its experiment planning, XPlan provides a coordinating linchpin in DARPA’s Synergistic Discover and Design (SD2) platform to automate scientific discover, closing the loop between multiple machine learning analysis and biological design tools and wet labs to guide the discovery and design process.
2018
Engineering modular intracellular protein sensor-actuator devices
Abstract
Understanding and reshaping cellular behaviors with synthetic gene networks requires the ability to sense and respond to changes in the intracellular environment. Intracellular proteins are involved in almost all cellular processes, and thus can provide important information about changes in cellular conditions such as infections, mutations, or disease states. Here we report the design of a modular platform for intra-body-based protein sensing-actuation devices with transcriptional output triggered by detection of intracellular proteins in mammalian cells. We demonstrate reporter activation response (fluorescence, apoptotic gene) to proteins involved in hepatitis C virus (HCV) infection, human immunodeficiency virus (HIV) infection, and Huntington’s disease, and show sensor-based interference with HIV-1 downregulation of HLA-I in infected T cells. Our method provides a means to link varying cellular conditions with robust control of cellular behavior for scientific and therapeutic applications.
2018
Synthetic Biology Open Language (SBOL) Version 2.2.0
Abstract
Synthetic biology builds upon the techniques and successes of genetics, molecular biology, and metabolic engineering by applying engineering principles to the design of biological systems. The field still faces substantial challenges, including long development times, high rates of failure, and poor reproducibility. One method to ameliorate these problems would be to improve the exchange of information about designed systems between laboratories. The synthetic biology open language (SBOL) has been developed as a standard to support the specification and exchange of biological design information in synthetic biology, filling a need not satisfied by other pre-existing standards. This document details version 2.2.0 of SBOL that builds upon version 2.1.0published in last year’s JIB special issue. In particular, SBOL 2.2.0 includes improved description and validation rules for genetic design provenance, an extension to support combinatorial genetic designs, a new class to add non-SBOL data as attachments, a new class for genetic design implementations, and a description of a methodology to describe the entire design-build-test-learn cycle within the SBOL data model.
2018
Synthetic Biology Open Language Visual (SBOL Visual) Version 2.0
Abstract
People who are engineering biological organisms often find it useful to communicate in diagrams, both about the structure of the nucleic acid sequences that they are engineering and about the functional relationships between sequence features and other molecular species. Some typical practices and conventions have begun to emerge for such diagrams. The Synthetic Biology Open Language Visual (SBOL Visual) has been developed as a standard for organizing and systematizing such conventions in order to produce a coherent language for expressing the structure and function of genetic designs. This document details version 2.0 of SBOL Visual, which builds on the prior SBOL Visual 1.0 standard by expanding diagram syntax to include functional inter-actions and molecular species, making the relationship between diagrams and the SBOL data model explicit, supporting families of symbol variants, clarifying a number of requirements and best practices, and significantly expanding the collection of diagram glyphs.
2018
Managing Bioengineering Complexity
Abstract
Engineering the behavior of cells by modification of their genetic machinery holds the potential for revolutionary advances in many important application areas, including medical therapies, vaccination, manufacturing of proteins and other organic compounds, and environmental remediation. As capabilities and potential applications grow, the complexity and cross-disciplinary knowledge required to employ them is also growing rapidly. Managing the complexity of biological engineering is thus a problem of increasing importance. The rapid pace of advancement makes it important to have good methods for integration of new knowledge and procedures into organism engineering workflows.
2017
A Visual Language for Protein Design
Abstract
As protein engineering becomes more sophisticated, practitioners increasingly need to share diagrams for communicating protein designs. To this end, we present a draft visual language, Protein Language that describes the high-level architecture of an engineered protein with a few easy-to-draw glyphs, intended to be compatible with other biological diagram languages such as SBOL and SBGN. Protein Language consists of glyphs for representing important features (e.g., globular domains, recognition and localization sequences, sites of covalent modification, cleavage and catalysis), rules for composing these glyphs to represent complex architectures, and rules constraining the scaling and styling of diagrams.
2017
Toward Quantitative Comparison of Fluorescent Protein Expression Levels via Fluorescent Beads
Abstract
Establishing an elective engineering discipline always requires standardized and comparable units of measurement. Such measurements serve as a means of communication between the people and machines interacting with a project, ensure compatibility between components, and allow prediction of the results of design decisions. Regulating gene expression is foundational for organism engineering, and flow cytometry is an excellent means of quantifying large numbers of single cell gene expression measurements. At present, however, flow cytometry data is still often acquired in arbitrary or relative units, without standardizing the measurement by comparison to an independent reference material (i.e., one enabling precise calibration of measurements). Some have proposed standardizing to a biological cultured reference material (e.g., [3]), but fluorescence from such materials varies strongly, unpredictably, and often not proportional to the samples it is intended to be a reference for, thus resulting in a large degree of uncertainty in measurement.
2017
Biochemical complexity drives log-normal variation in genetic expression
Abstract
Cells exhibit a high degree of variation in levels of gene expression, even within otherwise homogeneous populations. The standard model to describe this variation centers on a gamma distribution driven by stochastic bursts of translation. Stochastic bursting, however, cannot account for the well-established behavior of strong transcriptional repressors. Instead, it can be shown that the very complexity of the biochemical processes involved in gene expression drives an emergent log-normal distribution of expression levels. Emergent log-normal distributions can account for the observed behavior of transcriptional repressors, are still compatible with stochastically constrained distributions, and have important implications for both analysis of gene expression data and the engineering of biological organisms.
2017
A Standard-Enabled Workflow for Synthetic Biology
Abstract
A synthetic biology workflow is composed of data repositories that provide information about genetic parts, sequence-level design tools to compose these parts into circuits, visualization tools to depict these designs, genetic design tools to select parts to create systems, and modeling and simulation tools to evaluate alternative design choices. Data standards enable the ready exchange of information within such a workflow, allowing repositories and tools to be connected from a diversity of sources. The present paper describes one such workflow that utilizes, among others, the Synthetic Biology Open Language (SBOL) to describe genetic designs, the Systems Biology Markup Language to model these designs, and SBOL Visual to visualize these designs. We describe how a standard-enabled workflow can be used to produce types of design information, including multiple repositories and software tools exchanging information using a variety of data standards. Recently, the ACS Synthetic Biology journal has recommended the use of SBOL in their publications.
2017
Reducing DNA context dependence in bacterial promoters
Abstract
Variation in the DNA sequence upstream of bacterial promoters is known to affect the expression levels of the products they regulate, sometimes dramatically. While neutral synthetic insulator sequences have been found to buffer promoters from upstream DNA context, there are no established methods for designing effective insulator sequences with predictable effects on expression levels. We address this problem with Degenerate Insulation Screening (DIS), a novel method based on a randomized 36-nucleotide insulator library and a simple, high-throughput, flow-cytometry-based screen that randomly samples from a library of 436 potential insulated promoters. The results of this screen can then be compared against a reference uninsulated device to select a set of insulated promoters providing a precise level of expression. We verify this method by insulating the constitutive, inducible, and repressible promotors of a four transcriptional-unit inverter (NOT-gate) circuit, finding both that order dependence is largely eliminated by insulation and that circuit performance is also significantly improved, with a 5.8-fold mean improvement in on/off ratio.
2016
Mathematical Foundations of Variation in Gene Expression
Abstract
A key challenge in engineering biological organisms is the high degree of cell-to-cell variation commonly observed in gene expression. The inherently discrete and stochastic nature of the chemical reactions that underlay gene expression has been proposed as an explanation for the highly asymmetric distributions that are frequently observed [1], with bursts of expression leading to a Gamma distribution. While this may explain the behavior of systems with very low expression, it is insufficient to account for the high degree of cell-to-cell variation that is typically still observed even with strong expression (e.g., more than 2-fold standard deviation with a mean of many millions of molecules in [2]). In essence, with strong expression there are typically so many molecules involved that the law of large numbers will generally render the impact of chemical stochasticity largely insignificant.
2016
Synthetic Biology Open Language (SBOL) Version 2.1.0
Abstract
Synthetic biology builds upon the techniques and successes of genetics, molecular biology, and metabolic engineering by applying engineering principles to the design of biological systems. The field still faces substantial challenges, including long development times, high rates of failure, and poor reproducibility. One method to ameliorate these problems would be to improve the exchange of information about designed systems between laboratories. The Synthetic Biology Open Language (SBOL) has been developed as a standard to support the specification and exchange of biological design information in synthetic biology, filling a need not satisfied by other pre-existing standards. This document details version 2.1 of SBOL that builds upon version 2.0 published in last year’s JIB special issue. In particular, SBOL 2.1 includes improved rules for what constitutes a valid SBOL document, new role fields to simplify the expression of sequence features and how components are used in context, and new best practices descriptions to improve the exchange of basic sequence topology information and the description of genetic design provenance, as well as miscellaneous other minor improvements.
2016
Managing Bioengineering Complexity with AI Techniques
Abstract
Our capabilities for systematic design and engineering of biological systems are rapidly increasing. Effectively engineering such systems, however, requires the synthesis of a rapidly expanding and changing complex body of knowledge, protocols, and methodologies. Many of the problems in managing this complexity, however, appear susceptible to being addressed by artificial intelligence (AI) techniques, i.e., methods enabling computers to represent, acquire, and employ knowledge. Such methods can be employed to automate physical and informational “routine” work and thus better allow humans to focus their attention on the deeper scientific and engineering issues. This paper examines the potential impact of AI on the engineering of biological organisms through the lens of a typical organism engineering workflow. We identify a number of key opportunities for significant impact, as well as challenges that must be overcome.
2016
Design for Improved Repression in RNA Replicons
Abstract
RNA replicons are an emerging platform for synthetic biology, in which the infective capsid of a RNA virus is replaced with an engineered payload while its self-replication capability is retained [4, 3, 1, 6]. This self-replication capability allows RNA replicons entering a cell to amplify their engineered elements, providing strong expression from a low initial dose without integration into host DNA or propagation to other cells. Replicons thus offer an attractive platform for developing medical applications such as vaccines [2, 3] and stem-cell generation [7], combining both strong expression and relative genetic isolation. Development of RNA replicons to date has focused primarily on derivatives of alphaviruses, a well-characterized family of positive-strand RNA viruses, and most particularly the Sindbis and VEE vectors [4]. Protein expression from RNA replicons can be precisely predicted and controlled [1], and can support standard synthetic circuits such as cascades and toggle switches [6]
2016
Abstract
Research is communicated more effectively and reproducibly when articles depict genetic designs consistently and fully disclose the complete sequences of all reported constructs. ACS Synthetic Biology is now providing authors with updated guidance and piloting a new tool and publication workflow that facilitate compliance with these recommended practices and standards for visual representation and data exchange.
2016
Sharing Structure and Function in Biological Design with SBOL 2.0
Abstract
The Synthetic Biology Open Language (SBOL) is a standard that enables collaborative engineering of biological systems across different institutions and tools. SBOL is developed through careful consideration of recent synthetic biology trends, real use cases, and consensus among leading researchers in the field and members of commercial biotechnology enterprises. We demonstrate and discuss how a set of SBOL-enabled software tools can form an integrated, cross-organizational workflow to recapitulate the design of one of the largest published genetic circuits to date, a 4-input AND sensor. This design encompasses the structural components of the system, such as its DNA, RNA, small molecules, and proteins, as well as the interactions between these components that determine the system’s behavior/function.
2016
IWBDA 2015 (editorial) Jacob Beal
Abstract
The International Workshop on Bio-Design Automation (IWBDA) brings together researchers from the synthetic biology, systems biology, and design automation communities. One of the key challenges of synthetic biology is the sheer complexity of engineering biological systems, with regards to both the nature of biological organisms and the profusion of components, protocols, and methods with which these organisms are engineered. The motivating goal of IWBDA is to address these challenges by fostering cross-disciplinary discussion and collaboration between researchers with back grounds in biology, computation, and other relevant disciplines. The seventh IWBDA, organized by the nonprofit Bio-Design Automation Consortium (BDAC), was held at the University of Washington in Seattle, Washington on August 19th through 21st, 2015. This special ACS Synthetic Biology issue includes eight papers associated with the work presented at IWBDA, spanning a wide range of different topics and focus areas.
2016
libSBOLj 2.0: A Java Library to Support SBOL 2.0
Abstract
The Synthetic Biology Open Language (SBOL) is an emerging data standard for representing synthetic biology designs. The goal of SBOL is to improve the reproducibility of these designs and their electronic exchange between researchers and/or genetic design automation tools. The latest version of the standard, SBOL 2.0, enables the annotation of a large variety of biological components (e.g., DNA, RNA, proteins, complexes, small molecules, etc.) and their interactions. SBOL 2.0 also allows researchers to organize components into hierarchical modules, to specify their intended functions, and to link modules to models that describe their behavior mathematically. To support the use of SBOL 2.0, we have developed the libSBOLj 2.0 Java library, which provides an easy to use Application Programming Interface (API) for developers, including manipulation of SBOL constructs, serialization to and from an RDF/XML file format, and migration support in the form of conversion from the prior SBOL 1.1 standard to SBOL 2.0. This letter describes the libSBOLj 2.0 library and key engineering decisions involved in its design.
2016
Reproducibility of Fluorescent Expression from Engineered Biological Constructs in E. coli
Abstract
We present results of the first large-scale inter-laboratory study carried out in synthetic biology, as part of the 2014 and 2015 International Genetically Engineered Machine (iGEM) competitions. Participants at 88 institutions around the world measured fluorescence from three engineered constitutive constructs in E. coli. Few participants were able to measure absolute fluorescence, so data was analyzed in terms of ratios. Precision was strongly related to fluorescent strength, ranging from 1.54-fold standard deviation for the ratio between strong promoters to 5.75-fold for the ratio between the strongest and weakest promoter, and while host strain did not affect expression ratios, choice of instrument did. This result shows that high quantitative precision and reproducibility of results is possible, while at the same time indicating areas needing improved laboratory practices.
2016
Abstract
Multipart and modular DNA part libraries and assembly standards have become common tools in synthetic 10 biology since the publication of the Gibson and Golden Gate assembly methods, yet no multipart modular library exists for use in 11 bacterial systems. Building upon the existing MoClo assembly framework, we have developed a publicly available collection of 12 modular DNA parts and enhanced MoClo protocols to enable rapid one-pot, multipart assembly, combinatorial design, and 13 expression tuning in Escherichia coli. The Cross-disciplinary Integration of Design Automation Research lab (CIDAR) MoClo 14 Library is openly available and contains promoters, ribosomal binding sites, coding sequence, terminators, vectors, and a set of 15 fluorescent control plasmids. Optimized protocols reduce reaction time and cost by >80% from that of previously published protocols.
2015
SBOL Visual: Standard Schematics for Synthetic Genetic Constructs
Abstract
Synthetic Biology Open Language (SBOL) Visual is a graphical standard for genetic engineering. It consists of symbols representing DNA subsequences, including regulatory elements and DNA assembly features. These symbols can be used to draw illustrations for communication and instruction, and as image assets for computer-aided design. SBOL Visual is a community standard, freely available for personal, academic, and commercial use (Creative Commons CC0 license). We provide prototypical symbol images that have been used in scientific publications and software tools.
2015
Cas9 gRNA engineering for selectable genome editing, activation and repression
Abstract
We demonstrate that by altering the length of Cas9- associated guide RNA (gRNA) we were able to control Cas9 nuclease activity and simultaneously perform genome editing and transcriptional regulation with a single Cas9 protein. We exploited these principles to engineer mammalian synthetic circuits with combined transcriptional regulation and kill functions governed by a single multifunctional Cas9 protein.
2015
Design of Biological Circuits Using Signal-to-Noise Ratio
Abstract
Biological computing circuits have a role to play in many synthetic biology applications, such as precision cancer therapy, sensing chemical threats, or control of biosynthesis processes. Actually realizing such circuits effectively, however, has been quite difficult: until recently, neither high-precision prediction nor high-performance component libraries were available. Thus, although many design approaches for selecting components to realize a circuit have been proposed (e.g., [11, 6, 9], to name a few), it has been unclear which, if any, of these approaches was likely to actually be practical for the realization of biological circuits.
2015
Copyright and Licensing of BBF RFCs
Abstract
The BBF RFC process currently is managed by the BioBricks Foundation, and BBF RFC documents are made available as PDF files through DSpace@MIT. Until now notification of the licensing terms for BBF RFC documents have been indicated on the BioBricks Foundation’s website and on DSpace@MIT, but have not been included on the actual BBF RFC documents. Because PDF files travel freely over the internet, the lack of a licensing notice on the actual BBF RFC documents has led to unnecessary confusion.
2015
Synthetic Biology Open Language (SBOL) Version 2.0.0
Abstract
Synthetic biology builds upon the techniques and successes of genetics, molecular biology, and metabolic engineering by applying engineering principles to the design of biological systems. These principles include standardization, modularity, and design abstraction. The field still faces substantial challenges, including long development times, high rates of failure, and poor reproducibility. A common factor of these challenges is the exchange of information about designed systems between laboratories. When designing a synthetic system, synthetic biologists need to exchange information about multiple types of molecules and their expected behavior in the design. Furthermore, there are often multiple degrees of separation between a specified nucleic acid sequence (e.g., a sequence that encodes an enzyme or transcription factor) and the molecular interactions that a designer intends to result from said sequence (e.g., chemical modification of metabolites or regulation of gene expression), yet these different perspectives need to be connected together in the engineering of biological systems.
2015
Associated abstract at IWBDA'15
Abstract
The initial version of the Synthetic Biology Open Language (SBOL) was designed for the exchange of information about biological designs at the DNA level. As the field of synthetic biology matures, however, there is a clear need to extend SBOL to capture the function of biological designs and their structure beyond annotated DNA sequences [2]. To support the specification of increasingly complex and diverse biological designs, standards need to represent data on both biological structure and function in a modular, hierarchical fashion. These include data on biological interactions, which are especially important for the functional composition of biological components, and meta-data on computational models, which are important for linking biological designs to more detailed descriptions of their behavior in specific biological contexts.
2015
Signal-to-noise ratio measures efficacy of biological computing devices and circuits
Abstract
Engineering biological cells to perform computations has a broad range of important potential applications, including precision medical therapies, biosynthesis process control, and environmental sensing. Implementing predictable and effective computation, however, has been extremely difficult to date, due to a combination of poor composability of available parts and of insufficient characterization of parts and their interactions with the complex environment in which they operate. In this paper, the author argues that this situation can be improved by quantitative signal-to-noise analysis of the relationship between computational abstractions and the variation and uncertainty endemic in biological organisms.
2015
Accurate Predictions of Genetic Circuit Behavior from Part Characterization and Modular Composition
Abstract
A long-standing goal of synthetic biology is to rapidly engineer new regulatory circuits from simpler devices. As circuit complexity grows, it becomes increasingly important to guide design with quantitative models, but previous efforts have been hindered by lack of predictive accuracy. To address this, we developed Empirical Quantitative Incremental Prediction (EQuIP), a new method for accurate prediction of genetic regulatory network behavior from detailed characterizations of their components. In EQuIP, precisely calibrated time-series and dosage-response assays are used to construct hybrid phenotypic/ mechanistic models of regulatory processes. This hybrid method ensures that model parameters match observable phenomena using phenotypic formulation where current hypotheses about biological mechanisms do not agree closely with experimental observations. We demonstrate EQuIP’s precision at predicting distributions of cell behaviors for six transcriptional cascades and three feed-forward circuits in mammalian cells. Our cascade predictions have only 1.6-fold mean error over a 261-fold mean range of fluorescence variation, owing primarily to calibrated measurements and piecewise-linear models. Predictions for three feed-forward circuits had a 2.0-fold mean error on a 333-fold mean range, further demonstrating that EQuIP can scale to more complex systems. Such accurate predictions will foster reliable forward engineering of complex biological circuits from libraries of standardized devices.
2015
Bridging the Gap: A Roadmap to Breaking the Biological Design Barrier
Abstract
This paper presents an analysis of an emerging bottleneck in organism engineering, and paths by which it may be overcome. Recent years have seen the development of a profusion of synthetic biology tools, largely falling into two categories: high-level “design” tools aimed at mapping from organism specifications to nucleic acid sequences implementing those specifications, and low-level “build and test” tools aimed at faster, cheaper, and more reliable fabrication of those sequences and assays of their behavior in engineered biological organisms. Between the two families, however, there is a major gap: we still largely lack the predictive models and component characterization data required to effectively determine which of the many possible candidate sequences considered in the design phase are the most likely to produce useful results when built and tested.
2015
Model-Driven Engineering of Gene Expression from RNA Replicons
Abstract
RNA replicons are an emerging platform for engineering synthetic biological systems. Replicons self-amplify, can provide persistent high-level expression of proteins even from a small initial dose, and, unlike DNA vectors, pose minimal risk of chromosomal integration. However, no quantitative model sufficient for engineering levels of protein expression from such replicon systems currently exists. Here, we aim to enable the engineering of multigene expression from more than one species of replicon by creating a computational model based on our experimental observations of the expression dynamics in single- and multi-replicon systems.
2015
Proposed Data Model for the Next Version of the Synthetic Biology Open Language
Abstract
While the first version of the Synthetic Biology Open Language (SBOL) has been adopted by several academic and commercial genetic design automation (GDA) software tools, it only covers a limited number of the requirements for a standardized exchange format for synthetic biology. In particular, SBOL Version 1.1 is capable of representing DNA components and their hierarchical composition via sequence annotations. This proposal revises SBOL Version 1.1, enabling the representation of a wider range of components with and without sequences, including RNA components, protein components, small molecules, and molecular complexes. It also introduces modules to instantiate groups of components on the basis of their shared function and assert molecular interactions between components. By increasing the range of structural and functional descriptions in SBOL and allowing for their composition, the proposed improvements enable SBOL to represent and facilitate the exchange of a broader class of genetic design.
2014
Precision Design of Expression from RNA Replicons
Abstract
RNA replicons are an emerging platform of increasing interest, particularly for vaccination and therapeutic applications [3]. A replicon is based on a virus, but replaces the infective capsid proteins with engineered “payload” genes [5]. Here we focus on replicons derived from alphavirus, a positivestrand RNA virus, with architecture and lifecycle shown in Figure 1: the replicon RNA begins with a complex of nonstructural proteins (NSPs) that create “viral factories” where it replicates [4]. A sub genomic promoter next induces production of shorter mRNAs containing engineered payload genes, which are translated to produce the proteins encoded by the payload sequences. Finally, both mRNA and proteins are removed by normal processes of dilution and decay.
2014
Abstract
The re-use of previously validated designs is critical to the evolution of synthetic biology from a research discipline to an engineering practice. Here we describe the Synthetic Biology Open Language (SBOL), a proposed data standard for exchanging designs within the synthetic biology community. SBOL represents synthetic biology designs in a community-driven, formalized format for exchange between software tools, research groups and commercial service providers. The SBOL Developers Group has implemented SBOL as an XML/RDF serialization and provides software libraries and specification documentation to help developers implement SBOL in their own software. We describe early successes, including a demonstration of the utility of SBOL for information exchange between several different software tools and repositories from both academic and industrial partners. As a community-driven standard, SBOL will be updated as synthetic biology evolves to provide specific capabilities for different aspects of the synthetic biology workflow.
2013
Functional synthesis of genetic regulatory networks
Abstract
As synthetic biologists improve their ability to engineer complex computations in living organisms, there is increasing interest in using programming languages to assist in the design and composition of biological constructs. In this paper, we argue that there is a natural fit between functional programming and genetic regulatory networks, exploring this connection in depth through the example of BioProto, a piggyback DSL on the Proto general-purpose spatial language. In particular, we present the first formalization of BioProto syntax and semantics, and compare these to the formal syntax and semantics of the parent language Proto. Finally, we examine the pragmatics of implementing BioProto and challenges to proving correctness of BioProto programs.
2013
How can AI help Synthetic Biology?
Abstract
Our primary goal in this talk is to draw the attention of the AI community to a novel and rich application domain, namely Synthetic Biology. Synthetic biology is the systematic design and engineering of biological systems. Synthetic organisms are currently designed at the DNA level, which limits the complexity of the systems. In our talk we will introduce the domain, describe the current workflow used by synthetic biologists, and demonstrate the feasibility of progress in this domain. Problems specific to each AI topic area will be highlighted.
2013
Synthetic Biology Open Language Visual: an ontological use case
Abstract
Synthetic Biology Open Language (SBOL) is a data exchange standard for the specification of forward engineered genetic designs (Galdzicki 2012). SBOL Visual is the graphical counterpart to SBOL, used to represent designs in a human readable manner. The central element in the SBOL data model is the DNA Component, which represents the design of a contiguous piece of DNA. DNA Components have an assigned functional role, generally referred to as ‘part type’ among synthetic biologists and which is analogous to the feature keys in annotated DNA sequences.
2013
Accurate Predictions of Genetic Circuit Behavior from Part Characterization and Modular Composition
Abstract
A long-standing goal of synthetic biology is to rapidly engineer new regulatory circuits from simpler regulatory elements [8, 16, 2, 7]. As the complexity of engineered circuit’s increases, it becomes increasingly important to utilize quantitative models to guide circuit construction effectively, but previous efforts have been hindered by lack of accuracy in predictions of circuit behavior [13, 10]. To address this shortcoming, we have developed Empirical Quantitative Incremental Prediction (EQuIP), a new method for accurate prediction of genetic regulatory network behavior. EQuIP predictions are based on a compos able black-box model derived solely from empirical observations of steady state and dynamic behavior.
2013
Online Tools for Characterization, Design, and Debugging
Abstract
The engineering of biological systems can be greatly aided by better models, derived from high-quality characterization data, and by better means for designing and debugging new genetic circuits. Web-based tools and repositories have proven a successful approach to distributing such techniques, particularly because the centralization of infrastructure greatly decreases adoption cost for new users. Notable examples include the Parts Registry [8], the RBS calculator [10], GeneDesign [9], GenoCAD [4], BioFab [7], and JBEI ICE [6].
2013
Synthetic Biology Open Language Visual: An Open-Source Graphical Notation for Synthetic Biology
Abstract
The Synthetic Biology Open Language Visual (SBOL Visual) project is an effort toward developing a community-driven open standard for visual representation of genetic designs. Standardized visual notation for communicating designs has proven to be useful in many engineering disciplines. A de facto visual notation does exist in synthetic biology; however, it is incomplete, is often extended ad hoc, and exists as a poorly defined, voluntary, communal convention rather than an explicit standard. Because synthetic biology endeavors often require a multidisciplinary team, a common visual system of communication with well-defined semantics is vital.
2013
Recent Advances in the Synthetic Biology Open Language
Abstract
A significant concern in the synthetic biology community is the difficulty in reproducing results reported in the literature [5]. To address this problem, in 2008, a small group of researchers proposed the development of the synthetic biology open language (SBOL), an open-source standard for the exchange of genetic designs. In 2011, the first version of the SBOL core data model was released [2]. In 2013, the first version of a standard for visualization of genetic designs expressed in SBOL was also released [6]. Leveraging libSBOLj, a java-based library for SBOL’s core data model, 18 software tools now support SBOL. While this represents excellent progress, there is still a lot of work to do.
2013
Synthetic Biology Open Language Visual (SBOL Visual), version 1.0.0
Abstract
The Synthetic Biology Open Language Visual (SBOL Visual) project is an effort to create an open-source graphical notation to support the description and specification of genetic designs. SBOL Visual is intended for use by biological engineers in forward engineering projects. It aims to encourage and support model-driven engineering by establishing a common set of symbols.
2012
An End-to-End Workflow for Engineering of Biological Networks from High-Level Specifications
Abstract
We present a workflow for the design and production of biological networks from high-level program specifications. The workflow is based on a sequence of intermediate models that incrementally translate high-level specifications into DNA samples that implement them. We identify algorithms for translating between adjacent models and implement them as a set of software tools, organized into a four-stage toolchain: Specification, Compilation, Part Assignment, and Assembly. The specification stage begins with a Boolean logic computation specified in the Proto programming language. The compilation stage uses a library of network motifs and cellular platforms, also specified in Proto, to transform the program into an optimized Abstract Genetic Regulatory Network (AGRN) that implements the programmed behavior.
2012
Synthetic Biology Open Language (SBOL) Version 1.1.0
Abstract
In this BioBricks Foundation Request for Comments (BBF RFC), we specify the Synthetic Biology Open Language (SBOL) Version 1.1.0 to enable the electronic exchange of information describing DNA components used in synthetic biology. We define: 1. the vocabulary, a set of preferred terms and 2 the core data model, a common computational representation.
2012
Automated Selection of Synthetic Biology Parts for Genetic Regulatory Networks
Abstract
Raising the level of abstraction for synthetic biology design requires solving several challenging problems, including mapping abstract designs to DNA sequences. In this paper we present the first formalism and algorithms to address this problem. The key steps of this transformation are feature matching, signal matching, and part matching. Feature matching ensures that the mapping satisfies the regulatory relationships in the abstract design. Signal matching ensures that the expression levels of functional units are compatible. Finally, part matching finds a DNA part sequence that can implement the design. Our software tool MatchMaker implements these three steps.
2012
Abstract
The TASBE (A Tool-Chain to Accelerate Synthetic Biological Engineering) project [2] developed a tool-chain (Figure 1) to design and build synthetic biology systems. These tools convert a circuit description written in a high-level language to an implementation in cells, assembled with laboratory robots. Each tool addresses a different sub-problem. This paper describes each tool and its key results.
2012
Toward Automated Design of Cell State Detectors
Abstract
There are a wide range of applications in which it would be useful to have a small synthetic biology circuit that could reliably classify cell state. For example, in [5], the authors propose cancer therapy based on a circuit that uses miRNA markers to test whether a cell belongs to a particular type of cancer and then kills only those cells. The authors then demonstrate a miRNA classifier that can distinguish between HeLa cells and several other cell lines. This same approach might be applied to therapeutics for many other diseases, as well as for high-precision assays that can monitor the cell-by-cell progress of a disease being studied, and for many other possible applications.
2012
A Method for Fast, High-Precision Characterization of Synthetic Biology Devices
Abstract
Engineering biological systems with predictable behavior is a foundational goal of synthetic biology. To accomplish this, it is important to accurately characterize the behavior of biological devices. Prior characterization efforts, however, have generally not yielded enough high-quality information to enable compositional design. In the TASBE (A Tool-Chain to Accelerate Synthetic Biological Engineering) project we have developed a new characterization technique capable of producing such data. This document describes the techniques we have developed, along with examples of their application, so that the techniques can be accurately used by others.
2011
Bridging Biology and Engineering Together with Spatial Computing
Abstract
Biological systems can often be viewed as spatial computers: space-filling collections of computational devices with strongly localized communication. Applying a continuous-space abstraction allows the behavior of such systems to be modeled or specified in terms of aggregate geometry and information flow. This can simplify both the engineering of biological systems and the application of biological models to the engineering of non-biological systems, as illustrated by examples from synthetic biology and morphogenetic engineering.
2011
Abstract
The field of synthetic biology promises to revolutionize our ability to engineer biological systems, providing important benefits for a variety of applications. Recent advances in DNA synthesis and automated DNA assembly technologies suggest that it is now possible to construct synthetic systems of significant complexity. However, while a variety of novel genetic devices and small engineered gene networks have been successfully demonstrated, the regulatory complexity of synthetic systems that have been reported recently has somewhat plateaued due to a variety of factors, including the complexity of biology itself and the lag in our ability to design and optimize sophisticated biological circuitry.
2011
High-Level Programming Languages for Bio-Molecular Systems
Abstract
In electronic computing, high-level languages hide much of the details, allowing non-experts and sometimes even children to program and create systems. High level languages for bio-molecular systems aim to achieve a similar level of abstraction, so that a system might be designed on the basis of the behaviors that are desired, rather than the particulars of the genetic code that will be used to implement these behaviors. The drawback to this sort of high-level approach is that it generally means giving up control over some aspects of the system and having decreased efficiency relative to hand-tuned designs.
2011
TASBE: A Tool-Chain to Accelerate Synthetic Biological Engineering
Abstract
There is a pressing need for design automation tools for synthetic biology systems. Compared to electronic circuits, cellular information processing has more complex elementary components and a greater complexity of interactions among components. Moreover, chemical computation within a cell is strongly affected both by other computations simultaneously occurring in the cell and by the cell’s native metabolic processes and its external environment. This complexity implies an engineering work-flow that is currently highly iterative, error-prone, and extremely slow—critical problems that must all be addressed in order to realize the potential of synthetic biology.
2011
Abstract
Synthetic biology is an emerging field in which biologists modify or design the behavior of organisms to engineer systems that perform computation in diverse biological applications. Synthetic biologists design such a complex system by composing basic functional units—e.g., a promoter or a gene—into a regulatory network that exhibits the desired transcriptional behavior. As the desired behavior becomes more sophisticated, the size of the network grows, the complexity of the design becomes an impending concern [2], and its assembly and verification, an arduous task.
2008
Cells Are Plausible Targets for High-Level Spatial Languages
Abstract
High level languages greatly increase the power of a programmer at the cost of programs that consume more resources than those written at a lower level of abstraction. This inefficiency is a major concern for the programming of biological systems: although advances in synthetic biology are beginning to allow bacteria to be programmed at an “assembly language” level, metabolic and chemical constraints currently place tight limits on the computational resources available. We find, however, that the semantics of the Proto spatial computing language appear to be a good match for engineered genetic regulatory networks, and particularly for describing the spatial differentiation necessary to construct tissues or organs.