Integrating Systems and Synthetic Biology to Understand and Engineer Microbiomes

Microbiomes are complex and ubiquitous networks of microorganisms whose seemingly limitless chemical transformations could be harnessed to benefit agriculture, medicine, and biotechnology. The spatial and temporal changes in microbiome composition and function are influenced by a multitude of molecular and ecological factors. This complexity yields both versatility and challenges in designing synthetic microbiomes and perturbing natural microbiomes in controlled, predictable ways. In this review, we describe factors that give rise to emergent spatial and temporal microbiome properties and the meta-omics and computational modeling tools that can be used to understand microbiomes at the cellular and system levels. We also describe strategies for designing and engineering microbiomes to enhance or build novel functions. Throughout the review, we discuss key knowledge and technology gaps for elucidating the networks and deciphering key control points for microbiome engineering and highlight examples where multiple omics and modeling approaches can be integrated to address these gaps.


INTRODUCTION
Diverse communities of microorganisms inhabit every known environment, including oceans, soil, the surface and proximity of plants, and the intestines of humans, animals, and insect wing to the myriad functions they perform, from biogeochemical cycling of nutrients to transforming dietary substrates into nutrients for multicellular hosts, microbiomes attract immense attention from industry and academic researchers alike. Efforts to understand and engineer microbiomes frequently require integrated approaches that blur the lines between microbiology, ecology, medicine, computer science, mathematics, and engineering. Natural and synthetic microbiomes that robustly perform target functions could be exploited to address grand challenges in human health, agriculture, bioremediation, and bioprocessing that face society.
Target microbiome engineering goals include the ability to predictably modulate community composition, enhance existing functions, or install novel capabilities. Harnessing the properties of microbiomes remains difficult because we do not yet fully understand the molecular and ecological mechanisms that govern systems-level behaviors, and therefore we lack the capability to predict their multifunctional properties. Microbiomes are immensely complex; they can consist of hundreds to thousands of organisms, exhibit spatial and temporal variability, and establish dynamic feedback loops with the environment. A detailed and quantitative understanding of microbiomes could ultimately inform the design of interventions to predictably modify system properties or guiding principles for how to construct desired community functions from the bottom up. Exploiting and understanding the full functional potential of microbiomes necessitate integration of multiple experimental and computational methodologies that bridge many different disciplines.
Here, we describe tools currently used to understand microbiomes in diverse habitats and outline how meta-omics tools may be used and integrated to characterize microbiome composition and function. We describe representative case studies that showcase the various spatial and temporal scales that influence the composition and collective function of microbiomes, and we describe how interactions between microorganisms can lead to emergent functions that cannot be predicted on the basis of each community member's behaviors in isolation. We discuss the relative advantages of several microbiome modeling approaches of varying degrees of coarse-graining and highlight recent efforts to integrate multi-omics data and multiscale considerations into a single model. Finally, we present recent successes in microbiome engineering and biocontainment of engineered communities and provide perspectives on how to move the field closer to wide-scale deployment of engineered communities for broad applications in bioenergy, agriculture, and human health. Throughout the review, we highlight opportunities to improve our understanding of the causal links between microbiome composition and function and our ability to engineer them for societal benefit.
as the gut-brain axis (9)(10)(11), although work remains in parsing the causality of these observed associations (12). Evolutionary theory informs the hypothesis that although many host-associated microbes are symbiotic, they are also under strong selective pressure to compete with other community members even at the cost of their beneficial functions to the host (13,14).

Microbial Interactions Determine Microbiome Behavior and Stability
Microbial interactions are multidimensional and diverse mechanisms can combine to influence the net effect of one organism on another. At a coarse-grained level, the net effect of an organism on another can be represented as competitive (−/−), mutualist (+/+), predative (+/−), neutral (0/0), amensal (0/−), or commensal (0/+). Positive and negative interactions can be established by diverse molecular mechanisms, including resource competition and release of molecules, that affect growth rates and metabolic activities (15). Microbial interactions change as a function of time due to intracellular networks that sense and respond to environmental shifts. Nevertheless, simplified mathematical models that represent single-organism growth parameters and pairwise microbial interactions have been shown to predict multispecies community behaviors (16)(17)(18). How these microbial interaction networks map to community-level properties such as stability, diversity, and function remains largely unresolved. Theoretical work has demonstrated that negative interactions can drive networks toward stability (19), whereas positive interactions have been associated with enhanced diversity and metabolic activities (20,21). To gain a predictive understanding of how microbial interaction networks are shaped by environmental stimuli and combine to generate community-level properties, we must consider the molecular mechanism of interactions as well as how they change as a function of time and spatial proximity.
Stability: rate of return to a system's steady state in response to an environmental perturbation

Spatial Organization and Biofilms Are Key Properties of Microbiome Resilience
Microbial populations exhibit heterogenous spatial distributions in natural environments. The spatial organization of a microbiome is a major determinant of systems-level properties, including community metabolism, and response to environmental perturbations, such as antibiotics (22)(23)(24). In biofilms, a predominant form of microbial life in nature (25), the occurrence of intracolony channels can mediate the transfer of molecules to cells that are deeply embedded in the matrix (26). Although biofilms can be exploited for specific applications such as lignocellulose valorization (27) and environmental remediation (28), they can be detrimental in the context of human disease (29). Beyond biofilms, microbes can also self-organize to establish complex physical structures in the environment. For example, fungi play an integral role in maintaining the spatial structure of the soil matrix and soil bacteria can use fungal phyla to traverse pores in the soil that would otherwise prohibit dispersal (30)(31)(32). Therefore, the spatial organization of a microbiome can be harnessed or targeted to alter system properties.
At a smaller scale, membrane proteins such as receptors and transporters are key drivers of microbiome function and are involved in cellular processes as diverse as sensing and uptake of nutrients, secretion of enzymes and small molecules, virulence, motility, and adhesion. For example, bacterial two-component systems are involved in quorum sensing (33) and chemotaxis (34), and substrate-specific transporter proteins are central to metabolic crossfeeding and flux through biochemical pathways (35). These proteins enable microbial constituents to respond to changes as they enter and exit habitats to which they are well (or ill) adapted, all while maintaining a favorable intracellular environment for metabolism, and they can be engineered to influence substrate utilization, production of target molecules, and formation of biofilms (36)(37)(38)(39) (see the sidebar titled Membrane Proteins).

MEMBRANE PROTEINS
Membrane proteins are at the core of how microorganisms interact with each other and with their environment. All cells navigate their environment by using membraneembedded receptor proteins to sense and respond to changes in their environment. Typical receptor proteins, such as bacterial two-component systems, have a domain that is exposed to the extracellular environment and that recognizes a ligand with high specificity. Ligand binding induces a conformational shift in the receptor protein that leads to an intracellular response mediated, for example, by enzymatic cascades or by protein-DNA interactions. The extracellular ligands comprise a tremendous variety of ions and molecules (215). Motor proteins allow cells to swim toward nutrients and away from toxins; adhesion proteins facilitate cell-cell connections and adhesion to the environment. Meanwhile, a large variety of transporter proteins enable the efficient and highly selective uptake of nutrients as well as secretion of waste products, enzymes, and effector molecules. About one-quarter of the genes in any typical organism appear to encode integral membrane proteins that are involved in cell development and growth, energy conversions, communication, mobility, defense, and virulence (216,217).

Investigating and Designing Microbiomes from the Bottom Up and Top Down
Microbiomes can be studied and engineered through two different and complementary approaches: top down and bottom up (15,40,41). A top-down approach investigates natural communities by introducing them into highly controlled laboratory environments. Such topdown manipulations can be used to understand microbiome dynamics and functions in response to environmental inputs (e.g., nutrient availability or antibiotic stress) (42). By contrast, isolated species can also be assembled in vitro to form synthetic microbial communities, which have reduced complexity compared with natural microbiomes and greater controllability via manipulation of initial community composition (16,43). Although molecular and ecological mechanisms of synthetic communities can be more easily dissected, these simplified systems can display reduced temporal stability in composition and/or function, limiting their deployment in real-world environments and for biotechnological applications (41). At the core, if we understand the temporal changes in "which microbe is there" and "which microbe can do what, when, and how," we become better equipped to tailor microbiomes for, say, medical and agricultural purposes. A promising approach is to combine ecological studies, quantitative measurements, and computational modeling to map the functional potential of microbiomes with increasing resolution (44).
Synthetic microbial community: combination of three or more individual microorganisms (often not isolated from the same habitat) in a single environment

TOOLS FOR UNDERSTANDING MICROBIOMES
To harness the properties of microbiomes, we must develop tools that decipher which microbes have the capability and flexibility to perform specific functions, quantify their functional activities across space and time, and decipher interactions between organisms and between organisms and the environment. High-throughput sequencing has significantly enhanced our ability to investigate microbiome composition and functional activities, as today's next-generation and emerging so-called third-generation sequencing tools (45) can rapidly process billions of DNA base pairs with continuous read lengths greater than 100 kbp (46) (>2% of the average bacterial genome size). These technologies enable characterization of phylogeny (amplicon sequencing), functional potential (shotgun metagenomics), and gene expression (metatranscriptomics) in thousands of species or synthetic communities simultaneously. Beyond nucleic acid sequencing, metaproteomics and metabolomics can analyze the activities of microbiomes by quantifying the abundance of enzymes that perform key chemical transformations and the metabolites that mediate interspecies interactions ( Figure 2). Such meta-omics tools can be applied to quantify the spatial distribution of organisms within microbiomes, characterize low-abundance members and assess cellular heterogeneity, identify the organisms that perform key chemical transformations, and elucidate the web of metabolic interactions that ultimately drives microbiome functions.

Quantifying Microbiome Composition and Functional Potential via Metagenomics
Quantitative measurements of microbial abundance are critical for understanding the spatial and temporal behaviors of microbiomes. Microbiome composition is frequently determined via highly conserved marker genes for ribosomal RNA (rRNA), usually the 16S rRNA gene in prokaryotes and the 18S, 28S, and internal transcribed spacer regions in eukaryotes (47). Amplicon sequencing is particularly useful for characterizing community composition in systems contaminated by host DNA or in samples with low DNA template concentrations. However, these methods provide a genus-level resolution, and sequence-dependent variations in (nominally) universal primer affinity between clades, along with extraction efficiencies of DNA and variation in gene copy numbers (48,49), can bias abundance results.
In shotgun metagenomics, a library is constructed with all community DNA and the reads can be assembled into genomes of individual species called metagenome-assembled genomes (MAGs). These genome sequences provide simultaneous quantification of the microbiome's functional potential and its phylogenetic composition. Shotgun metagenomics has several advantages, including finer phylogenetic resolution than amplicon-based sequencing down to the strain level (50)(51)(52)(53) and detection of viral DNA, with the trade-off that more reads are needed to confidently quantify more genes. To probe rare microbes with potentially unique functions, researchers have leveraged DNA extraction methodologies targeted for different species to assemble near-complete genomes of bacteria present below 1% relative abundance within a community by employing differential coverage binning (54). In addition, the variations in genome copy number in different regions of the chromosome have been used to infer the bacterial replication rates in natural environments (55), providing key insights into the distribution of metabolic activity states within a community. Differential coverage binning: extraction of nucleic acid samples by multiple techniques to improve resolution of species and strains Sequencing-based methods provide relative abundance or compositional data, which presents challenges for statistical analyses and can lead to spurious correlations. Therefore, methods to quantify absolute abundance, including determining correlations between organisms, growth rates, per-cell metabolic activities, or total microbial loads present in a host, are critical for understanding microbiomes. Absolute DNA-based quantification of microbiome composition is a major challenge that has been approached with spike-in (56), quantitative polymerase chain reaction (PCR) (57), flow cytometry (58), and total DNA quantification (59) methods; however, all of these methods have inherent biases and limitations. Recently, amplicon sequencing was coupled with digital PCR in a microfluidic format for absolute DNA-based quantification of microbiome composition with the advantage of evaluating concurrently the limits of both clade detection and clade quantification (60).
Researchers use sequencing methods to study the spatial distribution of clades within a microbiome by sampling from different locations. This approach has been used to map the biogeography of the mammalian gut (61) and to demonstrate how anaerobic digestion communities self-assemble into distinct microbiomes when reactors are connected in series (62). Although the spatial resolution that can be achieved by sampling different locations is limited (~20 μm), micron-and nanometer-level spatial variation can be elucidated by imaging approaches (see the sidebar titled Microbiome Imaging). To uncover how spatial clustering of clades influences community functions and interactions, researchers must elucidate both the spatial distribution of functions and the identity of organisms.

MICROBIOME IMAGING
Fluorescence in situ hybridization (FISH) and mass spectrometry imaging (MSI) techniques may be used to visualize localization of phylogenetic groups and specific molecules and isotopes, respectively. In FISH, a fluorescent reporter is attached to a probe with a nucleic acid sequence complementary to that of a target sequence in the microbiome, often for ribosomal RNA. The specificity of the FISH probe can vary, targeting an entire kingdom to a single species or genus, and if multiple probes are used for one sample, a different fluorescent reporter can be attached to each probe for simultaneous resolution of numerous microbial clades at the micron scale (218,219).
In MSI, the spatial organization of both microbes and small molecules within a microbiome can be quantified over nanometer lengths, as reviewed in Reference 220. When integrated with FISH (221) or stable isotope probing (SIP) (222; described below), nanoscale secondary ion mass spectrometry (NanoSIMS) can map the location of both microbes and specific metabolites. In SIP-NanoSIMS, the fate of specific moieties derived from an isotope-labeled substrate can be quantified with nanometer resolution and linked to microbial identity, enabling quantification of localized functional activity within microbiomes.
In vivo imaging of microbiomes in systems such as the human gut promises to improve our understanding of host-microbiota interactions and how these vary in space and time. However, in the gut and other anaerobic systems, such as tumor interiors, microbe labeling with fluorescent proteins is difficult, as most fluorescent proteins require oxygen to develop their chromophores, and anaerobic fluorescent proteins are typically not bright enough to be deployed in vivo. Researchers have overcome these limitations by fluorescently labeling selected gut bacteria and their polysaccharides with bio-orthogonal click chemistry (222a) and by growing and imaging bacterial communities in low oxygen concentrations in which obligate anaerobes survive but chromophores still form (222b). See References 104 and 222c for reviews and perspectives on live-cell anaerobic imaging and related in vivo applications.
By evaluating putative metabolic pathways, as well as which metabolites may be taken up or secreted by certain microbes, researchers can use the information in MAGs to predict the chemical transformations the system is capable of performing. For example, lignocellulosedegrading enzymes in the porcupine gut microbiome were identified via shotgun metagenomics and expressed in Escherichia coli, leading to the discovery of an active endo-1,4-β-xylanase even though the microbe encoding this gene was unknown (63). Accurate predictions of functional potential rely heavily on reference genomes and metagenomics datasets, available in databases such as those outlined in Table 1.
Our ability to collect metagenomics data has outpaced our ability to functionally annotate the data for interpretation of biological context (i.e., which microbe has the capability to do what in the microbiome). In genetically tractable organisms, a powerful approach to functionally annotate genes involves quantification of strain fitness within pooled genomewide mutant libraries that can be grown in monoculture or coculture (64) across many different environmental conditions (65). However, because most strains lack genetic tools, bioinformatic approaches, including the Integrated Gut Genomes Database and its associated IGGsearch tool (66) and the Distilled and Refined Annotation of Metabolism (DRAM) tool (67), can be used to analyze large sequencing datasets containing potentially thousands of interacting species.

Using Metatranscriptomics to Map Organism Identities to Functional Activities
Metatranscriptomics, by which total community messenger RNA (mRNA) is extracted, reverse transcribed to complementary DNA, and sequenced, provides insight into the potential functions performed by organisms within a community as well as which microbes may be performing them. For example, metagenomics and metatranscriptomics were combined to identify sugar-fermenting and fatty-acid-chain-elongating microbes in an anaerobic bioreactor and to propose routes of metabolite exchange between the clades (42).
By quantifying nitrogen fixation transcripts, Gómez-Godínez et al. (68) reported that Azospirillum brasilense was the predominant nitrogen-fixing bacterium in a synthetic consortium of plant growth-promoting bacteria on maize roots. Although metatranscriptomics also generates many unannotated hypothetical genes, the ability to identify genes that change across different conditions greatly facilitates the downstream identification of genes involved in microbiome processes, including the breakdown of lignocellulose or the biogeochemical cycling of elements. Further, investigating the genomewide transcriptional activity of organisms within a community may guide the development of hypotheses about mechanisms involved in observed microbiome states (e.g., dysbiosis and steady-state recovery after perturbation). For example, a combined metagenomics and metatranscriptomics study of the fecal microbiome of 308 adult men revealed that pathways that were encoded in the genomes of many members of the microbiome were actually transcribed by a small subset of species (69).
Notably, even when reference genomic data are lacking, metatranscriptomics can be used to mine microbiomes for enzymes with a desired function. A transcriptomics survey of anaerobic gut fungi harvested from the intestinal tract of herbivores revealed that these unusual and understudied eukaryotes produce an unrivaled array of biomass-degrading enzymes (70), marking these fungi as attractive targets for sourcing of valuable enzymes (71). Similarly, He et al. (72) identified 125,252 putative CAZymes in a sheep gut microbiome, most of which had less than 75% identity to known proteins in the CAZy database or the National Center for Biotechnology Information (NCBI) database, but 19 of the 30 tested candidates showed cellulase activity when heterologously expressed.

CAZyme: carbohydrate-active enzyme that degrades biomass into smaller, fermentable sugars
In microbes, mRNA transcripts represent less than 10% of the total RNA; therefore, rRNA should typically be removed prior to sequencing. Methods for prokaryotic rRNA depletion vary in efficacy on the basis of microbiome type (biofilm versus planktonic) and composition, and tool development is an active area of research (73). Methods for single-cell prokaryotic rRNA depletion are only beginning to show some success (74,75) and will be useful for interrogating the unique activities of low-abundance organisms as well as the cellto-cell heterogeneity of gene expression within a given species that is not observable with bulk methods. The identification of optimal spatial locations and time points to discover ecological driver organisms or novel biochemical pathways mediating microbiome functions remains unresolved. Further, because transcript number does not always correlate with protein abundance or activity, it is difficult to estimate from metatranscriptomics data alone the relative contribution of different metabolic reactions and pathways to the overall function of the microbiome.

Quantifying Microbiome Functional Capabilities via Metaproteomics
Metaproteomics, which studies all proteins recovered from a microbiome sample, can provide critical information about microbiome functional capabilities. Liquid chromatography can be coupled with mass spectrometry (LC-MS) or with tandem mass spectrometry (LC-MS/MS) to detect tens of thousands of peptides in one sample (76,77). In the future, nanopore-based devices hold promise to revolutionize proteomics and biotechnology by enabling amino acid sequencing of intact proteins, allowing structural characterization of proteins larger than those characterized by LC-MS (78). In addition to differential enzyme expression, metaproteomics may be employed to quantify the abundance of individual organisms in a microbiome on a biomass basis (79), which offers an alternative method for microbiome composition. Indeed, proteinaceous biomass may be a better representation of composition for systems composed of eukaryotic and prokaryotic cells where the size and weight of organisms vary significantly, but may be perhaps more biased in cases where the intracellular protein content varies widely across species.
Beyond prospecting for genes with predicted functions, metaproteomics can be used to identify posttranslational modifications as well as directly quantify the abundance of proteins in a community, which may not necessarily correlate with transcript abundances (80). This approach is greatly improved by integration with metagenomics analysis owing to the difficulties of mapping fragmented peptide sequences to genes. In particular, mining microbiomes for membrane proteins such as transporters is critical for elucidating molecular mechanisms involved in interspecies interactions but remains challenging owing to their low abundance compared with soluble proteins and technically and analytically challenging owing to their hydrophobicity (81).

Metabolomics Reveals the Chemical Repertoire of Microbiomes
Microbes are exquisite chemists and these chemical mediators produced and utilized by constituent community members are a major driving force of microbiome functions. Metabolomics can be used to detect small-molecule metabolites, including intermediates and end products of cellular metabolism. Integration of metabolomics with metagenomics is particularly useful for formulating hypotheses about the role of measured metabolites in interspecies interactions and microbiome functions, as metabolites typically cannot be assigned to specific organisms. Therefore, the integration of other meta-omics tools is necessary to determine which metabolic pathways are active in a community and to hypothesize how different metabolites may be utilized, released, and exchanged to form an integrated community metabolic network.
Many metabolomics approaches employ gas chromatography (GC) instead of LC to precede MS analysis, offering greater chromatographic separation of metabolites. Nuclear magnetic resonance spectroscopy offers an alternative, more quantitative measure of metabolites without the sample preparation and derivatization steps required in MS studies but typically cannot detect metabolites below micromolar concentration (82). Untargeted metabolomics seeks to characterize the structures of as many metabolites present in the sample as possible that can be identified. However, it is impossible to characterize all classes of metabolites with a single solvent and column chemistry, and many metabolites in databases remain unannotated (83). Therefore, strategies must be developed to predict unknown chemical structures and link them to the microbes and biosynthetic pathways. Recently, researchers (84) developed the Pickaxe tool for generating novel metabolites and predicting the enzymes and putative pathways based on Enzyme Commission numbers and the MINE (Metabolic In Silico Network Expansion) database in Table 1.
Enzyme Commission number: a numerical classification of a metabolic enzyme based on the chemical reaction it performs

Investigating Metabolic Flux in Microbial Communities
Metabolic flux analysis (MFA) can quantify the distribution (flux) of carbon in cellular metabolism, directly measuring the activity of metabolic networks. When this method is used, cells are typically exposed to 13 C-labeled carbon and the degree of labeling of biomass components like glycogen and proteins is quantified via GC-MS (85). On the basis of these data, researchers use an organism-specific metabolic model to estimate the flux through each pathway by using software such as METRAN. Although MFA has been used to study wellcharacterized and simplified communities (86), the challenge associated with assigning metabolites to microbes in complex communities has stymied the broader application of MFA. However, analysis of isotope-labeled peptides, which can be mapped to individual microbes with reference genomes by metaproteomics, may unlock MFA for microbiomes (87). Nevertheless, this method may perform best on communities composed of microbes with dissimilar metabolisms or of microbes that can be spatially separated. To quantify interspecies metabolic interactions and community-wide fluxes, researchers must develop metafluxomic protocols and software to translate isotopic labeling and multi-omics data to quantitative descriptions of community metabolic networks (88).

Stable Isotope Probing in Microbiomes
Stable isotope probing (SIP) has emerged as a promising technique to enrich for rare microbes, link microbe identities to functions, and investigate interaction networks within microbiomes. In DNA-SIP or RNA-SIP, isotope-labeled substrates are differentially taken up and incorporated into nucleic acids by community members. After extraction, nucleic acids are fractionated by density to simultaneously enrich for nucleic acids from rare microbes and link the affinity for the labeled substrate to microbe identity (89,90). For example, DNA-SIP was used to curate a complete genome of a member of the phylum Saccharibacteria with less than 1× coverage in the bulk metagenome and to decipher the metabolite exchange networks within the Saccharibacteria's surrounding community (91).
A major limitation is that the SIP culturing procedure may not mimic a microbiome's natural microenvironment. Although most microbes remain uncultivated, meta-omics analyses may elucidate clues for isolating and culturing previously uncharacterized species (66). Nucleic acid-SIP requires that isotopes be incorporated directly into nucleic acids, and the incubation time with the isotope influences the degree of community labeling. For example, short incubation times exclude isotope uptake by slow-growing microbes, and long incubation times may lead to nonspecific cross-feeding of isotopes across the community, skewing which clades initially metabolized the substrate (92,93).
In principle, SIP can also be used to quantify incorporation of labels into proteins (protein-SIP) (94), metabolites (metabolome-SIP), and phospholipid-derived fatty acids (PLFA-SIP), though linking these to microbe identity requires excellent reference genomes or concurrent metagenomics and/or metatranscriptomics analyses. When integrated with meta-omics analyses, SIP may link microbe identity to function and uncover interspecies metabolite exchange mechanisms and community-wide metabolic networks. For example, protein-SIP coupled with amplicon sequencing and shotgun metagenomics was used to map acetate metabolism to organism identity in anaerobic digester consortia (95). In addition, RNA-SIP, metagenomics, and metatranscriptomics have been integrated to characterize the predominant CO 2 fixation pathways and their transcribing microbes in deep-sea hydrothermal vent microbiomes at a range of temperatures (92). Finally, DNA-SIP has been combined with differential coverage binning to enhance resolution of MAGs with specific activity in anaerobic digesters (96). See Reference 97 for a review of SIP applied to in vivo and ex vivo human and animal gut systems.

Microfluidic Devices for Microbiome Fractionation, Analysis, and Cultivation
Microfluidic devices have provided major advances in cell culturing and analysis by enabling micron-level spatial precision, temporal control of environmental inputs, and ultrahigh-throughput analysis of cellular and molecular systems. For example, physical separation of microbes within a microfluidic device can be used to sort clades on the basis of function for single-cell analysis or subsequent cultivation. Using droplet microfluidics, Schaerli & Hollfelder (98) generated picoliter to nanoliter aqueous droplets at kilohertz rates to create millions of compartments to study microbial behaviors. When coupled with DNA barcoding, this approach has enabled processing of more than 50,000 cells per run to sequence and assemble genomes from single cells (99). Microfluidics-enabled sorting of cells with desired genomic traits or metabolic activities is useful for screening strains that display greater enzyme productivity while consuming a millionfold fewer reagents (100).
In addition to single-cell screening, droplet microfluidics has enabled massively parallelized, compartmentalized culturing of human-associated intestinal organisms to enrich for lowfitness community members that would otherwise be outcompeted in a well-mixed culture (101). In addition, droplet microfluidics has been used to decipher pairwise and higher-order interactions in microbial communities across different environmental conditions (102). Coupling droplet microfluidics to a microwell array platform, by way of droplet dye barcoding and fluorescent reporters for strain abundance or metabolic activities, was used to construct and analyze approximately 100,000 subcommunities composed of 19 soil microbes (103). Although these methods enable the characterization of many subcommunities, they have relied largely on fluorescence labeling of specific strains, which is restricted to organisms that can be manipulated with genetic tools. Therefore, label-free methods are needed to more broadly apply these techniques to study diverse communities, including anaerobic systems (104).
Lower-throughput but higher-precision microfluidic devices have been used to investigate the effects of spatial separation on the dynamics of bacterial signaling information transmission and metabolic cross-feeding (105). In addition, microfluidic platforms have also been developed to study host-microbiome interactions (106). The intestine-on-a-chip can recapitulate physiologically relevant microenvironments such as oxygen gradients found within the human intestine as well as mechanical forces reflecting peristalsis (107). Although these techniques can capture critical features to summarize natural microenvironments and enable detailed control of the environment, a major limitation is the complexity of the design of the platforms, which is a barrier to widespread adoption by and deployment to end users.

COMPUTATIONAL MODELS TO PREDICT MICROBIOME DYNAMICS AND FUNCTIONS
Mathematical models, based on ecological, thermodynamic, and biochemical principles, can be used to simulate microbiome population dynamics and metabolic functions on many length scales and timescales. These models range from data-driven differential equationbased models of community composition and interactions to mechanistic genome-scale models of metabolic flux and interspecies metabolite exchange (Figure 3). In the sections below, we discuss relative advantages and limitations of several modeling approaches and how they may be used to enable microbiome engineering.

Ordinary Differential Equation and Evolutionary Game Theory Models of Microbiome Population Dynamics and Interactions
Ordinary differential equation (ODE) models, such as the generalized Lotka-Volterra (gLV) model (16), have been used to model microbiome population dynamics with time series data of absolute organism abundance for parameter estimation and experimental validation (108,109). In the gLV model, the temporal changes in the abundance of each species are a function of its growth rate and intraspecies and interspecies interactions (110). Modified gLV models have been used to analyze dynamic behaviors, including the response to perturbations such as dilution rate and the response to antibiotics and temperature fluctuations (111,112). In addition, the inferred parameters of the gLV model can be visualized as an interaction network to examine the distribution of negative and positive interactions and to identify ecological driver species (16). The gLV model could be used to design community cultivation strategies to achieve desired community compositions and stability properties.
Evolutionary game theory (EGT) can also be used to model microbiome population dynamics (as described in 109,113,114). The fitness parameters that dictate the outcome of metabolic games in microbiomes can be difficult to estimate, as they are influenced by nonlinear environmental and intracellular conditions. For this purpose, EGT can be integrated with genome-scale models (GEMs) (described in the next section), enabling prediction of interspecies interactions and stable steady-state fluxes and species abundances. The system states at which each microbe locally maximizes its own growth, but not necessarily the global maximum community growth, can be identified as Nash equilibria and evolutionarily steady solutions (a subset of Nash equilibria) (115) or as asymptotically stable solutions to dynamic replicator equations, all of which pose candidates for stable coexistence states that the microbiome could exhibit. Evolutionary stability is a key factor to consider when designing bottom-up communities or altering the composition of a native system. Although gLV and EGT models capture context-dependent pairwise interactions, these models fail to capture higher-order or metabolite-based interactions in the community. Public goods games are, in principle, extendable to multispecies interactions and integrable with other modeling approaches like GEMs; some associated challenges are reviewed in Reference 113.
Public goods game: game in which participants can choose to donate their own resources to improve the fitness of the community

Predicting Microbiome Fluxes and Interactions with Genome-Scale Models and Machine Learning
Mechanistic GEMs can predict microbial behaviors in untested conditions and serve as useful platforms for synthesizing multi-omics data into one comprehensible and interactive format. GEMs mathematically represent an organism's metabolic pathways (with geneprotein-reaction associations in metadata) as a stoichiometric matrix of reactions and metabolites (116,117). Through flux balance analysis (FBA) (118) and related techniques, including flux variability analysis (119, 120), GEMs can be used to assess the effects of media and substrate changes and genetic edits on fluxes of target compounds; this approach has been used extensively to guide metabolic engineering (121)(122)(123). As metabolic engineers look to cocultures for specialty products (124,125) and as systems biology is applied to medicine (122,126), genome-scale modeling of microbiomes is becoming increasingly useful.

Automated Genome-Scale Reconstructions: Scaffolds for Genome-Scale Models
Advances in meta-omics tools described above have enabled semiautomated construction of metabolic networks for hundreds of species that compose a microbiome (tools reviewed in 127); however, manual curation is needed to accurately recapitulate metabolism in silico (128). For example, Magnúsdóttir et al. (129) published the AGORA (Assembly of Gut Organisms through Reconstruction and Analysis) database, complete with 773 genome-scale reconstructions of human gut microbes and simulated pairwise microbial interactions when fed different diets. Follow-up studies to this report highlight the distinction between a reconstruction and a context-specific predictive GEM (see the sidebar titled Metabolic Reconstructions and Models) and the importance of identifying and applying appropriate constraints before attempting to simulate metabolism in silico (130,131).

METABOLIC RECONSTRUCTIONS AND MODELS
A metabolic reconstruction (sometimes called a genome-scale network reconstruction) is a compact and accessible representation of metabolism linked to the genome and can feasibly be created for an entire microbiome at once given high-quality bioinformatic data, albeit with some experimental and bioinformatic gap-filling (223). A genome-scale model (GEM) based on a reconstruction is used to simulate metabolism. In communities, such simulations are limited to a few well-characterized species at a time, as they require a high-quality GEM for each constituent member; this necessitates that each member be cultivable in isolation in defined media. Importantly, a metabolic simulation will only ever be as accurate as the condition-specific constraints applied to it. These constraints dictate reaction reversibility and impose upper and lower bounds on the flux through each reaction in an organism or community (including transport and interspecies exchange) on the basis of enzyme turnover, nutrient uptake rates, and many other factors (224,225).
Databases exist for published genome-scale reconstructions such as BiGG (132) and ModelSEED (133). Significant progress has been made in cataloging reconstructions for the human gut microbiome and human body at large (129,134,135). These reconstructions serve as templates for context-specific GEMs, such as a known viral infection of a macrophage cell (136) or dysbiosis of the gut (137). In an important step toward universalizing GEMs over different sequence annotation styles, programming languages, and operating systems, Lieven et al. (138) recently published MEMOTE, which scores GEMs for completeness and feasibility. However, owing to the highly variable and environment-dependent nature of enzyme kinetics and transcriptional regulation, the individual constraints that make models predictive cannot yet be generally cataloged this way.

Constraining and Optimizing Community Genome-Scale Models
When a GEM is curated from a genome-scale network reconstruction, integration of multiomics data and experimental metabolite concentrations (139) facilitates filling in pathway gaps, constraining the solution space (140), and validating and improving simulations of microbiome function. To this end, Pandey et al. (141) published in 2019 REMI (Relative Expression and Metabolomic Integrations) for integrating transcriptomics, thermodynamic, and metabolomics data from differential expression analyses into GEMs, which Hadadi et al. (142) used to simulate the transition of Pseudomonas veronii from exponential to stationary phase, as well as from culture in liquid media to soil, demonstrating advancement in our ability to model microbial adaptation to environmental perturbations.
In addition to flux constraints, choosing and defining the objective function to be optimized are particularly challenging for community models (119). In single-organism FBA systems, the objective function is usually to maximize flux through a biomass-forming reaction based on the organism's macromolecular composition (i.e., to grow as fast as possible). For multispecies FBA systems, many optimization strategies exist. In so-called supraorganism approaches, the metabolic pathways in all organisms are combined into one stoichiometric matrix, and organisms are partitioned into separate compartments that exchange metabolites (109,143). Optimization of the weighted community growth rate may require individual species to grow at suboptimal growth rates, which is not always an accurate assumption (especially in competitively interacting systems). Bi-level optimization has been implemented to maximize both individual and community growth rates in algorithms such as OptCom (144) and CASINO (Community and System-Level Interactive Optimization) (145).
Objective function: the function to be optimized to select for particular solutions within a multidimensional solution space To model microbiome fluxes at steady state, community FBA (cFBA) (146) and SteadyCom (147) impose a fixed community growth rate that is adopted by all organisms; however, this is not always achievable in practice. Alternatively, EGT can be integrated with FBA to predict evolutionarily stable interactions and steady-state flux distributions (148). In other bi-level optimization strategies, a microbe may be predicted to produce a metabolite that benefits the community but does not necessarily maximize its own growth rate. To avoid imposing this forced altruism, Cai et al. (149) developed NECom, which predicts steadystate community fluxes and pairwise interactions by identifying Nash equilibria and removes any influence by the community optimization problem on a microbe's incentive to secrete a metabolite in community GEMs.
Although highly useful for predicting and describing microbial fluxes and interactions, GEMs require significant time and resources to construct and curate predictive models from automated reconstructions. Because a microbiome's composition must be fully defined to accurately apply FBA, GEMs are currently limited to bottom-up microbiomes with a few representative species, which often lack long-term stability. Further, experimental measurements of community growth rates and fluxes are critical to GEM validation improvement but remain a challenge to obtain for large (more than three members) communities.

Machine Learning Can Identify Complex Mappings Between Microbiome Inputs and Outputs
Machine learning can be employed to predict and link microbiome composition and functions by using multi-omics data (as reviewed in 150,151). Machine learning can learn complex and nonlinear relationships between microbiome inputs (e.g., cultivation conditions and initial species compositions) and outputs (e.g., metabolite concentrations, gene expression profiles, and community structure) that may enable design of novel microbial consortia with target functions without the need to rigorously characterize each organism in isolation. A major limitation is that these models may not provide insight into the biological mechanisms that generate the observed microbiome states. To address this challenge, researchers have integrated machine learning with GEMs to extract information from multiomics experiments and metabolic simulations that is relevant to the engineering objective at hand (152).

Dynamic Models of Microbiome Flux
Dynamic computational models are particularly useful when the time-varying microbiome fluxes or 3D structure is of interest (153). Dynamic FBA (dFBA) has been used extensively to model transient compositions and flux distributions in small (two to three members) microbial communities (see 119 for a review). In these cases, estimation of metabolite uptake and secretion kinetics is particularly important for accurately simulating growth and fluxes. Characterization of transporter membrane protein specificity and influx/efflux kinetics promises to significantly improve both dynamic and steady-state FBA simulations (154, 155) but remains a major challenge (156).
Spatial heterogeneity can be incorporated into community GEMs (153,157,158), typically by means of FBA to find each species' growth rate at each time step in numerical solutions of reaction-diffusion partial differential equations (159). Biofilms, in particular, are increasingly being modeled with spatiotemporal GEMs (21,158,160,161). With the goal of accounting for their moving boundary conditions brought on by film growth and expansion, it is useful to simulate biofilms as collections of individual microbes in so-called agentbased models (AbMs).

Agent-Based Models Simulate Predefined Physical and Metabolic Interactions in Microbiomes
In AbMs, microbes are treated as individuals with specified traits rather than as concentration state variables as in other methods. AbMs can capture gene regulation (162) and metabolic and mechanical interactions between microbes in a community, and they are well suited to model biofilm formation, deformation, and disruption (163). Metabolism can be coarse-grained to allow only for reactions involving exchange with the environment, or AbMs can integrate genome-scale metabolism to compute fluxes with FBA. The latter approach is taken in BacArena, an R package that was demonstrated to model spatiotemporal metabolic interactions among seven human gut microbes (164).
Although they are computationally demanding, AbMs are versatile models that suggest priority experiments for answering specific questions of microbiome function. For example, van Hoek et al. (165) simulated metabolism in the human large intestine with a coupled dFBA-mass transport model of individuals from a supraorganism metabacterium. Although they did not include experimental validation, their model generated hypotheses for the effects of diarrhea (a macroscale system state) on the microscale spatial organization and relative abundance of microbes with various flux profiles, which can be explored experimentally. Similarly, Doloman et al. (166) used a multispecies AbM (without genomescale metabolism) to predict cultivation conditions that maximize methane productivity in spatially structured anaerobic sludge granules.

ENGINEERING MICROBIOMES
As microbiomes are complex, dynamic, and yet often resistant to change, how can we best modulate and design functional and predictable microbiomes for targeted use? Combining diverse omics tools with computational modeling facilitates the identification of microbial and molecular targets as well as the prediction of system behaviors. The widespread interest in engineering microbiomes for diverse applications is evidenced by the increasing number of biotechnology companies that seek to design microbiome interventions for human health, livestock fitness, and agriculture ( Table 2). The approaches through which microbial populations, interactions, and biochemical fluxes may be tuned can be summarized as either modifying existing functional activities or engineering novel functions (Figure 4).

Microbiome Functions Can Be Modified by External Inputs
Manipulation of environmental factors has been widely used to modify the composition and functions of microbiomes. This strategy has attracted interest from the medical and nutritional fields that aim to tailor microbiomes to enhance host health. In gut models, prebiotics have been used to enrich for specific beneficial microbes. It is well established that diet affects the composition of intestinal microbes (167,168), and prebiotics are used to increase the abundance of specific organisms associated with beneficial health outcomes, typically by modulating the intake of ingested fermentable dietary fibers (169).
Prebiotics: chemical compounds that promote growth and metabolism of certain microbes to benefit host health Beyond chemical inputs, microorganisms can be administered to microbiomes with the goal of altering a target function. Several probiotics products, containing selected nonpathogenic and predicted beneficial bacterial strains, mitigate symptoms of gastrointestinal dysbiosis and protect against pathogens (170,171). In addition to human health, probiotics have been used in animal feed (172), and the addition of rhizobacteria to fertilizer has been shown to promote plant growth (173). However, the efficacy of probiotics remains unclear due to challenges in the stable engraftment into the resident community (174) and to the unpredictable activities of beneficial biochemical pathways within probiotics that are continuously modified by the environment.

Probiotics: live bacteria that promote human gastrointestinal tract health when consumed
The introduction of microbial communities, as opposed to isolated strains, into the intestinal tract of animals or into the roots of crops is an alternative method that has attracted considerable interest from the medical and agricultural communities. For example, fecal microbiota transplantation (FMT), the procedure of transferring a fecal sample from a healthy donor to a patient with dysbiosis to reestablish a functioning microbial community, is highly effective against recurrent Clostridium difficile infections (175). However, FMT remains an investigational procedure and is associated with potential health risks, including transfer of antibiotic-resistant pathogens in the samples, though these are typically screened for in samples from human donors (176). In other words, using black box natural microbiomes may result in the inadvertent transfer of multidrug-resistant organisms and other pathogens (177), spurring interest in exploring carefully designed microbial consortia.
Rationally designed consortia comprise organisms that maintain stability in desired functions over long periods of time. A potentially powerful approach involves using topdown and bottom-up approaches to characterize system properties and responses to environmental inputs and to exploit these insights to guide the design of consortia that perform target functions (15). Both top-down and bottom-up approaches, with the aid of computational and statistical modeling (178), have had some success in introducing desirable functions in animals and plants (179,180). Several microbiome biotechnology companies, including Vedanta and Seres, have designed microbial consortia in various stages of clinical trials, targeting human-gut-related diseases, from C. difficile infection to ulcerative colitis. (180a) However, the efficacy and effects of prebiotics and probiotics are not yet predictable, as many complex factors, including abiotic factors and interactions between gut bacteria, ultimately determine how an individual microbiome responds to different interventions.

Rewiring Microbial Functions by Genetic Engineering and Laboratory Evolution
Genetic engineering involves modification of the genome of a given organism. These modifications include, for example, deletion of genes to redirect metabolic fluxes and introduction of exogenous genes or entire pathways that allow the organism to make a valuable molecule or utilize a novel substrate. Genetically engineered microorganisms have been deployed into microbial communities associated with both plants and animals (181)(182)(183). In addition to metabolic pathway engineering, novel genetic circuits are introduced into engineered bacteria to sense and report disease-associated biomarkers in animal models and the human gut (recently reviewed in 183a). Further, engineered microbes are explored for the prevention and treatment of cancer, for example, through the targeted delivery of enzymes and toxins that inhibit tumor growth (reviewed in 183b). Another method activates the stimulator of interferon genes (STING) pathways for localized immune activation and impact on tumor growth (183c), which is currently in phase I of a clinal trial by Synlogic Therapeutics.
Although many molecular tools exist to modify genomic content of microorganisms, the clustered regularly interspaced short palindromic repeat (CRISPR)/Cas9 system has recently gained momentum as a particularly versatile tool (reviewed in 184). An alternative strategy exploits microbial horizontal gene transfer (HGT), which has the potential to influence the functional potential of organisms in situ (185). As revealed by high-resolution genomic data, HGT events occur not only between prokaryotes but also between prokaryotes and eukaryotes (186). HGT can be used to deliver genetic payloads encoding, for example, CRISPR machinery for destruction or incorporation of genes in target cells (187,188). Lysogenic phages have been similarly explored for the integration of genes into bacterial chromosomes in order to subsequently alter target microbial communities (189,190). The use of HGT or phage-delivered payloads is particularly attractive for nonmodel microorganisms for which established molecular engineering tools are lacking. Furthermore, the use of established genetic engineering tools that rely on given nucleotide motifs (e.g., CRISPR) may be hampered in microbes that have extreme AT or GC richness (191). Challenges of HGT and phage-based tools for microbiome engineering include mechanisms to enable stable propagation of circuits over time and regulatory elements for the control of gene expression within diverse organisms in response to specific environmental cues.
Horizontal gene transfer (HGT): transfer of genetic material between organisms with different genomes, often aided by membrane proteins and transfer origins Evolution continuously shapes the properties of microbiomes in unpredictable ways, but efforts aim to harness adaptive laboratory evolution for microbial engineering. Laboratory evolution can be used to select for single strains or communities with desired traits, including improved substrate utilization (192), product titer (193), stress tolerance (194), and controlled shift in host characteristics (e.g., flowering time of host plant) (195). A major challenge in applying directed evolution strategies to modify properties of communities is developing capabilities to predict how microbial communities' functions evolve toward specific discrete states or a continuum of different states in response to environmental inputs (196). Indeed, there is evidence that community context shapes the evolutionary dynamic trajectories, indicating the critical role of biotic interactions in driving evolutionary processes (197).

Biocontainment Strategies for Microbiome Engineering
All microbiome engineering strategies involving genetically engineered organisms require biocontainment mechanisms that restrict the growth of the organisms within a given environment for a designated period of time. Whereas for some applications long-term colonization may be needed to achieve the desired outcome, in other cases the transient presence of a nonnative organism or community may be sufficient and also provides biocontainment (198). To prevent engineered organisms from escaping into the surrounding ecosystem, researchers can use auxotrophy as a biocontainment strategy (199). One emerging method involves engineering metabolic niches by introducing unique metabolic capabilities and novel molecules that cannot otherwise be metabolized by native members of the microbiome (200). The discovery of unique metabolic niches can be aided by genome sequencing, metabolomics, and detailed characterization of novel metabolic pathways within microbial communities.

CONCLUSION AND PERSPECTIVES
Microbiome engineering should consider the complex interplay and feedback loops between the environment and the resident microbiome, which together drive community dynamics and multifunctional properties. Advancements in our understanding of microbiomes gleaned from ecology and systems biology must be leveraged to develop new strategies to program prescribed functions by using tools from synthetic biology. Top-down design of microbiomes presents challenges in predictably modifying composition and function, and the engineering of stable and highly functional synthetic microbial communities from the bottom up also remains difficult. Development of microbiome engineering that can achieve high-precision, robust, and predictable outcomes hold promise for diverse applications in biotechnology, medicine, agriculture, and the environment. A comprehensive understanding of microbiomes necessitates an integration of multi-omics data as well as quantification of the spatial and ecological factors that contribute to community functions. (a) Microbiome stability is a function of the community's ability to recover its original functions following a disturbance. Functions could include the production of specific molecules over time. (b) Microbiome composition and function are shaped by multiple spatial and temporal factors. (i) At the broadest scale, availability of carbon and energy defines possible ecological niches within each environment. (ii) Within a community, interspecies interactions, including social behaviors, modulate how cells respond to their environment by modifying substrate availability (i.e., syntrophy or competition), releasing effectors, or occupying available space. Biofilms, in particular, provide resilience to specific environmental perturbations. (iii) Within each cell, function is constrained by individual metabolic capacity, which can be altered through genetic mutations or horizontal gene transfer. Membrane-bound proteins transduce signals and molecules from the environment to the cytosol as well as facilitate secretion of products from the cell to the environment.  Different meta-omics tools are suited to answer different questions about microbiome composition and function. Amplicon metagenomics can reveal which organisms are present in a microbiome but not necessarily what each microbe's role in the community is. Shotgun metagenomics elucidates which microbes are present in the community and what functions they have the capacity to perform. Metatranscriptomics and metaproteomics are necessary to uncover which functions are actually being performed in the community; assigning these transcripts and proteins to the microbes that produced them typically requires high-quality reference genomes or concurrent metagenomics analyses. Metabolomics and fluxomics quantify the chemical composition of the microbiome environment; however, linking metabolites to the microbes that produce or consume them is challenging, even with reference genomes. Linking microbiome composition and function is facilitated by integrating multiple meta-omics techniques, for example, concurrent shotgun metagenomics, metaproteomics, and metabolomics studies to assess which enzymes are producing an observed small molecule of interest and which microbes could produce those enzymes. Microbiomes can be modeled on many scales, and the choice of modeling technique depends on the question at hand. At the most mechanistic level, molecular simulations may be used to model the thermodynamics and kinetics of individual enzymes identified through metaproteomics; however, these are not scalable to encompass the entire microbiome. GEMs enable prediction of the metabolic fluxes and end-product profiles within a microbiome and can offer mechanistic insight into metabolomic observations given high-quality genomic reconstructions and sufficient experimental model validation. Evolutionary game theory models and differential equation-based models are particularly useful when microbiome population dynamics are of the greatest interest, because detailed metabolic reconstructions are not needed for each organism to be modeled. AbMs offer flexibility in that the user may define which inputs and outputs to include in the model and are often the technique of choice when integrating both metabolic and physical interactions between microbes. Datadriven models, including emerging machine learning-based models, offer empirical predictions of microbiome behaviors under specified conditions given appropriate training data. Although less mechanistic than GEMs, machine learning-based models are a pragmatic approach to synthesizing large amounts of different data types into interpretable conclusions, for example, rate constant estimations for process-level models of microbiome function. The structure of the molecular enzyme model is from PDB ID 4QLK. The structure of the AbM is reproduced with permission from Reference 162. The microbes in the evolutionary game theory models panel and the entire machine learning-based models panel were adapted from images created with BioRender.com. Abbreviations: AbM, agentbased model; EGT, evolutionary game theory; GEM, genome-scale model; ODE, ordinary differential equation; PDB, Protein Data Bank. Microbiome engineering tools can be used to modify existing functions within the community or to introduce novel functions. These tools vary substantially in scale, from modification of the genome of a single organism to introduction of an entirely new engineered community into the microbiome. Highlighted methods include environmental perturbations such as the addition of a molecular antibiotic or a wild-type or engineered strain or community to an existing microbiome. More recently, genetic engineering efforts and adaptive laboratory evolution, including bacteriophage-assisted gene transfer, increasingly sophisticated synthetic biology tools, and targeted horizontal gene transfer, have gained traction.