Quantitative microbiome profiling links gut community variation to microbial load


Current sequencing-based analyses of faecal microbiota quantify microbial taxa and metabolic pathways as fractions of the sample sequence library generated by each analysis1,2. Although these relative approaches permit detection of disease-associated microbiome variation, they are limited in their ability to reveal the interplay between microbiota and host health3,4. Comparative analyses of relative microbiome data cannot provide information about the extent or directionality of changes in taxa abundance or metabolic potential5. If microbial load varies substantially between samples, relative profiling will hamper attempts to link microbiome features to quantitative data such as physiological parameters or metabolite concentrations5,6. Saliently, relative approaches ignore the possibility that altered overall microbiota abundance itself could be a key identifier of a disease-associated ecosystem configuration7. To enable genuine characterization of host–microbiota interactions, microbiome research must exchange ratios for counts4,8,9. Here we build a workflow for the quantitative microbiome profiling of faecal material, through parallelization of amplicon sequencing and flow cytometric enumeration of microbial cells. We observe up to tenfold differences in the microbial loads of healthy individuals and relate this variation to enterotype differentiation. We show how microbial abundances underpin both microbiota variation between individuals and covariation with host phenotype. Quantitative profiling bypasses compositionality effects in the reconstruction of gut microbiota interaction networks and reveals that the taxonomic trade-off between Bacteroides and Prevotella is an artefact of relative microbiome analyses. Finally, we identify microbial load as a key driver of observed microbiota alterations in a cohort of patients with Crohn’s disease10, here associated with a low-cell-count Bacteroides enterotype (as defined through relative profiling)11,12.

Figure 1: Faecal microbial loads vary across enterotypes.
Figure 2: Relative versus quantitative microbiome profiling.
Figure 3: Relative versus quantitative microbiota network reconstruction.
Figure 4: Quantitative microbiome alterations in Crohn’s disease.

We thank all study participants, F. Giraldo for enabling sample collection at the PXL Hasselt, L. Rymenans and C. Verspecht for faecal DNA extraction and library preparation, K. Verbeke for facilitating moisture content determinations, and P. Goncalves for advice on simulating microbial data for benchmarking the QMP and RMP approach. The main funding for this study comes from a KU Leuven CREA grant. D.V. is supported by the Agency for Innovation by Science and Technology (IWT). G.K., K.D., M.V.-C., S.V.-S., and J.W. are funded by the Research Foundation Flanders (FWO-Vlaanderen). This work is further supported through funding by VIB, the Rega Institute for Medical Research, KU Leuven, FP7 METACARDIS (HEALTH-F4-2012-305312), and H2020 SYSCID (grant agreement 733100).

Author information

Authors and Affiliations



This study was conceived by G.F. Experiments were designed by D.V., S.V., G.F., and J.R. Sampling of cohorts was set up and carried out by D.V., G.K., K.D., S.V.-S., M.V.-C., J.S., J.W., R.Y.T., L.D.C., and G.F. Optimization of sequencing protocols was performed by R.Y.T.; data pre-processing by D.V., M.V.-C., J.S., J.W., and Y.D.; flow cytometry analyses by G.K. and K.D.; statistical analyses by D.V., G.K., K.D., S.V.-S., M.V.-C., J.S., J.W., and G.F.; network analyses by S.V.-S.; and simulation experiments by D.V. and S.V.-S. G.F. developed the QMP protocol. S.V.-S., G.F., and J.R. drafted the manuscript. All authors revised the article and approved the final version for publication.

Corresponding author

Correspondence to Jeroen Raes.

Competing interests

Extended data figures and tables

Extended Data Figure 1 Quantification of microbial loads of frozen faecal samples.

a, Microbial cell counts in fresh and frozen faecal aliquots strongly correlate with one another (study cohort, n = 39; Pearson’s r = 0.91, two-sided P = 4.9 × 10−16). Data points represent mean values. Error bars represent the s.d. of duplicate (fresh) and triplicate (frozen) cell counts. b, Comparison between flow cytometric assessment of microbial loads and estimation of bacterial abundances on the basis of qPCR (study cohort, n = 40). Data points represent mean values. Error bars represent the s.d. of triplicate values (qPCR and cell counts). Although comparing cell-based and molecular enumeration methods is not recommended18, the measurements were correlated (Pearson’s r = −0.53, two-sided P = 4.7 × 10−4). In a complex ecosystem, enumerating bacteria on the basis of qPCR would introduce biases through the extraction, purification, and amplification of DNA, 16S rRNA gene copy number variation, and community replication rate (shown to differ between enterotypes44), among others. Moreover, qPCR has been reported to be only sensitive enough to detect twofold changes in gene concentration or microbial load45. Flow cytometry is less specific and results might be affected by formation of aggregates8. For QMP analyses, we opted for a flow cytometry approach given its technical straightforwardness (limited number of technical manipulations; see Methods), reproducibility46,47, and throughput.

Source data

Extended Data Figure 2 Intra-individual versus inter-individual microbial cell count variation in human faeces.

Twenty healthy individuals sampled daily over the course of a week (maximum of one sample per day; longitudinal cohort). Healthy individuals have significantly higher cell counts than patients with Crohn’s disease. Patient group disease cohort, n = 29 (CD; purple) versus overall daily samples, n = 132 (Ind. All; grey) or the average cell count of individuals (Ind. av; orange). Two-sided Wilcoxon rank-sum test, ***P < 0.001. The body of the box plot represents the first and third quartiles of the distribution and the median line. The whiskers extend from the quartiles to the last data point within 1.5× interquartile range, with outliers beyond.

Source data

Extended Data Figure 3 Faecal microbial loads correlate with sample moisture content and genus richness.

a, Microbial cell counts and moisture content were negatively correlated in the longitudinal and the validation cohort, though not in the study cohort (study cohort, n = 37, Spearman’s ρ = −0.12, two-sided FDR = 0.56; longitudinal cohort, n = 132, Spearman’s ρ = −0.52, two-sided FDR = 1.6 × 10−9; validation cohort, n = 54, Spearman’s ρ = −0.56, two-sided FDR = 9.1 × 10−5). b, Microbial cell counts and observed richness correlated mildly (study cohort, n = 40; Spearman’s ρ = 0.36, two-sided P = 2.3 × 10−2).

Source data

Extended Data Figure 4 Faecal microbial loads vary across enterotypes.

Microbial load differences between the four enterotypes in the disease cohort (n = 95). Box plot representation of microbial load (cells per gram of faeces) distribution across the four enterotypes. The body of the box plot represents the first and third quartiles of the distribution and the median line. The whiskers extend from the quartiles to the last data point within 1.5× interquartile range, with outliers beyond. Two-sided Dunn’s adjusted test, **P < 0.01, ***P < 0.001.

Source data

Extended Data Figure 5 Microbial loads are not associated with sequencing depth.

Sequencing depth did not reflect the total microbial load of a sample (Spearman’s ρ = 0.17, two-sided P = 0.28). Samples are ranked according to decreasing cell counts.

Source data

Extended Data Figure 6 Illustration of differences between RMP and QMP methodology.

Two samples, each containing four genera, are analysed (numbers are illustrative). Genus abundance distributions in sample A and B are markedly distinct, with the microbial load in sample B more than double that of the load in sample A. Genus ‘purple’ carries two copies of the 16S rRNA gene. (1) DNA extraction and library preparation. Neither RMP nor QMP correct for biases introduced by DNA extraction, primer specificity, PCR amplification, or other library preparation steps. The resulting sequencing depth is independent of microbial load. (2) By rarefying to an even number of reads per sample, RMP assumes similar genus abundance distributions in samples A and B: sample A is therefore sequenced far more intensively than sample B. The resulting profiles therefore poorly reflect the genus distribution in the original samples. Given the multiple copies of the 16S rRNA gene, the relative abundance of ‘purple’ is overestimated. (3) The first step of QMP corrects for 16S rRNA copy number variation. In the resulting copy-corrected profile (CCP), each read corresponds with a single bacterium sequenced. (4) By dividing the CCP reads total (R) by the microbial loads (X), sampling depth is estimated for each sample. For sample A and B, sampling depth is [R]A divided by [X]A and [R]B divided by [X]B, respectively. The sampling depth for B is the lowest (3.33%) of the two; sample A is rarefied to the same level. This implies that ‘orange’ is no longer detected. As ‘orange’ was equally abundant in A and B, the fact that it is included in sample A RMP can be considered an artefact of uneven sampling intensity. The resulting rarefied genus abundances are proportional with sample microbial loads and can be extrapolated to generate QMPs expressed as number of cells per gram.

Extended Data Figure 7 Distribution of RMP and QMP abundances of Bacteroides and Prevotella in healthy controls.

Samples (n = 66) are ranked according to decreasing Bacteroides abundance in both the RMP and QMP panel (stacked bars). The trade-off between Bacteroides and Prevotella (RMP; Spearman’s ρ = −0.59, two-sided FDR = 2 10−4) was no longer significant after correction for microbial load (QMP; Spearman’s ρ = −0.33, two-sided FDR = 1).

Source data

Extended Data Figure 8 False discovery rate and sensitivity in network reconstruction based on QMP and RMP simulated data.

QMP resulted in increased sensitivity and decreased FDR compared to RMP (two-sided t-test, P < 10−15). For a two-, four- and eightfold maximum difference in microbial load, QMP FDRs were 11%, 15%, and 22% lower, respectively, than for RMP; and QMP true positive discovery rate (sensitivity) was increased by 10%, 12%, and 15%. Data points depict repetitions (n = 50). The body of the box plot represents the first and third quartiles of the distribution and the median line. The whiskers extend from the quartiles to the last simulated data point within 1.5× interquartile range.

Source data

Extended Data Figure 9 RMP underestimates the decrease in microbiome richness associated with Crohn’s disease.

Observed richness in healthy controls (n = 66) and patients with Crohn’s disease (n = 29). The decrease in richness associated with Crohn’s disease is more pronounced in QMP. The body of the box plot represents the first and third quartiles of the distribution and the median line. The whiskers extend from the quartiles to the last data point within 1.5× interquartile range. Two-sided Wilcoxon test, ***P < 0.001, **P < 0.01.

Source data

Extended Data Figure 10 Flow cytometry gating strategy.

A fixed gating and staining approach was applied28. Both blank and sample solutions were stained with SYBR Green I. a, The FL1-A/FL3-A acquisition plot of a blank sample (0.85% w/v physiological solution) with gate boundaries indicated. A threshold value of 2,000 was applied on the FL1 channel. b, Secondary gating was performed on the FSC-A/SSC-A channels to further discriminate between debris or background and microbial events. c, d, FL1-A/FL3-A count acquisition of a faecal sample (c) with secondary gating on FSC-A/SSC-A channels on the basis of blank analyses (d). Total counts were defined as events registered in the FL1-A/FL3-A gating area, excluding debris or background events observed in the FSC-A/SSC-A R1 gate. The flow rate was set at 14 μl per minute and the acquisition rate did not exceed 10,000 events per second. Each panel reflects events registered over the course of a 30-s acquisition period.

