Nowadays, modern biological or medical research is scarcely conceivable without the deep insights that are gained from next-generation sequencing (NGS). These technologies have been constantly improved and yield up to 20 billion reads per run today. This allows for studying highly complex environments such as gut, soil or biogas reactor sludge at a high resolution, but also requires an adequate analysis of the huge amount of generated data. A plethora of software packages for NGS data analysis are available today; however, most of them demand extensive knowledge in computer sciences or even need a bioinformatic specialist to be executed.
CoMA (Comparative Microbiome Analysis) is a free pipeline for intuitive and user-friendly analysis of amplicon-sequencing data, available for all common computer platforms. The software package uses various open-source third party tools and combines them with own scripts into a linear analysis workflow in the form of a Bash script, starting with the raw input files (in FASTQ format) and resulting in aesthetically pleasing and publication-ready graphics. In addition, output files in standardized formats, such as a tab-delimited abundance table, an abundance table in BIOM format and a tree file in NEWICK format, are provided. These allow for subsequent secondary analysis using R for example.
The operation of this tool is remarkably intuitive and makes it accessible even for entry-level users. A graphical user interface facilitates the handling, representing a major advantage compared with command-line based applications. Nevertheless, multiple adjustment parameters and the high degree of automation make CoMA also suitable for advanced users who are looking for an efficient and streamlined workflow. The tool is capable of handling data from today’s most important NGS platforms, including Illumina MiSeq, Illumina HiSeq, Illumina NextSeq, and Illumina NovaSeq, but also from the former 454 pyrosequencing technology, which was, in fact, terminated in 2016 but data are still around and analysis tools are still needed.
CoMA can be run on every common computer operating system: Linux, Windows, and macOS. With version 3.0, four different options for installation are available: a virtual appliance (which can be imported with tools like VMware Workstation, Oracle VM Virtualbox or Parallels Desktop for macOS), a Singularity image, a Docker image, and a direct Linux installer (Bash script). Each option provides specific advantages and the user can select the most suitable one in order to meet his needs. CoMA can also be used on high performance computer systems (HPC cluster). In this case, we suggest using the Singularity option. The following graphical summary shall help finding the ideal usage strategy for every user.
CoMA will be updated regularly to ensure an excellent performance also in the future, but also to implement additional features (e.g. a web-based platform for online data analysis). These updates will always include the newest versions of the taxonomic databases at the release date to guarantee the best possible and most current results.
If you use CoMA for any published research, please include the following citation:
Hupfauf S, Etemadi M, Fernández-Delgado Juárez M, Gómez-Brandón M, Insam H, et al. (2020) CoMA – an intuitive and user-friendly pipeline for amplicon-sequencing data analysis. PLOS ONE 15(12): e0243241. https://doi.org/10.1371/journal.pone.0243241
Heatmap showing the microbial community composition (on family level) of bioreactors operated at 37, 45 and 55 °C over a period of 143 days
Taxonomic plots in CoMA can be created generally or specifically for Bacteria, Archaea, Fungi, or Eukaryota. Moreover, specific plots can be created for any given taxon. A minimum abundance threshold can be selected below which taxa are excluded from depiction. Depending on the settings, unassigned taxa are shown or excluded. Apart from heatmaps, CoMA offers bar charts for the depiction of the taxonomic information
Bar chart showing the microbial community composition (on order level) of bioreactors operated at 37, 45 and 55 °C over a period of 143 days
Taxonomic plots in CoMA can be created generally or specifically for Bacteria, Archaea, Fungi, or Eukaryota. Moreover, specific plots can be created for any given taxon. A minimum abundance threshold can be selected below which taxa are excluded from depiction. Depending on the settings, unassigned taxa are shown or excluded. Apart from bar charts, CoMA offers heatmaps for the depiction of the taxonomic information.
Bar chart showing Faith’s phylogenetic diversity (PD) in bioreactors operated at 37, 45 and 55 °C over time
The alpha diversity plots show means ± standard deviation if metadata are used to group the samples. Otherwise, individual bars are created for each sample. The color of the bars can be selected by the user. Apart from Faith’s PD, CoMA support the following alpha diversity measures: OTU richness, Shannon-Wiener diversity, Simpson’s index, Pielou’s evenness, Good’s coverage of counts, and Chao1 richness estimator.
Bar chart showing the Shannon-Wiener diversity (H’) in bioreactors operated at three different temperatures over 143 days
The alpha diversity plots show means ± standard deviation if metadata are used to group the samples. Otherwise, individual bars are created for each sample. The color of the bars can be selected by the user. Apart from Shannon-Wiener diversity, CoMA support the following alpha diversity measures: OTU richness, Simpson’s index, Pielou’s evenness, Good’s coverage of counts, Chao1 richness estimator, and Faith’s phylogenetic diversity.
Dendrogram showing the hierarchical clustering of the microbiota of bioreactors operated at three different temperatures over time based on Bray-Curtis distance
Apart from Bray-Curtis distance, CoMA support the following metrics: Euclidean distance, Cosine distance, City Block (Manhattan) distance, correlation distance, Jaccard-Needham dissimilarity, and Dice dissimilarity. For determination of the cluster distance, the user can choose between the following linkage methods: Single linkage (Minimum), Complete linkage (Maximum), Average linkage (UPGMA), Weighted (WPGMA), Centroid (UPGMC), Median (WPGMC), and Ward’s method (Incremental algorithm).
Ordination plot showing the similarity/dissimilarity of the microbiota in bioreactors operated at 37, 45 and 55 °C over a period of 143 days
Ordination plots in CoMA are created based on principal coordinates analysis (PCoA) using one of the following distance metrics: Minkowski distance, Euclidean distance, Manhattan (City Block) distance, Cosine distance, Jaccard-Needham dissimilarity, Dice dissimilarity, Canberra distance, Chebyshev distance, Bray-Curtis distance, Weighted UniFrac distance, and Unweighted UniFrac distance. Data points can be colored based on metadata; otherwise, a common color for all data points can be selected …
Line plot showing rarefaction curves based on OTU counts of sequencing data generated in course of a biogas research project
In CoMA, rarefaction curves can be computed based on the following calculators: OTU count, Chao1 richness estimator, Shannon-Wiener diversity, Simpson’s index, and Good’s coverage of counts.
Venn plot showing differences in the microbial community of bioreactors operated at 37 and 45 °C after 143 days of operation
Venn plots in CoMA can be either created using metadata or manual group assignments entered by the user. For clarity reasons, the number of compared groups is limited to three. Depending on user settings, labels are included or removed, counteracting overlapping issues in some cases.
Venn plot showing differences in the microbial community of bioreactors operated at 37, 45 and 55 °C over a period of 143 days
Venn plots in CoMA can be either created using metadata or manual group assignments entered by the user. For clarity reasons, the number of compared groups is limited to three. Depending on user settings, labels are included or removed, counteracting overlapping issues in some cases.
CoMA is a pipeline for NGS data analysis, using various applications, tools and scripts originating from internal and external sources (open-source third party software). Within this section, all external tools are listed. Please follow the web links if you want to get more information on a specific tool. Keep in mind that some of these tools may also be using third party software, for more information consult the prevailing manuals.