The following steps gives a complete walk-through of using OTUX database for taxonomic classification.

  1. Pre-requisites
  2. Downloading OTUX database
  3. Performing reference based OTU-picking/ taxonomic classification with mothur
  4. Parsing mothur outputs to generate abundance profiles
  5. Visualization of taxonomies
  6. Cross-comparison of results using Greengenes IDs




For any reference based OTU-picking or taxonomic classification approach one requires the following prerequisites



OTUX provide a set of databases (19) each covering a V-regions (or stretches of V-regions) from 16S rRNA. The regions commonly targetted in amplicon sequencing are included in this database. Download the database corresponding to the targetted region of your input sequence reads. In case the V-region database of your choice is not available please contact us.


Since the reads in ‘sample.fasta’ was sequenced targetting V4 regions, for this walk-through we will require the OTUX database of V4 region as reference. You need to go the download page. Select single V-region from the first dropdown.


download

Select V4 from the second dropdown.


download

Once the V4 region is highlighted, you need to click on the same to download the V4 database.


download

An archived folder is downloaded which has following files:

  1. FASTA file of containing sequences consituting the V4 regions corresponding to OTUX OTU IDs
  2. Taxonomy file of V4 database, which provides sequence wise taxonomic information along with the OTUX IDs
  3. A mapping-back matrix for V4 database (required for mapping back the OTUX IDs obtained in the abundance profiles to Greengenes IDs [4]). Please refer to this page for detailed information about mapping-back matrices.

For help regarding downloading a database belonging to any other V-region or a stretch of V-regions please refer to this page


Before starting OTU-picking make sure the database (fasta file and taxonomy file) and input fasta file are located in the same directory where you want to execute mothur command. Enter the mothur environment by typing ‘./mothur’ in the terminal in case you are using the executable. Whereas, you may directly type 'mothur' in case you have installed mothur on your computing device. For any help regarding installing mothur on your device please follow the instructions provided by the authors of mothur at this link. Once you enter the mothur environment you will see this.


mothur v.1.39.5
Last updated: 3/20/2017

by
Patrick D. Schloss

Department of Microbiology & Immunology
University of Michigan
http://www.mothur.org

When using, please cite:
Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.

Distributed under the GNU General Public License

Type 'help()' for information on the commands that are available

Type 'quit()' to exit program



mothur >

In the mothur environment enter the following command

mothur > classify.seqs(fasta=sample_V4.fasta, template=V4.otux.fasta, taxonomy=V4.otux.tax, processors=1, cutoff=80)

The provided parameters which are required for running the above command are explained below.

  1. fasta: your input fasta file
  2. template: fasta file of the reference database. OTUX V4 database in this case.
  3. taxonomy: taxonomy file of the reference database.
  4. processors: number of processors/ cores of the computing machine on which you want to run this command. In this example we have used a single core.
    * Please note these cores will be used only for classifying sequences from input fasta file. Only one core will be used for loading the database in the memory. During first run mothur builds few files which stores information pertaining to 8mer probabilities (required by Wang‘s method). However, this is a one time time process and for future runs mothur loads these template probabilities into the memory.
  5. cutoff: The Wang"s algorithm divides the query sequences into 8mers and calculates the probability a sequence from a given taxonomy would contain a specific 8mer by looking at all taxonomies represented in the template. It then calculates probability of the query sequence of being assigned to a taxonomy based on the 8mers it contains. The taxonomy with highest probability is assigned. Apart from this, Wang‘s algorithm also performs bootstrapping to find the confidence limit of taxonomic assignment by randomly picking 1/8 of the 8mers (with replacement) in the query and then finding the taxonomy. To pick taxonomic assignments with high confidence we provide a cutoff/ threshold value. By default this value is 80% which is the conventional norm followed in RDP classifier.

For more details regarding the sequence classification option in mothur please refer to their help page.



The mothur software will generate two output files,

  1. A taxonomy file. The following gives a preview of taxonomy file generated for the input file used in this walk-through.
  2. SEQUENCE_1 k__Bacteria(100);p__Firmicutes(100);c__Bacilli(100);o__Bacillales(98);f__[Exiguobacteraceae](90);g__Exiguobacterium(90);unclassified;unclassified;
    SEQUENCE_2 k__Bacteria(100);p__Proteobacteria(100);c__Gammaproteobacteria(100);o__Pseudomonadales(100);f__Pseudomonadaceae(100);g__Pseudomonas(100);u__OTX040002932(100);unclassified;
    SEQUENCE_3 k__Bacteria(100);p__Proteobacteria(100);c__Gammaproteobacteria(99);u__OTX040048496(90);unclassified;unclassified;unclassified;unclassified;
    SEQUENCE_4 k__Bacteria(100);p__Proteobacteria(100);c__Gammaproteobacteria(100);o__Pseudomonadales(100);f__Pseudomonadaceae(100);g__Pseudomonas(100);u__OTX040163422(99);unclassified;
    SEQUENCE_5 k__Bacteria(100);p__Proteobacteria(100);c__Gammaproteobacteria(100);o__Alteromonadales(100);f__Alteromonadaceae(100);g__Alteromonas(100);u__OTX040009408(89);unclassified;

  3. A tax summary file which gives consolidated summary of the taxonomic classification. The summary of first OTUX ID OTX040004433 in the tax summary file, generated in this walk-though, is shown below.
  4. taxlevelrankIDtaxondaughterlevelstotal
    00Root210000
    10.1k__Archaea11
    20.1.2p__Euryarchaeota11
    30.1.2.6c__Methanococci11
    40.1.2.6.1o__Methanococcales11
    50.1.2.6.1.1f__Methanocaldococcaceae11
    60.1.2.6.1.1.1g__Methanocaldococcus11
    70.1.2.6.1.1.1.2u__OTX04000443311

The taxonomy file generated by mothur, named ‘sample_V4.otux.wang.taxomomy’, is parsed using the scripts provided by us on the download page. Before running the script, please make sure that the script and the required input files are present in the same directory. The script was tested for mothur output generated by the version v.1.39.5 last updated on 3/20/2017.


sh otux_parser.sh -m sample_V4.otux.wang.taxonomy -b mapping.V4 -g gg_13_8_99.gg.tax -f sample_V4.fasta

Arguments:
  -m  -  taxonomy output file generated by mothur
  -b  -  mapping back file for the corresponding v-region
  -g  -  greengenes taxonomy file
  -f  -  fasta file used as input for mothur

** The mapping back file to be used above is being downloaded along with database. Whereas, the greengenes taxonomy file, which is required to map back the taxonomies obtained using OTUX to Greengenes taxonomies, can be downloaded from the download page.

Running this script generates,

  1. OTUX abundance profiles at each taxonomic level. One set gives the raw count of abundances while the other set gives percentage normalized values.
  2. Mapped back abundance profiles from OTUX IDs to Greengenes IDs using one-to-one as well as one-to-many mapping back schemes.
    In one-to-one mapping back scheme one OTUX ID is mapped back to one Greengenes ID (which is the best representation of that OTUX ID), whereas in one-to-many mapback scheme, the OTUX ID is mapped back to multiple Greengenes IDs. Please go through this help page to know the details pertaining to mapping back of OTUX IDs to greengenes IDs.
  3. A fasta file which contains sequences which remained unclassified at OTU level using OTUX V4 database. This file can be used to facilitate ‘open reference based OTU picking’. One can use de novo clustering tools like CD-HIT [5] or CROP [6] to cluster these sequences.
  4. A file in json format which stores the generated taxonomy in a structured way. This file may be used for visualization of taxonomic profiles using D3 library.

These results are stored in a structured way (as shown in the figure below) in a directory which includes the prefix of the input file name followed by the current time.

output directory

The following section will give a directory-wise preview of each abundance profile. You may download the complete file by clicking on the respective links given above each preview.

1. OTUX

1.1. Raw Count

The taxonomies are assigned according to OTUX nomenclature. The abundances are represented as raw count values.

Abundance profile at Phylum level
Acidobacteria 6
Actinobacteria 1662
Aquificae 7
Armatimonadetes 1
Bacteroidetes 482
Chlamydiae 17
Chlorobi 4
Chloroflexi 5
Cyanobacteria 177
Deferribacteres 1
Abundance profile at Class level
Acidimicrobiia 5
Acidobacteria-6 1
Acidobacteriia 3
Actinobacteria 1646
Alphaproteobacteria 1017
Anaerolineae 2
Aquificae 7
Bacilli 2610
Bacteroidia 123
Betaproteobacteria 576
Abundance profile at Order level
Acholeplasmatales 100
Acidimicrobiales 5
Acidithiobacillales 15
Acidobacteriales 3
Actinomycetales 1615
Aeromonadales 111
Alteromonadales 193
Anaerolineales 1
Aquificales 7
Bacillales 1434
Abundance profile at Family level
Acaryochloridaceae 3
Acetobacteraceae 61
Acholeplasmataceae 100
Acidimicrobiaceae 3
Acidithiobacillaceae 15
Acidobacteriaceae 3
Actinomycetaceae 13
Actinopolysporaceae 4
Actinosynnemataceae 7
Aerococcaceae 10

Abundance profile at Genus level
1-68 1
Acaryochloris 1
Acetobacter 19
Acholeplasma 2
Achromobacter 30
Acidimicrobium 1
Acidiphilium 3
Acidisoma 1
Acidithiobacillus 15
Acidomonas 2
Abundance profile at Species level
Acholeplasma brassicae 1
Acholeplasma laidlawii 1
Acidithiobacillus albertensis 5
Acidomonas methanolica 2
Acinetobacter johnsonii 28
Acinetobacter lwoffii 13
Acinetobacter rhizosphaerae 26
Actinoallomurus iriomotensis 1
Actinobacillus parahaemolyticus 2
Actinomadura echinospora 1
Abundance profile at OTU level
OTX040000007 32
OTX040000017 5
OTX040000019 1
OTX040000023 60
OTX040000048 4
OTX040000052 5
OTX040000053 1
OTX040000133 6
OTX040000137 1
OTX040000138 7

Abundance profile of complete lineage
k__Archaea;p__Euryarchaeota;c__Methanococci;o__Methanococcales;f__Methanocaldococcaceae;g__Methanocaldococcus;u__OTX040004433; 1
k__Bacteria; 5
k__Bacteria;p__Acidobacteria;c__Acidobacteria-6;o__iii1-15;f__mb2424; 1
k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Acidobacteriaceae;g__Granulicella;s__tundricola;u__OTX040121496; 1
k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Acidobacteriaceae;u__OTX040007044; 1
k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Acidobacteriaceae;u__OTX040117629; 1
k__Bacteria;p__Acidobacteria;c__Solibacteres;o__Solibacterales;f__Solibacteraceae;g__Candidatus_Solibacter;u__OTX040038795; 1
k__Bacteria;p__Acidobacteria;c__Sva0725;o__Sva0725; 1
k__Bacteria;p__Actinobacteria;c__Acidimicrobiia;o__Acidimicrobiales;f__Acidimicrobiaceae;g__Acidimicrobium;u__OTX040013048; 1
k__Bacteria;p__Actinobacteria;c__Acidimicrobiia;o__Acidimicrobiales;f__Acidimicrobiaceae;g__Ferrimicrobium; 1

1.2. Percentage Normalized

The taxonomies are assigned according to OTUX nomenclature. The abundances are represented as percentage normalized values.

Abundance profile at Phylum level
Acidobacteria 0.06
Actinobacteria 16.62
Aquificae 0.07
Armatimonadetes 0.01
Bacteroidetes 4.82
Chlamydiae 0.17
Chlorobi 0.04
Chloroflexi 0.05
Cyanobacteria 1.77
Deferribacteres 0.01
Abundance profile at Class level
Acidimicrobiia 0.05
Acidobacteria-6 0.01
Acidobacteriia 0.03
Actinobacteria 16.46
Alphaproteobacteria 10.17
Anaerolineae 0.02
Aquificae 0.07
Bacilli 26.1
Bacteroidia 1.23
Betaproteobacteria 5.76
Abundance profile at Order level
Acholeplasmatales 1
Acidimicrobiales 0.05
Acidithiobacillales 0.15
Acidobacteriales 0.03
Actinomycetales 16.15
Aeromonadales 1.11
Alteromonadales 1.93
Anaerolineales 0.01
Aquificales 0.07
Bacillales 14.34
Abundance profile at Family level
Acaryochloridaceae 0.03
Acetobacteraceae 0.61
Acholeplasmataceae 1
Acidimicrobiaceae 0.03
Acidithiobacillaceae 0.15
Acidobacteriaceae 0.03
Actinomycetaceae 0.13
Actinopolysporaceae 0.04
Actinosynnemataceae 0.07
Aerococcaceae 0.1

Abundance profile at Genus level
1-68 0.01
Acaryochloris 0.01
Acetobacter 0.19
Acholeplasma 0.02
Achromobacter 0.3
Acidimicrobium 0.01
Acidiphilium 0.03
Acidisoma 0.01
Acidithiobacillus 0.15
Acidomonas 0.02
Abundance profile at Species level
Acholeplasma brassicae 0.01
Acholeplasma laidlawii 0.01
Acidithiobacillus albertensis 0.05
Acidomonas methanolica 0.02
Acinetobacter johnsonii 0.28
Acinetobacter lwoffii 0.13
Acinetobacter rhizosphaerae 0.26
Actinoallomurus iriomotensis 0.01
Actinobacillus parahaemolyticus 0.02
Actinomadura echinospora 0.01
Abundance profile at OTU level
OTX040000007 0.32
OTX040000017 0.05
OTX040000019 0.01
OTX040000023 0.6
OTX040000048 0.04
OTX040000052 0.05
OTX040000053 0.01
OTX040000133 0.06
OTX040000137 0.01
OTX040000138 0.07

Abundance profile of complete lineage
k__Archaea;p__Euryarchaeota;c__Methanococci;o__Methanococcales;f__Methanocaldococcaceae;g__Methanocaldococcus;u__OTX040004433; 0.01
k__Bacteria; 0.05
k__Bacteria;p__Acidobacteria;c__Acidobacteria-6;o__iii1-15;f__mb2424; 0.01
k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Acidobacteriaceae;g__Granulicella;s__tundricola;u__OTX040121496; 0.01
k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Acidobacteriaceae;u__OTX040007044; 0.01
k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Acidobacteriaceae;u__OTX040117629; 0.01
k__Bacteria;p__Acidobacteria;c__Solibacteres;o__Solibacterales;f__Solibacteraceae;g__Candidatus_Solibacter;u__OTX040038795; 0.01
k__Bacteria;p__Acidobacteria;c__Sva0725;o__Sva0725; 0.01
k__Bacteria;p__Actinobacteria;c__Acidimicrobiia;o__Acidimicrobiales;f__Acidimicrobiaceae;g__Acidimicrobium;u__OTX040013048; 0.01
k__Bacteria;p__Actinobacteria;c__Acidimicrobiia;o__Acidimicrobiales;f__Acidimicrobiaceae;g__Ferrimicrobium; 0.01

2. Greengenes

2.1. One-to-one mapping back

The taxonomies shown below are obtained by following the one-to-one mapping back scheme of OTUX IDs to Greengenes IDs. Please note that only those sequences will be mapped back to Greengenes OTU identifiers which have been assigned till OTU level .

Abundance profile at Phylum level
Acidobacteria 4
Actinobacteria 926
Aquificae 7
Armatimonadetes 1
Bacteroidetes 266
Chlamydiae 7
Chlorobi 3
Chloroflexi 4
Cyanobacteria 110
Deferribacteres 1
Abundance profile at Class level
Acidimicrobiia 4
Acidobacteriia 3
Actinobacteria 914
Alphaproteobacteria 592
Anaerolineae 2
Aquificae 7
Bacilli 1079
Bacteroidia 89
Betaproteobacteria 351
Chlamydiia 7
Abundance profile at Order level
Acholeplasmatales 64
Acidimicrobiales 4
Acidithiobacillales 5
Acidobacteriales 3
Actinomycetales 902
Aeromonadales 99
Alteromonadales 106
Anaerolineales 1
Aquificales 7
Bacillales 477
Abundance profile at Family level
Acaryochloridaceae 3
Acetobacteraceae 42
Acholeplasmataceae 64
Acidimicrobiaceae 2
Acidithiobacillaceae 5
Acidobacteriaceae 3
Actinomycetaceae 9
Actinopolysporaceae 3
Actinosynnemataceae 5
Aerococcaceae 5

Abundance profile at Genus level
1-68 1
Acaryochloris 1
Acetobacter 14
Acholeplasma 2
Achromobacter 17
Acidicapsa 1
Acidimicrobium 1
Acidiphilium 2
Acidisoma 1
Acidithiobacillus 5
Abundance profile at Species level
Acholeplasma brassicae 1
Acholeplasma laidlawii 1
Acidicapsa borealis 1
Acidithiobacillus albertensis 1
Acidomonas methanolica 2
Acidovorax avenae 1
Acidovorax temperans 1
Acinetobacter lwoffii 11
Acinetobacter rhizosphaerae 23
Actinoallomurus iriomotensis 1
Abundance profile at OTU level
1492 1
2216 1
2235 1
2446 1
2813 1
2885 2
3095 3
3270 1
3308 1
4157 2

Abundance profile of complete lineage
k__Archaea;p__Euryarchaeota;c__Methanococci;o__Methanococcales;f__Methanocaldococcaceae;g__Methanocaldococcus; 1
k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Acidobacteriaceae; 1
k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Acidobacteriaceae;g__Acidicapsa;s__borealis; 1
k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Acidobacteriaceae;g__Granulicella;s__tundricola; 1
k__Bacteria;p__Acidobacteria;c__Solibacteres;o__Solibacterales;f__Solibacteraceae;g__Candidatus_Solibacter; 1
k__Bacteria;p__Actinobacteria;c__Acidimicrobiia;o__Acidimicrobiales;f__Acidimicrobiaceae; 1
k__Bacteria;p__Actinobacteria;c__Acidimicrobiia;o__Acidimicrobiales;f__Acidimicrobiaceae;g__Acidimicrobium; 1
k__Bacteria;p__Actinobacteria;c__Acidimicrobiia;o__Acidimicrobiales;f__Microthrixaceae;g__Candidatus_Microthrix;s__parvicella; 2
k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales; 8
k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Actinomycetaceae; 1

2.2. One-to-many mapping back

The taxonomies shown below are obtained by following the one-to-many mapping back scheme of OTUX IDs to Greengenes IDs. The values are percentage normalized. Please note that only those sequences will be mapped back to Greengenes OTU identifiers which have been assigned till OTU level .

Abundance profile at Phylum level
Acidobacteria 0.0416055
Actinobacteria 8.51558
Aquificae 0.0718798
Armatimonadetes 0.0104014
Bacteroidetes 2.63731
Chlamydiae 0.0728097
Chlorobi 0.0312042
Chloroflexi 0.0416055
Cyanobacteria 1.11808
Deferribacteres 0.0101477
Abundance profile at Class level
Acidimicrobiia 0.0416055
Acidobacteriia 0.0312041
Actinobacteria 8.39077
Alphaproteobacteria 5.8427
Anaerolineae 0.0208028
Aquificae 0.0718798
Bacilli 10.5232
Bacteroidia 0.861744
Betaproteobacteria 3.46319
Chlamydiia 0.0728097
Abundance profile at Order level
ASSO-13 0.00111443
Acholeplasmatales 0.654314
Acidimicrobiales 0.0416055
Acidithiobacillales 0.0476798
Acidobacteriales 0.0312041
Actinomycetales 8.27316
Aeromonadales 0.823742
Alteromonadales 1.02251
Anaerolineales 0.0104014
Aquificales 0.0718798
Abundance profile at Family level
Acaryochloridaceae 0.0312042
Acetobacteraceae 0.426603
Acholeplasmataceae 0.654314
Acidimicrobiaceae 0.0208028
Acidithiobacillaceae 0.0476798
Acidobacteriaceae 0.0312041
Actinomycetaceae 0.0936123
Actinopolysporaceae 0.0312041
Actinosynnemataceae 0.0512639
Aerococcaceae 0.0514465

Abundance profile at Genus level
1-68 0.0102461
Acaryochloris 0.0104014
Acetobacter 0.13818
Acholeplasma 0.0208028
Achromobacter 0.155692
Acidicapsa 0.00520068
Acidimicrobium 0.0104014
Acidiphilium 0.0199707
Acidisoma 0.0104014
Acidithiobacillus 0.0476798
Abundance profile at Species level
Acholeplasma brassicae 0.0104011
Acholeplasma laidlawii 0.0104011
Acidicapsa borealis 0.00520054
Acidithiobacillus albertensis 0.0104011
Acidomonas methanolica 0.0208022
Acidovorax avenae 0.00120013
Acidovorax defluvii 0.000137808
Acidovorax konjaci 0.000200021
Acidovorax temperans 0.00972927
Acinetobacter guillouiae 0.021411
Abundance profile at OTU level
1186 0.000813513
1209 0.000472789
1395 0.00123826
1397 0.00619131
1475 0.000650085
1491 0.000650085
1492 0.00520068
1950 0.00297181
1992 0.000299936
2066 0.000650087

Abundance profile of complete lineage
k__Archaea;p__Euryarchaeota;c__Methanococci;o__Methanococcales;f__Methanocaldococcaceae;g__Methanocaldococcus; 0.0104014
k__Bacteria; 0.000800105
k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Acidobacteriaceae; 0.015602
k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Acidobacteriaceae;g__Acidicapsa;s__borealis; 0.00520068
k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Acidobacteriaceae;g__Granulicella;s__tundricola; 0.0104014
k__Bacteria;p__Acidobacteria;c__Solibacteres;o__Solibacterales;f__Solibacteraceae;g__Candidatus_Solibacter; 0.0104013
k__Bacteria;p__Actinobacteria;c__Acidimicrobiia;o__Acidimicrobiales;f__Acidimicrobiaceae; 0.0104014
k__Bacteria;p__Actinobacteria;c__Acidimicrobiia;o__Acidimicrobiales;f__Acidimicrobiaceae;g__Acidimicrobium; 0.0104014
k__Bacteria;p__Actinobacteria;c__Acidimicrobiia;o__Acidimicrobiales;f__Microthrixaceae;g__Candidatus_Microthrix;s__parvicella; 0.0208027
k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales; 0.0856069

Few sequences remained unclassified at OTU level using OTUX V4 database. A fasta file was created which can be used to facilitate ‘open reference based OTU picking’ using tools like CD-HIT or CROP.



One can visualize the taxonomies by uploading the json file on the visualization page. Following options are available for visualization. Click on any one of the images to expand.
Please note that a copy of json file will be generated as one of the output files. However you need to have a D3 library installed to visualize the same.


Hierarchical Bar Chart Zoomable Circle Packing Zoomable Sunburst

A brief description of the visualizations is given below.

  1. Hierarchical Bar Chart: Each bar represents a taxonomic level. The bar chart starts with the highest taxonomic level, i.e, phylum. The length of the bar is determined by the abundance of that taxa. On has to click on any bar to expand and go to the daughter level of the represented taxa. To go to the parent level click on the white space next to the bars.
    Source: Mike Bostock's Block available at this link.

  2. Zoomable Circle Packing: Each taxa level is represented by a circle. Circle representing parent level taxa contains the circles reprenting its daughter levels in a heirarichal manner. Each circle can be zoomed into by clicking on the same.
    Source: Mike Bostock's Block available at this link.

  3. Zoomable Sunburst: Each taxa level is present as concentric circles with the innermost circle being the highest taxonomy. At a time only two levels are shown. For a taxa with greater abundance, the represented area of that taxa in that concentric circle will be higher. To zoom in on the taxa level (go to daughter levels) on has to click on the represented area. To zoom out (go to parent level) one has to click on the innermost circle.
    Source: Vasco Asturiano's Block available at this link.


In the above steps we gave a detailed walk-through of obtaining abundance profile of the sample.fasta file (having reads targetted for V4 region) at each taxonomic level in terms of greengenes IDs. Suppose you want to compare your results with any other study which targetted V2V3 region (download results of this example study) you can use the abundance profiles of OTUX IDs to get abundance profiles in terms of greengenes taxonomy.

Abundance profiles for each taxa level will also be generated. Using these abundance profiles you can compare the results of the two studies.

Calculating Unifrac distance [7] is one of the methods which can be used to compare microbial communities.



Please note that OTUXv2 approach follows the same steps as mentioned above. Only the FASTA, taxonomy and mapping files are to be changed.


  1. Wang, Q., Garrity, G.M., Tiedje, J.M. and Cole, J.R. (2007) Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy, Applied and Environmental Microbiology, 73, 5261-5267. [PubMed]
  2. Schloss, P.D., Westcott, S.L., Ryabin, T., Hall, J.R., Hartmann, M., Hollister, E.B., Lesniewski, R.A., Oakley, B.B., Parks, D.H., Robinson, C.J., Sahl, J.W., Stres, B., Thallinger, G.G., Van Horn, D.J. and Weber, C.F. (2009) Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities, Applied and Environmental Microbiology, 75, 7537-7541. [PubMed]
  3. Hartmann, M., Howes, C.G., Abarenkov, K., Mohn, W.W. and Nilsson, R.H. (2010) V-Xtractor: an open-source, high-throughput software tool to identify and extract hypervariable regions of small subunit (16S/18S) ribosomal RNA gene sequences, Journal of Microbiological Methods, 83, 250-253. [PubMed]
  4. DeSantis, T.Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E.L., Keller, K., Huber, T., Dalevi, D., Hu, P. and Andersen, G.L. (2006) Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB, Applied and Environmental Microbiology, 72, 5069-5072. [PubMed]
  5. Li, W. and Godzik, A. (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences, Bioinformatics (Oxford, England), 22, 1658-1659. [PubMed]
  6. Hao, X., Jiang, R. and Chen, T. (2011) Clustering 16S rRNA for OTU prediction: a method of unsupervised Bayesian clustering, Bioinformatics (Oxford, England), 27, 611-618. [PubMed]
  7. Lozupone, C., Lladser, M.E., Knights, D., Stombaugh, J. and Knight, R. (2011) UniFrac: an effective distance metric for microbial community comparison, The ISME journal, 5, 169-172. [PubMed]