OTUX

Manual

The following steps gives a complete walk-through of using OTUX database for taxonomic classification.

Pre-requisites
Downloading OTUX database
Performing reference based OTU-picking/ taxonomic classification with mothur
Parsing mothur outputs to generate abundance profiles
Visualization of taxonomies
Cross-comparison of results using Greengenes IDs

Step 1: Pre-requisites

For any reference based OTU-picking or taxonomic classification approach one requires the following prerequisites

Reference database: We will use an appropriate OTUX database for this walk-through

A taxonomic classification algorithm/tool: For this walk-through we will be using the Wang's algorithm^[1] as implemented in the Mothur project^[2]. Additionally, a few more algorithms are implemented in mothur project for taxonomic classification. You may go through the list on this page as provided by the authors of mothur.

An input file: The input file contains sequence reads in fasta format. Ideally one should preprocess the input file by extracting the targetted v-region from the sequence reads using v-region extraction tools (example: vxtractor^[3]). For this walk-through we will use the fasta file named ‘sample_V4.fasta’, containing sequences of reads targetted for V4 region. A preview of this file is mentioned below.

>SEQUENCE_1
ccttttaagtctgatgtgaaagcccccggctcaaccggggagggtcattggaaactggaaggcttgagtacagaagagaagagtg
>SEQUENCE_2
tttgttaagttggatgtgaaatccccgggctcaacctggganctgcattcaaaactgactgactagagtatggtagagggtggtg
>SEQUENCE_3
cttgttaagtcggatgtgaaagccccgggcttaacctgggaatggcattcgatactggcaggctagagtttggtagagggaagtg
>SEQUENCE_4
tagtgttcagcaagtggattgaaatccccggctcacctggactgcatcaaactactgagctagagtacggtagagggtggtg
>SEQUENCE_5
tttgttaagctagatgtgaaagccccgggctcaacctgggatggtcatttagaactggcagactagagtcttggagaggggagtgg

Step 2: Downloading OTUX database

OTUX provide a set of databases (19) each covering a V-regions (or stretches of V-regions) from 16S rRNA. The regions commonly targetted in amplicon sequencing are included in this database. Download the database corresponding to the targetted region of your input sequence reads. In case the V-region database of your choice is not available please contact us.

Since the reads in ‘sample.fasta’ was sequenced targetting V4 regions, for this walk-through we will require the OTUX database of V4 region as reference. You need to go the download page. Select single V-region from the first dropdown.

Select V4 from the second dropdown.

Once the V4 region is highlighted, you need to click on the same to download the V4 database.

An archived folder is downloaded which has following files:

FASTA file of containing sequences consituting the V4 regions corresponding to OTUX OTU IDs
Taxonomy file of V4 database, which provides sequence wise taxonomic information along with the OTUX IDs
A mapping-back matrix for V4 database (required for mapping back the OTUX IDs obtained in the abundance profiles to Greengenes IDs^[4]). Please refer to this page for detailed information about mapping-back matrices.

For help regarding downloading a database belonging to any other V-region or a stretch of V-regions please refer to this page

Step 3: Performing reference based OTU-picking/ taxonomic classification with mothur

Before starting OTU-picking make sure the database (fasta file and taxonomy file) and input fasta file are located in the same directory where you want to execute mothur command. Enter the mothur environment by typing ‘./mothur’ in the terminal in case you are using the executable. Whereas, you may directly type 'mothur' in case you have installed mothur on your computing device. For any help regarding installing mothur on your device please follow the instructions provided by the authors of mothur at this link. Once you enter the mothur environment you will see this.

mothur v.1.39.5
Last updated: 3/20/2017

by
Patrick D. Schloss

Department of Microbiology & Immunology
University of Michigan
http://www.mothur.org

When using, please cite:
Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.

Distributed under the GNU General Public License

Type 'help()' for information on the commands that are available

Type 'quit()' to exit program

mothur >

In the mothur environment enter the following command

mothur > classify.seqs(fasta=sample_V4.fasta, template=V4.otux.fasta, taxonomy=V4.otux.tax, processors=1, cutoff=80)

The provided parameters which are required for running the above command are explained below.

fasta: your input fasta file
template: fasta file of the reference database. OTUX V4 database in this case.
taxonomy: taxonomy file of the reference database.
processors: number of processors/ cores of the computing machine on which you want to run this command. In this example we have used a single core.
* Please note these cores will be used only for classifying sequences from input fasta file. Only one core will be used for loading the database in the memory. During first run mothur builds few files which stores information pertaining to 8mer probabilities (required by Wang‘s method). However, this is a one time time process and for future runs mothur loads these template probabilities into the memory.
cutoff: The Wang"s algorithm divides the query sequences into 8mers and calculates the probability a sequence from a given taxonomy would contain a specific 8mer by looking at all taxonomies represented in the template. It then calculates probability of the query sequence of being assigned to a taxonomy based on the 8mers it contains. The taxonomy with highest probability is assigned. Apart from this, Wang‘s algorithm also performs bootstrapping to find the confidence limit of taxonomic assignment by randomly picking 1/8 of the 8mers (with replacement) in the query and then finding the taxonomy. To pick taxonomic assignments with high confidence we provide a cutoff/ threshold value. By default this value is 80% which is the conventional norm followed in RDP classifier.

For more details regarding the sequence classification option in mothur please refer to their help page.

Step 4: Parsing mothur outputs to generate abundance profiles

The mothur software will generate two output files,

A taxonomy file. The following gives a preview of taxonomy file generated for the input file used in this walk-through.

SEQUENCE_1 k__Bacteria(100);p__Firmicutes(100);c__Bacilli(100);o__Bacillales(98);f__[Exiguobacteraceae](90);g__Exiguobacterium(90);unclassified;unclassified;
SEQUENCE_2 k__Bacteria(100);p__Proteobacteria(100);c__Gammaproteobacteria(100);o__Pseudomonadales(100);f__Pseudomonadaceae(100);g__Pseudomonas(100);u__OTX040002932(100);unclassified;
SEQUENCE_3 k__Bacteria(100);p__Proteobacteria(100);c__Gammaproteobacteria(99);u__OTX040048496(90);unclassified;unclassified;unclassified;unclassified;
SEQUENCE_4 k__Bacteria(100);p__Proteobacteria(100);c__Gammaproteobacteria(100);o__Pseudomonadales(100);f__Pseudomonadaceae(100);g__Pseudomonas(100);u__OTX040163422(99);unclassified;
SEQUENCE_5 k__Bacteria(100);p__Proteobacteria(100);c__Gammaproteobacteria(100);o__Alteromonadales(100);f__Alteromonadaceae(100);g__Alteromonas(100);u__OTX040009408(89);unclassified;

A tax summary file which gives consolidated summary of the taxonomic classification. The summary of first OTUX ID OTX040004433 in the tax summary file, generated in this walk-though, is shown below.

taxlevel	rankID	taxon	daughterlevels	total
0	0	Root	2	10000
1	0.1	k__Archaea	1	1
2	0.1.2	p__Euryarchaeota	1	1
3	0.1.2.6	c__Methanococci	1	1
4	0.1.2.6.1	o__Methanococcales	1	1
5	0.1.2.6.1.1	f__Methanocaldococcaceae	1	1
6	0.1.2.6.1.1.1	g__Methanocaldococcus	1	1
7	0.1.2.6.1.1.1.2	u__OTX040004433	1	1

The taxonomy file generated by mothur, named ‘sample_V4.otux.wang.taxomomy’, is parsed using the scripts provided by us on the download page. Before running the script, please make sure that the script and the required input files are present in the same directory. The script was tested for mothur output generated by the version v.1.39.5 last updated on 3/20/2017.

sh otux_parser.sh -m sample_V4.otux.wang.taxonomy -b mapping.V4 -g gg_13_8_99.gg.tax -f sample_V4.fasta

Arguments:
  -m  -  taxonomy output file generated by mothur
  -b  -  mapping back file for the corresponding v-region
  -g  -  greengenes taxonomy file
  -f  -  fasta file used as input for mothur

^** The mapping back file to be used above is being downloaded along with database. Whereas, the greengenes taxonomy file, which is required to map back the taxonomies obtained using OTUX to Greengenes taxonomies, can be downloaded from the download page.

Running this script generates,

OTUX abundance profiles at each taxonomic level. One set gives the raw count of abundances while the other set gives percentage normalized values.
Mapped back abundance profiles from OTUX IDs to Greengenes IDs using one-to-one as well as one-to-many mapping back schemes.
In one-to-one mapping back scheme one OTUX ID is mapped back to one Greengenes ID (which is the best representation of that OTUX ID), whereas in one-to-many mapback scheme, the OTUX ID is mapped back to multiple Greengenes IDs. Please go through this help page to know the details pertaining to mapping back of OTUX IDs to greengenes IDs.
A fasta file which contains sequences which remained unclassified at OTU level using OTUX V4 database. This file can be used to facilitate ‘open reference based OTU picking’. One can use de novo clustering tools like CD-HIT ^[5] or CROP ^[6] to cluster these sequences.
A file in json format which stores the generated taxonomy in a structured way. This file may be used for visualization of taxonomic profiles using D3 library.

These results are stored in a structured way (as shown in the figure below) in a directory which includes the prefix of the input file name followed by the current time.

The following section will give a directory-wise preview of each abundance profile. You may download the complete file by clicking on the respective links given above each preview.

1. OTUX

1.1. Raw Count

The taxonomies are assigned according to OTUX nomenclature. The abundances are represented as raw count values.

Abundance profile at Phylum level

Acidobacteria		6
Actinobacteria		1662
Aquificae		7
Armatimonadetes		1
Bacteroidetes		482
Chlamydiae		17
Chlorobi		4
Chloroflexi		5
Cyanobacteria		177
Deferribacteres		1

Abundance profile at Class level

Acidimicrobiia		5
Acidobacteria-6		1
Acidobacteriia		3
Actinobacteria		1646
Alphaproteobacteria		1017
Anaerolineae		2
Aquificae		7
Bacilli		2610
Bacteroidia		123
Betaproteobacteria		576

Abundance profile at Order level

Acholeplasmatales		100
Acidimicrobiales		5
Acidithiobacillales		15
Acidobacteriales		3
Actinomycetales		1615
Aeromonadales		111
Alteromonadales		193
Anaerolineales		1
Aquificales		7
Bacillales		1434

Abundance profile at Family level

Acaryochloridaceae		3
Acetobacteraceae		61
Acholeplasmataceae		100
Acidimicrobiaceae		3
Acidithiobacillaceae		15
Acidobacteriaceae		3
Actinomycetaceae		13
Actinopolysporaceae		4
Actinosynnemataceae		7
Aerococcaceae		10

Abundance profile at Genus level

1-68		1
Acaryochloris		1
Acetobacter		19
Acholeplasma		2
Achromobacter		30
Acidimicrobium		1
Acidiphilium		3
Acidisoma		1
Acidithiobacillus		15
Acidomonas		2

Abundance profile at Species level

Acholeplasma brassicae		1
Acholeplasma laidlawii		1
Acidithiobacillus albertensis		5
Acidomonas methanolica		2
Acinetobacter johnsonii		28
Acinetobacter lwoffii		13
Acinetobacter rhizosphaerae		26
Actinoallomurus iriomotensis		1
Actinobacillus parahaemolyticus		2
Actinomadura echinospora		1

Abundance profile at OTU level

OTX040000007		32
OTX040000017		5
OTX040000019		1
OTX040000023		60
OTX040000048		4
OTX040000052		5
OTX040000053		1
OTX040000133		6
OTX040000137		1
OTX040000138		7

Abundance profile of complete lineage

k__Archaea;p__Euryarchaeota;c__Methanococci;o__Methanococcales;f__Methanocaldococcaceae;g__Methanocaldococcus;u__OTX040004433;		1
k__Bacteria;		5
k__Bacteria;p__Acidobacteria;c__Acidobacteria-6;o__iii1-15;f__mb2424;		1
k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Acidobacteriaceae;g__Granulicella;s__tundricola;u__OTX040121496;		1
k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Acidobacteriaceae;u__OTX040007044;		1
k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Acidobacteriaceae;u__OTX040117629;		1
k__Bacteria;p__Acidobacteria;c__Solibacteres;o__Solibacterales;f__Solibacteraceae;g__Candidatus_Solibacter;u__OTX040038795;		1
k__Bacteria;p__Acidobacteria;c__Sva0725;o__Sva0725;		1
k__Bacteria;p__Actinobacteria;c__Acidimicrobiia;o__Acidimicrobiales;f__Acidimicrobiaceae;g__Acidimicrobium;u__OTX040013048;		1
k__Bacteria;p__Actinobacteria;c__Acidimicrobiia;o__Acidimicrobiales;f__Acidimicrobiaceae;g__Ferrimicrobium;		1

1.2. Percentage Normalized

The taxonomies are assigned according to OTUX nomenclature. The abundances are represented as percentage normalized values.

Abundance profile at Phylum level

Acidobacteria		0.06
Actinobacteria		16.62
Aquificae		0.07
Armatimonadetes		0.01
Bacteroidetes		4.82
Chlamydiae		0.17
Chlorobi		0.04
Chloroflexi		0.05
Cyanobacteria		1.77
Deferribacteres		0.01

Abundance profile at Class level

Acidimicrobiia		0.05
Acidobacteria-6		0.01
Acidobacteriia		0.03
Actinobacteria		16.46
Alphaproteobacteria		10.17
Anaerolineae		0.02
Aquificae		0.07
Bacilli		26.1
Bacteroidia		1.23
Betaproteobacteria		5.76

Abundance profile at Order level

Acholeplasmatales		1
Acidimicrobiales		0.05
Acidithiobacillales		0.15
Acidobacteriales		0.03
Actinomycetales		16.15
Aeromonadales		1.11
Alteromonadales		1.93
Anaerolineales		0.01
Aquificales		0.07
Bacillales		14.34

Abundance profile at Family level

Acaryochloridaceae		0.03
Acetobacteraceae		0.61
Acholeplasmataceae		1
Acidimicrobiaceae		0.03
Acidithiobacillaceae		0.15
Acidobacteriaceae		0.03
Actinomycetaceae		0.13
Actinopolysporaceae		0.04
Actinosynnemataceae		0.07
Aerococcaceae		0.1

Abundance profile at Genus level

1-68		0.01
Acaryochloris		0.01
Acetobacter		0.19
Acholeplasma		0.02
Achromobacter		0.3
Acidimicrobium		0.01
Acidiphilium		0.03
Acidisoma		0.01
Acidithiobacillus		0.15
Acidomonas		0.02

Abundance profile at Species level

Acholeplasma brassicae		0.01
Acholeplasma laidlawii		0.01
Acidithiobacillus albertensis		0.05
Acidomonas methanolica		0.02
Acinetobacter johnsonii		0.28
Acinetobacter lwoffii		0.13
Acinetobacter rhizosphaerae		0.26
Actinoallomurus iriomotensis		0.01
Actinobacillus parahaemolyticus		0.02
Actinomadura echinospora		0.01

Abundance profile at OTU level

OTX040000007		0.32
OTX040000017		0.05
OTX040000019		0.01
OTX040000023		0.6
OTX040000048		0.04
OTX040000052		0.05
OTX040000053		0.01
OTX040000133		0.06
OTX040000137		0.01
OTX040000138		0.07

Abundance profile of complete lineage

k__Archaea;p__Euryarchaeota;c__Methanococci;o__Methanococcales;f__Methanocaldococcaceae;g__Methanocaldococcus;u__OTX040004433;		0.01
k__Bacteria;		0.05
k__Bacteria;p__Acidobacteria;c__Acidobacteria-6;o__iii1-15;f__mb2424;		0.01
k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Acidobacteriaceae;g__Granulicella;s__tundricola;u__OTX040121496;		0.01
k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Acidobacteriaceae;u__OTX040007044;		0.01
k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Acidobacteriaceae;u__OTX040117629;		0.01
k__Bacteria;p__Acidobacteria;c__Solibacteres;o__Solibacterales;f__Solibacteraceae;g__Candidatus_Solibacter;u__OTX040038795;		0.01
k__Bacteria;p__Acidobacteria;c__Sva0725;o__Sva0725;		0.01
k__Bacteria;p__Actinobacteria;c__Acidimicrobiia;o__Acidimicrobiales;f__Acidimicrobiaceae;g__Acidimicrobium;u__OTX040013048;		0.01
k__Bacteria;p__Actinobacteria;c__Acidimicrobiia;o__Acidimicrobiales;f__Acidimicrobiaceae;g__Ferrimicrobium;		0.01

2. Greengenes

2.1. One-to-one mapping back

The taxonomies shown below are obtained by following the one-to-one mapping back scheme of OTUX IDs to Greengenes IDs. Please note that only those sequences will be mapped back to Greengenes OTU identifiers which have been assigned till OTU level .

Abundance profile at Phylum level

Acidobacteria		4
Actinobacteria		926
Aquificae		7
Armatimonadetes		1
Bacteroidetes		266
Chlamydiae		7
Chlorobi		3
Chloroflexi		4
Cyanobacteria		110
Deferribacteres		1

Abundance profile at Class level

Acidimicrobiia		4
Acidobacteriia		3
Actinobacteria		914
Alphaproteobacteria		592
Anaerolineae		2
Aquificae		7
Bacilli		1079
Bacteroidia		89
Betaproteobacteria		351
Chlamydiia		7

Abundance profile at Order level

Acholeplasmatales		64
Acidimicrobiales		4
Acidithiobacillales		5
Acidobacteriales		3
Actinomycetales		902
Aeromonadales		99
Alteromonadales		106
Anaerolineales		1
Aquificales		7
Bacillales		477

Abundance profile at Family level

Acaryochloridaceae		3
Acetobacteraceae		42
Acholeplasmataceae		64
Acidimicrobiaceae		2
Acidithiobacillaceae		5
Acidobacteriaceae		3
Actinomycetaceae		9
Actinopolysporaceae		3
Actinosynnemataceae		5
Aerococcaceae		5

Abundance profile at Genus level

1-68		1
Acaryochloris		1
Acetobacter		14
Acholeplasma		2
Achromobacter		17
Acidicapsa		1
Acidimicrobium		1
Acidiphilium		2
Acidisoma		1
Acidithiobacillus		5

Abundance profile at Species level

Acholeplasma brassicae		1
Acholeplasma laidlawii		1
Acidicapsa borealis		1
Acidithiobacillus albertensis		1
Acidomonas methanolica		2
Acidovorax avenae		1
Acidovorax temperans		1
Acinetobacter lwoffii		11
Acinetobacter rhizosphaerae		23
Actinoallomurus iriomotensis		1

Abundance profile at OTU level

1492		1
2216		1
2235		1
2446		1
2813		1
2885		2
3095		3
3270		1
3308		1
4157		2

Abundance profile of complete lineage

k__Archaea;p__Euryarchaeota;c__Methanococci;o__Methanococcales;f__Methanocaldococcaceae;g__Methanocaldococcus;		1
k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Acidobacteriaceae;		1
k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Acidobacteriaceae;g__Acidicapsa;s__borealis;		1
k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Acidobacteriaceae;g__Granulicella;s__tundricola;		1
k__Bacteria;p__Acidobacteria;c__Solibacteres;o__Solibacterales;f__Solibacteraceae;g__Candidatus_Solibacter;		1
k__Bacteria;p__Actinobacteria;c__Acidimicrobiia;o__Acidimicrobiales;f__Acidimicrobiaceae;		1
k__Bacteria;p__Actinobacteria;c__Acidimicrobiia;o__Acidimicrobiales;f__Acidimicrobiaceae;g__Acidimicrobium;		1
k__Bacteria;p__Actinobacteria;c__Acidimicrobiia;o__Acidimicrobiales;f__Microthrixaceae;g__Candidatus_Microthrix;s__parvicella;		2
k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;		8
k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Actinomycetaceae;		1

2.2. One-to-many mapping back

The taxonomies shown below are obtained by following the one-to-many mapping back scheme of OTUX IDs to Greengenes IDs. The values are percentage normalized. Please note that only those sequences will be mapped back to Greengenes OTU identifiers which have been assigned till OTU level .

Abundance profile at Phylum level

Acidobacteria		0.0416055
Actinobacteria		8.51558
Aquificae		0.0718798
Armatimonadetes		0.0104014
Bacteroidetes		2.63731
Chlamydiae		0.0728097
Chlorobi		0.0312042
Chloroflexi		0.0416055
Cyanobacteria		1.11808
Deferribacteres		0.0101477

Abundance profile at Class level

Acidimicrobiia		0.0416055
Acidobacteriia		0.0312041
Actinobacteria		8.39077
Alphaproteobacteria		5.8427
Anaerolineae		0.0208028
Aquificae		0.0718798
Bacilli		10.5232
Bacteroidia		0.861744
Betaproteobacteria		3.46319
Chlamydiia		0.0728097

Abundance profile at Order level

ASSO-13		0.00111443
Acholeplasmatales		0.654314
Acidimicrobiales		0.0416055
Acidithiobacillales		0.0476798
Acidobacteriales		0.0312041
Actinomycetales		8.27316
Aeromonadales		0.823742
Alteromonadales		1.02251
Anaerolineales		0.0104014
Aquificales		0.0718798

Abundance profile at Family level

Acaryochloridaceae		0.0312042
Acetobacteraceae		0.426603
Acholeplasmataceae		0.654314
Acidimicrobiaceae		0.0208028
Acidithiobacillaceae		0.0476798
Acidobacteriaceae		0.0312041
Actinomycetaceae		0.0936123
Actinopolysporaceae		0.0312041
Actinosynnemataceae		0.0512639
Aerococcaceae		0.0514465

Abundance profile at Genus level

1-68		0.0102461
Acaryochloris		0.0104014
Acetobacter		0.13818
Acholeplasma		0.0208028
Achromobacter		0.155692
Acidicapsa		0.00520068
Acidimicrobium		0.0104014
Acidiphilium		0.0199707
Acidisoma		0.0104014
Acidithiobacillus		0.0476798

Abundance profile at Species level

Acholeplasma brassicae		0.0104011
Acholeplasma laidlawii		0.0104011
Acidicapsa borealis		0.00520054
Acidithiobacillus albertensis		0.0104011
Acidomonas methanolica		0.0208022
Acidovorax avenae		0.00120013
Acidovorax defluvii		0.000137808
Acidovorax konjaci		0.000200021
Acidovorax temperans		0.00972927
Acinetobacter guillouiae		0.021411

Abundance profile at OTU level

1186		0.000813513
1209		0.000472789
1395		0.00123826
1397		0.00619131
1475		0.000650085
1491		0.000650085
1492		0.00520068
1950		0.00297181
1992		0.000299936
2066		0.000650087

Abundance profile of complete lineage

k__Archaea;p__Euryarchaeota;c__Methanococci;o__Methanococcales;f__Methanocaldococcaceae;g__Methanocaldococcus;		0.0104014
k__Bacteria;		0.000800105
k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Acidobacteriaceae;		0.015602
k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Acidobacteriaceae;g__Acidicapsa;s__borealis;		0.00520068
k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Acidobacteriaceae;g__Granulicella;s__tundricola;		0.0104014
k__Bacteria;p__Acidobacteria;c__Solibacteres;o__Solibacterales;f__Solibacteraceae;g__Candidatus_Solibacter;		0.0104013
k__Bacteria;p__Actinobacteria;c__Acidimicrobiia;o__Acidimicrobiales;f__Acidimicrobiaceae;		0.0104014
k__Bacteria;p__Actinobacteria;c__Acidimicrobiia;o__Acidimicrobiales;f__Acidimicrobiaceae;g__Acidimicrobium;		0.0104014
k__Bacteria;p__Actinobacteria;c__Acidimicrobiia;o__Acidimicrobiales;f__Microthrixaceae;g__Candidatus_Microthrix;s__parvicella;		0.0208027
k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;		0.0856069

Few sequences remained unclassified at OTU level using OTUX V4 database. A fasta file was created which can be used to facilitate ‘open reference based OTU picking’ using tools like CD-HIT or CROP.

Step 5: Visualization of taxonomies

One can visualize the taxonomies by uploading the json file on the visualization page. Following options are available for visualization. Click on any one of the images to expand.
Please note that a copy of json file will be generated as one of the output files. However you need to have a D3 library installed to visualize the same.

Hierarchical Bar Chart	Zoomable Circle Packing	Zoomable Sunburst

A brief description of the visualizations is given below.

Hierarchical Bar Chart: Each bar represents a taxonomic level. The bar chart starts with the highest taxonomic level, i.e, phylum. The length of the bar is determined by the abundance of that taxa. On has to click on any bar to expand and go to the daughter level of the represented taxa. To go to the parent level click on the white space next to the bars.
Source: Mike Bostock's Block available at this link.

Zoomable Circle Packing: Each taxa level is represented by a circle. Circle representing parent level taxa contains the circles reprenting its daughter levels in a heirarichal manner. Each circle can be zoomed into by clicking on the same.
Source: Mike Bostock's Block available at this link.

Zoomable Sunburst: Each taxa level is present as concentric circles with the innermost circle being the highest taxonomy. At a time only two levels are shown. For a taxa with greater abundance, the represented area of that taxa in that concentric circle will be higher. To zoom in on the taxa level (go to daughter levels) on has to click on the represented area. To zoom out (go to parent level) one has to click on the innermost circle.
Source: Vasco Asturiano's Block available at this link.

Step 6: Cross-comparison of results using Greengenes IDs

In the above steps we gave a detailed walk-through of obtaining abundance profile of the sample.fasta file (having reads targetted for V4 region) at each taxonomic level in terms of greengenes IDs. Suppose you want to compare your results with any other study which targetted V2V3 region (download results of this example study) you can use the abundance profiles of OTUX IDs to get abundance profiles in terms of greengenes taxonomy.

Abundance profiles for each taxa level will also be generated. Using these abundance profiles you can compare the results of the two studies.

Calculating Unifrac distance^[7] is one of the methods which can be used to compare microbial communities.

OTUXv2 version

Please note that OTUXv2 approach follows the same steps as mentioned above. Only the FASTA, taxonomy and mapping files are to be changed.

References

Wang, Q., Garrity, G.M., Tiedje, J.M. and Cole, J.R. (2007) Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy, Applied and Environmental Microbiology, 73, 5261-5267. [PubMed]
Schloss, P.D., Westcott, S.L., Ryabin, T., Hall, J.R., Hartmann, M., Hollister, E.B., Lesniewski, R.A., Oakley, B.B., Parks, D.H., Robinson, C.J., Sahl, J.W., Stres, B., Thallinger, G.G., Van Horn, D.J. and Weber, C.F. (2009) Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities, Applied and Environmental Microbiology, 75, 7537-7541. [PubMed]
Hartmann, M., Howes, C.G., Abarenkov, K., Mohn, W.W. and Nilsson, R.H. (2010) V-Xtractor: an open-source, high-throughput software tool to identify and extract hypervariable regions of small subunit (16S/18S) ribosomal RNA gene sequences, Journal of Microbiological Methods, 83, 250-253. [PubMed]
DeSantis, T.Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E.L., Keller, K., Huber, T., Dalevi, D., Hu, P. and Andersen, G.L. (2006) Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB, Applied and Environmental Microbiology, 72, 5069-5072. [PubMed]
Li, W. and Godzik, A. (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences, Bioinformatics (Oxford, England), 22, 1658-1659. [PubMed]
Hao, X., Jiang, R. and Chen, T. (2011) Clustering 16S rRNA for OTU prediction: a method of unsupervised Bayesian clustering, Bioinformatics (Oxford, England), 27, 611-618. [PubMed]
Lozupone, C., Lladser, M.E., Knights, D., Stombaugh, J. and Knight, R. (2011) UniFrac: an effective distance metric for microbial community comparison, The ISME journal, 5, 169-172. [PubMed]