Before starting OTU-picking make sure the database (fasta file and taxonomy file) and input fasta file are located in the same directory where you want to execute mothur command. Enter the mothur environment by typing ‘./mothur’ in the terminal in case you are using the executable. Whereas, you may directly type 'mothur' in case you have installed mothur on your computing device. For any help regarding installing mothur on your device please check this link. Once you enter the mothur environment you will see this.


mothur v.1.34.4
Last updated: 12/22/2014

by
Patrick D. Schloss

Department of Microbiology & Immunology
University of Michigan
pschloss@umich.edu
http://www.mothur.org

When using, please cite:
Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.

Distributed under the GNU General Public License

Type 'help()' for information on the commands that are available

Type 'quit()' to exit program



mothur >

In the mothur environment enter the following command

mothur > classify.seqs(fasta=sample.fasta, template=otux_v4.fasta, taxonomy=otux_v4.tax, processors=1, cutoff=80)

The provided parameters which are required for running the above command are explained below.

  1. fasta: your input fasta file
  2. template: fasta file of the reference database. OTUX V4 database in this case.
  3. taxonomy: taxonomy file of the reference database.
  4. processors: number of processors/ cores of the computing machine on which you want to run this command. In this example we have used a single core.
    * Please note these cores will be used only for classifying sequences from input fasta file. Only one core will be used for loading the database in the memory. During first run mothur builds few files which stores information pertaining to 8mer probabilities (required by Wang‘s method). However, this is a one time time process and for future runs mothur loads these template probabilities into the memory.
  5. cutoff: The Wang‘s algorithm divides the query sequences into 8mers and calculates the probability a sequence from a given taxonomy would contain a specific 8mer by looking at all taxonomies represented in the template. It then calculates probability of the query sequence of being assigned to a taxonomy based on the 8mers it contains. The taxonomy with highest probability is assigned. Apart from this, Wang‘s algorithm also performs bootstrapping to find the confidence limit of taxonomic assignment by randomly picking 1/8 of the 8mers (with replacement) in the query and then finding the taxonomy. To pick taxonomic assignments with high confidence we provide a cutoff/ threshold value. By default this value is 80% which is the conventional norm followed in RDP classifier.

For more details regarding the sequence classification option in mothur please refer to their help page.