Using DTASelect with Sorcerer

DTASelect is a great tool from the Yates Lab, authored by Daniel Corciova and colleagues, for filtering and organizing peptide IDs from Sequest. We have built support for using this program as a post-processing step for Sorcerer. This article describes how to obtain, install and use DTASelect on Sorcerer.

Installing DTASelect

The program may be obtained directly from the Yates Laboratory under the Download tab. Review and agree to the license terms, and download a recent version of the DTASelect distribution, e.g. DTASelect2_0_25.tar.gz. Place this distribution in the /home/sorcerer/custom directory on Sorcerer. Extract the software by typing the commands (as the sorcerer user):

cd /home/sorcerer/custom
tar -zxvf DTASelect2_0_25.tar.gz
ln -s DTASelect2_0_25 DTASelect

Also, grab a copy of the most recent manual at http://fields.scripps.edu/DTASelect/DTASelect-Manual.pdf. (This describes V1.9 of DTASelect; the current V2.0 version is generally compatible, but adds some new options, notably for search statistics, that are summarized in the online help.) DTASelect is a Java program and requires a Java environment of at least V1.6. Currently, Sorcerer has an older version of Java pre-installed. See the article on how to update Java on Sorcerer before using DTASelect. Verify your installation of DTASelect by typing:

java -cp /home/sorcerer/custom/DTASelect DTASelect -h | more

You should see a listing of DTASelect’s options, which you may find interesting to review. Scroll through the pages of output by hitting the space key. Note that the CGI options for  external visualization modules do not apply to this installation, because the extra components are not included in the distribution. You may encounter broken hyperlinks in the reports if you try to invoke them.

Configuring DTASelect for MUSE postprocessing

We have a Muse script which makes it easy to run DTASelect automatically on Sorcerer search results. After you install DTASelect as described above, install the Muse script and its associated support scripts, by logging  into Sorcerer as the sorcerer user and typing these commands (or by copying the block and pasting it into a terminal window):

cd ~/custom
wget -r -nd http://proteomics2.com/extras/dtaselect.muse
wget -r -nd http://proteomics2.com/extras/make_decoydb.mu
wget -r -nd http://proteomics2.com/extras/fasta-decoy.mu
chmod ugo+x dtaselect.muse make_decoydb.mu fasta-decoy.mu

How to run DTASelect automatically after a Sorcerer search

In the Sorcerer search setup page, do this:

  1. Select Advanced Options to expose more settings
  2. Select the Muse custom script option
  3. Type dtaselect.muse as the name of the script, followed by the DTASelect command line options you want

Considerations when running the dtaselect.muse script

  • Unless you’ve carefully set up decoy databases as described below, turn the statistics off with the --nostats option. This isn’t in the manual, but it’s necessary to get the program to behave as described in the manual.
  • If you are using decoy databases prepared with the Sorcerer make_decoydb script, then specify the option --decoy “##” to be consistent with the Sorcerer decoy sequence tag convention.
  • Also, don’t select --GUI when running a search from the queue. If you do, DTASelect will bring up the GUI at some future point when you may not expect it, and it will sit there until you do something with it. Better to run the GUI again later from the command line when you are ready to interact with the system.
  • DTASelect operates in the ‘original’ subdirectory of the results directory, but the dtaselect.muse script runs in the results directory itself (i.e., /home/sorcerer/output/nnn). For your convenience, and for easy browsing from the Sorcerer search results page, the script puts links to the DTASelect output reports in the results directory.
  • You can click DTASelect.html from the results page and get a user-friendly view in your browser. Just don’t expect all the links to external views to work if you haven’t installed the helper CGIs.
  • The format of the sequest.params file expected by DTASelect is slightly different from the Sorcerer format. dtaselect.muse makes the required changes and backs up the original.

Running dtaselect.muse from the command line

The script can also be invoked manually on an old search you may have, or to repeat an earlier analysis with different conditions. To do this:

  • start the Gnome desktop (type startx from the sorcerer prompt)
  • open a terminal window (right-click on the desktop)
  • Go to the appropriate results directory: cd /home/sorcerer/output/nnn
  • type dtaselect.muse, and add any DTASelect options you may want, including the --GUI option if you like.

Target-decoy searching and DTASelect

DTASelect needs the FASTA database that corresponds to the Sequest search results in order to do its analysis. Since Sorcerer does the actual search against a PeptideDB rather a FASTA file directly, that means DTASelect needs to be referred to the FASTA file from which the PeptideDB is derived.  This is part of what the dtaselect.muse script does automatically. But it does mean that DTASelect cannot currently understand Sorcerer’s option for on-the-fly decoy searching, because those sequences aren’t in the FASTA file itself. (We are collaborating with the Yates lab and we expect the on-the-fly decoys will be supported eventually.)

In order to use target-decoy database analysis in DTASelect — and it is enabled by default — you will have to have decoy sequences in the database itself, rather than using Sorcerer’s on-the-fly decoys. You can use the make_decoydb.mu script described that you already downloaded to do this, as described below. Otherwise,  If you do not have a FASTA database with properly labelled decoy sequences, then DTASelect’s false positive analysis won’t work and you should use the --nostats option  to turn the analysis off altogether.

Using a target-decoy FASTA database with DTASelect on Sorcerer

There are a number of tools in the community for building reverse or other decoy sequence databases.  Or you may be able to find a usable FASTA database already with decoys. A suitable database will have:

  • Target and decoy sequences in the same file — e.g. a concatenated forward-reverse FASTA
  • Distinctive tags for the decoys as a prefix on the accession number (i.e., following the > in the FASTA)
    • Sorcerer uses ##, but REV_ is often seen, too.
    • Make sure you specify the tag in the dtaselect.muse script: e.g. --decoy “##”
  • When in-silico digested, the distributions of decoy peptides by number, size, composition, etc., should be similar to those of the target peptides
    • this is more-or-less true for a simple reversed sequence database.

Scripts to build target-decoy FASTAs on Sorcerer itself are included in the set of files in the download described above. For example, to use SwissProt for target-decoy analysis, do this:

  • start the Gnome desktop (type startx from the sorcerer prompt)
  • open a terminal window (right-click on the desktop)
  • Go to the database directory: cd ~/fasta
  • type make_decoydb.mu swissprot.fasta (or whatever the name of the database is that you are working with)

This creates a new FASTA file called swissprot.TargetDecoy.fasta, with reverse sequences tagged with ##, which can be prepared and searched in the GUI in the normal way. Then  it can be used for DTASelect analysis by specifying the --decoy “##” option.

You must be logged in to post a comment.