Muse

You are currently browsing the archive for the Muse category.

Our R&D team is busy working on the next major version of the Sorcerer-PE software, and expects to release it to then-in-warranty customers in the next few weeks.  Early previews and beta tests of some of the components will be made available by arrangement to qualified customer sites.

Highlights of the upcoming release include:

  • ETD fragmentation support and analysis
  • MUSE scripting modules for rescoring peptide matches with Olsen-Mann and Sadygov-Coon scores
  • Interoperation with major components of the Yates lab Sequest suite, including the DTASelect filtering and statistical analysis tool, and the Census quantitation application
  • Enhancements to the SEQUEST engine which provide first-pass cross-correlation scoring and E-values for greater accuracy and sensitivity

Read the rest of this entry »

We’ve developed a new Muse workflow for target-decoy analysis and false discovery rate estimation, based on our integration of DTASelect from the Yates lab. DTASelect can now use target-decoy FASTA files that are installed on Sorcerer to support its statistical analysis. It provides an easy-to-interpret results report complete with match statistics and estimated false discovery rates.

Our DTASelect on Sorcerer page on this blog has been updated to describe the target-decoy workflow, in addition to the existing material on installing, configuring and running DTASelect and the Muse script. Please visit it to get links to the latest scripts and for a detailed How-To.

Many of our customers have found DTASelect to be a very useful postprocessing tool for Sequest results, and have reported success using it with Sorcerer output. Up until now, however, these customers have generally run the tool manually on a separate desktop computer. Now we have developed a Muse script to make it easy to do this automatically, on Sorcerer itself.

See our DTASelect on Sorcerer page on this blog for a detailed How-to on installing, configuring and running DTASelect and the Muse script.

If you are interested in using Ascore as described in the application note on the blog, please contact us for new Muse scripts for your Sorcerer. We’ve just updated them, and they are needed to work with the recent v4.0 release of TPP, which is what’s in the current Sorcerer release.

Here’s a how-to for technically advanced users who need to update the Java platform on Sorcerer. It’s not required for the base Sorcerer software, including ScaffoldBatch, but it may be necessary for Phenyx installation. Please consult our technical support staff before deciding to do the update.

These instructions assume that you have a recent 64-bit Sorcerer operating platform (either RHEL 5.2 or Centos 5-based), and that your Sorcerer software is at V3.5.

Here are the steps:

  1. Get the latest Java Development Kit (JDK)  (currently v6 update 18) from http://java.sun.com/javase/downloads/index.jsp. Click on the ‘Download JDK’ button. Get the Linux x64 platform, and download the non-rpm file which has a name like jdk-6u18-linux-x64.bin
  2. Log in as root in a terminal window and type: cd /opt
  3. Copy the file you downloaded to /opt, and unpack it:  /bin/sh jdk-6u18-linux-x64.bin
  4. Note the name of the pathname to java in the unpacked directory for use in the next step, e.g. /opt/jdk1.6.0_18/bin/java
  5. Type:  /usr/sbin/alternatives --install /usr/bin/java java /opt/jdk1.6.0_18/bin/java 2
    • This sets up a system of links from /usr/bin/java to the new installation
  6. Type: /usr/sbin/alternatives --config java
    • Enter ‘2′ at the prompt to select the newly installed alternative
  7. Check you have the latest java by typing:  java -version

(Optional) Update Firefox Java plugin:

  1. Create a plugins directory in the Firefox installation directory if the plugins directory does not exist. Please check your version of Firefox to determine the correct path to use: mkdir /usr/lib64/firefox-3.x.x/plugins
  2. Create a symbolic link to the new Java plugin. Again please check your Firefox and JRE version for the correct paths: ln -s /opt/jdk1.6.0_18/jre/lib/amd64/libnpjp2.so /usr/lib64/firefox-3.0.5/plugins/

APEX (’Absolute Proteomics Expression’) is a technique developed by Lu et al. for label-free quantitation of proteins based on MS-MS spectral counting of peptides. Unlike basic methods of this sort which suffer from variable detection probabilities that depend on the physiochemical properties of the peptides, APEX includes correction factors that predict the detection rates of the peptides for a better protein quantitation result.

There is an open source APEX Quantitative Proteomics Tool that implements this technique and that can use Sequest-based protein IDs as analyzed by the Trans-Proteomic Pipeline. Sorcerer users had the idea of using the tool in conjunction with Sorcerer, and now we have developed a workflow and MUSE script to help other users use this combination.

For more information, please read the application note ‘Sorcerer Workflow for the APEX Quantitative Proteomics Tool’.

MSQuant, from the Centre for Experimental Bioinformatics (DK), is a leading tool used by the Matthias Mann group and others for quantitative proteomics, and in particular, SILAC analysis. It is a Windows program that is designed to take MS-MS raw files and protein IDs in the form of a Mascot Peptide Summary Report. So up until now, if you wanted to use MSQuant, the only practical way of doing it was to have Mascot installed and run it first.

Now, however, there another option: MSQuant users can use a Sequest/TPP-based toolchain for protein ID, and using a conversion utility, they can transform the ProtXML/PepXML files from TPP into a format which MSQuant can load.  Using Sorcerer’s scripting environment, MUSE, the transform can be done automatically as post-processing of a Sorcerer search. A further advantage of doing it this way is that the Sequest/TPP toolchain needs no special preparation of the input peaklist files to extract all the information that MSQuant requires for links to the scans in the raw file.

Read the rest of this entry »

A ‘Muse’ is a Greek goddess with inspirational and creative power — perhaps someone you might expect to hang out with Sorcerers!

Indeed, MUSE(R) is a recursive acronym for ”MUSE Utilities for Search Engines”. The MUSE platform is developed to allow rapid prototyping of new scoring algorithms, such as for “Proteomics 2.0″ analyses of PTMs, ETD, and quantitation.

There are currently 3 big challenges in proteomic data analysis today:

  1. Data scale and throughput
  2. Workflow integration
  3. Analysis flexibility

The few proteomic labs with the compute servers to handle large-scale data-sets, the know-how to integrate robust workflows, and the programming capability to develop semi-custom analyses and algorithms can do big science. Today, much more than instrumentation, data analysis capability separates the ‘haves’ from the ‘have-nots’ in proteomics research.

The SORCERER 2 appliance already addresses throughput and workflow integration. With the new MUSE integrated scripting platform, the SORCERER 2 appliance can now address all three to provide the most advanced platform for advanced proteomic data analysis.

The MUSE platform is specifically designed to allow trained researchers to quickly interrogate, filter, and manipulate their large-scale data-sets interactively, along with easy-to-use scripting libraries for developing new scoring functions that compare spectra against a peptide sequence with PTMs.

Technically, the MUSE platform consists of two components: the MUSE scripting language and the MUSE scripting environment.

The MUSE scripting language is a proteomics extension of the LUA language popularized by online video games due to its speed and extensibility. (It is considered one of the fastest scripting languages, is easier to read than Perl, and is syntactically similar to Java.)

The MUSE scripting environment is based on the Bash shell, and includes Perl, PHP, sed, awk, and other popular tools on a 64-bit Enterprise Linux platform, with three decades of robust history.

Even with the very first MUSE platform, it is possible to write single lines to make regular expression substitutions, sort search results by score or delta-mass, write new scoring functions, re-arrange or combine fields, or change formats.

In one test case, we are able to write out the search results into a virtual spreadsheet with 6000 rows and 6000 columns that can be filtered and sorted at will. With adequate training and tech support, researchers can rapidly sort results by XCorr or mass difference, search for phosphorylated sites, and convert PTM symbols to actual masses without programming, for example.

You can see MUSE examples at the Proteomics 2.0 blog by searching for “MUSE”:   http://www.proteomics2.com/ .

The MUSE script ’sorcout.mu’ can be used to summarize the top peptide scores from SORCERER-SEQUEST into a CSV format for importing into Excel.

This is useful to performing non-standard analyses (i.e. separate from PeptideProphet or Scaffold), or for further manipulation of the data using scripting languages like Perl or MUSE.

Simply type “sorcout.mu” in the MUSE box (under Advanced Options in the Search page).

It can also be run interactively after the search, by running it inside the output directory for the search job (e.g. “/home/sorcerer/output/45/”), just above the ‘original’ directory.

It will search all subdirectories for *.out files, and turn the top peptide from each *.out file into a single CSV line.

As well, the MUSE script can be copied and modified as needed to customize to a specific format.

Note: sorcout.mu is available in Sorcerer PE v3.5+ revisions.

N-linked protein glycosylation is a common post-translational modification (PTMs) in many cellular processes. Atwood et al (RCMS 2005) describe a tandem mass spec-based methodology to analyze N-linked glycopeptides.

Enriched glycopeptides are treated with peptide N-glycosidase F, which removes the carbohydrate moieties from the peptide backbone. Deglycosylated peptides are analyzed with a tandem mass spec. The resulting MS/MS spectra are searched against a modified protein sequence database that allows only PTMs on N’s within the consensus sequence N-x-y, where x is any residue other than proline, and y is either serine or threonine.

To analyze this PTM on the deglycosylated peptides on SORCERER, we need to search for a monoisotopic mass shift of 0.9840 Da on N’s only in the {N[^P][ST]} consensus sequence.

To search this PTM on the SORCERER, we do the following 2 steps:

1) Create a new protein sequence database that replaces ‘N’ with ‘J’ in the consensus sequence.

2) Prepare this new sequence database for searching by defining ‘J’ to have the same mass as ‘N’ using a static modification setting on ‘J’.

3) Submit a search on SORCERER with a variable modification search on ‘J’ with a mass shift of +0.9840 Da.

Create New Protein Database

Use the MUSE script ‘nlinkglyco-fasta.mu’ (part of SORCERER PE v3.5) to create a new protein sequence database that replaces each N in the consensus sequence with J.

Simply log onto SORCERER, go to directory ‘/home/sorcerer/fasta/’ where the protein sequences are, and create a new fasta file from an existing one (for example, create ‘ipi.human_n2j.fasta’ from ‘ipi.HUMAN.fasta’) . Then use prepare this new fasta file for searching as you would any other protein sequence file.

Once you log onto the SORCERER, and type the following 2 commands (do not type the ’sorc$’ which is the SORCERER prompt):

   sorc$ cd /home/sorcerer/fasta/

   sorc$ nlinkglyco-fasta.mu < ipi.HUMAN.fasta > ipi.human_n2j.fasta

The latter command literally means to run the MUSE script using “standard input” from file ipi.HUMAN.fasta (after the ‘<’ symbol) and sending the “standard output” to the new file ipi.human_n2j.fasta (after the ‘>’ symbol).

(The script may be easily copied and modified for another consensus sequence. Contact TechTeam for details.)

Prepare Database for Searching

When the new protein sequence database is prepared for searching, assign a static modification ‘MakeN’ of -9885.95707256 Da. This will cause the final ‘J’ mass to be the monoisotopic mass of 114.04292744 Da. (The normally unused codes ‘J’ and ‘U’ are set at 10,000 Da to flag any inadvertent usage.) The resulting peptide database will be used for subsequent searching.

SORCERER Search

The search can now be submitted by creating a user-defined variable modification ‘Nlinkglyco’ with mass of 0.9840 Da on the residue ‘N’ against the new peptide database.

 

We thank Dr. Rebekah Gundry from the Van Eyk Lab at Johns Hopkins for bringing this SORCERER application to our attention!

Reference: Atwood et al (Rapid Comm Mass Spec 2005; 19: 3002-3006 DOI: 10.1002/rcm.2162)