Articles by techteam

You are currently browsing techteam’s articles.


Prof. Josh Elias (left) of Stanford University receives a thank-you gift from David Chiang after his talk.

Ever wondered about target-decoy searching? Want to gain a better understanding and realistic expectation of this effective tool? SageNResearch’s video “Addressing Peptide Identification Signal-to-noise With Target-Decoy Searching”, given by Professor Josh Elias of Stanford University at our “Translational Proteomics 2.0″ meeting, can help. Dr. Elias is an Assistant Professor in Chemical and Systems Biology at Stanford University, and was part of the Steven Gygi Lab at Harvard Medical School before that. His lab is keenly interested in developing and applying methods to meet the current challenges facing scientists engaged in large scale proteome characterization.

Josh kicked off his talk with a stunning and very powerful visual to hit home the concept of what target-decoy database searching can do — you’ll never look at coffee beans in quite the same way. With this talk, you’ll know how to better find a happy medium for thresholds, smarter ways of designing your filtering criteria, when not to even consider using the method, how to get the most out of (really easy) decoy searching in SORCERER, and what’s so good about partial tryptic searches.

The 30-minute presentation is available at: http://www.scivee.tv/node/15544
To view slides, I recommend using the “full screen” mode. The slide set can also be downloaded as a Powerpoint file.


Prof. Alexey Nesvizhskii (left) of University of Michigan receives a thank-you gift from David Chiang after his talk.

If you really want to understand how peptide and protein identification is done, this video talk is a must-see!

Professor Alexey Nesvizhskii of the University of Michigan is one of the co-inventors (with Dr. Andy Keller) of the popular PeptideProphet/ProteinProphet algorithm for turning search engine results into statistically consistent peptide and protein identifications. (This algorithm is also the basis for the popular Scaffold software.)

At the “Translational Proteomics 2.0″ meeting, we were privileged to have Alexey give his insightful talk that reviews the various steps involved in inferring peptide and protein identifications from large spectra datasets.

In this talk, you will learn why False Discovery Rates are preferred over P-values, why you probably should not run more than 4 replicates of a MudPIT experiment, how FDR estimations from decoy differ from Peptide/ProteinProphet, how “The Two Prophets” compute probabilities by curve-fitting the score distributions, how sensitivity and FDR are computed, and the what and why of some advanced TPP options.

The talk is available at: http://www.scivee.tv/node/12671 (45 minutes).

I recommend using the “full screen” mode so you can view the slides, which are also available as a download from the site. (Please be aware that the slideset order is different from that in the presentation.)

(Note: Both Trans-Proteomic Pipeline and Scaffold Batch software are integrated into the SORCERER platforms.)


Jimmy Eng (left) of University of Washington receives a thank-you gift from David Chiang after his talk.

During our Translational Proteomics 2.0 Meeting, we were privileged to have Jimmy Eng (University of Washington) give us his uncommon insights into using SEQUEST with the Trans-Proteomic Pipeline (TPP).

This talk will be invaluable for advanced users of the SEQUEST search engine for sensitive translational proteomics analysis. All active SEQUEST users should listen to this talk!

Researchers will benefit by increasing their sensitivity and decreasing their false discovery rates when identifying proteins and post-translational modifications using proteomics mass spectrometers like the Orbitrap.

Jimmy is one of the most prolific proteomics developers over almost two decades, as the co-inventor (with John Yates) of proteomic search engines and SEQUEST, as well as the developer of a number of TPP tools.

Conclusions from slides:
- Semi-tryptic searches are better
- Use monoisotopic masses for fragment ions
(Use monoisotopic masses for precursor ions if data from a high-res instrument)
- Narrow mass tolerance searches better if search considers precursor mass isotope assignment error

The talk is available at:  http://www.scivee.tv/node/11920 (31 minutes).

I recommend using the “full screen” mode so you can view the slides, which are also available as a download from the site.

APEX (’Absolute Proteomics Expression’) is a technique developed by Lu et al. for label-free quantitation of proteins based on MS-MS spectral counting of peptides. Unlike basic methods of this sort which suffer from variable detection probabilities that depend on the physiochemical properties of the peptides, APEX includes correction factors that predict the detection rates of the peptides for a better protein quantitation result.

There is an open source APEX Quantitative Proteomics Tool that implements this technique and that can use Sequest-based protein IDs as analyzed by the Trans-Proteomic Pipeline. Sorcerer users had the idea of using the tool in conjunction with Sorcerer, and now we have developed a workflow and MUSE script to help other users use this combination.

For more information, please read the application note ‘Sorcerer Workflow for the APEX Quantitative Proteomics Tool’.

We are pleased to announce the availability of the ISIS (Integrated Storage and Information System), which is configured and integrated to work directly with the SORCERER Enterprise bladecenter system to provide 4 to 100+ terabytes of integrated, protected storage for proteomics, genomics, imaging, and other repository needs. A second backup ISIS system can be configured offsite to provide additional backup and disaster recovery needs. To simplify maintenance and warranty for our clients, it will be covered under the same warranty plan as the SORCERER system for 3 years or 5 years.

The base ISIS system will provide approximately 4.1 terabytes of secure storage in a “2U” height, rack-mount system, consisting of twelve 450 GB SAS disks with 2 disk redundancy in RAID6.

In most countries, the ISIS system consists of the following:

- ISIS storage integration software interface running on SORCERER platform
- Fujitsu ETERNUS DX80 with single controller
- Approximately 4.1TB usable (12 x 1TB SATA disks in Raid6) per 2U rack, with up to 20 racks
- Min 3 year warranty is included (subject to the TSP coverage of the SORCERER)

Note that future expansion to 100+ TB will require additional ISIS expansion units or higher density SAS drives.

New clients can order the SORCERER Enterprise blade system with the ISIS system together as two rack-mount units. Clients with newer SORCERER 2 integrated data appliances with at least 8 CPU cores can simply add the ISIS to their existing system. (Older SORCERER systems will require a hardware upgrade.)

Please contact sales@SageNResearch.com for more information.

We are lining up an exciting program for our SORCERER clients on Sunday, May 31, 2009, from 1:30pm to 5:00pm, in Philadelphia, just before the ASMS reception:
http://www.asms.org/Default.aspx?tabid=209

Like last year, we will have plenty of toys and give-aways. It will continue to be a closed meeting, open to in-warranty clients and special invited guests only. Pre-registration will be required due to space limitations.

Please stay tuned for more details.

To see and hear what some of our SORCERER clients have to say about their own experiences, just click the link below:

Click Here for Video of Dr. Rich James (Randall Moon Lab, University of Washington, WA, US)

Here are some notes from the TPP support group on using Tandem Mass Tags (i.e. similar to iTRAQ):

http://groups.google.com/group/spctools-discuss/browse_thread/thread/98dcb28f8dfa2349?hl=en

Here is Thermo’s TMT information:

http://www.thermo.com/com/cda/article/general/1,,20815,00.html

Note that TMT pre-dates iTRAQ, and is a significantly larger molecular tag. At present, iTRAQ has a larger marketshare than TMT.

MSQuant, from the Centre for Experimental Bioinformatics (DK), is a leading tool used by the Matthias Mann group and others for quantitative proteomics, and in particular, SILAC analysis. It is a Windows program that is designed to take MS-MS raw files and protein IDs in the form of a Mascot Peptide Summary Report. So up until now, if you wanted to use MSQuant, the only practical way of doing it was to have Mascot installed and run it first.

Now, however, there another option: MSQuant users can use a Sequest/TPP-based toolchain for protein ID, and using a conversion utility, they can transform the ProtXML/PepXML files from TPP into a format which MSQuant can load.  Using Sorcerer’s scripting environment, MUSE, the transform can be done automatically as post-processing of a Sorcerer search. A further advantage of doing it this way is that the Sequest/TPP toolchain needs no special preparation of the input peaklist files to extract all the information that MSQuant requires for links to the scans in the raw file.

Read the rest of this entry »

The MUSE script ’sorcout.mu’ can be used to summarize the top peptide scores from SORCERER-SEQUEST into a CSV format for importing into Excel.

This is useful to performing non-standard analyses (i.e. separate from PeptideProphet or Scaffold), or for further manipulation of the data using scripting languages like Perl or MUSE.

Simply type “sorcout.mu” in the MUSE box (under Advanced Options in the Search page).

It can also be run interactively after the search, by running it inside the output directory for the search job (e.g. “/home/sorcerer/output/45/”), just above the ‘original’ directory.

It will search all subdirectories for *.out files, and turn the top peptide from each *.out file into a single CSV line.

As well, the MUSE script can be copied and modified as needed to customize to a specific format.

Note: sorcout.mu is available in Sorcerer PE v3.5+ revisions.

Electron transfer dissociation (ETD) is a promising dissociation technology for analyzing labile post-translational modifications (PTMs) such as phosphorylation. Unlike CID, ETD generates positively charged c and z* (z-radical) ions instead of b and y ions. There are two caveats in using standard SEQUEST for ETD tandem mass spectra:

  1. Standard c/z option doesn’t compute z* ions correctly.
  2. Standard SEQUEST allows only low charge states, and would not work for highly charged, long peptides.

It is important to note that z* ions are not the same as z ions, and have an extra hydrogen (1.08 Da monoisotopic mass). This means that the standard SEQUEST option of searching c/z ions will not search ETD spectra correctly, since the computed z ions will have the wrong mass. On SORCERER, correct c/z* ions can be obtained using user-defined static peptide terminus modifications on standard b/y searches, as described below. As well, SORCERER-SEQUEST* allows very high precursor charge states (up to +255) in order to accommodate highly charged species. Here is how to search ETD spectra using SORCERER …

1. Define peptide terminus mods that shift b/y ions to c/z* ions, and use these for ETD searches.

Define the following static peptide terminus modifications using the web interface (click “Add/edit modifications…” on the Search page, then click “New/edit modifications” on top):

  • Name: “BtoC” with Mono Mass: “17.02655″ and Type=”N-Terminus”
  • Name: “YtoZrad” with Mono Mass “-16.01872407″ and Type=”C-Terminus”

In both cases, Residue is left blank.

2. Define a new search profile that incorporates the above peptide terminus mods.

In the Search page under “(2) Choose a Search profile”, select the most similar existing search profile, then click “Edit this profile…”. Be sure to name it something different and memorable, then select the above 2 mods under “Terminus modifications” and “Static”. Select other applicable options.

3. Include a MUSE script to generate a Excel-readable tab-delimited text (TDT) summary file of the SEQUEST top peptides.

In many cases, it can be useful to have a TDT file of the SEQUEST outputs for your Excel analysis, especially for ETD analysis of purified proteins or very simple mixtures. (See note below.) Simply include the MUSE script “sorcout.mu” (part of Sorcerer PE v3.5) as follows: Click Advanced Options “Expand”, and type “sorcout.mu” into the MUSE custom script box. (From now on, any submitted search will have a “sorcout.tdt” file automatically created in the appropriate ‘output’ directory.) Save the search profile. It is now ready for SEQUEST searches on ETD spectra.

4. Try the search using this test DTA file.

Download the following ETD test DTA file and search against SwissProt.

Right Click to Download Sample ETD DTA file

If using built-in TPP’s Spectrum Viewer, simply set the display options to “c” and “z” ions (here, “z” really means “z*”). The z* ions should match pretty well against peptide “KLYNKEPSEIVELK”.

 

Note that many common post-SEQUEST probability re-scoring algorithms, such as PeptideProphet or Scaffold, are not tuned for ETD scores. From first principles, we believe that the resulting probabilities may not be wrong per se, but rather be lacking in specificity. Therefore, particularly for ETD analysis of PTMs in purified proteins or other simple mixtures, we recommend downloading the SEQUEST scores to an Excel spreadsheet for manual interpretation rather than using CID-tuned tools. *The Yates Lab’s version of SEQUEST has 2 code modifications for ETD. The first is the increased charge state (same as in SORCERER-SEQUEST). The second is exclusion of the Proline cleavage, which is not implemented in standard SORCERER-SEQUEST. However, this can be done with a MUSE post-processing step in the future if it is found to have a large effect. As always, in-warranty clients can contact our TechTeam for help on this and other advanced capabilities.

Article on Sage-N Research and Thermo Fisher Scientific collaboration:

http://www.drugdiscoverynews.com/index.php?newsarticle=2475

N-linked protein glycosylation is a common post-translational modification (PTMs) in many cellular processes. Atwood et al (RCMS 2005) describe a tandem mass spec-based methodology to analyze N-linked glycopeptides.

Enriched glycopeptides are treated with peptide N-glycosidase F, which removes the carbohydrate moieties from the peptide backbone. Deglycosylated peptides are analyzed with a tandem mass spec. The resulting MS/MS spectra are searched against a modified protein sequence database that allows only PTMs on N’s within the consensus sequence N-x-y, where x is any residue other than proline, and y is either serine or threonine.

To analyze this PTM on the deglycosylated peptides on SORCERER, we need to search for a monoisotopic mass shift of 0.9840 Da on N’s only in the {N[^P][ST]} consensus sequence.

To search this PTM on the SORCERER, we do the following 2 steps:

1) Create a new protein sequence database that replaces ‘N’ with ‘J’ in the consensus sequence.

2) Prepare this new sequence database for searching by defining ‘J’ to have the same mass as ‘N’ using a static modification setting on ‘J’.

3) Submit a search on SORCERER with a variable modification search on ‘J’ with a mass shift of +0.9840 Da.

Create New Protein Database

Use the MUSE script ‘nlinkglyco-fasta.mu’ (part of SORCERER PE v3.5) to create a new protein sequence database that replaces each N in the consensus sequence with J.

Simply log onto SORCERER, go to directory ‘/home/sorcerer/fasta/’ where the protein sequences are, and create a new fasta file from an existing one (for example, create ‘ipi.human_n2j.fasta’ from ‘ipi.HUMAN.fasta’) . Then use prepare this new fasta file for searching as you would any other protein sequence file.

Once you log onto the SORCERER, and type the following 2 commands (do not type the ’sorc$’ which is the SORCERER prompt):

   sorc$ cd /home/sorcerer/fasta/

   sorc$ nlinkglyco-fasta.mu < ipi.HUMAN.fasta > ipi.human_n2j.fasta

The latter command literally means to run the MUSE script using “standard input” from file ipi.HUMAN.fasta (after the ‘<’ symbol) and sending the “standard output” to the new file ipi.human_n2j.fasta (after the ‘>’ symbol).

(The script may be easily copied and modified for another consensus sequence. Contact TechTeam for details.)

Prepare Database for Searching

When the new protein sequence database is prepared for searching, assign a static modification ‘MakeN’ of -9885.95707256 Da. This will cause the final ‘J’ mass to be the monoisotopic mass of 114.04292744 Da. (The normally unused codes ‘J’ and ‘U’ are set at 10,000 Da to flag any inadvertent usage.) The resulting peptide database will be used for subsequent searching.

SORCERER Search

The search can now be submitted by creating a user-defined variable modification ‘Nlinkglyco’ with mass of 0.9840 Da on the residue ‘N’ against the new peptide database.

 

We thank Dr. Rebekah Gundry from the Van Eyk Lab at Johns Hopkins for bringing this SORCERER application to our attention!

Reference: Atwood et al (Rapid Comm Mass Spec 2005; 19: 3002-3006 DOI: 10.1002/rcm.2162)

Dr. John Yates from the Scripps Research Institute gave the talk “Driving Biological Discovery using Quantitative Mass Spectrometry” at the 2008 Proteomics 2.0 Meeting hosted by Sage-N Research.

 

The audio MP3 file is available by download here (click to play, right click to download):

   SageN002_JYates_2008Jun_57m.mp3

The complete slideset is available by download in 5 parts here (click to view, right click to download):

  SageN002_JYates_2008Jun_part1.pdf

   SageN002_JYates_2008Jun_part2.pdf

   SageN002_JYates_2008Jun_part3.pdf

   SageN002_JYates_2008Jun_part4.pdf

   SageN002_JYates_2008Jun_part5.pdf

The meeting was held on June 1, 2008 in Denver, just before the ASMS conference.
 

We were privileged to have talks by Drs. John Yates (Scripps), Roman Zubarev (Uppsala),Alexander Ivanov (Harvard), Sean Beausoleil (Harvard Med), and Aaron Klammer (U. Washington) on large-scale quantitative analysis, ETD, and phosphorylation (and other PTMs). 

These talks offer a glimpse into the upcoming capabilities of the Sorcerer 2 platform. 


“Proteomics 2.0″ Users Meeting Group Picture


Dr. John Yates, at right, is presented by David Chiang with a Nintendo Wii, which was given to all 5 speakers this year.


Dr. Nick Morrice (U. Dundee) won one of the three Wii door prizes. Other lucky winners were Drs. Lynn Spruce (Childrens Hospital of Philadelphia) and Patrick Everley (Harvard Med, now in US Army). Eight Wii systems were given away in all.

Find out why all the participants were so excited about this special meeting by listening to the talks! Stay tuned to this space for how to download the talks, as well as for a chance to attend our meeting next year. (The 2009 “Proteomics 2.0″ Users Meeting will again be by invitation only.)

The following are a few of the comments from the meeting participants.


“Sage-N is in tune with the constantly evolving needs of proteomics labs throughout the world.”
Sean Beausoleil (Harvard Medical School)

“I find the Sorcerer convenient to use and when I dig carefully into selected data, I feel confident in the large-scale results from Sorcerer.”
Larry Brill (Burnham Institute)

“It was one of the best, if not the best event of this year’s ASMS! Fantastic speakers with talks about highly relevant subjects. I enjoyed it enormously. Thank you again.”
Markus Brosch (Wellcome Trust Sanger Institute, UK)

“Great opportunity to learn from field pioneers in an intimate setting.”
Josh Elias (Harvard Medical School)

“The scope of the meeting is excellent!”
Alexander Ivanov (Harvard School of Public Health)

“Great talks overall. Great prizes! Couldn’t imagine them any better.”
Aaron Klammer (University of Washington)

“Sage-N is enabling the forefront of proteomics research by providing data appliances that are robust, cutting edge, and easy to use.”
Mark Pitman (Geneva Bioinformatics SA)

“I thought you lined up a great list of speakers on useful current topics of interest. We found that Sage-N’s Sorcerer 2 product is a key component at our facility’s informatics system. The meeting was really worth my time! I was going to hop around, but I stayed on.”
Alexander B. Schilling (University of Illinois Chicago)

“A-list speakers - well worth the time.”
Lynn Spruce (Children’s Hospital of Philadelphia)

“It is a very nice small environment for people learn about the company and products.”
Ru Wei (Pfizer)

“Sorcerer provides rapid protein identification with minimal IT support requirement. It is a highly efficient tool for proteomics studies.”
Wenhong Zhu (Burnham Institute)

Sage-N Research will host a mini workshop focusing on analyses of phosphorylation, large dataset, and ECD/ETD as discussed by top experts. This meeting will be invaluable for gaining insider insights into advanced proteomics technology, as well as provide a glimpse into the future direction of the SORCERER platform.

Speakers include two Biemann Medal winners: Dr. John Yates and Dr. Roman Zubarev, as well as Ascore co-inventor Dr. Sean Beausoleil, and large dataset experts Dr. Michael MacCoss and Dr. Alexander Ivanov.

Please come to the meeting by 1:30pm to be eligible to win one of three hard-to-obtain Nintendo Wii game consoles! (This meeting is open to in-warranty Sorcerer clients and invited guests only.)

Announcing Sage-N Research Users Meeting 2008

“Proteomics 2.0: Putting the Pieces Together”
Sunday, June 1, 2008
Denver Marriott City Center
1701 California St., Downtown Denver (303-297-1300)
“Mattie Silks” Room, Lower Level 1
1:30pm to 5:00pm

Preliminary Agenda

1:30 Welcome Remarks
David Chiang (Sage-N Research)

1:40 Plenary Talk 
Dr. John R. Yates, III (Scripps Research Institute)

2:15 “Linking Proteomics to Systems Biology” 
Dr. Alexander Ivanov (Harvard School of Public Health)

2:45 Break

3:00 “ASCORE for Large-Scale Phosphorylation Site Localization” 
Dr. Sean Beausoleil (Gygi Lab, Harvard Med)

3:30 “Proteomic Profiling using Mass Spectrometry: A delicate balance between swimming and drowning in data”
Dr. Michael MacCoss and Dr. Aaron Klammer (University of Washington)

4:00 Break

4:15 “CAD/CID and ECD/ETD: Solo or duet?”
Dr. Roman Zubarev (Uppsala University)

4:45 Closing Remarks

5:00 Meeting End

This “Proteomics 2.0″ meeting is hosted for the benefit of our in-warranty SORCERER clients, whose generous support makes this meeting and continuing product improvements possible. Recent purchasers of Scaffold software from Sage-N Research are also eligible to attend.

Attendance is by invitation only and must be confirmed prior to the meeting. Space is very limited by room size to less than 30 clients and special guests. 

Please reserve your space soon, by emailing your intention to attend to: “usermeeting@SageNResearch.com“. Confirmations will be sent by return email.

Note: If you are currently not an in-warranty SORCERER client, you can also be eligible to attend the meeting if you purchase a Scaffold PC software license from Sage-N Research before the meeting.

High accuracy mass specs (e.g. Orbitrap) need to be calibrated every 3 days or so to maintain its mass accuracy, reported to be routinely around 2 ppm. But a common calibration solution can accumulate in the instrument and clog up the tubes, causing some labs to prolong the time between needed calibrations. What to do?

One Orbitrap user reports excellent results from the Acetonitrile solution marketed by Agilent for calibrating their instruments. The “ES Tuning Mix” (part number G2421A) fragments well, leaves not a trace, and is said to cost about $100 for a 100 ml bottle.

Email techteam@SageNResearch.com if you need more information. And thanks to B.G. for the tip!

The built-in mechanism for uploading and downloading files through the Sorcerer GUI is very convenient and right at one’s fingertips, but it is not recommended for very large files, as it is a relatively inefficient method. In the worst case, if you are sitting at your desktop and using the Web GUI to transfer a file from one directory to another on a server (either Sorcerer or some other machine), then that file has to travel over the network to your desktop computer and all the way back again. Not very efficient if you have a large file, a slow network and a creaky PC! So consider using Windows Explorer and Samba running on the server to make a move like this.

Using semi-enzyme or no-enzyme digests for database searches can be a powerful tool for some analyses where non-specific protease activity is being modelled, but those searches come at the expense of considerably larger databases and longer run times, not to mention more noise hits.

To keep the benefits while containing the downside, here are some pointers to optimize search conditions:

  • Work with a compact, non-redundant, preferably species-specific protein database (like the IPI series) that really represents the proteins you might expect to see. (This is a good rule in general.)
  • Make sure the precursor mass range is no wider than needed. Masses below a few hundred Da represent 2- or 3-mer peptides, and contribute little to the identifications, while increasing the noise and database size. Masses over a few kDa or 15-20 AA may be unexpected in an enzyme digest and will not be seen in your instrument anyway.
  • Use semi-enzyme in preference to no-enzyme if you can, for example if you are using trypsin, but you want to find the occasional non-enzymatic cleavage, or if you are generating background hits (to be filtered later) for statistical analysis. Using semi-tryptic conditions rather than full tryptic may increase the search space by an order of magnitude, but then going to full no-enzyme incurs a further two orders of magnitude increase, typically.

Steve Gygi (Harvard Medical School) writes…

We don’t find this to be a problem anymore. There used to be two problems:1) the precursor mass in the header was sometimes from the pre-scan (15K resolution) in the Orbi and not the one in the actual MS scan.

2) Sometimes (for large peptides) the second isotope (1st carbon-13 isotope peak) was
chosen for MS/MS (because its larger for peptides with masses above 1800).

Both of these problems are fixed by Xcalibur software when the radio checkbox is checked under the “exclude charge states” tab that says “Undetermined charge states.”

The mass spectrometer doesn’t collect MS/MS information if it doesn’t know the charge state. If it knows the charge state, then the right information gets put into the headers almost all the time. We always check that box now.

Finally, one can always check how well this works by just doing a search at 1.1 Da tolerance (instead of 50 ppm) and then examining the PPM values for the best-scoring peptides. Usually only one out of a hundred or so will be right (very high Xcorr) and have a PPM value that corresponds to exactly a 1.003355 Da shift.

Jimmy Eng (University of Washington) writes …

Regarding the question of ReAdW vs. extract_msn and the observation that ReAdW masses are almost always +10 to +15 ppm higher than extract_msn:

The current version of ReAdW is being distributed with the Trans-Proteomic Pipeline (TPP) and can also be individually downloaded directly from here:
http://sourceforge.net/project/showfiles.php?group_id=69281&package_id=68160

However, there is an imminent TPP release (3.5.1) which will include a new ReAdW which is the first update for this tool in well over a year. The new ReAdW incorporates a fix to profile scans that get centroided. Previously, centroided peaks were off by both m/z and intensity and we recently became aware of a proper Thermo function call to extract correct centroided data for Orbi/FT data using their Xcalibur Developer’s Kit (XDK) API. This new ReAdW also supports zlib peak compression which will generate in the range of 20-40% smaller files.

There is no change to the precursor mass determination though. For scans with a more precise precursor mass available, ReAdW reports what it gets out of the Thermo API (Monoisotopic M/Z Trailer Extra Value; I presume it’s the same accurate mass also visible in the scan header when viewing a spectrum in Qual Browser). There appears to be newer Thermo function calls to extract (more accurate?) precursor masses but the latest ReAdW continues to use the “Trailer Extra Value” masses for now. On one dataset I tested, the newer precursor mass function generated more accurate precursor masses within the high quality identifications. It had precursor mass accuracies in the range of 0-6PPM with distribution centered around 2PPM versus an error range of 0-12PPM with distribution centered around 4PPM for the “Trailer Extra Value” masses. But testing by others showed that there were enough scans where the new mass function failed so the precursor mass determination is left as-is for the time being.

Read the rest of this entry »

According to Mike Senko of Thermo, there are two complexities with determining the precursor mass using the Extract-MSn program within Bioworks.The first is the occasional +1 Da that is included in the reported precursor monoisotopic mass. The second is the random mass error that is estimated to be between 5 to 10 PPM.

In a Orbitrap or FT acquisition, the instrument tries to determine the monoisotopic m/z while it picks precursors based on abundance. If the m/z can be confidently inferred, it will be listed in the scan header (also called the trailer) as “Monoisotopic M/Z”. If it cannot, a ‘0′ is listed.

For the first complexity, the occasional extra +1 Da in the precursor mass arises when Extract-MSn is used to generate the peaklist and a ‘0′ is encountered. In those cases, the mass of the most abundant isotopic peak is chosen, which is the M+1 peak for precursor masses higher than about 1700 Da. Therefore, the extra +1 Da arises not from the instrument itself, but from the way Extract-MSn extracts the information. The potential for this error still exists today.

For the second complexity involving the random mass error, the issue is that the analytical scan is not done when the data dependent scan decisions are made, which is during the first 25% of the time domain signal (called the Preview Scan). For this reason, the listed masses for the precursor and isotopic M/Z will not match those values in the analytical scan. In that case, Extract-MSn will go back to the analytical scan to extract the more accurate values. This capability is reportedly in Extract-MSn version 3.2 or later.

[Editor’s Note: Mike is a research scientist at Thermo responsible for optimizing the interface between the LTQ and Orbitrap/FT sections of hybrid instruments. We thank Mike for providing the above information, which we summarized for this newsletter.]

Analyzing millions of spectra, quantitation by spectral counting and improved filtering — these are just a few of the new features in the just-released version 2 of the Scaffold software. Currently supported Sage-N customers who are users of the stand-alone Scaffold program can update to the new software immediately, and users of the Scaffold Batch program that is bundled with Sorcerer can look forward to these benefits in the integrated Scaffold analysis shortly.

(Scaffold Batch, which is the Scaffold program that is integrated with Sorcerer, is an enhanced, scriptable version of the more familiar Scaffold desktop, GUI-based program. It provides many of the same capabilities, but it is designed for automated use with no need for manual intervention when loading and analyzing the data. It is supplied as a premium product configuration that also comes with the standard desktop Scaffold program.)

With the current version 1.7 software, a Scaffold analysis is practically limited to about 500,000 spectra. That sounds like a lot, and it’s certainly enough even for moderately large MudPIT analysis, but Sorcerer itself is capable of searching millions of spectra, as some larger experiments require. So we at Sage-N Research are excited about the new version of Scaffold, V2, which can scale to these sorts of experiments. The method involves ‘thinning’ the spectral data so that poor quality data, which does not contribute to the statistics of the results analysis, is excluded from Scaffold. In future Sorcerer software releases, we expect to provide this feature for automated Scaffold Batch analysis, but for now, it can be applied manually when importing data into Scaffold interactively.

The Scaffold file format for version 2 has changed, but the program recognizes the old format and can convert it. So a user can generate a file using the existing version on Sorcerer, and then can open it interactively in the new version to use the new features. One potentially useful feature is the ability to merge Scaffold files. For Sorcerer users who have asked about comparing the results of multiple Sequest searches in one Scaffold analysis, this offers a method for doing it after searches have run. Other new features that may be especially useful to Sorcerer users include peptide filters by precursor mass variance and by post-translational modification, and Receiver-Operator Curves for peptide false positives.

To use Scaffold V2, Customers will need to download new software from http://www.proteomesoftware.com/Proteome_software_prod_Scaffold_download-main.html and they will need a new license key, which is available from Sage-N Research for those customers under support. Purchasers of standard Scaffold for individual computers may upgrade immediately. Upgrade license keys for Scaffold Batch are also available to current customers upon request, but as Sage-N is currently testing the interoperation of that software with Sorcerer and has already uncovered certain issues, we recommend that most users wait until the validation is complete.

A single large file can be transferred far faster – sometimes hours faster – than hundreds of small files.

For example, you can use a single click in the web interface to both zip up a directory of DTA files and to submit it to Sorcerer for searching. Sorcerer’s Web GUI can automatically unpack the zip archive ready for searching. Simply bring up the web user interface. Where you submit the raw or spectral files for search, simply find the parent directory, then right-click to bring up a menu, and select the compression function.

Go wider than experimental conditions: e.g.  5x the precursor mass tolerance of the mass spec, semi-trypsin, and reversed decoy databases.

For example, if your instrument has an expected precursor mass tolerance of 10 ppm or less, search using 50 ppm. If you expect 95%+ of the peptides to be full tryptic, search using semi-trypsin. And if your workflow handles reversed decoy databases, include reversed peptides in your search.

The 2 reasons for this are: (1) it provides auxiliary information that the post-search validation software can use for filtering, and (2) many validation software tools rely on a “population distribution” for curve-fitting, so it is important to provide enough noise for the algorithm to work properly.

For example, you can gain increased confidence on a particular peptide hit if you searched under widened conditions, but the top scoring hit has a mass of 1 ppm, is full-tryptic, and non-reversed.

Keep 2000 instead of 500 top preliminary scores (Sp), especially for phosphorylated peptides.

SEQUEST uses a 2-pass approach, whereby the 1st pass keeps only the top 500 preliminary scores (Sp) by default. These are in turn analyzed using the cross-correlation score (XCorr). However, we have found that 500 is too low for either poorer quality spectra, large protein sequence databases, or multiple variable modifications, because the distribution of random Sp values becomes so large that the true hit can be ranked beyond 500 or 1000.

This is especially true for phosphorylated peptides, whose MSMS spectra are typically dominated by one or two precursor derivatives. The resulting Sp distribution becomes more concentrated, causing true hits to be crowded out.

Increasing the parameter to 2000 (or even higher) will increase search engine sensitivity for general searches, and especially phosphopeptide searches.

For sensitive SEQUEST searching, use ExtractMSN or ReAdW in preference to Mascot Distiller, which generates only 1/2 to 1/5 as many peaks

For peak-rich MS-MS spectra typical from ion traps, SEQUEST uses 2 to 5 times more peaks in its score than Mascot. Extract-MSn (in Bioworks) and ReAdW (in Trans-Proteomic Pipeline) faithfully reproduce the MS-MS peaks from the raw file in the peaklist (the SRF, DTA or mzXML file). In contrast, Mascot Distiller, which is optimized for Mascot, tries to distill the measured MS-MS spectra down to only the relevant peaks to the extent possible. For poor quality spectra, this can result in lost information and sensitivity.

Set ‘print_duplicate_references’ to 10 or higher in SEQUEST for increased protein coverage in most workflows.

Do this when using Scaffold, DTASelect, or Bioworks for your post-SEQUEST analysis. Incorrectly leaving this parameter at its default value is one of the most common reasons why researchers mistakenly obtain lower than expected protein coverage from SEQUEST searches.

The parameter causes SEQUEST to print all the different protein references (up to the specified limit) containing the top peptide hit in its output file, which is then used by the protein inference program to re-assemble the protein from the peptide assignments.

Both Trans-Proteomic Pipeline (PeptideProphet and ProteinProphet) and Rosetta Elucidator re-compute the protein inferences directly from the peptides and do not depend on the reporting of multiple protein references.

Orbitrap or LTQ-FT MS-MS spectra may have precursor mass error of ~1 amu, which must be taken into account by the search engine.

High mass accuracy instruments can resolve a precursor peak mass to < 10 ppm, but they may occasionally mis-assign a carbon-13 isotope peak as a carbon-12, and possibly vice versa. This is said to affect up to 10% of the spectra, resulting in the mass error of a neutron. In SORCERER-SEQUEST, be sure to set the “isotope check” option which allows a small precursor mass tolerance (e.g. 50 ppm) to be applied to M, M-1, and M+1 masses, where M is the reported precursor mass. In Bioworks SEQUEST (including SEQUEST Cluster) which lacks this option, it may be necessary to search without the benefit of the mass accuracy (i.e. +/- 1.5 amu) to cover the isotope mass error.

For highly charged species, the max mass error can be 2 amu’s. Please contact our Tech Team for a Sorcerer scripting solution to help address this.