Changing nature of software in Proteomics (or why you can’t buy great software for SILAC)

by David.Chiang@SageNResearch.com

Proteomics technology is now a robust discovery tool, at least in capable hands with the right tools, for characterizing post-translational modifications such as phosphorylation, right alongside gene expression and cellular imaging for tumor and stem cell research.

However, the complexity, scale, and criticality of the data from a modern mass spectrometer such as an Orbitrap Velos are well beyond the capability of desktop PCs and require specialized infrastructure IT solutions.

When losing data becomes catastrophic rather than merely annoying, it is time to move beyond PCs into robust infrastructure solutions, such Sage-N Research’s SORCERER Enterprise system. Unlike traditional business-oriented IT systems, the SORCERER Enterprise system is optimized for the large multi-gigabyte data files of proteomics research.

Robust servers and storage systems provide the needed capacity, reliability, and throughput for storing and analyzing proteomics mass spec data that inexpensive PCs cannot provide. For example, a typical throughput of 300GB of raw data per week for a single mass spec will fill up a PC in less than a month. As well, the lower grade disk drives used in cost-sensitive, consumer-oriented PCs and external USB drives can lead to costly data loss and system downtime.

In addition, the nature of the data analysis needed for proteomics is changing, as it becomes more akin to hedge fund data mining than an administrative assistant running an Excel spreadsheet. This is especially true for quantitation and ETD data analyses where the field has not settled onto a de facto one-size-fits-all methodology, and where some semi-customization of the analysis to query and adapt to a particular data-set will be necessary. This is why the large-scale SILAC papers are always done by research groups with their own bioinformatics resource, and why just about any off-the-shelf software you can download or buy will probably not work well for your needs without some customization.

Why does quantitation or ETD software need to be semi-customized? Consider the analogy of getting “accounting software” for your flower shop. You download prototype software for “accounting” based on someone’s “accounting software” poster, only to find it was for a dog-walking service with different input parameters that don’t apply to you. Such is the case for SILAC, where everyone’s requirement is a little different. But the good news is that, just as in accounting, there is significant commonality, so the best scenario is to invest in a flexible “platform” (i.e. SORCERER) that provides 80% of what’s common and customize the rest with scripting. (The SORCERER Enterprise product includes custom scripting support for a number of hours per month for this purpose.)

Hedge fund IT serves as an excellent model for how an efficient proteomics data analysis works. On one side is the high-value trader — the George Soros types of people dealing with high-value information — who uses his deep expertise in currency exchanges with the data analysis to discover new potential trades. On the other side are large computer servers and storage systems that run semi-custom software to slice and dice the data in different ways to find the billion-dollar hidden trend. In the middle are informatics specialists (who by the way are not “programmers” in the same way that “linguists” are not “writers”) who bridge the two sides by scripting new visualization or data presentation routines. Here, you can bet that the trader focuses on the trading — his unique value-added — and not waste time installing software and maintaining computers.

Similarly, discovery proteomics should have the biologist focusing on the science, and let the specialized proteomics IT experts handle the backroom servers and integrated storage systems. This is in stark contrast to most proteomics labs today, where the same few people are trying to run big experiments using little PCs, and wasting time managing computers rather than doing science.

Fifty years ago, in the 1960’s when business computing was new, large retailers like Sears bought computer hardware, software, services, and support all together from IBM so that they can focus on their own business and let IBM do all the backroom accounting. Today, that combined “computing appliance plus support” model for the SORCERER Enterprise system is also an excellent fit for translational proteomics, which allows scientists operating million-dollar mass specs to focus on the science, and let Sage-N Research handle the backroom proteomics IT analysis and storage, and workflow customization.

(Note: Sage-N Research is expanding and will be interested in competent script developers familiar with Linux, Perl and especially LUA as contractors to help with our expanding SORCERER Enterprise clients. Interested parties can send their resumes and CV’s to jobs@sagenresearch.com.)

Tags: , , ,

You must be logged in to post a comment.