The SORCERER Methodology
Deep proteomics is a hi-res “biochemical x-ray” for low-abundance proteins in cells. It will revolutionize early detection of cancer and infections, and medical research in general. SorcererScore™ brings proteomics to its tipping point by making deep and reproducible proteomic research possible for the first time. Researchers skilled at deep data analysis will most benefit. No matter how accurate the data, deep insights come from interpreting ambiguous data beyond the reach of fully-automated workflows.
For Sage-N Research, SorcererScore represents the culmination of our founding objective, which was to make a lasting contribution to medicine through math and computing. Academia excels at ideas but robust integration is the domain of professional companies.
3 R’s of Deep Proteomics
- (adjective) strong and healthy; vigorous
- (of an object) sturdy in construction.
- (of a process, system, organization, etc.) able to withstand or overcome adverse conditions.
The principles of robust peptide ID resides inside our patent-pending SorcererScore™ technology. This forms the foundation for robust clinical proteomics R&D, including protein quantitation and PTM characterization. Frankly, our analysis shows this is more or less the only way to do precision proteomics. The inherent non-robustness of traditional workflows — that each gives slightly different ambiguous results — stalls proteomics.
- (adjective) extremely thorough, exhaustive, or accurate.
- (of a rule, system, etc.) strictly applied or adhered to.
- (of a person) adhering strictly or inflexibly to a belief, opinion, or way of doing something.
A rigorous hard science workflow is like a mathematical proof in having few degrees of freedom toward basically the one true answer. Proteomics using high-accuracy mass spectrometry is fundamentally a “hard science” requiring large-scale simple math on powerful servers. Our solution is remarkably simple — almost obvious after the fact — to anyone trained in the mathematical sciences.
- (adjective) able to bring back into existence again; re-create.
- (of a measurement, experiment, etc.) capable of being reproduced at a different time or place and by different people.
For two decades proteomics was a soft science. Low data accuracy was the initial cause. When accurate data became available, it was due to a legacy soft-science workflow. A soft-science workflow can never identify LAMPs and hence is useless for high-value research.
We can mathematically prove that a hard-science workflow must:
(1) use a cross-correlation search engine
(2) have a primary post-search filter with only hard mass data, particularly of fragment ions.
Empirically, fragment mass analysis must include the matched peak-count.
In other words, any hard-science workflow must fundamentally use our patent-pending SorcererScore technology. And for practical computing — i.e. not require 50x more CPUs — the cross-correlation search part requires our patented partial-index methodology.
Putting it altogether, the following is a conceptually simple, bare minimum robust workflow on which SorcererScore is based:
1) 5.5 to >1000 amu mass tolerance
2) Classical cross-correlation search engine, keeping top 100+ results per spectrum
3) Target-decoy search
4) Filter with abs(delta-mass)
The search mass tolerance may be down to 5.5 amu for extremely clean data, but challenging data may require a very large tolerance to better separate correct IDs from background noise. As an added bonus, this workflow naturally allows for multiple peptide IDs for one spectrum, such as for data-independent analysis (DIA).
The SILAC workflow with common SILAC search conditions:
1) +/- 5.5 amu mass tolerance
2) Variable modifications of K +8.0142; R +10.0083; M + 15.995
3) Full-tryptic peptides of single-species protein sequences
The SorcererScore SILAC script is run after the search for both peptide ID and SILAC analysis.
Experienced scientists should immediately see that it is simple, it works, and is hypothesis-driven.