Biomarkers in Computational Biology

Proteomics and Systems Biology for Understanding Diabetic Nephropathy.

Biomarkers and Systems Biology - 7 hours 19 min ago

Proteomics and Systems Biology for Understanding Diabetic Nephropathy.

J Cardiovasc Transl Res. 2012 May 12;

Authors: Starkey JM, Tilton RG

Abstract
Like many diseases, diabetic nephropathy is defined in a histopathological context and studied using reductionist approaches that attempt to ameliorate structural changes. Novel technologies in mass spectrometry-based proteomics have the ability to provide a deeper understanding of the disease beyond classical histopathology, redefine the characteristics of the disease state, and identify novel approaches to reduce renal failure. The goal is to translate these new definitions into improved patient outcomes through diagnostic, prognostic, and therapeutic tools. Here, we review progress made in studying the proteomics of diabetic nephropathy and provide an introduction to the informatics tools used in the analysis of systems biology data, while pointing out statistical issues for consideration. Novel bioinformatics methods may increase biomarker identification, and other tools, including selective reaction monitoring, may hasten clinical validation.

PMID: 22581264 [PubMed - as supplied by publisher]

Statistical interpretation of machine learning-based feature importance scores for biomarker discovery.

Biomarker Discovery - 7 hours 19 min ago

Statistical interpretation of machine learning-based feature importance scores for biomarker discovery.

Bioinformatics. 2012 Apr 25;

Authors: Huynh-Thu VA, Saeys Y, Wehenkel L, Geurts P

Abstract
MOTIVATION: Univariate statistical tests are widely used for biomarker discovery in bioinformatics. These procedures are simple, fast, and their output is easily interpretable by biologists but they can only identify variables that provide a significant amount of information in isolation from the other variables. Since biological processes are expected to involve complex interactions between variables, univariate methods thus potentially miss some informative biomarkers. Variable relevance scores provided by machine learning techniques however are potentially able to highlight multivariate interacting effects, but unlike the p-values returned by univariate tests, these relevance scores are usually not statistically interpretable. This lack of interpretability hampers the determination of a relevance threshold for extracting a feature subset from the rankings and also prevents the wide adoption of these methods by practicians. RESULTS: We evaluated several, existing and novel, procedures that extract relevant features from rankings derived from machine learning approaches. These procedures replace the relevance scores with measures that can be interpreted in a statistical way, such as p-values, FDRs, or FWERs, for which it is easier to determine a significance level. Experiments were carried out on several artificial problems as well as on real microarray datasets. Although the methods differ in terms of computing times and the tradeoff they achieve in terms of false positives and false negatives, some of them greatly help in the extraction of truly relevant biomarkers and should thus be of great practical interest for biologists and physicians. As a side conclusion, our experiments also clearly highlight that using model performance as a criterion for feature selection is often counter-productive.Availability and Implementation: Python source codes of all tested methods, as well as the MATLAB scripts used for data simulation, can be found in the supplementary material. CONTACT: vahuynh@ulg.ac.be, p.geurts@ulg.ac.be.

PMID: 22539669 [PubMed - as supplied by publisher]

Advancing the sensitivity of selected reaction monitoring-based targeted quantitative proteomics.

Biomarkers and Systems Biology - Tue, 05/15/2012

Advancing the sensitivity of selected reaction monitoring-based targeted quantitative proteomics.

Proteomics. 2012 Apr;12(8):1074-92

Authors: Shi T, Su D, Liu T, Tang K, Camp DG, Qian WJ, Smith RD

Abstract
Selected reaction monitoring (SRM) - also known as multiple reaction monitoring (MRM) - has emerged as a promising high-throughput targeted protein quantification technology for candidate biomarker verification and systems biology applications. A major bottleneck for current SRM technology, however, is insufficient sensitivity for, e.g. detecting low-abundance biomarkers likely present at the low ng/mL to pg/mL range in human blood plasma or serum, or extremely low-abundance signaling proteins in cells or tissues. Herein, we review recent advances in methods and technologies, including front-end immunoaffinity depletion, fractionation, selective enrichment of target proteins/peptides including posttranslational modifications, as well as advances in MS instrumentation which have significantly enhanced the overall sensitivity of SRM assays and enabled the detection of low-abundance proteins at low- to sub-ng/mL level in human blood plasma or serum. General perspectives on the potential of achieving sufficient sensitivity for detection of pg/mL level proteins in plasma are also discussed.

PMID: 22577010 [PubMed - in process]

Development of systems biology-oriented biomarkers by permuted stepwise regression for the monitoring of seasonal allergic rhinitis treatment effects.

Biomarkers and Systems Biology - Sat, 05/12/2012

Development of systems biology-oriented biomarkers by permuted stepwise regression for the monitoring of seasonal allergic rhinitis treatment effects.

J Immunol Methods. 2012 Feb 12;

Authors: Baars EW, Nierop AF, Savelkoul HF

Abstract
BACKGROUND: The immune system, a complex set of integrated responses, often cannot be explained, predicted, or monitored by examining its separate components as biomarkers. Combining different components may therefore be a suitable approach to develop relevant biomarkers reflecting immune system functioning in an appropriate way. METHODS: Here we compute and test pattern variables that should reflect immune system functioning on the systems level. Computation was based on a dataset (from a randomized controlled trial comparing two routes of administration) of allergen-specifically induced expression levels of cytokines (IL-1β, IL-5, IL-10, IL-12, IL-13, IL-17, IFN-γ and TNF-α) and symptom severity scores from 22 seasonal allergic rhinitis (SAR) patients measured before and after six weeks of treatment with medicinal products containing Citrus and Cydonia. By means of stepwise regression analyses we explored and tested pattern variables of the immunological data using permuted stepwise regression (PStR) to distinguish optimally between (immunological) baseline and post-baseline data for the whole treatment group (22 patients) and the two separate treatment groups (11 patients in each group). The validity of the stepwise selection method for the computed pattern variables was tested by means of random permutation tests and evaluated with the cross-validated correct rate of classification (CV correct). RESULTS: For the total group a pattern variable was computed with three variables: IL-10 (day 7), TNF-α (day 1) and IL-10 (day 1) (CV correct: 0.91; p<0.001; R(2)=0.66), demonstrating a small improvement from the model with IL-10 (day 7) only (CV correct: 0.84; p<0.001; R(2)=0.47). For the subcutaneous injection group a pattern variable was computed with four variables: IL-10 (day 7), IL-10 (day 1), IL-17 (day 7) and IFN-γ (day 7) (CV correct: 0.90; p<0.01; R(2)=0.78), demonstrating a very small improvement from the model with IL-10 (day 7) only (CV correct: 0.86; p<0.01; R(2)=0.58). For the nasal spray group a pattern variable was computed with three variables: IL-10 (day 7), TNF-α (day 1) and IL-10 (day 1) (CV correct: 0.95; p<0.01; R(2)=0.79), demonstrating a moderate improvement from the model with IL-10 (day 7) only (CV correct: 0.79; p<0.05; R(2)=0.37). CONCLUSION/DISCUSSION: In this study three robust systems biology-oriented biomarkers for the monitoring of SAR were computed that demonstrated small to moderate improvement compared to monitoring of a single cytokine (IL-10 (day 7)) (CV correct improvement: 0.07 (total group), 0.04 (subcutaneous injection group), 0.16 (nasal spray group)). Further computation and biomarker validation with larger datasets, including data from healthy persons and SAR patients, are indicated.

PMID: 22349124 [PubMed - as supplied by publisher]

Bioinformatic identification of proteins with tissue-specific expression for biomarker discovery.

Biomarker Discovery - Sat, 04/28/2012

Bioinformatic identification of proteins with tissue-specific expression for biomarker discovery.

BMC Med. 2012 Apr 19;10(1):39

Authors: Prassas I, Chrystoja CC, Makawita S, Diamandis EP

Abstract
ABSTRACT: BACKGROUND: There is an important need for the identification of novel serological biomarkers for the early detection of cancer. Current biomarkers suffer from a lack of tissue-specificity, rendering them vulnerable to non-disease-specific increases. The present study details a strategy to rapidly identify tissue-specific proteins using bioinformatics. METHODS: Previous studies focus on either gene or protein expression databases for the identification of candidates. We developed a strategy that mines six publicly available gene and protein databases for tissue-specific proteins, selects proteins likely to enter the circulation, and integrates proteomic datasets enriched for the cancer secretome, to prioritize candidates for further verification and validation studies. RESULTS: Using colon, lung, pancreas, and prostate cancer as case examples, we identified 48 candidate tissue-specific biomarkers, of which 14 have been previously studied as biomarkers of cancer or benign disease. Twenty-six candidate biomarkers for these four cancer types are proposed. CONCLUSIONS: We present a novel strategy using bioinformatics to identify tissue-specific proteins that are potential cancer serum biomarkers. Investigation of the 26 candidates in disease states of the organs is warranted.

PMID: 22515324 [PubMed - as supplied by publisher]

Proteomic and bioinformatics analyses of mouse liver microsomes.

Biomarker Discovery - Sun, 04/22/2012

Proteomic and bioinformatics analyses of mouse liver microsomes.

Int J Proteomics. 2012;2012:832569

Authors: Peng F, Zhan X, Li MY, Fang F, Li G, Li C, Zhang PF, Chen Z

Abstract
Microsomes are derived mostly from endoplasmic reticulum and are an ideal target to investigate compound metabolism, membrane-bound enzyme functions, lipid-protein interactions, and drug-drug interactions. To better understand the molecular mechanisms of the liver and its diseases, mouse liver microsomes were isolated and enriched with differential centrifugation and sucrose gradient centrifugation, and microsome membrane proteins were further extracted from isolated microsomal fractions by the carbonate method. The enriched microsome proteins were arrayed with two-dimensional gel electrophoresis (2DE) and carbonate-extracted microsome membrane proteins with one-dimensional gel electrophoresis (1DE). A total of 183 2DE-arrayed proteins and 99 1DE-separated proteins were identified with tandem mass spectrometry. A total of 259 nonredundant microsomal proteins were obtained and represent the proteomic profile of mouse liver microsomes, including 62 definite microsome membrane proteins. The comprehensive bioinformatics analyses revealed the functional categories of those microsome proteins and provided clues into biological functions of the liver. The systematic analyses of the proteomic profile of mouse liver microsomes not only reveal essential, valuable information about the biological function of the liver, but they also provide important reference data to analyze liver disease-related microsome proteins for biomarker discovery and mechanism clarification of liver disease.

PMID: 22500222 [PubMed - in process]

A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis.

Biomarker Discovery - Sat, 04/14/2012

A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis.

IEEE/ACM Trans Comput Biol Bioinform. 2012 Feb 13;

Authors: Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, de Schaetzen V, Duque R, Bersini H, Nowe A

Abstract
A plenitude of feature selection (FS) methods is available in the literature, most of them rising as a need to analyse data of very high dimension, usually hundreds or thousands of variables. Such datasets are now available in various application areas like combinatorial chemistry, text mining, multivariate imaging or bioinformatics. As a general accepted rule, these methods are grouped in filters, wrappers and embedded methods. More recently, a new group of methods has been added in the general framework of FS: ensemble techniques. The focus in this survey is on filter feature selection methods for informative feature discovery in gene expression microarray analysis, which is also known as differentially expressed genes (DEGs) discovery, gene prioritization or biomarker discovery. We present them in a unified framework, using standardized notations in order to reveal their technical details and to highlight their common characteristics as well as their particularities. This survey ends with a taxonomy proposal for filter FS methods applied in gene expression microarray data analysis.

PMID: 22350210 [PubMed - as supplied by publisher]

Data integration and systems biology approaches for biomarker discovery: Challenges and opportunities for multiple sclerosis.

Biomarkers and Systems Biology - Wed, 02/22/2012

Data integration and systems biology approaches for biomarker discovery: Challenges and opportunities for multiple sclerosis.

J Neuroimmunol. 2012 Jan 24;

Authors: Villoslada P, Baranzini S

Abstract
New "omic" technologies and their application to systems biology approaches offer new opportunities for biomarker discovery in complex disorders, including multiple sclerosis (MS). Recent studies using massive genotyping, DNA arrays, antibody arrays, proteomics, glycomics, and metabolomics from different tissues (blood, cerebrospinal fluid, brain) have identified many molecules associated with MS, defining both susceptibility and functional targets (e.g., biomarkers). Such discoveries involve many different levels in the complex organizational hierarchy of humans (DNA, RNA, protein, etc.), and integrating these datasets into a coherent model with regard to MS pathogenesis would be a significant step forward. Given the dynamic and heterogeneous nature of MS, validating biomarkers is mandatory. To develop accurate markers of disease prognosis or therapeutic response that are clinically useful, combining molecular, clinical, and imaging data is necessary. Such an integrative approach would pave the way towards better patient care and more effective clinical trials that test new therapies, thus bringing the paradigm of personalized medicine in MS one step closer.

PMID: 22281286 [PubMed - as supplied by publisher]

msCompare: a framework for quantitative analysis of label-free LC-MS data for comparative biomarker studies.

Biomarker Discovery - Wed, 02/22/2012

msCompare: a framework for quantitative analysis of label-free LC-MS data for comparative biomarker studies.

Mol Cell Proteomics. 2012 Feb 7;

Authors: Hoekman B, Breitling R, Suits F, Bischoff R, Horvatovich P

Abstract
Data processing forms an integral part of biomarker discovery and contributes significantly to the ultimate result. To compare and evaluate various publicly available open source label-free data processing workflows we developed msCompare, a modular framework that allows the arbitrary combination of different feature detection/quantification and alignment/matching algorithms in conjunction with a novel scoring method to evaluate their overall performance. We used msCompare to assess the performance of workflows built from modules of publicly available data processing packages such as SuperHirn, OpenMS and MZmine, and our in-house developed modules, on peptide-spiked urine and trypsin-digested cerebrospinal fluid (CSF) samples. We found that the quality of results varied greatly among workflows and, interestingly, heterogeneous combinations of algorithms often performed better than the homogenous workflows. Our scoring method showed that the union of feature matrices of different workflows outperformed the original homogenous workflows in some cases. msCompare is open-source software (https://trac.nbic.nl/mscompare), and we provide a web-based data processing service for our framework by integration into the Galaxy server of the Netherlands Bioinformatics Center (http://galaxy.nbic.nl/galaxy), in order to allow scientists to determine which combination of modules provides the most accurate processing for their particular LC-MS datasets.

PMID: 22318370 [PubMed - as supplied by publisher]

Mass spectrometry for protein quantification in biomarker discovery.

Biomarker Discovery - Fri, 02/10/2012

Mass spectrometry for protein quantification in biomarker discovery.

Methods Mol Biol. 2012;815:199-225

Authors: Wang M, You J

Abstract
Major technological advances have made proteomics an extremely active field for biomarker discovery in recent years due primarily to the development of newer mass spectrometric technologies and the explosion in genomic and protein bioinformatics. This leads to an increased emphasis on larger scale, faster, and more efficient methods for detecting protein biomarkers in human tissues, cells, and biofluids. Most current proteomic methodologies for biomarker discovery, however, are not highly automated and are generally labor-intensive and expensive. More automation and improved software programs capable of handling a large amount of data are essential to reduce the cost of discovery and to increase throughput. In this chapter, we discuss and describe mass spectrometry-based proteomic methods for quantitative protein analysis.

PMID: 22130994 [PubMed - in process]