Biomarker Discovery
NCBI: db=pubmed; Term="biomarker discovery"[Title/Abstract] AND (("computational approaches"[Title/Abstract] OR "bioinformatics"[Title/Abstract] OR "computational biology"[Title/Abstract]))
Updated: 7 hours 14 min ago
7 hours 14 min ago
Statistical interpretation of machine learning-based feature importance scores for biomarker discovery.
Bioinformatics. 2012 Apr 25;
Authors: Huynh-Thu VA, Saeys Y, Wehenkel L, Geurts P
Abstract
MOTIVATION: Univariate statistical tests are widely used for biomarker discovery in bioinformatics. These procedures are simple, fast, and their output is easily interpretable by biologists but they can only identify variables that provide a significant amount of information in isolation from the other variables. Since biological processes are expected to involve complex interactions between variables, univariate methods thus potentially miss some informative biomarkers. Variable relevance scores provided by machine learning techniques however are potentially able to highlight multivariate interacting effects, but unlike the p-values returned by univariate tests, these relevance scores are usually not statistically interpretable. This lack of interpretability hampers the determination of a relevance threshold for extracting a feature subset from the rankings and also prevents the wide adoption of these methods by practicians. RESULTS: We evaluated several, existing and novel, procedures that extract relevant features from rankings derived from machine learning approaches. These procedures replace the relevance scores with measures that can be interpreted in a statistical way, such as p-values, FDRs, or FWERs, for which it is easier to determine a significance level. Experiments were carried out on several artificial problems as well as on real microarray datasets. Although the methods differ in terms of computing times and the tradeoff they achieve in terms of false positives and false negatives, some of them greatly help in the extraction of truly relevant biomarkers and should thus be of great practical interest for biologists and physicians. As a side conclusion, our experiments also clearly highlight that using model performance as a criterion for feature selection is often counter-productive.Availability and Implementation: Python source codes of all tested methods, as well as the MATLAB scripts used for data simulation, can be found in the supplementary material. CONTACT: vahuynh@ulg.ac.be, p.geurts@ulg.ac.be.
PMID: 22539669 [PubMed - as supplied by publisher]
Sat, 04/28/2012
Bioinformatic identification of proteins with tissue-specific expression for biomarker discovery.
BMC Med. 2012 Apr 19;10(1):39
Authors: Prassas I, Chrystoja CC, Makawita S, Diamandis EP
Abstract
ABSTRACT: BACKGROUND: There is an important need for the identification of novel serological biomarkers for the early detection of cancer. Current biomarkers suffer from a lack of tissue-specificity, rendering them vulnerable to non-disease-specific increases. The present study details a strategy to rapidly identify tissue-specific proteins using bioinformatics. METHODS: Previous studies focus on either gene or protein expression databases for the identification of candidates. We developed a strategy that mines six publicly available gene and protein databases for tissue-specific proteins, selects proteins likely to enter the circulation, and integrates proteomic datasets enriched for the cancer secretome, to prioritize candidates for further verification and validation studies. RESULTS: Using colon, lung, pancreas, and prostate cancer as case examples, we identified 48 candidate tissue-specific biomarkers, of which 14 have been previously studied as biomarkers of cancer or benign disease. Twenty-six candidate biomarkers for these four cancer types are proposed. CONCLUSIONS: We present a novel strategy using bioinformatics to identify tissue-specific proteins that are potential cancer serum biomarkers. Investigation of the 26 candidates in disease states of the organs is warranted.
PMID: 22515324 [PubMed - as supplied by publisher]
Sun, 04/22/2012
Proteomic and bioinformatics analyses of mouse liver microsomes.
Int J Proteomics. 2012;2012:832569
Authors: Peng F, Zhan X, Li MY, Fang F, Li G, Li C, Zhang PF, Chen Z
Abstract
Microsomes are derived mostly from endoplasmic reticulum and are an ideal target to investigate compound metabolism, membrane-bound enzyme functions, lipid-protein interactions, and drug-drug interactions. To better understand the molecular mechanisms of the liver and its diseases, mouse liver microsomes were isolated and enriched with differential centrifugation and sucrose gradient centrifugation, and microsome membrane proteins were further extracted from isolated microsomal fractions by the carbonate method. The enriched microsome proteins were arrayed with two-dimensional gel electrophoresis (2DE) and carbonate-extracted microsome membrane proteins with one-dimensional gel electrophoresis (1DE). A total of 183 2DE-arrayed proteins and 99 1DE-separated proteins were identified with tandem mass spectrometry. A total of 259 nonredundant microsomal proteins were obtained and represent the proteomic profile of mouse liver microsomes, including 62 definite microsome membrane proteins. The comprehensive bioinformatics analyses revealed the functional categories of those microsome proteins and provided clues into biological functions of the liver. The systematic analyses of the proteomic profile of mouse liver microsomes not only reveal essential, valuable information about the biological function of the liver, but they also provide important reference data to analyze liver disease-related microsome proteins for biomarker discovery and mechanism clarification of liver disease.
PMID: 22500222 [PubMed - in process]
Sat, 04/14/2012
A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis.
IEEE/ACM Trans Comput Biol Bioinform. 2012 Feb 13;
Authors: Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, de Schaetzen V, Duque R, Bersini H, Nowe A
Abstract
A plenitude of feature selection (FS) methods is available in the literature, most of them rising as a need to analyse data of very high dimension, usually hundreds or thousands of variables. Such datasets are now available in various application areas like combinatorial chemistry, text mining, multivariate imaging or bioinformatics. As a general accepted rule, these methods are grouped in filters, wrappers and embedded methods. More recently, a new group of methods has been added in the general framework of FS: ensemble techniques. The focus in this survey is on filter feature selection methods for informative feature discovery in gene expression microarray analysis, which is also known as differentially expressed genes (DEGs) discovery, gene prioritization or biomarker discovery. We present them in a unified framework, using standardized notations in order to reveal their technical details and to highlight their common characteristics as well as their particularities. This survey ends with a taxonomy proposal for filter FS methods applied in gene expression microarray data analysis.
PMID: 22350210 [PubMed - as supplied by publisher]
Wed, 02/22/2012
msCompare: a framework for quantitative analysis of label-free LC-MS data for comparative biomarker studies.
Mol Cell Proteomics. 2012 Feb 7;
Authors: Hoekman B, Breitling R, Suits F, Bischoff R, Horvatovich P
Abstract
Data processing forms an integral part of biomarker discovery and contributes significantly to the ultimate result. To compare and evaluate various publicly available open source label-free data processing workflows we developed msCompare, a modular framework that allows the arbitrary combination of different feature detection/quantification and alignment/matching algorithms in conjunction with a novel scoring method to evaluate their overall performance. We used msCompare to assess the performance of workflows built from modules of publicly available data processing packages such as SuperHirn, OpenMS and MZmine, and our in-house developed modules, on peptide-spiked urine and trypsin-digested cerebrospinal fluid (CSF) samples. We found that the quality of results varied greatly among workflows and, interestingly, heterogeneous combinations of algorithms often performed better than the homogenous workflows. Our scoring method showed that the union of feature matrices of different workflows outperformed the original homogenous workflows in some cases. msCompare is open-source software (https://trac.nbic.nl/mscompare), and we provide a web-based data processing service for our framework by integration into the Galaxy server of the Netherlands Bioinformatics Center (http://galaxy.nbic.nl/galaxy), in order to allow scientists to determine which combination of modules provides the most accurate processing for their particular LC-MS datasets.
PMID: 22318370 [PubMed - as supplied by publisher]
Fri, 02/10/2012
Mass spectrometry for protein quantification in biomarker discovery.
Methods Mol Biol. 2012;815:199-225
Authors: Wang M, You J
Abstract
Major technological advances have made proteomics an extremely active field for biomarker discovery in recent years due primarily to the development of newer mass spectrometric technologies and the explosion in genomic and protein bioinformatics. This leads to an increased emphasis on larger scale, faster, and more efficient methods for detecting protein biomarkers in human tissues, cells, and biofluids. Most current proteomic methodologies for biomarker discovery, however, are not highly automated and are generally labor-intensive and expensive. More automation and improved software programs capable of handling a large amount of data are essential to reduce the cost of discovery and to increase throughput. In this chapter, we discuss and describe mass spectrometry-based proteomic methods for quantitative protein analysis.
PMID: 22130994 [PubMed - in process]
|