Pages

Thursday, August 25, 2011

Biomarker Studies: samples, hypothesis, and statistics


In a recent post on BiomarkerBlog, David Mosedale highlights a common problem with the design of biomarker discovery studies: reductionist clinical samples selection.  While it is tempting to initially explore for potential new biomarkers in highly contrasted clinical samples (e.g. healthy vs. diseased, benign early cancer vs. advanced metastatic cancer), this approach is almost guaranteed to yield over-optimistic results that do not translate easily to the real, complex world.  As a solution to this common problem, the author proposes that the design of the initial biomarker discovery study should reflect more accurately the intended application of the biomarker by including a spectrum of cases representative of the true complexity of the target patient population.  While I fully agree with David’s point, I would like to suggest an alternative view on this issue.

I would argue that the root-cause of the disconnection between biomarker discovery and their translation to medical use is the application of the right statistics to the wrong questions (i.e. statistical hypothesis).  Based on this premise, is there a fundamental issue with biomarker exploration using highly contrasted clinical samples?  I would argue that this approach can be useful as long as it is recognized for what it is: an initial screening step designed to test the minimalist hypothesis of whether a distinguishing factor (or factors) can be detected under artificially contrasted conditions.  Thus, the strength of the statistical association between the distinguishing factor and the selected sample phenotypes only reflects the pre-defined sample choice, not the true nature of the factor’s statistical association in the real-world population.  Hence, the use of this approach should be limited to the selection of potential biomarker candidates intended to be studied in a representative clinical sample.

Another case of inappropriate hypothesis definition is often encountered in the so-called validation of candidate biomarkers where a subset of the clinical sample used for discovery is used to determine the predictive value of the candidate biomarker using techniques such as Receiver Operator Curve analysis.  Here again, the strength of the statistical predictive value (Positive and Negative Predictive Values) derived from this approach is skewed by the initial sample selection, offering limited information about the predictive value of candidate biomarkers in the real-world.

So what is the solution to this somewhat frustrating trend in biomarker research?  I would argue that biomarker scientists should learn to ask the right questions to statisticians, and that statisticians should learn to challenge biomarker scientists about the actual hypothesis they wish to test.



Thierry Sornasse for Integrated Biomarker Strategy

No comments:

Post a Comment