Navigating the Data Void

Written By:

April 4, 2023

In my recent conversation with Christian Djurhuus (former Chief Digital Officer, Development at Novo Nordisk), he referred to the pharma industry operating in a "data void". This is not a term that I had heard before, but the more I thought about it, the more I came to love it.

Part of what I love is how it initially seems so discordant with reality: How can we be living in a data void, when the industry is clearly awash in data? Just think about all of the trial data, "real world" patient data, 'omics data, reimbursement data, market data, literature data, meta-research data, etc., etc. There is so much data! So many data vendors!

Indeed, it would seem that there has never been more data available to answer important questions in biomedicine. And yet, once you step back (or zoom out, if you prefer) to look at the overall epistemic landscape, it is immediately clear just how little we know and how much more data would be required to have the certainty we desire.

However, the fact that there is so much uncertainty should not prevent us from acting. On the contrary, I think this fact should spur us to action—but in order to know how best to act, we need to first understand the void and develop strategies for navigating it.

In what follows, I revisit a pair of studies from my academic career that shed light on the nature of the data void in biomedicine.

Validating surrogate endpoints

In 2020, I published a study with my Harvard colleagues in The Journal of the National Cancer Institute. We conducted a systematic review and meta-analysis of all the phase 2 and phase 3 clinical trials evaluating bevacizumab (BVZ, a.k.a. "Avastin") as a treatment for metastatic breast cancer (mBC).

Some quick background on this case: The FDA granted BVZ an accelerated approval for mBC in 2008 based on preliminary trial data that showed the drug prolonged progression-free survival (PFS). However, as the trial data became more mature, it was found that BVZ did not actually improve overall survival (OS). So the FDA withdrew the approval—and there was much controversy.

Thereafter, this case was often cited as showing that PFS was not a good trial-level surrogate for OS. But there was no study that had actually gone back to look at the data. So this is what we did: We looked at all the trials to see what could we conclude about the validity of PFS as a surrogate for OS with this drug and patient population.

Our evidence review found 52 studies of BVZ in mBC. Only 7 of which reported sufficient data to be included in the surrogate validity analysis. But when you put all the data together, we found that you could not conclude that PFS is poor trial-level surrogate for OS. There simply wasn't enough data.

Figure 2 from our JNCI paper. Despite 7 clinical trials, still too much uncertainty to conclude is PFS is a good/bad surrogate endpoint

I think this as a critical insight about the data void: If it will take more than 7 RCTs to rigorously determine if a surrogate endpoint is valid for a particular (type of) intervention in a particular patient population, then this means that rigorous surrogate validation is going to be incredibly rare. It is simply not feasible to conduct all the trials we would need.

Uncertainty about surrogates is thus going to be a feature of the data void.

Testing precision medicines

In 2020, I also published a study in Science Translational Medicine that mapped the evidence landscape for a precision medicine—the use of BRAF-inhibitors (BRAFi) as a treatment for metastatic melanoma (MM) patients whose tumors are found to have a V600 mutation.

Some quick background on this case: Two BRAFi's, vemurafenib and dabrafenib, were approved by the FDA in 2011 and 2013, respectively, as treatments for MM. These therapies are hypothesized to only be beneficial for patients whose MM tumors have V600 mutations, and indeed, the regulatory approvals for vemurafenib and dabrafenib are explicit that these treatments should only be given to patients whose tumors have been tested with the products' companion diagnostic and found to have the relevant BRAF mutation.

But to truly know (i.e., to have supporting data and evidence) if a BRAF-mutated tumor is necessary to benefit from BRAFi therapies, there need to have been RCTs that (a) tested everyone for their BRAF-mutation status (ideally using the companion diagnostic), and (b) enrolled both BRAF-mutated and BRAF-wildtype patients (i.e., those whose tumors do not show a V600 mutation). Furthermore, the results of the RCT would need to show that BRAF-mutated patients did better on the BRAF-inhibitor therapies than they did on the control therapy and the BRAF-wildtype patients did better on the control therapy than they did on the BRAF-inhibitor therapy.

Given the FDA approval, when we undertook this study, we expected to find strong evidence that BRAF mutation testing was predictive of benefit with BRAFi therapy for MM patients, as well as weaker or disconfirming evidence that BRAF testing was useful for other types of therapies for MM patients.

In principle, BRAF V600 mutation testing could also be prognostically useful, regardless of therapy; or it may select a clinically-meaningful subgroup that can benefit from other kinds of therapies. For our review, we tried to capture and summarize the literature to illuminate all of these possibilities.

What we found was surprising to us. There was no rigorous evidence to show that BRAF-mutation testing was predictive of benefit for MM patients. This was (it seems) because it was believed to be unethical to expose MM patients to the therapy if they lacked a V600 mutation. So the trials necessary to answer this question were never conducted.

And unfortunately, this epistemic state of affairs appears to be the general rule, and not the exception, in precision medicine: We often do not have the data and evidence to support claims that either (a) a particular companion diagnostic is a necessary component of the therapeutic ensemble; or (b) that a targeted therapy (like BRAFi's) only provides benefit for the hypothesized population.

Now to be clear: This doesn't mean that BRAFi's are not beneficial for MM. It also doesn't mean that we shouldn't be trying to develop or treat patients with precision medicines. It is about understanding the data void. If we will not conduct the necessary biomarker-stratified trials (as I described above), then there will always be a lack of data to support the use of precision medicines.

Illuminating the void

In some of my old academic circles I had the reputation of being a nihilist. This always made me chuckle, because nothing could be further from the truth. But I understand why my colleagues or critics might have seen me this way. Much of my work was drawing attention to the data void (although I didn't have that nice, pithy way of putting it back then).

But there is an important difference between nihilism—which denies that we can have knowledge or certainty—and my philosophical beliefs—which emphasize the need for systematic evidence assessment, the value for scientific judgment, and the importance of clearly distinguishing the known from the unknown.

Given how much is unknown in biomedicine—about surrogate endpoints, about precision medicines, pragmatic clinical trials, digital health technologies, take your pick!—I believe that ignoring the unknowns, and "flying blind" as it were, is a mistake. And in the context of clinical trials, it is a mistake that will waste incredible amounts of resources and harm patients.

I believe that pretending we have knowledge when we do not is even worse.

Thus, I think of my work, and now Prism's mission, as trying to illuminate the data void—to illuminate the known and unknown. This is what a heuristic-driven philosophy of science can help us to do. This is what good, philosophically-grounded ontologies and data visualization can help us to do.

The data void is real. We are living in it. But if we help our clients and partners make the best of the data we have; if we think carefully and creatively about the evidence landscape we're exploring, I believe we can navigate the void safely and efficiently—making better decisions about how to better allocate our scarce scientific resources and getting better treatments to patients more quickly.

Latest Articles


Understanding Large Perturbation Models

A brief, layperson's introduction to Large Perturbation Models (LPMs), a new tool in the drug development toolkit to simulate vast numbers of experiments digitally

Schedule a demo