In this post, I defend the bold claim that our platform can deliver “10x the insights” at “50x the speed”. I offer a 300-word, just-the-punchline version of this defense followed by a longer presentation where I unpack more of the details and assumptions.
Prism replicated a published, 2020 analysis that attempted to quantify the number of clinical trials that use digital health technologies. Based on the dates reported in the publication, we estimate that it took 125 days for this team of 4 to essentially produce 2 findings (i.e., “insights”): (1) a bar chart that shows the growth of trials using digital health technologies over time; and (2) a table that shows the top 5 digital health tech keywords that matched clinical trial records.
In 2 days, we were able to replicate the study and published the complete results on our platform. Our analysis includes the same chart showing trials over time and the top keywords. Additionally, we include a breakdown of trials by therapeutic areas; a breakdown of trials by status; the volume of trials over time, broken down by type of sponsor; a 4-dimensional bubble chart showing volume of trials over time, broken down by therapeutic area and sponsor; and the geographic distribution of trial sites.
All of our charts/results are also dynamically linked to filters, allowing the user to narrow the dataset and see updated versions of every chart and table. This means that each chart in our analysis actually contains within it thousands of potential insights (the value of which, of course, depends on the user’s particular needs).
There are many ways we might slice and dice this to derive an estimate for how many more insights are contained within our analysis, and how much faster we can produce them. If you are skeptical, please keep reading, because I will go into more depth about my assumptions and this calculation below. But here's the punchline: We believe it is reasonable to claim that Prism’s data science platform can deliver at least 10x the insights at 50x the speed.
Want more detail? You got it! Here's the longer version of this story:
The Current “Gold-Standard”
In April 2020, a team of 4 researchers, largely-based at Harvard, published an analysis in npj Digital Medicine that attempted to quantify the number of digital health technologies (DHTs) used in clinical trials. To do this, they conducted a traditional systematic review: They first compiled a list of keywords that they believed likely to pick out DHTs in ClinicalTrials.gov records. Then they wrote and ran an algorithm that searched ClinicalTrials.gov for each keyword and retrieved the number of trials that matched against the keyword. Then they tabulated the results, wrote up a manuscript, and submitted it for publication.
Their final publication reports essentially 2 “insights”: (1) The growth in trials using DHTs between 2000-2018, stratified by trial phase (this is figure 2 in the paper); (2) a contrast between the top 5 keywords for 2004-2007 and the top 5 for 2015-2018 (this is figure 3 in the paper; these timespans were selected by the authors because of the release of the iphone in 2007).
Note: I recognize that in the usual way of using the term “insight”, we would rightly say that multiple insights can be extracted from a single figure or table. However, for the purposes of this comparative analysis, we can count a static chart or table as 1 insight.
One has to dig into the paper's supplementary methods to estimate how long it took this team to produce these 2 insights. In the supplement, they report that they pulled the final data on November 30, 2019. The paper was published on April 3, 2020. This is 125 days.
Now I know (from having worked on dozens of systematic reviews) that the whole process to produce this publication will have taken far longer than 125 days. In fact, I was at a workshop with the authors of this study in May 2019 and I recall them telling me about their plans and soliciting my feedback for this analysis. So this would push the “true” time it took to conduct this study closer to a year. The 125 days also includes the time it took for peer-review (which is astonishingly fast!).
But for the purposes of this analysis, I think it is reasonable to estimate that these authors generated their 2 insights in 125 days. This likely approximates the actual time it took for them to go from the list of keywords to the final chart and table (i.e., their primary figures 1 and 2).
And just in case that sounds dismissive, let me be clear: I believe that analyzing the landscape of DHTs in ClinicalTrials.gov data to extract and publish 2 insights is no small feat. This is innovative work on an important question, and this kind of output is entirely consistent with the gold-standard for systematic reviews in academic medicine.
The Prism Replication
So how does the Prism approach stack up?
In September 2020, as a component of a sensitivity analysis for a project Prism is undertaking with Janssen Clinical Innovation to estimate the volume of DHT use, I attempted a replication of the Harvard team’s study. Although there wasn’t quite enough information in the publication and supplementary materials to run a strict replication, the Harvard team did make the list of keywords available on GitHub. I downloaded this list, inspected, and modified it slightly (mostly to make it more specific and accurate when querying ClinicalTrials.gov). I then queried ClinicalTrials.gov for every keyword, tabulated the results, wrote up the report, and published it to Prism's application.
Prism's report includes the same 2 insights as the Harvard team’s paper (i.e., trials over time, broken down by phase; top keywords), as well as 5 others:
- Breakdown of trials by therapeutic areas
- Breakdown of trials by status
- Volume of trials over time, broken down by type of sponsor
- 4-dimensional bubble chart showing volume of trials over time, broken down by therapeutic area and sponsor
- Geographic distribution of trials
Like all of the reports on our platform, this analysis comes with the ability to filter the dataset along any dimension in the data (I chose to include 7 such filters). The user also has the ability to click and drill down on any chart segment to inspect and download the underlying data.
This entire process took me less than 8 hours, spent across 2 days.
Now let’s do the calculation:
It took me (i.e., 1 person) roughly 2 days to produce the insights that it took a team of 4 researchers roughly 125 days to produce. I think there are too many unknowns to reasonably argue that Prism is “250x times faster”, but I think it is entirely fair to claim that we are at least “50x faster” than the gold-standard review method.
In those 2 days, I generated 7 insights to the Harvard team’s 2. Except, as I noted above, all of the charts and tables on Prism’s platform are dynamic—they update with filters. This allows the user to generate additional insights that are more narrowly focused on particular disease areas, sponsor portfolios, geographic regions, time spans, etc., in any combination.
For example, suppose you want insight into the types of keywords and devices that picked out trials in the area of nervous system diseases, conducted only by industry, and only the trials that specified a phase (which likely indicates a drug as the intervention). Insights of this kind (which would be informative for understanding the types of devices that have been successfully/unsuccessfully used in drug trials) cannot be derived from the Harvard team’s paper. With Prism’s platform, it is just a few clicks away.
Even using the overly-strict definition for “insight” (i.e., 1 chart = 1 insight), this means that Prism’s DHT report contains many thousands more insights! But as before, I think given the many assumptions here, claiming thousands of times more insights is stretching it too far. However, I think it is fair to claim we offer at least “10x the insights”.