The Problem

More than 1 in 10 Americans over 65 are living with Alzheimer’s Disease (AD). This number is only projected to grow, and existing options are bleak. Despite the recently approved drugs (of questionable efficacy), there is still no cure or truly disease-modifying treatments.

For decades, drug research in AD has been narrowly focused on two single targets:amyloid and tau. Many experts believe this narrow focus is at least partially responsible for the lack of successful Alzheimer’s treatments.

Genentech asked Prism.bio for help in finding another path forward for AD research. What other ideas—besides amyloid and tau—are out there in the literature? Are some of these ideas “neglected hypotheses” that could lead to breakthrough therapies? Of all the hypotheses overlooked due to amyloid and tau, which are the most promising?

The Solution 

Prism.bio started with the data: We searched the National Library of Medicine’s PubMed database for clinical trials in the neurodegenerative disease space, focusing on records from 2002 to 2022. This yielded 8,749 records. Then we used large language models (LLMs) to summarize a central hypothesis from each clinical trial record based on the title, abstract, substance terms, and keywords.

In the next step, Prism.bio converted the textual hypotheses into a mathematical representation—a 1,536-dimensional vector to be specific—that could be processed by machine learning. Using advanced 3D visualization techniques, we then created data clusters wherein each data point was an individual hypothesis and each cluster was a group of hypotheses. The clusters allowed us to see which groups of hypotheses people have invested the most research firepower into.

But we were not interested in hypotheses that have dominated the research paradigm, we were seeking promising outliers. Therefore, we applied a second clustering analysis exclusively on the outliers. We finally used LLMs to summarize the largest 25 noise cluster to derive a set of neglected hypotheses.

The Results

We chose the three most interesting of the 25 neglected hypotheses and confirmed that they were hardly explored, promising areas. Our expert pharma collaborators confirmed that the top hypotheses had plausible scientific rationale and potential.

This method is transferrable for any organization seeking promising, underexplored research areas (and can be utilized for any therapeutic area, not just neurodegenerative disease). As an added bonus, no human effort is required in the process until the very last step—the final review of the top neglected hypotheses.

This development opens up a new era of scientific research: one where literature can be analyzed with a scale and accuracy that was previously impossible. And in turn, this has the potential to accelerate scientific discovery by helping to foreground promising, but overlooked ideas.

Want to learn more? Download our manuscript, accepted to Neural Information Processing Systems 2023 Generative Biology Workshop.