🧬 Babylon Biosciences: Fine-Tuning AI to Predict Clinical Trial Success
Deep Dive | July 10, 2025
Welcome back to the “Deep Dive” series, where we bring you the coolest AI tools making a real impact in life sciences. In each episode, we feature exclusive insight from the teams building these tools — why they matter, what they do, and what the future holds!
Today we present Babylon Biosciences, a small molecule therapeutics company that’s teaming up with OpenAI and Sleuth Insights to tackle one of pharma’s biggest challenges: predicting clinical trial outcomes. We spoke with Babylon’s founder and CEO, Sacha Schermerhorn, to learn why he believes this fine-tuned AI could change the game for drug development.
What’s slowing you down? Let us tackle it.
🔴 The Problem
Bringing new medicines to patients is a massive gamble. Around 90% of drug candidates fail in the clinic, wasting an estimated $45 billion every year and costing patients precious time waiting for treatments.
Despite decades of modelling tools, the industry has struggled to meaningfully improve success rates in over 30 years. Part of the problem is that predicting trial outcomes requires understanding mountains of both structured data (like indication, phase, prior evidence) and unstructured context buried in protocols, eligibility criteria, and trial registries.
Without tools that truly integrate these clues, biopharma companies have no easy way to make smarter “go/no-go” decisions before investing millions in large studies.
💡The Idea
Babylon teamed up with OpenAI and Sleuth to ask a bold question: Can we teach a foundation AI model to reason through complex clinical trial data, myriad clues from previous trials and scientific literature, and output realistic success probabilities?
Using OpenAI’s Reinforcement Fine-Tuning (RFT) API, Babylon fine-tuned OpenAI’s o3-mini model on 430 historical trials covering oncology, neurology, metabolic disease, and rare disorders. The model combines structured data with unstructured protocol text to produce a probabilistic prediction of whether a trial will hit its primary endpoint.
When tested on 90 unseen trials, the fine-tuned model achieved an AUC of 0.84, up from a baseline of just 0.65 — a major boost in predictive power that could help teams rank-order their pipeline and make faster, better-informed kill/no-kill calls.
🔬 Why It’s Different
What sets Babylon apart is how they’ve hybridized cutting-edge AI with deep, proven drug development expertise. Babylon’s team has contributed to 46 FDA-approved drugs, including major drugs like NURTEC® and lisinopril — the third most prescribed drug in the US.
With this track record, Babylon is embedding fine-tuned AI directly into their portfolio strategy, using it as a practical decision tool to de-risk and discover their next wave of small molecule therapeutics and accelerate their path to patient-ready treatments.
As CEO and founder Sacha Schermerhorn puts it:
"Seeing our fine-tuned model hit a record AUC of 0.84 was a defining moment for all those involved. It opened up a broader conversation about what else these AI systems can help us unlock for patients and what other opportunities have been overlooked."