# 3: Life Science x AI

Kiin Bio

May 26, 2024

Welcome back to your weekly dose of AI news for Life Science!

Here’s what we have for you this week:

Benchmark and data 💿
Insights on AI adoption and challenges in Life Science 🧠
Life Science tools of the week 🛠️

If you like the article … follow us on LinkedIn

Don’t forget to take our short survey to help us understand how you currently use AI in your day-to-day!

Start Survey

Benchmark and data 💿

1/ PoseBench - Protein-ligand docking and structure generation

With the speed at which AI tools are applied to protein folding and protein-ligand structure prediction, it is important to have easy and reliable benchmarks! Introducing PoseBench, a comprehensive benchmarking of protein-ligand structure generation methods! Why PoseBench is useful? Because with one single tools you can you can:

📌 Run protein prediction inference with many state of the art tools (DiffDock, FABind, DynamicBind, NeuralPLexer, RoseTTAFold-All-Atom, AutoDock Vina, TULIP) … hopefully we will see OpenFold soon!

📌 Run an 'ensemble' inference, ie. with a few commands you can generate predictions for a new protein target using all the methods and you receive a list of predictions ranked with different approaches

📌 Easily pull benchmark data for protein prediction and create comparative plots of inference results across different tools

You can checkout some tutorial on how to use PoseBench in here

2/ CyclicPepedia - Knowledge base of natural and synthetic cyclic peptides

Cyclic peptides have emerged as a prominent class of therapeutic agents across a variety of diseases, and represent about 66% of the FDA- and EMA-approved drugs. Cyclic peptides have several advantages over linear peptides, particularly their enhanced pharmacokinetic and pharmacodynamic properties.

Bu how can you easily find information on cyclic peptides? Look no further! CyclicPepedia is a database that encompasses both synthetically derived and naturally occurring cyclic peptides, totalling 8,744 known cyclic peptides with summaries, structures, sequences, properties, assays, targets, prediction tools, manufacturers, external links, and relevant literature! Peptides are collected from various sources:

PubChem: 2,535
DrugBank: 91
UniProt: 605
CyBase: 1,236
DPL: 88
APD: 204
ConoServer: 3,416
DRAMP: 914
Norine: 576

Insights on AI adoption and challenges in Life Science 🧠

In the past months there have been many surveys on the AI adoption and challenges. Every large consultancy and no-profit has done one and it is sometimes difficult to follow the trends. Here we summarise the problems highlighted across a few large surveys.

1/ Pistoia alliance

The Pistoia Alliance is a global, not-for-profit alliance that advocates for greater collaboration in life sciences R&D. They conducted a survey over 300 Life Science experts from AbbVie, AstraZeneca, GSK, Elsevier, Roche, J&J, FDA and more. Their key findings:

📌 70% of life sciences experts recognise AI's potential but acknowledge limited implementations within pharma

📌 63% of respondents expressing concern that poor data quality could lead to incorrect AI conclusions and potentially harmful clinical decisions.

📌 ~20% of respondents suggesting lack of confidence on data privacy and AI safety has a barrier for AI adoption

2/ Boston Consulting Group (BCG) and Wellcome Trust

Their report focuses on the potential of AI in Drug Discovery. Their interviewed 95 key experts across high and low income countries. What they found was that:

📌 Top pain points are similar between high and low income countries, but in a different order

📌 For high income countries, the largest pain point is the availability of large high-quality data for

📌 Low / Medium income countries, the largest painpoint is on the expense (and overall availability) of high-quality tools

📌 Lack of internal expertise on how AI works and how to implement it is a main barrier for both high and low income countries

HIC = High-income countries

LMIC = Low- and Middle-income countries

3/ Schwartz Reisman Institute for Technology and Society

The Schwartz Reisman Institute for Technology and Society works with world-class expertise across many sectors to ensure that powerful technologies like AI are responsible, inclusive, and beneficial to everyone. Despite they do not focus on Life Science per se, in their Global Public Opinion on Artificial Intelligence report they have a special section on AI adoption for Healthcare based on the feedback from 1,000+ people in 21 countries. Their insights:

📌 ~50% agree AI use should be used in healthcare, including triage and develop-ing robots aimed at providing services for the elderly.

📌 Most popular use case: AI in diagnostic imaging is the top use case(59%)

📌 Least popular use cases: AI to determine an individual health plan & for making prescriptions (both 46%).

Interestingly, there is a large variability between countries (eg. AI for diagnostic imaging has the widest support, but only 38% of Australians agree with its use compared to 75% of Chinese respondents)

4/ ZoomRx

In a recent ZoomRx survey of more than 200 life sciences professionals including many of the Big Pharma. They survey was mostly focused on ChatGPT and not Life Science tools overall. However, there are a few key insights:

📌 65% of the top 20 Big Pharmas do not use ChatGPT internally due to concerns that sensitive internal data could be leaked to competitors.

📌 83% of the respondents labelled AI overall as “overrated”

📌 8% of the respondents have not started adopting AI, 50% are have some pilot AI use cases, but 10% have production-ready AI applications.

5/ Out insights

Seeing these surveys, talking with people and reading news in the field, we believe AI is staying and will do so for the foreseeable futures. We see still many problems on data quality, expertise and overall privacy, where overall there is so much noise in the field for data and tools that is difficult to understand which is the right solution for a specific use case.

Regardless how impactful or disruptive a technology is, it takes time for change to happen. We do not think AI will transform the field in the next 2-3 years, but we strongly believe that in 10 years AI will be pervasive in Life Science and it will help to get drugs out faster and diagnose better.

Life Science tools of the week 🛠️

1/ OpenFold - Protein folding

Everyone talks about AlphaFold (especially now that AlphaFold3 is out). Fewer people know that a fully open-source version of the famous protein fold exists. Welcome to OpenFold, a fully reimplementation of AlphaFold2 with improved Speed and efficiency. The tool used the same architecture as AlphaFold2 and achieves the same accuracy … but can run up to four times faster while requiring less memory!

Thanks to the efficiency, OpenFold can handle proteins with 4,000 residues on a single GU, while AlphaFold2 struggled on a single GPU with proteins exceeding 2,500 residues. How could this happen? Well, the authors experimented with the training process by retraining OpenFold multiple times under different scenario. They found that approximately 90% accuracy could be achieved within about 3% of the training time, even with progressively reduced training data sizes. While AlphaFold2 was trained using approximately 132,000 PDB protein structures, OpenFold achieved nearly the same accuracy with just about 7.6% of the structures. This indicates that a small but diverse training set is sufficient for strong model accuracy.

2/ PSAURON - Assessing protein annotation

More proteins tools! A common problem is to evaluate the accuracy of protein-coding sequences in genome annotations. That’s where PSAURON (Protein Sequence Assessment Using a Reference ORF Network) can help! The tool was developed was trained on a diverse dataset from over 1000 plant and animal genomes and assigns a score to coding DNA or protein sequence that reflects the likelihood that the sequence is a genuine protein coding region.

Why is this important? Well, this helps to:

📌 Perform more accurate genome-wide protein annotation, important to then use proteins for downstream work, for example identify potential new antibiotics!

📌 Rapid identify potentially spurious annotated proteins in already characterised genomes.

📌 Get a single overall score of protein annotation quality for a genome

The good design of the tools means that PSAURON does not need to be trained specifically on each species, but only needs to be trained once (which has been done already!)

3/ GigaPath - Whole-slide pathology foundation model

What happens when doctor suspect a patient has cancer? Well, normally doctors take a biopsy and analyse cells under a microscope, producing large images where a standard “gigapixel” slide may comprise tens of thousands of image tiles! Can AI help to do better cancer diagnosis from images? Of course! Problem is the scale of the data, where a typical Transformer architecture (the technology behind ChatGPT) can simply not handle the size of the data.

A collaboration between the University of Washington and Microsoft Researcher teams developed Prov-GigaPath, a new Vision Transformer with dilated attention. What’s the novelty? It does not use all image patches as input (dense attention), instead skips patches at intervals, such as taking 1 out of every 2 or 4 patches (dilated attention). This allows to handle large images and pixel efficiently without losing much performance. The authors trained the tool on 1.3 billion 256 × 256 pathology image tiles in 171,189 whole slides from Providence, a large US health network comprising 28 cancer centres. The slides originated from more than 30,000 patients covering 31 major tissue types.

Why is this relevant for Life Science experts and what can it do? GigaPath was tested on 9 cancer subtyping tasks and 17 pathomics tasks and

📌 Attained state-of-the-art performance on 25 out of 26 tasks, including identifying cancer subtyping and mutation prediction

📌 Showed a significant improvement over the second-best method on 18 tasks.

📌 It can also incorporate pathology reports (as text) to improve the predictions!

BITE-SIZED COOKIES FOR THE WEEK 🍪

Meta is doing amazing work on open-source software and has recently worked on a new foundation model for chemistry to predict atomic property!

AstraZeneca is planning a $1.5 billion manufacturing facility for antibody drug conjugates (ADCs) in Singapore, targeting 2029 for the production launch. ADCs are a novel therapeutic modality with a great potential due to the potency and selectivity. You can read more on our article in here

If you want to read about the best selling drugs and pharma company of 2023 check this post. Quick sneak peak …Merck has the top selling drug in the word (Keytruda, cancer) with $25B+ in sales!

Good news for the folks who are working with clinical data … EPIC, the Electronic Health Records giant, released an open-source AI validation tool for health systems to validate model performances over time based on standardised evaluation criteria

Don’t forget to take our short survey to help us understand how you currently use AI in your day-to-day!

Start Survey

Kiin Bio Weekly

Discussion about this post

Ready for more?