# 5: Life Science x AI

Kiin Bio

Jun 09, 2024

Welcome back to your weekly dose of AI news for Life Science!

Here’s what we have for you this week:

Chemical benchmarks 💿
The status of CRISPR therapies 🔮
Life Science tools of the week 🛠️

Don’t forget to take our short survey to help us understand how you currently use AI in your day-to-day!

Start Survey

Chemical benchmarks 💿

As more and more AI models are developed in Life Science, it is vital to have reliable tools to benchmark & verify the quality of the output.

Introducing ChemFH! ChemFH goal is to help reducing the waste of time and cost coming from identifying new potential drug targets (hit identification). There are two main approaches for hit identification: i) high throughput screening (HTS), where 1000s of compounds are tested in a lab for efficacy against a target and ii) virtual screening, where billions of compounds are tested in silico. While virtual screening is very powerful, HTS (or any other lab validations) are needed to ensure the validity of the predictions. The problem is that up to 95% of results can be false positives, leading to delays and wasted resources in drug discovery!

How does ChemFH help?

📌 ChemFH provides an online platform to automatically evaluates compounds and identifies potential false positives based on a variety of physicochemical properties or interfering factors (eg. colloidal aggregation, autofluorescence, luciferase inhibition, and chemical reactivity, complicating the identification of true hit compounds)

📌 ChemFH uses many dataset for predicting false positives, including

823,391 compounds
1,441 representative alert substructures
10 commonly used frequent hitter screening rules.

ChemFH shows encouraging results, with an accuracy of AUC = 0.91 and was validated with an external set of 75 compounds as well as five virtual screening libraries.

How does it compare to other tools? It is faster and uses a large training data for the predictions!

For those of you who are more technical, ChemFH uses multi-task directed message-passing network (DMPNN) architectures.

The tool is also available open-source on GitHub!

The status of CRISPR therapies🔮

The CRISPR-Cas system is an amazing programmable gene editing system important for gene therapy and personalised medicine. Put it simply, it allows us to edit, modify and replace a specific gene in a specific region.

What is the status of CRISPR in clinical trials? We have summarised the insights from Innovative Genomics for you ⬇️

📌 First CRISPR Approval (late 2023): While the technologies is relatively new, last year we saw the first-ever approval of CRISPR-based medicine (Casgevy from CRISPR Therapeutics) to treat sickle cell disease and transfusion-dependent beta thalassemia.

📌 Clinical Trials Expansion: Ongoing trials target a variety of diseases, including blood disorders, chronic bacterial infections, protein-folding diseases, inflammatory diseases, and cancers. The main players in the field are CRISPR Therapeutics, Intellia Therapeutics, Editas Medicine, Locus Biosciences, Beam Therapeutics, Caribou, Verve Therapeutics, Excision Biotherapeutics

📌 Key challenges: Three key challenges are preventing a wider adoption of CRISPR-based therapies

CRISPR drugs are scientifically and technologically challenging. More specialised infrastructure and expertise is needed.
CRISPR treatments are expensive, up to $2 million per patient due to the complex manufacturing, limiting the number of people who can afford the drugs, hence hindering the pharma investment
CRISPR therapies require a chemotherapy pre-treatment chemotherapy, which carries risk and safety concerns (and cost)

Life Science tools of the week 🛠️

1/ scFoundation - Single-cell transcriptomics

New foundation model for single-cell transcriptomics data! scFoundation has been developed by BioMap as the largest foundation model for single cell data

📌 Trained on 100 million parameters covering about 20,000 genes, pretrained on over 50 million human single-cell transcriptomic profiles

📌 Experiments showed its merit as a foundation model that achieved state-of-the-art performances in a diverse array of single-cell analysis tasks such as gene expression enhancement, tissue drug response prediction, single-cell drug response classification, single-cell perturbation prediction, cell type annotation and gene module inference.

📌 Some limitations still exist: 1) the model does not include genomic or epigenomic data; 2) the approach to the model - unsupervised pretraining process - did not rely on human annotation. This metadata might link cells’ molecular features with phenotypes; 3) While all human scRNA-seq data publicly available was used for training , they may still not be sufficient to fully reflect the complexity of human organ development and health states

🔗 Code: https://github.com/biomap-research/scFoundation

2/ ProSTAGE - Protein stability

Protein stability prediction remains a complex and challenging problem due to limited datasets and less sophisticated algorithms.

ProSTAGE is a cutting-edge deep learning method that combines structural and sequence data to predict changes in protein stability due to single point mutations. Why are we talking about it?

📌 It uses 2x the sample size used by other models, leading to more robust and accurate predictions. In particular, it is trained on a curated dataset of 11,304 mutations across 318 proteins

📌 It beats previous state-of-the-art (SOTA) models in a variety fo tests

📌 It leverages ProtT5-XL protein language model (in particular the embeddings) to capture long-range sequence information without additional training or knowledge, thus improving the understanding of the effect of the mutation.

3/ ABodyBuilder3 - Antibody design

Antibodies are crucial in drug discovery due to their ability to specifically target and neutralize pathogens. Their precision allows for the development of targeted therapies, enhancing treatment efficacy and minimizing side effects. This makes antibodies essential in creating innovative treatments for a wide range of diseases, from cancer to autoimmune disorders, driving advancements in personalized medicine. For designing antibodies efficiently, understanding their 3D structure and how they interact with antigens is essential. Researchers at Exscientia have released their latest version of ABodyBuilder, a deep-learning model to predict antibody structure! ABodyBuilder3 improves on its predecessor on three main areas ⬇️

📌 Language model representation: ABodyBuilder3 uses ProT5 protein language model to encode and represent the antibody's variable region. This sequence is then processed through eight stages to create a detailed all-atom structure of the antibody and to provide estimates of any uncertainties.Unlike its predecessor, ABodyBuilder2, which used a simpler method (one-hot encoding) to represent amino acids, ABodyBuilder3 uses protein language models (PLMs). These PLMs offer a more accurate way to represent the residues in the antibody, leading to better predictions of the crucial regions known as complementarity-determining regions (CDRs).

📌 Uncertainty estimation: The ABodyBuilder2 model uses four different models to give a prediction confidence score. However, more models means more computation needs. ABodyBuilder3 replace the models with a pLDDT (used by AlphaFold2). This method slightly increases the number of parameters but avoids the need for multiple models.

📌 Improved structure modelling and evaluation: Model was trained on Structural Antibody Database (SAbDab) to improve the prediction. Moreover YASARA was used as a physics-based refinement strategy to fix stereochemical errors and provide realistic structures.

🔗 Code: https://github.com/Exscientia/ABodyBuilder3
🔗 Paper: https://arxiv.org/pdf/2405.20863

BITE-SIZED COOKIES FOR THE WEEK 🍪

Yet another layoff in Pharma. Bristol Myers Squibb is cutting 860 employees in New Jersey as part of the $1.5B savings campaign. In 2024 BMS plan to lay off a total of 2,200 employees

The EU is launching their first AI Office! The goal: “enabling the future development, deployment and use of AI in a way that fosters societal and economic benefits and innovation, while mitigating risks”

SOPHiA GENETICS, a cloud-native company focused on data-driven medicine, announced a new data-driven consortium called SOPHiA UNITY. The goal? Accelerate cancer research globally using SOPHiA GENETIC’s tech stack and the expertise of the community. For now only Memorial Sloan Kettering Cancer Center joined

Don’t forget to take our short survey to help us understand how you currently use AI in your day-to-day!

Start Survey

Kiin Bio Weekly

Discussion about this post

Ready for more?