# 7: Life Science x AI
Welcome back to your weekly dose of AI news for Life Science!
Here’s what we have for you this week:
New antibody database 💿
First successful EHR-to-EDC data stream 🩺
Life Science tools of the week 🛠️
We have started working on an AI Scientist to design, execute and troubleshoot scientific experiments …. join the waiting list to be the first one to try it!
New antibody database 💿
Now that we are in the era of AI, it is ever more important to collect high-quality data to keep on improving and training models. While many datasets for proteins exist (partially explaining the rise of the protein language models), the same has not been the same for antibodies … until now! Introducing AbNGS, the largest database of antibody sequences for machine learning and data-mining applications ⬇️
📌 It contains ~3b unique sequences (~2.1b heavy, ~370m light) from 135 human studies and 826 therapeutic antibodies
📌 Future updates in the order of 300 human bioprojects with ~11b productive sequences!
While we are far away from the approximately 10^18 possible (natural) antibodies we have … we are getting there!
🔗 Paper
🔗 Database
First successful EHR-to-EDC data stream 🩺
News for everyone involved in healthcare data cleaning and Electronic Health Record (EHR) systems! Everyone in the field knows the pain of working with EHR, where the systems are not interoperable, don’t use same data standards. In practice this means for each data point (ie. value) you might spend 2-3 minutes to map it to the data in a different EHR or in your clinical database (EDC)
📌 Yonalink successfully used AI to map and stream data from EHR-to-EDC (i.e. clinical trial database) in real time across 21 medical centers in US and Israel!
📌 Data was transferred from 15 patients in an oncology trial with a 100% accuracy of information! While 15 patients are not many, each patients might have thousands of data point … which have different way they are collected among the many medical centers in this study!
🙌 This is a massive breakthrough that gets us one step closer to seamless clinical data integration. Game-changing stuff! 🙌
Life Science tools of the week 🛠️
1/ QupKake - pKa prediction
There are an estimated 1060 small, “drug-like” molecules. How can re reduce the number to fast-track drug discovery? One way is to use computational chemistry to “filter out” molecules based on their physicochemical properties. One important feature is the acid–base dissociation constant (pKa), which reflects the relative propensity for a molecule to donate/ accept a proton thus affecting its solubility, membrane permeability, protein binding affinity, stability. Introducing QupKake, a novel framework to calculate pKa
📌 QupKake combines a Quantum mechanics (GFN2 features) with molecular graph neural network (GNN) and was trained on ChEMBL data (∼2.5 million predicted acidic and basic pKa values over ∼1.5 million molecules)
📌 QupKake provides the state-of-the-art pKa predictions while yielding low prediction errors on five external test sets (RMSEs between 0.5 and 0.8 pKa units). Moreover … QupKake beats commercial software while being open source!
🔗 Paper
🔗 Code
2/ CellRank2 - Cell fate prediction
Imagine predicting a cell's destiny! It could transform drug discovery. Introducing CellRank2, a powerful tool for studying blood cell development using massive single-cell datasets.
📌 CellRank2 combines multiple modalities (RNA velocity, cell similarity, time-series, metabolic labelling) to predict terminal states and fate probabilities for single cells
📌 CellRank2 can 1) compute initial, terminal and intermediate macrostates for cells; 2) Infer fate probabilities and driver genes; 3) Visualize and cluster gene expression trends
📌 CellRank2 can be extended with more modalities thanks to its flexible architecture
3/ DragonFly - Pathology imaging
Amazing work from Together AI with DragonFly, a new "zoom-and-select" multimodal LLM architecture! Why is important? ⬇️
📌 As opposed to focus on an entire large image (which is resource-intensive), DragonFly splits the image into multiple resolutions and sub-images, making it easier and faster to "zoom in" a specific location
📌 DragonFly Llama3 beats Med-Gemini on pathology imaging Q&A (Path-VQA dataset, 92% vs 83%) showing a good fine-grained visual understanding. The Path-VQA dataset comprises 32,632 question-answer pairs derived from 4,289 images relating to pathology images (including open-ended and closed-ended (yes/no) questions)
📌 DragonFly also beats state-of-the-art models on a variety of metrics related to image captioning and radiology report generation tasks in the IU X-Ray radiology (3,955 reports and 7,470 frontal and lateral X-ray images) and Peir Gross (7,443 images across 21 pathology sub-categories) datasets
BITE-SIZED COOKIES FOR THE WEEK 🍪
Color is using OpenAI's GPT-4o to revolutionise cancer care! Their are rolling out a copilot application to merge patient data with clinical insights, creating personalized treatment plans and accelerating diagnosis. The results so far? Identify 4x more missing labs, imaging, biopsy and pathology results + 5 minutes for clinicians to analyze patient records and identify gaps
Nice new GPT4-based imaging tool to help you create an imaging tool widget with just a prompt! How is it useful? You can ask it to segment cell nuclei from images!
National Institute for Health and Care Excellence and NHS are exploring to apply AI at a national scale to pinpoint early sign of cardiac problems from CT scans. Early finding found that clinicians decided to change a patient’s treatment in 45% of cases after seeing the insights from AI tools!
Don’t forget to take our short survey to help us understand how you currently use AI in your day-to-day!