In this issue:
Welcome back to your weekly dose of AI news for Life Science!
This week, we have some exciting new models lined up for you:
Dive into these game-changing innovations and explore how they are transforming the biotech and healthcare landscapes!
scGPT-spatial: Continual Pretraining of Single-Cell Foundation Model for Spatial Transcriptomics 🧬
In early 2024, Bo Wang’s lab introduced scGPT, one of the first foundation models for single cell data. Spatial transcriptomics is next-level complex—not only must we model single-cell/spot profiles, but we also need to capture intricate spatial relationships while handling diverse sequencing protocols (imaging-based vs. sequencing-based). Now, the same lab presented scGPT-spatial, an extension of the previous model adapted to spatial transcriptomics data.
🔨 Applications:
scGPT-spatial can be used for a variety of activities, including cell-type deconvolution (e.g. predicting cell types in a specific spatial spot), and gene expression imputation!
📌 Key Insights:
Based on scGPT and adapted to spatial single-cell data, providing spatial context in tissues.
Incorporates both imaging-based (MERFISH, Xenium) and sequencing-based (Visium, Visium HD) technologies. Based on a 30M cell training dataset (SpatialHuman30M Dataset) across 821 slides.
Primarily a Decoder-Only architecture, using masked attention to predict missing gene expression while predicting spatial context. Introduces a Mixture of Expert Decoder architecture to capture modality-specific features.
Input Embeddings are based on gene names, expression values (binned), a spatial spot token (representing the aggregated gene expression of the spatial location or cell), and a modality token (representing the spatial technology used).
Spatially-Aware Sampling: Groups spatially neighboring spots into local "patches," allowing the model to predict missing gene expression based on nearby cells. Avoids explicit coordinate encoding, improving generalizability across tissue slides.
FoldSeek-Multimer - Fast protein alignment 🏃♂️➡️
The field of structural biology is rapidly advancing, with new computational tools making it easier to analyze and compare protein complexes. Understanding how proteins interact is essential for drug discovery, functional genomics, and biomolecular engineering, but traditional methods for aligning protein complexes are computationally expensive and time-consuming. Foldseek-Multimer is an innovative AI-powered framework designed to dramatically accelerate protein complex alignment, making it thousands of times faster than existing tools while maintaining high accuracy.
🔨 Applications:
Align proteins, important for detecting distant evolutionary relationship between proteins and to build 2d structure with tools like AlphaFold
📌 Key Insights:
Unmatched Speed: Can compare billions of protein complex pairs in just 11 hours, making it 3–4 orders of magnitude faster than US-align.
High Accuracy: Detects structural similarities in >95% of cases, achieving performance comparable to state-of-the-art methods.
Better Sensitivity: Identifies structurally similar complexes with low sequence identity, making it ideal for metagenomic studies.
User-Friendly & Open-Source: Available on GitHub and web platforms, allowing easy integration into existing research workflows.
DockCADD: A streamlined in silico pipeline for the identification of potent ribosomal S6 Kinase 2 (RSK2) inhibitors 💻
Molecular docking is a crucial step in computational drug discovery, often time-consuming and complex. This process is essential for identifying potential drug candidates but can be a bottleneck in preclinical studies. Introducing DockCADD, an open source tool designed to streamline the docking process. It integrates various tools to automate receptor and ligand preparation, requiring only a ligand SMILES code and a PDB ID to initiate the docking simulation
🔨 Applications:
Streamline docking to predict protein-ligand interactions
📌 Key Insights:
Requires minimal input from users and utilizes advanced tools like Vina scoring to provide accurate docking results.
Accelerates the drug discovery process by automating multiple steps
Successfully applied in screening RSK2 inhibitors for cancer therapy, and results should be validated experimentally in future work
Did you find this newsletter insightful? Share it with a colleague!
Subscribe Now to stay at the forefront of AI in Life Science.
Connect With Us
Have questions or suggestions? We'd love to hear from you!
📧 Email Us | 📲 Follow on LinkedIn | 🌐 Visit Our Website