Kiin Bio Weekly
Kiin Bio Weekly
ATOMICA: A Universal Database for Decoding Molecular Interactions in Drug Discovery
0:00
-20:04

ATOMICA: A Universal Database for Decoding Molecular Interactions in Drug Discovery

Harvard researchers unveil ATOMICA, a transformative database and geometric deep learning framework for modeling atomic-scale intermolecular interactions across diverse biomolecules. Built on 2 million+ experimentally validated complexes, ATOMICA bridges structural biology and AI to accelerate drug discovery and functional annotation.

Core Innovations:

Universal Interaction Modeling:

  • First unified representation of interactions across proteins, nucleic acids, small molecules, and metal ions

  • Hierarchical SE(3)-equivariant GNNs capture atomic, block (e.g., amino acids), and interface-level features

  • Trained on 2,037,972 complexes (Cambridge Structural Database, PDB, Q-BioLiP) with 8Å interaction interfaces

Self-Supervised Learning:

  • Pretrained via denoising rigid transformations and masked block prediction (AUPRC up to 0.71)

  • Recovers physicochemical patterns (hydrogen bonds, π-stacking) without labeled data

  • Latent space organizes by chemical properties (periodic table alignment) and interaction types

Disease Pathway Discovery:

  • Constructed ATOMICANets: Modality-specific networks linking proteins by interaction similarity (ions, lipids, nucleic acids, etc.)

  • Identified disease modules for 27 conditions, including autoimmune neuropathies and lymphoma

  • Predicted high-confidence targets (e.g., Kv1 channels in multiple sclerosis, Zn²⁺-finger proteins in leukemia)

Dark Proteome Annotation:

  • Predicted 2,646 uncharacterized binding sites in structurally novel proteins

  • Discovered bacterial zinc finger motifs (HEXXH) and transmembrane cytochrome subunits validated by AlphaFold3 (ipTM > 0.7)

  • Enables functional annotation of ancient protein families across Bacteria, Archaea, and Eukarya

Technical Highlights:
✅ Multi-scale embeddings (atom → block → interface) with compositional algebra properties
✅ Outperforms ESM-2 in zero-shot binding residue identification (2.7 vs. 2.4 precision@10)
✅ Scales with dataset size, showing 190% AUPRC gains on low-data modalities (e.g., protein-DNA)

Applications:

  • Target discovery via shared interaction interfaces

  • Mechanism elucidation through multimodal interaction networks

  • Functional annotation of evolutionary distant proteins

  • Structure-based drug design for understudied targets

Access & Tools:

Limitations & Future Work:

  • Relies on high-confidence structural predictions (pLDDT > 70)

  • Limited coverage of disordered protein regions

  • Planned integration with experimental interaction data (e.g., SPR, DEL)

ATOMICA redefines how we model molecular interactions, offering a systematic framework to decode the "interaction language" of life. By bridging structural biology, AI, and disease mechanisms, it opens new frontiers in drug discovery and proteome annotation.

Explore the blog: zitniklab.hms.harvard.edu/projects/ATOMICA

Discussion about this episode