Harvard researchers unveil ATOMICA, a transformative database and geometric deep learning framework for modeling atomic-scale intermolecular interactions across diverse biomolecules. Built on 2 million+ experimentally validated complexes, ATOMICA bridges structural biology and AI to accelerate drug discovery and functional annotation.
Core Innovations:
Universal Interaction Modeling:
First unified representation of interactions across proteins, nucleic acids, small molecules, and metal ions
Hierarchical SE(3)-equivariant GNNs capture atomic, block (e.g., amino acids), and interface-level features
Trained on 2,037,972 complexes (Cambridge Structural Database, PDB, Q-BioLiP) with 8Å interaction interfaces
Self-Supervised Learning:
Pretrained via denoising rigid transformations and masked block prediction (AUPRC up to 0.71)
Recovers physicochemical patterns (hydrogen bonds, π-stacking) without labeled data
Latent space organizes by chemical properties (periodic table alignment) and interaction types
Disease Pathway Discovery:
Constructed ATOMICANets: Modality-specific networks linking proteins by interaction similarity (ions, lipids, nucleic acids, etc.)
Identified disease modules for 27 conditions, including autoimmune neuropathies and lymphoma
Predicted high-confidence targets (e.g., Kv1 channels in multiple sclerosis, Zn²⁺-finger proteins in leukemia)
Dark Proteome Annotation:
Predicted 2,646 uncharacterized binding sites in structurally novel proteins
Discovered bacterial zinc finger motifs (HEXXH) and transmembrane cytochrome subunits validated by AlphaFold3 (ipTM > 0.7)
Enables functional annotation of ancient protein families across Bacteria, Archaea, and Eukarya
Technical Highlights:
✅ Multi-scale embeddings (atom → block → interface) with compositional algebra properties
✅ Outperforms ESM-2 in zero-shot binding residue identification (2.7 vs. 2.4 precision@10)
✅ Scales with dataset size, showing 190% AUPRC gains on low-data modalities (e.g., protein-DNA)
Applications:
Target discovery via shared interaction interfaces
Mechanism elucidation through multimodal interaction networks
Functional annotation of evolutionary distant proteins
Structure-based drug design for understudied targets
Access & Tools:
Database: 2M+ interaction complexes on Harvard Dataverse
Code/Models: GitHub | Hugging Face
Interactive Tutorials: Binding site prediction, interface similarity analysis
Limitations & Future Work:
Relies on high-confidence structural predictions (pLDDT > 70)
Limited coverage of disordered protein regions
Planned integration with experimental interaction data (e.g., SPR, DEL)
ATOMICA redefines how we model molecular interactions, offering a systematic framework to decode the "interaction language" of life. By bridging structural biology, AI, and disease mechanisms, it opens new frontiers in drug discovery and proteome annotation.
Explore the blog: zitniklab.hms.harvard.edu/projects/ATOMICA
Share this post