What task in your drug discovery work is the most time consuming?
NVIDIA’s La-Proteina is a new state-of-the-art generative model for protein design, capable of jointly generating all-atom 3D protein structures and their amino acid sequences.
By combining explicit backbone generation with latent variable modelling of side chains and sequence, La-Proteina directly tackles one of protein design’s most complex challenges: the co-design of atomistic structure and sequence at scale.
Key Innovations and Capabilities:
1. Predict-Decode-Generate Capabilities
Predict: La-Proteina predicts atomically precise protein structures by modelling the joint distribution of α-carbon backbone coordinates and per-residue latent vectors that encode both amino acid identity and side-chain geometry.
Decode: A transformer-based decoder maps latent variables and α-carbons into full-atom structures and amino acid sequences, using the Atom37 format for compatibility with structural biology standards.
Generate: New protein sequences and structures are synthesised by sampling from a latent Gaussian distribution and integrating a flow matching model to reconstruct α-carbon geometry and atomistic details.
2. Partially Latent, Long-Range, Residue-Level Representation
Maintains explicit control over α-carbon backbones for geometric accuracy across long proteins.
Encodes variable-length, mixed-type side chains into fixed-size, continuous latent vectors.
Supports residue-level independence during generation, allowing for structural and sequence diversity without collapse.
Enables localised edits and flexible conditional design tasks, such as motif scaffolding.
3. Scalable, Transformer-Based Framework
Trained on 46 million sequence-structure pairs from the AlphaFold Database.
Uses efficient transformer architectures with pair-biased attention and no triangular update layers, reducing memory usage.
Employs flow matching in latent and coordinate space to generate structures from noise.
Allows separate generation schedules for backbone and latent components, improving fidelity and control.
4. Applications in Protein Design
All-Atom Generation: La-Proteina produces physically realistic, diverse proteins with sequences that fold into the predicted structures.
Motif Scaffolding: Performs indexed and unindexed scaffolding with either full-atom or tip-atom motifs, including multi-segment active sites.
Enzyme and Binder Design: Enables insertion and arrangement of functional motifs with high geometric accuracy and structural realism.
Controllable Editing: Latent perturbations affect only the associated residue, allowing for precise, interpretable manipulation of structure and sequence.
5. Performance and Validation
Co-designability: Up to 75 percent of samples pass the strict test where the generated sequence folds back into the generated structure with under 2 Å all-atom RMSD. The next best method achieves 36 percent.
Structural Validity: La-Proteina produces the best MolProbity scores across all tested lengths. Clash scores, bond geometry, rotamer distributions, and angle outliers all closely match experimental structures.
Motif Scaffolding: Outperforms all baselines across 26 benchmark tasks. Solves 21 to 25 tasks depending on setup, including unindexed and tip-atom conditions. The strongest baseline solves only 4.
Scalability: Successfully generates atomically detailed proteins up to 800 residues. Competing methods fail to scale beyond 500 residues due to memory or modelling limitations.
Speed: Can generate a 500-residue, co-designable protein in approximately 6 seconds. At shorter lengths and in batch mode, generation takes less than 1 second per sample.
6. Limitations and Future Work
Current focus:
The model is trained on monomeric proteins only. It is not yet capable of modelling multimeric complexes or protein-protein interactions.
Known challenges:
Expanding to protein complexes, incorporating functional annotations, and modelling binding or enzymatic activity will require further innovation.
Roadmap:
Extend to multi-chain generation, improve integration with conservation and function data, and support condition-specific or task-driven scaffolding.
Why It Matters
La-Proteina marks a shift in generative protein modelling. By combining explicit backbone geometry with continuous latent representations of side chains and sequences, it enables true joint design of full protein structures and sequences at atomic detail. This unlocks high-impact applications such as binder design, active-site engineering, and scaffold generation. It also lays a scalable foundation for future AI-driven tools in precision protein design.
Visit the project page: https://research.nvidia.com/labs/genair/la-proteina
Share this post