The Protein Design Archive (PDA), a comprehensive database and analysis platform, has been launched to catalog and analyze de novo designed proteins with structural validation. As of 2025, the PDA contains over 1,500 structurally characterized designs, including unpublished historic examples, offering scientists a unified resource to explore four decades of progress in protein engineering. This open-access tool aims to streamline drug discovery by enabling researchers to learn from past successes, identify design trends, and develop novel therapeutic proteins or enzymes.
Key Features of the PDA:
Comprehensive Repository:
Curates synthetic protein structures from the RCSB PDB, filtered by synthetic origin and manually validated for quality.
Includes metadata such as sequence, 3D structure, publication source, and similarity scores to natural proteins.
Structured Analysis Tools:
Interactive timeline (1991–2024) visualising the exponential growth of designs, categorised by dominant methodologies:
Minimal/rational design (1997–2010): 9.88 designs/year (R2=97%R2=97%).
Physics-based computational design (2010–2021): 78.61 designs/year (R2=99%R2=99%).
Deep learning-driven design (2021–present): 226 designs/year (R2=99%R2=99%).
Filters for novelty, release date, and sequence/structure similarity to natural proteins.
Insights into Design Evolution:
Reveals biases in designed proteins (e.g., increased α-helical content vs. natural proteins) and trends toward greater structural complexity.
Quantifies progress using metrics like MMseqs2 sequence similarity scores and Foldseek structural similarity (LDDT scores).
Community-Driven Resource:
Regularly updated with new designs and analysis tools.
Open-source code for data processing and curation available on GitHub.
Applications in Drug Discovery:
Scaffold Identification: Rapidly browse validated protein structures for therapeutic or enzyme design.
Benchmarking: Compare new AI/ML design tools against historical datasets.
Functional Insights: Analyze sequence-structure relationships to guide engineering of binders, catalysts, or stable biologics.
Educational Use: Study the transition from rational design to AI-driven methods to inform future strategies.
Limitations and Future Directions:
Current Scope: Limited to structural data; excludes design protocols or biochemical validation (e.g., binding affinity).
Planned Expansions: Incorporate design workflows, sequence libraries, and experimental data to capture the full design lifecycle.
Improved Filtering: Address ambiguities in “de novo design” definitions with user-customizable filters.
Conclusion:
The PDA provides an unprecedented window into the past, present, and future of protein design. By unifying decades of structural data, it empowers researchers to accelerate the development of proteins for drug delivery, enzyme engineering, and beyond. As the field shifts toward AI-driven design, the PDA will serve as a critical benchmark for innovation.
Access the PDA:
Explore the database at https://pragmaticprotein-design.bio.ed.ac.uk/pda/. Data and curation code are freely available on GitHub.