FutureHouse has developed ether0, a 24B-parameter reasoning language model that solves complex chemistry problems by generating explicit, step-by-step reasoning traces before delivering answers. Trained via reinforcement learning on 640,730 experimentally grounded chemistry tasks, ether0 outperforms frontier LLMs, specialized models, and human experts in molecular design while achieving unprecedented data efficiency.
Key Innovations and Capabilities:
1. Scientific Reasoning Beyond Math/Programming
Ether0 extends chain-of-thought reasoning—previously effective in mathematics and programming—to chemistry. It outputs SMILES strings for molecular structures while "thinking aloud" in natural language, enabling transparent solutions for tasks like retrosynthesis, solubility editing, and functional group design.
Outperforms Claude 3.7, GPT-4.5, and chemistry-specific models (e.g., ChemDFM) on open-answer tasks by 12–28% accuracy.
2. Data Efficiency
Achieves 70% accuracy in reaction prediction with only 46,000 training examples, surpassing the Molecular Transformer (64% accuracy) trained on 480,000 reactions.
Requires no domain-specific pretraining, reducing data needs by 3–5× compared to contemporary models.
3. Training Pipeline
Multi-stage RL: Combines supervised fine-tuning on reasoning traces, task-specific reinforcement learning (Group Relative Policy Optimization), distillation into a generalist model, and safety alignment.
Advantage-based curriculum: Prioritizes challenging problems during training, boosting learning efficiency by 40%.
Safety-first: Refuses 80% of unsafe chemistry queries (e.g., designing explosives) without compromising core task performance.
4. Applications in Drug Discovery
Solves 375 subtasks across 18 categories, including:
Molecular optimization: Modify solubility, blood-brain barrier permeability, or receptor binding.
Inverse problems: Design molecules meeting specific criteria (e.g., "Create a C16H12O7 compound from Plumbago spp.").
Synthesis planning: Predict reaction products or viable single-step retrosyntheses.
Excels in property-driven design, leveraging experimentally verified data from ChEMBL, COCONUT, and PubChem.
5. Limitations and Future Work
Focused on organic chemistry (SMILES strings); struggles with inorganic tasks like crystal structure generation.
Lacks multi-turn conversation capabilities of the base model (Mistral-Small-24B-Instruct).
Future versions will integrate tool usage (e.g., PubChem lookup) and expand to biochemical modalities.
Why It Matters
Ether0 proves that reasoning models can tackle scientific "inverse problems"—where designing solutions (e.g., a drug molecule) is harder than evaluating them. By releasing model weights, benchmarks, and reward functions, FutureHouse aims to accelerate AI-driven discovery across chemistry, materials science, and biotechnology.
Share this post