MOOSE-Star (ICML 2026): 7B model + 108K-paper dataset for scientific hypothesis discovery

Disclosure first: I work on community at MiroMind.
One of our researchers just dropped the full MOOSE-Star collection on Hugging Face — a 7B model post-trained for scientific hypothesis discovery, plus the dataset behind it. Paper accepted at ICML 2026.

🤗 Collection: https://huggingface.co/collections/ZonglinY/moose-star-models-and-data

Inside:

MS-IR-7B / MS-HC-7B / MS-7B: 7B models for inspiration retrieval, hypothesis composition, and joint use. Base: DeepSeek-R1-Distill-Qwen-7B.
TOMATO-Star: 108,717 NCBI papers decomposed into (background, hypothesis, inspirations), every inspiration anchored to a real citation. Covers biology, chemistry, medicine, medical imaging, psychology, cognitive science. ~38,400 A800 GPU-hours of preprocessing went into building it.
Strict temporal split for evaluation: train ≤ Sep 2025, test = Oct 2025 (after the base model's knowledge cutoff).

Inspiration retrieval accuracy

Model	IR accuracy
Random Selection	6.70%
R1-Distilled-Qwen-7B (base)	28.42%
Claude Sonnet 4.6	45.02%
DeepSeek-R1	45.11%
Gemini-3 Flash	51.44%
GPT-5.4	51.50%
MS-7B (7B, joint IR + HC)	54.34%
MS-IR-7B (7B, IR-only)	54.37%
Gemini-3 Pro	54.89%

Locally: it's a standard DeepSeek-R1-Distill-Qwen-7B fine-tune, so anything that runs that runs this — llama.cpp / vLLM / SGLang all fine. ~14GB at fp16, single 24GB card territory. Apache-2.0 code, CC-BY-4.0 data.

Stress-test it, anything! Qestions or any views welcomed below!

📄 https://arxiv.org/abs/2603.03756
💻 https://github.com/ZonglinY/MOOSE-Star

submitted by /u/MiroMindAI
[link] [comments]

Leave a Comment