Disclosure first: I work on community at MiroMind.
One of our researchers just dropped the full MOOSE-Star collection on Hugging Face — a 7B model post-trained for scientific hypothesis discovery, plus the dataset behind it. Paper accepted at ICML 2026.
🤗 Collection: https://huggingface.co/collections/ZonglinY/moose-star-models-and-data
Inside:
- MS-IR-7B / MS-HC-7B / MS-7B: 7B models for inspiration retrieval, hypothesis composition, and joint use. Base: DeepSeek-R1-Distill-Qwen-7B.
- TOMATO-Star: 108,717 NCBI papers decomposed into (background, hypothesis, inspirations), every inspiration anchored to a real citation. Covers biology, chemistry, medicine, medical imaging, psychology, cognitive science. ~38,400 A800 GPU-hours of preprocessing went into building it.
- Strict temporal split for evaluation: train ≤ Sep 2025, test = Oct 2025 (after the base model's knowledge cutoff).
Inspiration retrieval accuracy
| Model | IR accuracy |
|---|---|
| Random Selection | 6.70% |
| R1-Distilled-Qwen-7B (base) | 28.42% |
| Claude Sonnet 4.6 | 45.02% |
| DeepSeek-R1 | 45.11% |
| Gemini-3 Flash | 51.44% |
| GPT-5.4 | 51.50% |
| MS-7B (7B, joint IR + HC) | 54.34% |
| MS-IR-7B (7B, IR-only) | 54.37% |
| Gemini-3 Pro | 54.89% |
Locally: it's a standard DeepSeek-R1-Distill-Qwen-7B fine-tune, so anything that runs that runs this — llama.cpp / vLLM / SGLang all fine. ~14GB at fp16, single 24GB card territory. Apache-2.0 code, CC-BY-4.0 data.
Stress-test it, anything! Qestions or any views welcomed below!
📄 https://arxiv.org/abs/2603.03756
💻 https://github.com/ZonglinY/MOOSE-Star
[link] [comments]