I ported Anthropic’s official skill-creator from Claude Code to OpenCode — now you can create and evaluate AI agent skills with any model

Hey r/LocalLLaMA — I open-sourced a tool that brings eval-driven development to AI agent skills. It's based on Anthropic's official skill-creator for Claude Code, but rewritten in TypeScript to work with OpenCode (which supports 300+ models including local ones).

The problem: creating skills for AI agents is trial-and-error. You write a skill, test it manually, and hope it triggers on the right prompts. There's no systematic way to measure if a skill works.

What this does:

Guided skill creation with an intake interview
Auto-generates eval test sets (should-trigger and should-not-trigger queries)
Runs evals with and without the skill to measure trigger accuracy
Optimizes skill descriptions through an iterative LLM loop (60/40 train/test split, up to 5 iterations)
Visual HTML eval viewer for human review
Benchmarks with variance analysis across iterations

The most interesting part for this community: it works with any of OpenCode's supported models. If you're running local models through OpenCode, you can use this tool with them.

One-command install:

npx opencode-skill-creator install --global

Apache 2.0 license. Based on Anthropic's skill-creator with attribution.

GitHub: https://github.com/antongulin/opencode-skill-creator

npm: https://www.npmjs.com/package/opencode-skill-creator

Happy to answer questions about the eval methodology, local model support, or architecture.

submitted by /u/antonusaca
[link] [comments]

Leave a Comment