Author name: Ranit Karmakar, Jayita Chatterjee

What Single-Prompt Accuracy Misses: A Multi-Variant Reliability Audit of Language Models

Ranit Karmakar, Jayita Chatterjee / May 5, 2026

arXiv:2605.02038v1 Announce Type: new
Abstract: Single-prompt accuracy is the dominant way to benchmark language models, but it can miss reliability failures that matter. We evaluate a 15-model open-weight corpus, with the main reliability analyses fo…

cs.AI, cs.CL

AgentFloor: How Far Up the tool use Ladder Can Small Open-Weight Models Go?

Ranit Karmakar, Jayita Chatterjee / May 4, 2026

arXiv:2605.00334v1 Announce Type: new
Abstract: Production agentic systems make many model calls per user request, and most of those calls are short, structured, and routine. This raises a practical routing question that existing evaluations do not di…