Ask HN: How are you evaluating AI apps and CLI?

I'm sure many of you work for companies where various AI tools are being made available and IT departments asking for feedback on those tools. The IT departments are allocating in some cases unlimited budget in the hopes that something comes out as a winner and sticks out eventually...

For example the models from Anthropic, OpenAI, Google etc. can be accessed via: - IDE integration, e.g. VS Code, JetBrains etc. - Dedicated apps and CLIs, e.g. Codex, Claude, Copilot CLI etc.

It's already bad enough that SWE orgs are struggling to quantify the strength weaknesses of the models themselves and now we have their integration/entry points to test out too and I'm not sure how we can even being to systematically evaluate these tools...

How are you approaching this? What's worked for you and what's not?

Comments URL: https://news.ycombinator.com/item?id=47893745

Points: 2

# Comments: 0

Leave a Comment