Noodler

You can't improve what you can't see.

Teams are spending more on tokens, infrastructure, and experimentation than ever before – but progress is often slow, opaque, and frustrating. Without clear feedback loops, it's hard to know what's working, what isn't, or how to improve reliably.

A recent MIT study found that 95% of organizations report little to no measurable ROI from their GenAI initiatives.

The problem isn't ambition. It's visibility.

AI applications behave differently from traditional software. LLM outputs are probabilistic, context-dependent, and difficult to evaluate with conventional metrics.

We've built and shipped AI systems ourselves – and we've felt how easy it is to optimize the wrong thing, or rely on intuition instead of evidence.

Noodler is an open-source suite of evaluation tools for AI applications.

Our first product tackles observability, you can't improve what you can't see.

See what we're building here.