Exclusive: Grafana Lab's Jen Villa on targeting the AI observability gap
Grafana Labs has introduced a suite of tools to address what it describes as a growing gap in the monitoring and management of artificial intelligence in production environments, as organisations move beyond experimentation and into operational deployment.
Announced today at GrafanaCON 2026 in Barcelona, the updates centre on improving visibility into AI-driven systems, expanding automation capabilities, and integrating observability more directly into developer workflows.
AI Observability in Grafana Cloud
At the core of the release is AI Observability, now available in public preview. The offering is designed to provide real-time monitoring and evaluation of large language model (LLM)-based applications and autonomous agents.
The new tooling aims to surface issues like inconsistent outputs, degraded performance earlier by treating AI interactions such as prompts, responses, and execution paths as observable signals alongside metrics, logs and traces.
This includes continuous evaluation of outputs, with alerts for anomalies such as policy violations or low-quality responses, as well as mechanisms to identify potential risks such as unintended data exposure or irregular usage patterns.
"Think of AI Observability as understand the performance of your AI application," said Jen Villa, Senior Director of Product at Grafana Labs. "What changes with AI applications is there's this additional dimension that you want to be aware of, and that, in the case of AI, is somewhat related to it being non deterministic. In addition to, 'Does my application have enough compute resources, and is it responding quickly enough?' You also want to be aware of the quality of those responses."
o11y-bench
Grafana Labs has also announced the launch of an open-sourced o11y-bench, a benchmarking framework for evaluating the performance of AI agents in observability contexts.
Built to run against real Grafana environments, the benchmark measures how agents perform on practical tasks such as querying telemetry data, investigating incidents, and modifying dashboards. Rather than focusing solely on generated outputs, it assesses actions taken within the system, reflecting the operational nature of observability work.
"We are building an agent to help with the observability use cases to help root cause problems," said Villa. " [Internally], if we make a change to the Grafana assistant, we can do a before and after, and we can say, 'Hey, are we doing better on this consistent set of tests?' So it's just another signal for us."
Grafana Assistant
Grafana has also expanded Grafana Assistant, its AI-powered observability agent, beyond its original cloud-only deployment model. The assistant will now be available in additional environments, including on-premises Grafana Enterprise installations, to address the requirements of organisations with stricter data governance or residency constraints.
The expansion to open-source Grafana users also introduces new capabilities designed to embed the assistant more deeply into operational workflows. These include a dedicated workspace for simultaneous chat and visualisation, an API for integration into external systems, and automation features that allow routine tasks to be scheduled and executed without manual intervention.
"You have this extremely patient buddy to help you with onboarding that can also do a lot of the certain acts around, like creating dashboards that you used to have to do yourself. So, combine that with the ability to come up with a lot of really high-quality hypotheses for why something is going wrong in your software system," said Villa. "We're taking that value that, right now, only cloud users have been able to get, and bringing it to our open source community.'
Additional integrations extend the assistant into collaboration platforms such as Microsoft Teams, while support for more than 50 external tools and multiple native data sources aims to reduce fragmentation across observability stacks.
Grafana Cloud CLI (GCX)
As developers increasingly rely on tools such as AI coding assistants and agent-based workflows, Grafana has positioned GCX to bring observability data into those environments. It allows engineers to query telemetry, configure systems, and invoke assistant capabilities directly from their development interface.
Villa said this reduces the need to switch between dashboards and alerting systems, enabling a more continuous feedback loop between development and production. The tool also supports workflows in which AI agents can correlate live system data with recent code changes and suggest remediation steps.
"Many developers say they're spending the majority of their time in...a developer environment. So this lets experience an interface with Grafana assistant, right in the same place that you're looking at your code," said Villa.