UK listeners spot AI voices faster & trust them less
Vocal Image has published a large-scale study of public reactions to synthetic speech. It found that UK listeners are more likely than Americans to detect AI-generated voices, and that recognition tends to reduce approval.
The month-long evaluation drew on feedback and listening behaviour from more than 10,000 participants. It compared 20 text-to-speech models from a range of suppliers, including ElevenLabs, Cartesia, WellSaid, Google and Amazon.
Alongside overall rankings, the study assessed models across 18 perceptual attributes, including "warm", "monotonous" and "mumbled". Participants could like, dislike, skip and evaluate voices while listening. Vocal Image said users were not told they were hearing synthetic speech.
Trust effect
A central result was what Vocal Image described as a "passive rejection" pattern: approval rates dropped as soon as listeners identified a voice as machine-generated. The study reported a -0.802 correlation between AI detection and user approval.
It also reported a wide gap in perceived quality between providers, with a threefold difference between the highest- and lowest-performing models in user perception and engagement.
The findings come as synthetic voice tools spread across customer service, media production and social platforms. Companies are also experimenting with natural-sounding speech for IVR systems and virtual agents, amid increased consumer awareness of AI-generated content.
UK ears
Native English speakers from the UK were most likely to recognise synthetic speech. UK listeners were 13% more likely than US listeners to spot an AI voice.
The research also suggested regional differences in acceptance. EU listeners were most likely to say they liked AI voices overall, according to results shared by Vocal Image.
The study pointed to a framing effect: people tended to respond positively to voices until they were told the audio was AI-generated, after which trust fell.
Provider rankings
In the overall leaderboard, all top 10 positions went to newer AI platforms and specialist text-to-speech suppliers, while large cloud and technology groups trailed. Vocal Image highlighted MiniMax and Deepgram among emerging platforms, and WellSaid Labs and LovoAI among specialist providers.
A Chinese supplier topped the rankings for both UK and US listeners. MiniMax ranked number one with participants in both markets.
The research also found exceptions where detection did not automatically lead to dislike. According to Vocal Image, ElevenLabs and Descript "managed to break the pattern", producing outputs that listeners continued to like despite identifying them as robotic.
The company said the results did not match popular attention in the AI market, claiming that "every second user" in its dataset dislikes OpenAI's voice. It also said UK users prefer Chinese AI voices over US and EU alternatives.
Use-case fit
Vocal Image positioned the work as a guide to selecting voice models for specific applications, noting that a voice that works well for an audiobook may not suit short-form video formats such as TikTok.
Rather than relying on simple A/B preference tests, it said its approach combined direct feedback with observed listening behaviour, including skips. The study used a ranking system derived from its Voice Arena tool, where app users evaluate voices in exercises focused on vocal confidence and related soft skills.
Vocal Image operates a soft-skills learning platform and said it uses a proprietary dataset of more than two million unique human voices. It also said it is operating at a USD $16 million annual recurring revenue scale and recently raised a USD $3.6 million seed round led by Educapital.
Nick Lahoika, chief executive officer and founder, said the issue poses a commercial risk for organisations deploying synthetic voices at scale.
"While switching to a specialized TTS takes resources, choosing the wrong provider is becoming a critical brand liability - especially for products built on trust. The reality is simple: people still don't trust bad AI voices," said Nick Lahoika, CEO and founder, Vocal Image.
Vocal Image said enterprises often stick with large cloud providers for operational reasons, but expects more scrutiny of voice choices as synthetic speech becomes more common in customer-facing products.