Inworld AI Elevates Real-Time Voice Agents with Groundbreaking TTS-1.5 Release

Inworld AI has launched TTS-1.5, a significant enhancement to its text-to-speech (TTS) technology, designed specifically for real-time voice agents. This upgrade within the TTS-1 series prioritizes demanding requirements for low latency, high audio fidelity, and cost-effectiveness. The system is recognized as a top-tier text-to-speech solution by Artificial Analysis, offering enhanced expressiveness and greater stability compared to previous versions, making it suitable for extensive consumer applications.

Optimized for Real-Time Responsiveness

A primary focus of TTS-1.5 is its P90 time to first audio latency, a critical indicator of user-perceived speed. The TTS-1.5 Max model achieves a P90 latency under 250 milliseconds, while TTS-1.5 Mini goes even further, dropping below 130 milliseconds. These figures represent an approximate fourfold speed improvement over Inworld's previous TTS generation.

The architecture supports streaming via WebSocket, allowing audio synthesis and playback to commence almost instantly upon generation of the first chunk. This capability helps maintain overall interaction latency consistent with typical real-time language model responses, which is crucial for integrated agent pipelines. Inworld typically advises the TTS-1.5 Max for most uses, as it balances latency around 200 ms with superior stability and audio fidelity. The TTS-1.5 Mini is tailored for extremely latency-sensitive scenarios, such as interactive gaming or ultra-responsive conversational AI, where every millisecond is vital.

Enhanced Expression and System Stability

Building upon its predecessor, TTS-1.5 delivers approximately a 30 percent increase in expressive range and around 40 percent better stability. Expressiveness encompasses elements such as prosody, emphasis, and emotional nuances, allowing for more natural and engaging conversations. Stability metrics, including word error rate and consistent output across varied and long prompts, have also seen substantial improvements. Reducing the word error rate mitigates common issues like truncated sentences, unintentional word substitutions, or audio artifacts, which is particularly beneficial when the TTS output directly originates from generated language model text.

Cost-Effective for Mass Deployment

The pricing structure for TTS-1.5 is designed for consumer-scale applications, offering two main configurations. Inworld TTS-1.5 Mini is priced at $5 per one million characters, equating to roughly $0.005 per minute of spoken audio. The TTS-1.5 Max costs $10 per one million characters, or approximately $0.01 per minute. This strategic pricing model ensures that text-to-speech integration remains economically viable for high-usage products, such as AI companions, educational platforms, or customer support lines, without becoming a prohibitive operational expense.

Extensive Language Support and Voice Cloning

TTS-1.5 provides robust multilingual capabilities, supporting 15 languages. This comprehensive list includes English, Spanish, French, Korean, Dutch, Chinese, German, Italian, Japanese, Polish, Portuguese, Russian, Hindi, Arabic, and Hebrew, enabling a single TTS pipeline to serve diverse global markets.

The system also features both instant and professional voice cloning. Instant voice cloning can generate a custom voice from merely 15 seconds of audio, accessible directly through Inworld’s portal and API. For branded voices or less common accents, professional voice cloning requires a minimum of 30 minutes of clean audio, with 20 minutes or more recommended for optimal results.

Flexible Deployment and Integration

For deployment flexibility, TTS-1.5 is available as a cloud API and as an on-premise solution. The on-premise option allows the full model to operate within a customer's own infrastructure, addressing specific data sovereignty and compliance requirements. Both deployment methods maintain the same high-quality profile. The models are also designed for seamless integration with partner platforms like LiveKit, Pipecat, and Vapi, facilitating comprehensive end-to-end voice agent stacks.

Optimized for Real-Time Responsiveness

Enhanced Expression and System Stability

Cost-Effective for Mass Deployment

Extensive Language Support and Voice Cloning

Flexible Deployment and Integration

Inworld AI Elevates Real-Time Voice Agents with Groundbreaking TTS-1.5 Release

Optimized for Real-Time Responsiveness

Enhanced Expression and System Stability

Cost-Effective for Mass Deployment

Extensive Language Support and Voice Cloning

Flexible Deployment and Integration

Latest News

From Political Chaos to Policy Crossroads: Albanese Navigates Shifting Sands

Historic Reimagining: Barnsley Crowned UK's First 'Tech Town' with Major Global Partnerships

OpenClaw: Viral AI Assistant's Autonomy Ignites Debate Amidst Expert Warnings

Adobe Sunsets Animate: A Generative AI Strategy Claims a Legacy Tool

Palantir CEO Alex Karp: ICE Protesters Should Demand More AI Surveillance

More News

East London Cafe Transforms Orders into Conversations, Fostering Connection Through British Sign Language

Generative AI Transforms Customer Segmentation, Bridging the Gap Between Data and Actionable Strategy

AI Unlocks Self-Healing Interfaces: The Future of Automated UI/UX Optimization

Inworld AI Elevates Real-Time Voice Agents with Groundbreaking TTS-1.5 Release

Optimized for Real-Time Responsiveness

Enhanced Expression and System Stability

Cost-Effective for Mass Deployment

Extensive Language Support and Voice Cloning

Flexible Deployment and Integration

Latest News

From Political Chaos to Policy Crossroads: Albanese Navigates Shifting Sands

Historic Reimagining: Barnsley Crowned UK's First 'Tech Town' with Major Global Partnerships

OpenClaw: Viral AI Assistant's Autonomy Ignites Debate Amidst Expert Warnings

Adobe Sunsets Animate: A Generative AI Strategy Claims a Legacy Tool

Palantir CEO Alex Karp: ICE Protesters Should Demand More AI Surveillance

More News

East London Cafe Transforms Orders into Conversations, Fostering Connection Through British Sign Language

Generative AI Transforms Customer Segmentation, Bridging the Gap Between Data and Actionable Strategy

AI Unlocks Self-Healing Interfaces: The Future of Automated UI/UX Optimization

Inworld AI Elevates Real-Time Voice Agents with Groundbreaking TTS-1.5 Release

Optimized for Real-Time Responsiveness

Enhanced Expression and System Stability

Cost-Effective for Mass Deployment

Extensive Language Support and Voice Cloning

Flexible Deployment and Integration

Latest News

From Political Chaos to Policy Crossroads: Albanese Navigates Shifting Sands

Historic Reimagining: Barnsley Crowned UK's First 'Tech Town' with Major Global Partnerships

OpenClaw: Viral AI Assistant's Autonomy Ignites Debate Amidst Expert Warnings

Adobe Sunsets Animate: A Generative AI Strategy Claims a Legacy Tool

Palantir CEO Alex Karp: ICE Protesters Should Demand *More* AI Surveillance

More News

East London Cafe Transforms Orders into Conversations, Fostering Connection Through British Sign Language

Generative AI Transforms Customer Segmentation, Bridging the Gap Between Data and Actionable Strategy

AI Unlocks Self-Healing Interfaces: The Future of Automated UI/UX Optimization

Inworld AI Elevates Real-Time Voice Agents with Groundbreaking TTS-1.5 Release

Optimized for Real-Time Responsiveness

Enhanced Expression and System Stability

Cost-Effective for Mass Deployment

Extensive Language Support and Voice Cloning

Flexible Deployment and Integration

Latest News

From Political Chaos to Policy Crossroads: Albanese Navigates Shifting Sands

Historic Reimagining: Barnsley Crowned UK's First 'Tech Town' with Major Global Partnerships

OpenClaw: Viral AI Assistant's Autonomy Ignites Debate Amidst Expert Warnings

Adobe Sunsets Animate: A Generative AI Strategy Claims a Legacy Tool

Palantir CEO Alex Karp: ICE Protesters Should Demand *More* AI Surveillance

More News

East London Cafe Transforms Orders into Conversations, Fostering Connection Through British Sign Language

Generative AI Transforms Customer Segmentation, Bridging the Gap Between Data and Actionable Strategy

AI Unlocks Self-Healing Interfaces: The Future of Automated UI/UX Optimization

Palantir CEO Alex Karp: ICE Protesters Should Demand More AI Surveillance

Palantir CEO Alex Karp: ICE Protesters Should Demand More AI Surveillance