Tooliax Logo
ExploreCompareCategoriesSubmit Tool
News
Tooliax Logo
ExploreCompareCategoriesSubmit Tool
News
Kyutai Unveils Hibiki-Zero: A Breakthrough in Real-Time AI Speech Translation Without Aligned Data
Back to News
Saturday, February 14, 20264 min read

Kyutai Unveils Hibiki-Zero: A Breakthrough in Real-Time AI Speech Translation Without Aligned Data

Kyutai has announced the release of Hibiki-Zero, a novel artificial intelligence system designed for real-time simultaneous speech-to-speech translation (S2ST) and speech-to-text translation (S2TT). This new model represents a significant step forward in AI translation by proficiently handling non-monotonic word dependencies and, crucially, operating without requiring word-level aligned data for its training process.

Historically, developing robust AI translation systems has relied heavily on supervised training methods that demand precise word-level alignments. Acquiring these granular alignments at scale has proven to be an arduous and resource-intensive task, often necessitating the creation of synthetic data or the application of language-specific heuristics. Hibiki-Zero bypasses this major challenge, streamlining the path to broader language support by employing an innovative reinforcement learning (RL) strategy specifically engineered to optimize translation latency.

A Multistream Architectural Approach

Hibiki-Zero is built as a decoder-only model, leveraging a distinctive multistream architecture to process token sequences in a unified manner. This design incorporates three primary data streams:

  • Source Stream: Captures audio tokens from the incoming speech.
  • Target Stream: Generates audio tokens corresponding to the translated output.
  • Inner Monologue: Comprises padded text tokens that align with the target audio.

The system utilizes the Mimi neural audio codec, a causal and streaming solution that converts waveforms into discrete tokens at a rate of 12.5 Hz. These audio streams are then modeled by an integrated RQ-Transformer.

Key architectural specifications for Hibiki-Zero include:

  • A total of 3 billion parameters.
  • A Temporal Transformer with 28 layers and a 2048-dimensional latent space.
  • A Depth Transformer featuring 6 layers per codebook and a 1024-dimensional latent space.
  • A context window spanning 4 minutes.
  • 16 levels of audio codebooks for superior speech quality.

Training Without Granular Interpretation Data

The training regimen for Hibiki-Zero unfolds in two distinct phases:

  • Coarse Alignment Training: The model initially trains using sentence-level aligned data, ensuring that target sentences accurately translate their corresponding source sentences. Researchers implemented a technique involving artificial silence insertion in the target speech to strategically delay its content relative to the source.
  • Reinforcement Learning (RL): The system then employs Group Relative Policy Optimization (GRPO) to refine its translation policy. This stage is crucial for minimizing translation latency while simultaneously maintaining high translation quality.

The RL process is driven by reward signals based solely on the BLEU score, calculating intermediate rewards at various points during translation. A configurable hyperparameter, α, allows for balancing the inherent trade-off between translation speed and accuracy; a lower α setting prioritizes reduced latency, potentially with a slight impact on quality.

Demonstrated Scalability and Performance

Kyutai's researchers showcased Hibiki-Zero's impressive adaptability to new languages. The model was successfully extended to support Italian as an input language with less than 1,000 hours of speech data. This process involved supervised fine-tuning followed by the GRPO optimization stage.

In comparative benchmarks, Hibiki-Zero achieved a quality and latency profile comparable to Meta’s Seamless model. Notably, it surpassed Seamless in speaker similarity by over 30 points.

Hibiki-Zero has established state-of-the-art results across five X-to-English translation tasks, including evaluations on the Audio-NTREX-4L long-form benchmark, which incorporates 15 hours of speech per Text-to-Speech system.

On French language tasks, Hibiki-Zero demonstrated superior performance:

  • ASR-BLEU (Higher is better): 28.7 (Hibiki-Zero) vs. 23.9 (Seamless)
  • Speaker Similarity (Higher is better): 61.3 (Hibiki-Zero) vs. 44.4 (Seamless)
  • Average Lag (LAAL) (Lower is better): 2.3 (Hibiki-Zero) vs. 6.2 (Seamless)

For short-form tasks, specifically Europarl-ST, the model recorded an ASR-BLEU score of 34.6 with an average lag of 2.8 seconds. Human evaluators consistently rated Hibiki-Zero significantly higher than baseline systems for both speech naturalness and voice transfer fidelity.

The release of Hibiki-Zero marks a pivotal moment for simultaneous translation, addressing longstanding data challenges and setting new benchmarks for quality, speed, and cross-language adaptability.

This article is a rewritten summary based on publicly available reporting. For the original story, visit the source.

Source: MarkTechPost
Share this article

Latest News

Unlocking Smart Logistics: AI Agents Deliver Precision Routing for Supply Chains

Unlocking Smart Logistics: AI Agents Deliver Precision Routing for Supply Chains

Feb 22

Microsoft Gaming Unveils Bold New Direction: Phil Spencer Retires, AI Strategist Named CEO

Microsoft Gaming Unveils Bold New Direction: Phil Spencer Retires, AI Strategist Named CEO

Feb 21

Microsoft Appoints AI Visionary Asha Sharma to Lead Xbox, Signaling Major Strategic Shift

Microsoft Appoints AI Visionary Asha Sharma to Lead Xbox, Signaling Major Strategic Shift

Feb 21

Autonomous Vehicles Unmasked: Tesla & Waymo Robotaxis Still Require Human Remote Support

Autonomous Vehicles Unmasked: Tesla & Waymo Robotaxis Still Require Human Remote Support

Feb 21

Groundbreaking Split: National PTA Rejects Meta Partnership Amid Child Safety Storm

Groundbreaking Split: National PTA Rejects Meta Partnership Amid Child Safety Storm

Feb 21

View All News

More News

No specific recent news found.

Tooliax LogoTooliax

Your comprehensive directory for discovering, comparing, and exploring the best AI tools available.

Quick Links

  • Explore Tools
  • Compare
  • Submit Tool
  • About Us

Legal

  • Privacy Policy
  • Terms of Service
  • Cookie Policy
  • Contact

© 2026 Tooliax. All rights reserved.