
New open-source voice model listens nonstop and decides every 0.4 seconds whether to speak or stay silent
A new open-source voice model fundamentally shifts real-time conversation dynamics by processing audio continuously and making speak/silence decisions every 0.4 seconds, rather than waiting for recording endpoints like GPT-4o or Qwen3.5-Omni. The model handles transcription, translation, chat, and ambient sound detection in a single inference stream. Full weights, code, and training data are available under Apache 2.0, lowering barriers for researchers and developers building voice-first applications and potentially accelerating the shift toward always-on conversational AI systems.73



























