Should we still use Text for Speech-to-Speech Translation? Promise meets Practice



In this talk I describe the current benefits and limitations of techniques for direct speech-to-speech translation (S2ST). I discuss the work I did at Roblox on speaker-preserving cascaded speech to text translation systems and adaptations made to such systems to allow for simultaneous inference. I finish by outlining methods for preserving prosody through text and discuss the steps that need to be taken in order to develop robust direct S2ST systems.


This talk was given on May 5th 2023 at the Human Language Technology Center of Excellence (HLTCOE) at John’s Hopkins University in Baltimore, Maryland as part of their Bi-weekly work in progress talk seminar.

Paper, Code, Poster