Speaker diarization often complements automatic speech recognition (ASR) by determining "Who spoke when?". One intriguing advancement in the field is the adoption of Self-Supervised Learning (SSL). By harnessing vast amounts of unlabelled audio data, SSL manages to improve multiple downstream tasks, including ASR and diarization, using the same pre-trained model. As we explore in this blog, the synergy between SSL and traditional methods not only boosts ASR accuracy but also aids in improving speaker diarization results.
Improving Speaker Diarization with Self-supervised Learning
· 11 min read