A Privacy-Preserving Unsupervised Speaker Disentanglement Method for Depression Detection from Speech.

TitleA Privacy-Preserving Unsupervised Speaker Disentanglement Method for Depression Detection from Speech.
Publication TypeJournal Article
Year of Publication2024
AuthorsRavi V, Wang J, Flint J, Alwan A
JournalCEUR Workshop Proc
Volume3649
Pagination57-63
Date Published2024 Feb
ISSN1613-0073
Abstract

The proposed method focuses on speaker disentanglement in the context of depression detection from speech signals. Previous approaches require patient/speaker labels, encounter instability due to loss maximization, and introduce unnecessary parameters for adversarial domain prediction. In contrast, the proposed unsupervised approach reduces cosine similarity between latent spaces of depression and pre-trained speaker classification models. This method outperforms baseline models, matches or exceeds adversarial methods in performance, and does so without relying on speaker labels or introducing additional model parameters, leading to a reduction in model complexity. The higher the speaker de-identification score (), the better the depression detection system is in masking a patient's identity thereby enhancing the privacy attributes of depression detection systems. On the DAIC-WOZ dataset with ComparE16 features and an LSTM-only model, our method achieves an F1-Score of 0.776 and a score of 92.87%, outperforming its adversarial counterpart which has an F1Score of 0.762 and 68.37% , respectively. Furthermore, we demonstrate that speaker-disentanglement methods are complementary to text-based approaches, and a score-level fusion with a Word2vec-based depression detection model further enhances the overall performance to an F1-Score of 0.830.

DOI10.21437/Interspeech.2018-1399
Alternate JournalCEUR Workshop Proc
PubMed ID38650610
PubMed Central IDPMC11034881
Grant ListR01 MH122569 / MH / NIMH NIH HHS / United States