Presentation of our humming sound recognition at INTERSPEECH 2023

A photo of me in front of a poster about our research at Interspeech 2023.

I presented our recognition of humming sounds using deep learning at the prestigious INTERSPEECH conference in Dublin.

At Semanux, our claim to develop highly innovative technologies to improve access to the digital world for all is a big promise. Last week, I was able to demonstrate that we do not understate this claim, by presenting our humming sound recognition at the internationally respected INTERSPEECH conference in Dublin, Ireland. The INTERSPEECH is the world’s largest and most comprehensive conference on the science and technology of spoken language processing and is sponsored by Apple and Google Research.

With our technology, a computer can distinguish which humming sound a microphone has just heard. Is it an affirmative “Uh-huh” or a negative “Uh-uh”? Our technology recognizes humming sounds of six different types with an accuracy of 96.6%. With this technology, actions on the computer can be reliably performed by humming sounds - from clicking to starting programs to entering text.

Our contribution to INTERSPEECH was published by the “International Speech Communication Association” and can be cited as follows:

Hedeshy, R., Menges, R., Staab, S. (2023) CNVVE: Dataset and Benchmark for Classifying Non-verbal Voice. Proc. INTERSPEECH 2023, 1553-1557, doi: 10.21437/Interspeech.2023-201

The paper consists of our dataset of humming sounds from 950 recordings we recorded with 42 participants, source code to process the dataset and generate a deep learning model for humming sound recognition, and a research paper that explains and discusses the dataset and model in detail. With the open dataset and open source code, we enable all researchers in the world to improve the recognition of humming sounds.

Ramin Hedeshy
Ramin Hedeshy
PhD Candidate

My research interests include Human-Computer Interactions, accessibility and artificial intelligence.