A Cross-modal Audio Search Engine based on Joint Audio-Text Embeddings
Ad-hoc audio clips, such as those from smart speakers, social media apps, security cameras and podcasts, are being recorded and shared online on a daily basis. For a variety of applications, it is important to…