Distributed AI driven Audio Sample Manager for Drum Sound Tagging, Clustering and Recommendations

Project Description

Audio industry professionals tend to collect enormous amounts of audio data within their production environment. Foley artists use different kinds of recorded sounds when working on film audio, music producers have libraries of different musical instruments to choose from and game sound designers also have a vast amount of audio files to choose from when designing soundscapes for games.
The process of assembling and manually tagging different audio files takes a huge amount of effort and is extremely time consuming. Audio professionals have to manually listen to all sounds within their library and then introduce meaningful tags to make the vast amount of files accessible for later usage in the production process. Furthermore, the process of finding the right sound in production also can cost a lot of time for the engineer.
To overcome these shortcomings, this project investigates the utilization of artificial intelligence for automatic audio tagging, clustering and recommendation and then the created models are used to implement a distributed audio sample manager. The required data for model training is extracted from the open source online sound library Freesound. In order to curate a representative data set and to refine the scope of the project, solely drum sounds will be used to train the model.
Audio sample tagging models are already well studied and thus a Transfer Learning Approach is chosen where a pre-trained MUSICNN model is chosen and further modified to adapt to the specific use case of drum sound tagging.
The implemented recommendation system and custom audio tagger are both based on k-Nearest Neighbor classifiers (k-NN), working with normalized Mel Frequency Cepstral Coefficients (MFCC) to allow for high accuracy audio recommendation and user customized tagging.
The system is implemented and deployed, using web technologies to allow for user access via web browsers. The backend is built by using the Python Framework Flask to implement a REST-API which exposes all required model computations to the frontend which will use Angular to implement the UI.

Project Goals

Compilation of a Drum Sound Dataset based on sounds from https://freesound.org/
Implementation of a Drumsample Tagger based on musicnn (https://github.com/jordipons/musicnn)
Implementation of a Recommendation System based on user history
Distributed Sample Manager where the backend is written with Python Flask and the frontend with the Angular framework

Project Demonstration

Current Limitations

The only supported audio format is wave
Users can apply only one custom tag to a given sound
JWT authentication and Flask Backend are not ready for deployment, as some security best practices have not been applied because this was beyond the scope for this work
User recommendations are not saved to the database, and thus have to be recomputed for every request
If a library is fed with new sounds or sounds are renamed, the system has currently no chance in reacting to those circumstances
When a dataset becomes large, tensorflow has memory allocation problems

Documentation

Download

Author

Niklas Wantrupp 2021

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.