After a first experimental Common Voice contest, Mozilla Italia returns with a new competition within the project for the collection of vocal recordings . The goal is to promote Mozilla's initiative also for the Italian language, enriching an open-source, collaborative and public database. Many areas would benefit from it, including that of accessibility, which is still too little considered and developed.
Common Voice: the Mozilla project
Common Voice is part of Mozilla's initiative to improve the digital speech recognition industry. The project provides a public database of voice recordings, downloadable and usable in various machine learning systems working in the field of speech recognition . The ability to leverage technologies that understand natural language provides numerous benefits, speeding up and automating many processes.
The initiative was born in 2017 and it is a crowdsourcing project for the construction of a public and free database of registrations . To develop speech recognition software, in fact, it is necessary to have access to a large and heterogeneous quantity of speech data to be used as training for the learning algorithms. One of the main problems in finding the recordings, however, is the absence of a well-supplied database: most of the available sets, in fact, are proprietary and therefore paid for; hence Mozilla's open-source idea.
The project requires to "provide your voice" to build a voice database so that developers can create ever more precise systems. There are two ways to contribute (for free) to the project: speaking and listening . In the first case, a short text provided by Mozilla is recorded and the recording is sent to a listening queue: the piece will then be listened to by other users and the accuracy of the reading will be evaluated. If at least two users validate the clip, it will be added to the shared dataset. If, on the other hand, the registration is rejected by two collaborators, it is inserted in the so-called "cemetery of registrations", which is still publicly accessible. In the second case, the contribution takes place through listening, becoming validators of the read sentences and promoting or rejecting the audio clips.
The Mozilla Italy contest
If the databases of languages such as English, French and German are very well supplied, the same cannot be said for Italian. The Common Voice Corpus 6.1, the latest version of the dataset, has in fact well 56GB of data for the English language, against only 5GB for our language. Among the characteristics that determine the goodness of a dataset, in addition to the quantity of data, there is the diversity of the records. In addition to gender and age, it is also important to train on the different accents of users, especially in a country like Italy where the dialect inflections are very marked and heterogeneous.
For this reason Mozilla Italia has launched a competition in order to promote the Common Voice project and enrich the Italian database . The contest started a week ago and has already collected 20 hours of recordings, with 35 registered collaborators. Participating is very simple: after downloading the Donate your voice: CV Project app (only on PlayStore for Android 6+) and creating an account, you need to go to Settings -> Advanced and click on Show me the line that identifies me in the app , then take a screenshot of your user ID and send it to the organizers. This procedure must then be repeated at the end of the contest. For the entire duration of the event, each validation earns 1 point, while each registration 2 points. The first 20 finishers who have scored the most points will receive gadgets and t-shirts.
Having free access to a large, multilingual dataset is essential for improving speech recognition technology. However, the tools to support innovation must be of a quality that is easily accessible to everyone, so that everyone can make their contribution and speed up new developments. Enriching the dataset means facilitating the work of researchers, students and anyone who wants to contribute to the improvement of speech recognition technologies .