Toward Speech Recognition of the Italian Language Based on Detection of Landmarks and Other Acoustic Cues to Features

Engineering Sciences

The creation of an automatic speech recognition system that imitates the process actuated by a listener in deriving words intended by a speaker requires understanding how lexical items are stored in memory. The model proposed by Stevens at the Massachusetts Institute of Technology (MIT; 2002) postulates that lexical items are stored in memory according to distinctive features, and that these features are hierarchically organized. In particular, the model highlights the importance of abrupt acoustic events, named landmarks, in the perception process. The detection of landmarks is primary in human perception and corresponds to the first phase of recognition by a listener. The temporal area around the landmark is then further processed by the listener. Based on the above Stevens’ model of Lexical Access, the Speech Communication Group of MIT developed a speech recognition system—for spoken English—over a span of more than 20 years, consisting in a complex association of modules dedicated to the different system functions; detection of landmarks, detection of specific features, and ultimately word recognition. This project proposes to apply the Lexical Access model to the Italian language, and by exploring a new language to provide insight into whether Stevens’ approach has universal application across languages, with relevant implication for the understanding of how the human brain recognizes speech. The project also proposes the principled introduction into the model of the concept of inference, complemented by deep learning mathematical tools.

Speech science and its multidisciplinary nature that spans from engineering to physics, physiology, linguistics, phonetics, and phonology, is the field of my project. Students will benefit from addressing problems related to this variety of fields, and participation in the project will also offer them an opportunity for developing skills in speech in general and speech recognition in particular. Students who know the Italian language or who are studying Italian may find an interesting context in which to develop further their Italian language skills. Knowing Italian is, however, not a requirement, and participation of students who do not know Italian is very welcome as well. The modular project structure is ideal for team work, and the project will therefore benefit greatly from the participation of students to the development of its various modules and of their interfaces.