Exploring Cross-Linguistic Patterns in Grammar and Vocabulary through Data: Body Partonomy

Statistics/Linguistics/Anthropology/Cognitive Science/Computational Linguistics/Computer Science

Regularities in the structure of unrelated languages have been used to argue for shared common causes. This is naturally derived from the fact that most languages emerge from human groups under similar needs (for expression, communication, and thought) and biological bases (which determine our species’ capacity for performing certain behaviors and cognitive operations). Complementarily, substantial variation across languages or highly idiosyncratic linguistic systems are often attributed to the stochastic nature of human history and the role of culture in shaping language.

A particularly interesting case is given by the way languages structure the vocabulary referring to body parts. The human body and its parts are universally accessible referents across human populations, yet at the same time there are huge differences in the ways different human societies value, represent, or conceptualize them. This project aims at exploring body partonomy systems in a large sample of diverse languages with the goal of assessing candidate explanations for their differences and commonalities. Within this project there are four tasks (that could be taken up by the same or different students):

  • Unsupervised learning of discrete hierarchical data: the student will assist me in reviewing the SOTA methods to describe and infer network representations of data with hierarchical structure, with the final goal of deploying a method applicable to the type of data gathered for this project. Expected/preferred background: statistics, scientific computing (pref. R and/or Python), and network science.
  • Elicitation of body part terms: the student will be responsible for running simple elicitation tasks on body part terms involving informants in Cambridge on diverse native languages (pending approval of the ethics committee). Expected/preferred background: linguistics, cognitive science, psychology, or anthropology.
  • Literature review: a crucial aspect of the project is to delimit very clearly what are the competing hypotheses proposed to explain the regularities and differences across body partonomies. The sources of these hypotheses are very diverse, ranging from the study of human perception and cognition to the neuroscience of action and lexical semantics. The selected student will assist me in reviewing such literature with the goal of producing a comprehensive and testable set of competing hypotheses. Expected/preferred background: linguistics, cognitive science, psychology, or anthropology.
  • CL resources and corpora: most of the data available for the present project comes from vocabulary lists of a large number of languages, yet relevant data on how body part terms are used is lacking. The student will help me exploring available methods and resources to leverage multilingual corpora with the aim of complementing the evidence coming from dictionaries and word lists. Expected/preferred background: computational linguistics.
  • In general, I look for highly motivated students with interest in language and languages, and a necessary drive to work on a complex topic where flexible project goals are to be expected. In return the students will take part in a highly interdisciplinary project and will be exposed to the whole process of scientific production starting from the delimitation of the hypotheses to the publication of the results. If a publication results from the project they will have the opportunity to become co-authors.