Gouskova and Stanton 2019

Learning complex segments. Manuscript, New York University. [pdf]

Languages differ in the status of sequences such as [mb, kp, ts]: they can pattern as complex segments or as clusters of simple consonants. We ask what evidence learners use to figure out which representations their languages motivate. We present an implemented computational model that starts with simple consonants only, and builds more complex representations by tracking statistical distributions of consonant sequences. We demonstrate that this strategy is successful in a wide range of cases, both in languages that supply clear phonotactic arguments for complex segments and in languages where the evidence is less clear. We then turn to the typological parallels between complex segments and consonant clusters: both tend to be limited in size and composition. We suggest that our approach allows the parallels to be reconciled. Finally, we compare our model with alternatives: learning complex segments from phonotactics and from phonetics.

The learner is implemented and accompanied by over a dozen case studies on various languages; the code and the data will be released when the paper is published.