Languages differ in the status of sequences such as [mb, kp, ts]: they can pattern as complex segments or as clusters of simple consonants. We ask what evidence learners use to figure out which representations their languages motivate. We present an implemented computational model that starts with simple consonants only, and builds more complex representations by tracking statistical distributions of consonant sequences. We demonstrate that this strategy is successful in a wide range of cases, both in languages that supply clear phonotactic arguments for complex segments and in languages where the evidence is less clear. We then turn to the typological parallels between complex segments and consonant clusters: both tend to be limited in size and composition. We suggest that our approach allows the parallels to be reconciled. Finally, we compare our model with alternatives: learning complex segments from phonotactics and from phonetics.
In addition to the paper describing the learner, there are about two dozen case studies and a simple-to-use version of the learner on the project website. If you would like to cite any of the case studies or use the data from that site, cite it as an online resource:
Gouskova, Maria and Juliet Stanton. 2020. The CompSeg Learner. [URL: http://compseg.lingexp.org/…]. Accessed on [DATE].