- 1 Some resources for computational work on Russian morphology and phonology
- 2 Other languages: corpora and word lists
- 3 Linguistic document preparation
- 4 Praat
- 5 Misc
Some resources for computational work on Russian morphology and phonology
Russians digitize everything and put it online. This makes corpus work on the language easy.
Probably the most useful starter resource for someone interested in Russian morphophonology is the 1977 classic Zaliznjak’s dictionary (Grammaticeskij slovar’ russkogo jazyka [A grammatical dictionary of the Russian language], Moscow: Russkij Jazyk). This is available in a number of formats online:
- A reverse list of forms. Sort of like a rhyming dictionary.
- A full list of Zaliznjak’s Paradigms (TXT in RAR file): Close to 90,000 inflected forms of Russian words, with stress marked. Automatically generated from Zaliznjak’s dictionary by Andrei Usachev. Contains a few errors (some ungrammatical short forms of adjectives are given, and paradigm gaps are sometimes filled incorrectly) but otherwise quite useful.
- Online database version: Enter a word, and this database returns all of the grammatical codes of the 1977 original, including stress type, declension class, etc.
- Downloadable version. This is a Windows .exe file, but you can extract a .dbf file that contains all of the information in the online version. DBF files can then be imported into R in any OS.
Frequency and corpus searches
- Ruscorpora: A searchable web corpus of Russian texts and spoken speech.
- Serge Sharoff’s frequency and lemma lists; includes things like bi-, tri- and tetragram lists (orthographic strings of various length, ordered by frequency of occurence). You probably cannot use this unless you can read Russian or are at least comforable with Cyrillic.
- Frequency Dictionaries:Russian lg page/English translation Some small frequency dictionaries of Russian, including a lemma frequency list for the 5000 most frequent words and some information about average word length and so on.
- Yandex. The dominant Russian search engine. By default, it searches for Russian words in all case forms, so you get estimated lemma counts.
- Academic Dictionaries: Online dictionaries including Ozhegov, Efremova, Vasmer’s etymological dictionary, Dahl, and many, many others (did you know philatelists had their own dictionary?). Comprehensive, accurate, and UTF-8 encoded.
- Downloadable Dictionaries: A collection of links of downloadable dictionaries. Not all are accurate or complete (for example, Zaliznjak’s dictionary does not appear in its complete form, but you can reconstruct the information from the parts that are there).
- Lyokhin & Petrov’s Dictionary of Loanwords: Searchable online version.
- Rosenthall’s Spelling Reference: Everything you needed to know about the quirks of Russian orthography and punctuation. Some useful discussions of vowel and consonant alternations represented in the orthography but not in spoken speech.
- Akhmanova’s Dictionary of Linguistic Terminology. In Russian.
Other languages: corpora and word lists
- An Crúbadán: Corpus building for minority languages: A collection of downloadable orthographic corpora in hundreds of languages, including texts from Wikipedia, Bible translations, the Declaration of Human Rights, blogs, and tweets.
- Kai Schott’s IPA word lists [[a-l]][l-y]: these have orthography-derived IPA transcriptions for wordlists that form the basis for the freely available OpenOffice spellchecker dictionaries. The dictionaries were created as part of Schott’s Simon project, whose website is now defunct. You will want to double-check the IPA transcriptions against language descriptions before using these for phonological corpus work.
Linguistic document preparation
LyX and LaTeX
- Switching to LyX for Linguists Some step-by-step instructions for installing LaTeX and LyX.
- LyX for Linguists: Wiki (a help/how to page that I have contributed to).
Bibliography and texmf files
- My texmf directory with all the LaTeX packages and styles I use for linguistics work.
- My .bib file–mostly this is about phonology, Russian, and morphology; 4800+ entries. Zipped.
Working with the International Phonetic Alphabet
- IPA_SIL An IPA keyboard layout for Mac OS that I find to be the most intuitive and user-friendly. The .zip file includes documentation. This layout was originally distributed by sil.org but seems to have been retired.
- Using IPA fonts and keyboard layouts Meant to be pragmatic, not comprehensive. If you want to learn all about Unicode or legacy fonts, your best bet is to search the internet superhighway on your own.
- How to create syntactic and prosodic structure trees, with a focus on word processors.
An introduction to Praat’s basic functions, for the Sound & Language course.
- A very short regex handout I wrote it for my Kazakh Field Methods course. It explains the basics of regular expressions for linguists, with phonological examples.
- How to Use MS Word Like a Pro This is a very old handout, but if you are a MS Word stalwart, you might still find it useful. It describes some useful and well-hidden functions of Microsoft Word that linguists should know about, such as semi-intelligent automatic sequential numbering and cross-referencing.
- How to run R on more than one CPU core on Mac OS using screen Normally, R.app runs on just a single core in Mac OS. This tip helps you do computationally intensive things in R simultaneously, using Terminal and the screen utility.
- Git cheat sheet
- How to add a picture of your signature to a PDF in Linux using Okular