Dalarna University's logo and link to the university's website

du.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • chicago-author-date
  • chicago-note-bibliography
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Clues to language evolution from a massive dataset with typology, phonology and vocabulary from many languages
Dalarna University, Not School affiliated.
2018 (English)In: Evolution of Language. Proceedings of Evolang XII / [ed] Cuskley, et al., Singapore: Nicolaus Copernicus University , 2018Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

1. Introduction

A major component in the evolution of language is the evolution of the human language capacity, whatever biological endowments humans have that make us language-ready. But the language capacity is not well understood and is difficult to study directly. Clues may come from biases displayed by humans in language acquisition and language change. Even weak underlying biases can lead to strong patterns in the resulting languages (Smith, 2011). Biases can be studied at the individual level in learning experiments (e.g. Culbertson, 2012, Tamariz et al., of natural languages (e.g. Dediu & Ladd, 2007). Biases can be seen either in the synchronic patterns of language features today, or in the diachronic patterns of transition probabilities between features as languages culturally evolve (e.g. Dunn et al, 2011).

Patterns that reveal biases may be found in any aspect of language, e.g. syntax, morphology, phonology, or lexicon, and may be subtle enough to be discernible only in large samples of languages. This work is an exploratory study across the widest possible set of languages, combining typological, phonological, lexical and phylogenetic data on a significant fraction of the languages of the world, with the goal of mapping any biases that may be present. Both synchronic and diachronic patterns are studied, with the emphasis on the latter.

2. Data set

The following data sources are used:

•Phylogeny and geography: Ethnologue (Simons & Fennig 2017); ~7,500 languages.

• Phonological inventories: PHOIBLE (Moran & McCloy & Wright 2014); ~1,800 languages.

• Typology: WALS (Dryer & Haspelmath 2013); ~2,500 languages.

• Lexicon (Swadesh lists): Rosetta Project Digital Language Archive (2009); ~1,300 languages.

All four types of data are available for ~300 languages. At least three types are available for ~1,600 languages from 132 different stocks. In order to keep the data set as homogeneous as possible, each type of data has been imported from a single source only. Languages are identified between data sources by their ISO codes. 3. Methods

The language phylogeny from Ethnologue is taken as given in the analysis. For the synchronic analysis, the phylogeny is taken into account in the character statistics by down-weighting multiple “hits” in the same family, in order to control for phylogenetic bias and lineage-specific patterns. Geographic data is also available to control for areal effects. Cross-correlations between different types of characters are analysed for possible patterns. For the diachronic analysis, the phylogeny together with modern-day character data are used to infer both ancestral character states up the language tree for phonological and typological characters, and transitional probabilities between states (including the probability of characters appearing and disappearing), in a bootstrapping process. 4. Some preliminary results

Well-known typological patterns are reproduced. But correlations between features are observed that go beyond those normally discussed in typology, or those observed by Dunn et al (2011). Interestingly, there are also some modest cross-correlations between grammatical features and phonemes. For example, the presence of aspirated consonants and nasal vowels correlates with certain word- order features, even after controlling for phylogeny. In the diachronic analysis, there are hints of patterns beyond the obvious one that transition probabilities into common features are larger, but much work remains to be done in the interpretation of these patterns.

Place, publisher, year, edition, pages
Singapore: Nicolaus Copernicus University , 2018.
Keywords [en]
typology, phonology, comparative linguistics
National Category
General Language Studies and Linguistics
Research subject
Research Profiles 2009-2020, Intercultural Studies
Identifiers
URN: urn:nbn:se:du-32571ISBN: 978-83-231-3991-1 (print)OAI: oai:DiVA.org:du-32571DiVA, id: diva2:1426902
Conference
Evolang XII
Available from: 2020-04-28 Created: 2020-04-28 Last updated: 2021-11-12Bibliographically approved

Open Access in DiVA

fulltext(150 kB)150 downloads
File information
File name FULLTEXT01.pdfFile size 150 kBChecksum SHA-512
2cde274255ca418db7f276b74c8c3227e5dcab14497f2d5fd38ac8d13d38142782df7709b3a70da07a357beff3832528f55f9073ff0ef5a920ac9f8e2e544971
Type fulltextMimetype application/pdf

Other links

http://evolang.org/torun/proceedings/paperpdfs/Evolang_12_paper_80.pdf

Authority records

Johansson, Sverker

Search in DiVA

By author/editor
Johansson, Sverker
By organisation
Not School affiliated
General Language Studies and Linguistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 150 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 282 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • chicago-author-date
  • chicago-note-bibliography
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf