Dalarna University's logo and link to the university's website

du.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • chicago-author-date
  • chicago-note-bibliography
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Simulating the spread and development of protolanguages
Dalarna University, Verksamhetsstödet.ORCID iD: 0000-0002-8555-148X
2023 (English)In: Protolang 8: Book of Abstracts, 2023, p. 41-42Conference paper, Oral presentation with published abstract (Refereed)
Abstract [en]

Languages change over time, due to various processes that likely have been operative since the dawn of language. But our understanding of the relative importance of different processes in the distant past remains limited. Methods for reconstructing language change are hampered by shortage of training data.

Simulating language change in software can help, testing processes and producing simulated language data as input for reconstruction tests. In simulation, the processes are known and controllable, and the true diversification path is known. Tuning process strength in simulation until the results resemble real language diversity may inform theories of language dynamics. 

But simulated data will only be helpful if the simulation reproduces relevant aspects of reality closely enough. Several items in List (2019) Open problems in computational linguistics concern simulation issues. Extant simulations are mainly of two types: 

  • Detailed short-term simulations of within-language dynamics, often agent-based (e.g. Nolfi & Mirolli, 2010).
  • Macro-scale long-term simulations, but with linguistic and/or geographical details abstracted away (e.g. Wichmann, 2017; Kapur & Rogers, 2020).

Neither type covers the middle ground where within-language and between-language dynamics meet. This work aims to fill that gap, with a simulation that has sufficient linguistic, geographic and anthropological detail to produce realistic data, and sufficient scope to cover macro-scale dynamics over millennia.

The basic simulation unit is a speech community with typically 100-1000 speakers, speaking a common language. Their language has an explicit vocabulary with word-forms and meanings. Real languages from CLICS3 (Rzymski et al., 2019) are used as seed languages, which then evolve through regular sound change, word gain and loss, semantic shift, language contact, and areal effects. All processes are adjustable and can be disabled.

The geography of the real world is used, with topography from De Ferranti (2015), rivers from Kelso (2016) and climate/ecology from NASA (2016). Each speech community lives in a 50x50 km grid square, which may be shared with other communities up to a carrying capacity. Population may increase or decrease depending on food availability, and surplus population may migrate to greener pastures, forming a new community whose language then evolves independently. Travel depends on real terrain and available technology (innovations occur occasionally, starting from paleolithic level).

Simulation results are available as Swadesh matrices, or in formats suitable for automated reconstruction such as CLDF or NEXUS. True trees and true cognate sets are saved separately.

Software and sample output available at https://github.com/[ANONYMIZED]/LangChangeSimulator/tree/master 

 

De Ferranti, J. (2015) Viewfinder Panoramas Digital Elevation Model. http://www.viewfinderpanoramas.org/dem3.html 

Kapur, R & Rogers, P (2020) Modeling language evolution and feature dynamics in a realistic geographic environment. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona.

Kelso, N V (2016) Natural Earth Data. https://www.naturalearthdata.com/downloads/ 

List, Johann-Mattis (2019): Open problems in computational historical linguistics. Invited talk presented at the 24th International Conference of Historical Linguistics (2019-07-01/05, Canberra, Australian National University).

NASA (2016) NASA Earth Observations. https://neo.gsfc.nasa.gov/ 

Nolfi, S & Mirolli, M (2010) Evolution of Communication and Language in Embodied Agents. Springer.

Rzymski, Christoph and Tresoldi, Tiago et al. 2019. The Database of Cross-Linguistic Colexifications, reproducible analysis of cross- linguistic polysemies. DOI: 10.1038/s41597-019-0341-x

Wichmann, S. (2017) Modeling language family expansions. Diachronica 34:1, 79-101.

Place, publisher, year, edition, pages
2023. p. 41-42
Series
Ways to (proto)language conference series
Keywords [en]
language change, simulation, protolanguage, historical linguistics
National Category
General Language Studies and Linguistics
Identifiers
URN: urn:nbn:se:du-47919OAI: oai:DiVA.org:du-47919DiVA, id: diva2:1831500
Conference
Protolang 8, Rome, September 27-28, 2023
Note

Ways to (proto)language conference series. Department of Philosophy, Communication and Performing Arts. Roma Tre University, Rome (IT), September 27-28, 2023

Available from: 2024-01-25 Created: 2024-01-25 Last updated: 2024-01-26Bibliographically approved

Open Access in DiVA

fulltext(94 kB)53 downloads
File information
File name FULLTEXT01.pdfFile size 94 kBChecksum SHA-512
020960b9816f0f64c1f4330ac3fc68867f4ce09a3523fc4aab7981bfa48fee6538753d5cb8bcd795781f14a094d9b5417a1a415f206410408207d0f149f87651
Type fulltextMimetype application/pdf
fulltext(1953 kB)181 downloads
File information
File name FULLTEXT02.pdfFile size 1953 kBChecksum SHA-512
16f86f173a884a9f1982d8432b879193152bcdead07282355d80de7815525de0dd2f4ac4e6dd27a47d3626e17659c0b92b684d343984002df81638b740221597
Type fulltextMimetype application/pdf

Other links

Protolang8 proceedings

Authority records

Johansson, Sverker

Search in DiVA

By author/editor
Johansson, Sverker
By organisation
Verksamhetsstödet
General Language Studies and Linguistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 237 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 313 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • chicago-author-date
  • chicago-note-bibliography
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf