Estonian (#740)
This commit is contained in:
parent
67dd940376
commit
cad27907c7
4 changed files with 1260700 additions and 0 deletions
12
docs/dictionaries/etWordlistReadme.txt
Normal file
12
docs/dictionaries/etWordlistReadme.txt
Normal file
|
|
@ -0,0 +1,12 @@
|
|||
Estonian word list by: Ekilex
|
||||
Version: January 2025
|
||||
Source: https://ekilex.ee/
|
||||
License: (Creative Commons BY 4.0) https://creativecommons.org/licenses/by/4.0/deed.en
|
||||
|
||||
Estonian word list and frequencies by: CC-100
|
||||
Version: 2020
|
||||
Source: https://data.statmt.org/cc-100/
|
||||
References (PDF links are available in the source URL):
|
||||
- Unsupervised Cross-lingual Representation Learning at Scale, Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), p. 8440-8451, July 2020.
|
||||
- CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data, Guillaume Wenzek, Marie-Anne Lachaux, Alexis Conneau, Vishrav Chaudhary, Francisco Guzmán, Armand Joulin, Edouard Grave, Proceedings of the 12th Language Resources and Evaluation Conference (LREC), p. 4003-4012, May 2020.
|
||||
Remark: Only the words that appear at least 3 times were used.
|
||||
Loading…
Add table
Add a link
Reference in a new issue