Gujarati
This commit is contained in:
parent
240e5c444a
commit
e3d0bac90f
13 changed files with 1380245 additions and 18 deletions
16
docs/dictionaries/guWordlistReadme.txt
Normal file
16
docs/dictionaries/guWordlistReadme.txt
Normal file
|
|
@ -0,0 +1,16 @@
|
|||
Gujarati word list 1 from Stardict, adapted by Docbroke
|
||||
Source: https://github.com/sspanak/tt9/issues/577#issuecomment-2515314462
|
||||
License: Public Domain; permission to use in the link
|
||||
|
||||
Conjunct consonants list obtained from Wikipedia
|
||||
Version: 2024-12-30
|
||||
Sources: https://en.wikipedia.org/wiki/Gujarati_script
|
||||
License: Creative Commons Attribution-ShareAlike 4.0 License
|
||||
|
||||
Gujarati word list and frequencies by: CC-100
|
||||
Version: 2020
|
||||
Source: https://data.statmt.org/cc-100/
|
||||
References (PDF links are available in the source URL):
|
||||
- Unsupervised Cross-lingual Representation Learning at Scale, Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), p. 8440-8451, July 2020.
|
||||
- CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data, Guillaume Wenzek, Marie-Anne Lachaux, Alexis Conneau, Vishrav Chaudhary, Francisco Guzmán, Armand Joulin, Edouard Grave, Proceedings of the 12th Language Resources and Evaluation Conference (LREC), p. 4003-4012, May 2020.
|
||||
Remark: Used all words that appear at least twice, and the words that appear once and are shorter than 10 characters.
|
||||
Loading…
Add table
Add a link
Reference in a new issue