1
0
Fork 0
* Serbian language: dictionary + icons

* updated icon generation script
This commit is contained in:
Dimo Karaivanov 2025-07-01 12:12:28 +03:00 committed by sspanak
parent d2e09195d6
commit b114370e91
11 changed files with 1502349 additions and 3 deletions

View file

@ -0,0 +1,21 @@
Serbian word list by: Jovan Turanjanin
Version: aa2d0308ee633d22b4b663d90507bf2747a6399c (2022-06-29)
Source: https://github.com/turanjanin/spisak-srpskih-reci
License: (CC0 1.0 Universal) https://github.com/turanjanin/spisak-srpskih-reci/blob/master/LICENSE.md
Serbian word frequencies by: CC-100
Version: 2020
Source: https://data.statmt.org/cc-100/
References (PDF links are available in the source URL):
- Unsupervised Cross-lingual Representation Learning at Scale, Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), p. 8440-8451, July 2020.
- CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data, Guillaume Wenzek, Marie-Anne Lachaux, Alexis Conneau, Vishrav Chaudhary, Francisco Guzmán, Armand Joulin, Edouard Grave, Proceedings of the 12th Language Resources and Evaluation Conference (LREC), p. 4003-4012, May 2020.
More Serbian word frequencies from The Leipzig Corpora Collection:
Source: https://wortschatz.uni-leipzig.de/en/download/Serbian
Version: Web/2016/1M, Wikipedia/2021/1M
License: https://creativecommons.org/licenses/by-nc/4.0/
Yet even more word frequencies obtained from LatinIME dictionaries:
Source: https://android.googlesource.com/platform/packages/inputmethods/LatinIME
Version: 66093bf509ea92fa31d796326d5f30a8d9582ffe (2023-12-21)
License: https://android.googlesource.com/platform/packages/inputmethods/LatinIME/+/refs/heads/main/NOTICE