1
0
Fork 0
tt9/docs/dictionaries/bgWordlistReadme.txt
2024-01-31 12:34:50 +02:00

23 lines
No EOL
1.2 KiB
Text

Bulgarian wordlist 1 by Miglen Georgiev
Version: f46eff1 (2022-04-26)
Source: https://github.com/miglen/bulgarian-wordlists/blob/master/wordlists/bg-words-validated-cyrillic.txt
License: https://github.com/miglen/bulgarian-wordlists/blob/master/LICENSE
Bulgarian wordlist 2 by michmech
Version: 9c91fe4
Source: https://github.com/michmech/lemmatization-lists/blob/master/lemmatization-bg.txt
License: https://github.com/michmech/lemmatization-lists/blob/master/LICENCE
Bulgarian wordlist 3 by chitanka
Source: https://rechnik.chitanka.info/about
Github: https://github.com/chitanka/rechko
License: Just "free download", so assuming public domain.
Also, used the wooorm's hunspell-compatible dictionary to determine which words need to start with a capital letter
Link: https://github.com/wooorm/dictionaries/tree/main/dictionaries/bg
Git commit: 13 Apr 2022 [0c78cc810c8aafb2e6f5140bb6dcd4026b247eb8]
Additionally cleaned up repeating words and added some missing ones manually.
Word frequencies obtained from the "General" word frequency dictionary by the Department of Computational Linguistics of the Bulgarian Academy of Sciences.
Link: https://dcl.bas.bg/frequency.html