1
0
Fork 0
tt9/docs/dictionaries/bgWordlistReadme.txt
Dimo Karaivanov 0aa934cebd
Bulgarian update (#268)
* fixed Bulgarian layout: moved 'ь' to 8-key 

* added a migration for removing all Bulgarian words, since the digit sequences are no longer compatible with the new layout

* fixed incorrect text case of some words

* removed some nonsense words

* added new Bulgarian words
2023-07-13 14:33:54 +03:00

18 lines
No EOL
1,000 B
Text

Bulgarian wordlist 1 by Miglen Georgiev
Version: f46eff1 (2022-04-26)
Source: https://github.com/miglen/bulgarian-wordlists/blob/master/wordlists/bg-words-validated-cyrillic.txt
License: https://github.com/miglen/bulgarian-wordlists/blob/master/LICENSE
Bulgarian wordlist 2 by michmech
Version: 9c91fe4
Source: https://github.com/michmech/lemmatization-lists/blob/master/lemmatization-bg.txt
License: https://github.com/michmech/lemmatization-lists/blob/master/LICENCE
Also, used the wooorm's hunspell-compatible dictionary to determine which words need to start with a capital letter
Link: https://github.com/wooorm/dictionaries/tree/main/dictionaries/bg
Git commit: 13 Apr 2022 [0c78cc810c8aafb2e6f5140bb6dcd4026b247eb8]
Additionally cleaned up repeating words and added some missing ones manually.
Word frequencies obtained from the "General" word frequency dictionary by the Department of Computational Linguistics of the Bulgarian Academy of Sciences.
Link: https://dcl.bas.bg/frequency.html