* fixed Bulgarian layout: moved 'ь' to 8-key * added a migration for removing all Bulgarian words, since the digit sequences are no longer compatible with the new layout * fixed incorrect text case of some words * removed some nonsense words * added new Bulgarian words
18 lines
No EOL
1,000 B
Text
18 lines
No EOL
1,000 B
Text
Bulgarian wordlist 1 by Miglen Georgiev
|
|
Version: f46eff1 (2022-04-26)
|
|
Source: https://github.com/miglen/bulgarian-wordlists/blob/master/wordlists/bg-words-validated-cyrillic.txt
|
|
License: https://github.com/miglen/bulgarian-wordlists/blob/master/LICENSE
|
|
|
|
Bulgarian wordlist 2 by michmech
|
|
Version: 9c91fe4
|
|
Source: https://github.com/michmech/lemmatization-lists/blob/master/lemmatization-bg.txt
|
|
License: https://github.com/michmech/lemmatization-lists/blob/master/LICENCE
|
|
|
|
Also, used the wooorm's hunspell-compatible dictionary to determine which words need to start with a capital letter
|
|
Link: https://github.com/wooorm/dictionaries/tree/main/dictionaries/bg
|
|
Git commit: 13 Apr 2022 [0c78cc810c8aafb2e6f5140bb6dcd4026b247eb8]
|
|
|
|
Additionally cleaned up repeating words and added some missing ones manually.
|
|
|
|
Word frequencies obtained from the "General" word frequency dictionary by the Department of Computational Linguistics of the Bulgarian Academy of Sciences.
|
|
Link: https://dcl.bas.bg/frequency.html |