1
0
Fork 0
* added Japanese (Hiragana, Katakana, Kanji)

* improved dictionary validation: it is now possible to have the same ideogram with two different transcriptions

* fixed frequency updating not working sometimes (in Chinese too)
This commit is contained in:
Dimo Karaivanov 2025-04-12 11:59:13 +03:00 committed by GitHub
parent efa1fb4d79
commit 0ec912f9c9
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
33 changed files with 1603029 additions and 89 deletions

View file

@ -0,0 +1,25 @@
Japanese wordlists by: EDICT Project
Source: https://www.edrdg.org
Dictionaries used: JMDICT, ENAMDICT
Version: 2025-04-01
License: https://www.edrdg.org/edrdg/licence.html (Creative Commons Attribution-ShareAlike Licence V4.0)
Verb conjugations generated using: Japanese Verb Conjugator V2
Source: https://pypi.org/project/japanese-verb-conjugator-v2/
Version: 2025-01-13
Verb conjugations converted to Hiragana using: WanaKana-py
Source: https://github.com/Starwort/wanakana-py
Version: fa43884 (2019-07-13)
Japanese frequency list by: Wortschatz Leipzig @ Uni Leipzig
Source: https://wortschatz.uni-leipzig.de/en/download/
Version: 2025-04-04
License: CC-BY
Reference:
> D. Goldhahn, T. Eckart & U. Quasthoff: Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages.
> In: Proceedings of the 8th International Language Resources and Evaluation (LREC'12), 2012
> http://www.lrec-conf.org/proceedings/lrec2012/pdf/327_Paper.pdf
Additional remarks:
Hiragana and Katakana for the respective modes were added manually. All words converted to Romaji manually.