1
0
Fork 0

Dictionaries update (#80)

* added missing words to the Bulgarian dictionary

* English dictionary update

* removed repeating words from the Italian and Bulgarian dictionaries

* fixed incorrectly broken words and removed repeating ones from the Ukrainian dictionary

* Russian dictionary update

* documentation update

* made it possible to type words with apostrophes (Dutch, English and Ukrainian)
This commit is contained in:
Dimo Karaivanov 2022-10-24 13:32:31 +03:00 committed by GitHub
parent 6c19edc8a3
commit 8b67929a07
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
23 changed files with 187613 additions and 57933 deletions

View file

@ -50,7 +50,9 @@ To support a new language one needs to:
- The text must be white and the background must be transparent as per the [official Android guide](https://android-doc.github.io/guide/practices/ui_guidelines/icon_design_status_bar.html). - The text must be white and the background must be transparent as per the [official Android guide](https://android-doc.github.io/guide/practices/ui_guidelines/icon_design_status_bar.html).
- To simplify the process, you could use Android Studio. It has a built-in icon generator accessible by right-cicking on "drawable" folder -> New -> Image Asset. Then choose "Icon Type": "Notification Icons", "Asset Type": Text, "Trim": No, "Padding": 0%. - To simplify the process, you could use Android Studio. It has a built-in icon generator accessible by right-cicking on "drawable" folder -> New -> Image Asset. Then choose "Icon Type": "Notification Icons", "Asset Type": Text, "Trim": No, "Padding": 0%.
- Find a suitable dictionary and add it to `assets` folder. - Find a suitable dictionary and add it to `assets` folder.
- Create a new language class in `languages/definitions/`. Make sure to set all properties. The ID must be the next available one. Currently, the range is limited between 1 and 31, so there can be 31 languages in total. - Create a new language class in `languages/definitions/`. Make sure to set all properties.
- `ID` must be the next available number. Currently, the range is limited between 1 and 31, so there can be 31 languages in total.
- Set `isPunctuationPartOfWords` to `true`, if you need to use the 1-key for typing words, such as: `it's`, `a'tje` or `п'ят`. Otherwise, it would not be possible to type them, nor will they appear as suggestions. `false` is recommended when apostrophes or other punctuation are not part of the words, to allow faster typing.
- Add the new language to the list in `LanguageCollection.java`. You only need to add it in one place, in the constructor. Please, be nice and maintain the alphabetical order. - Add the new language to the list in `LanguageCollection.java`. You only need to add it in one place, in the constructor. Please, be nice and maintain the alphabetical order.
- Add a new entry in `res/values/const.xml`. Make sure the new ID matches the one in the language class. - Add a new entry in `res/values/const.xml`. Make sure the new ID matches the one in the language class.
- Add new entries in `res/values/arrays.xml`. - Add new entries in `res/values/arrays.xml`.

View file

@ -44,6 +44,7 @@
ок ок
ос ос
от от
оф
ох ох
па па
пи пи
@ -84,8 +85,8 @@
яз яз
ял ял
ям ям
де
аба аба
абе
аби аби
абу абу
ага ага
@ -97,6 +98,7 @@
але але
ало ало
алт алт
алф
ама ама
ами ами
ана ана
@ -440,6 +442,7 @@
мри мри
мря мря
мсе мсе
мхм
мъж мъж
мър мър
мъх мъх
@ -650,8 +653,9 @@
тих тих
тиф тиф
тия тия
ток тоз
той той
ток
том том
тон тон
топ топ
@ -1005,6 +1009,7 @@
блей блей
блея блея
блок блок
блус
блъф блъф
блян блян
боаз боаз
@ -2291,6 +2296,7 @@
маят маят
мвае мвае
мваи мваи
мега
меди меди
мезе мезе
мека мека
@ -3132,6 +3138,8 @@
сена сена
сено сено
сент сент
сера
сере
сери сери
серт серт
сета сета
@ -3662,6 +3670,7 @@
хари хари
харч харч
хасе хасе
хаха
хвощ хвощ
хека хека
херц херц
@ -4027,7 +4036,6 @@
ячуа ячуа
ящен ящен
току-що току-що
ли
току-виж току-виж
току-тъй току-тъй
горе-долу горе-долу
@ -6167,6 +6175,7 @@
дюйма дюйма
дюкян дюкян
дюлев дюлев
дюнер
дюшек дюшек
дявол дявол
дявам дявам
@ -8301,6 +8310,7 @@
метра метра
метро метро
метри метри
метъл
метър метър
метял метял
мехме мехме
@ -10336,6 +10346,8 @@
първа първа
първи първи
първо първо
пърди
пърдя
пържа пържа
пържи пържи
пърли пърли
@ -10993,7 +11005,10 @@
сепия сепия
сепна сепна
сепне сепне
серат
серем
серен серен
сереш
серив серив
серии серии
серия серия
@ -12095,6 +12110,7 @@
тупне тупне
тупти тупти
туптя туптя
турбо
турен турен
турел турел
турим турим
@ -17334,6 +17350,8 @@
дюлите дюлите
дюлята дюлята
дюната дюната
дюнера
дюнери
дюните дюните
дюшеме дюшеме
дюшека дюшека
@ -18178,6 +18196,7 @@
зарека зарека
зареша зареша
зарече зарече
зариби
зарибя зарибя
зарива зарива
зариеш зариеш
@ -18382,6 +18401,7 @@
звънът звънът
звънял звънял
звънят звънят
звънях
звярът звярът
здание здание
здания здания
@ -23138,6 +23158,7 @@
обточа обточа
обтяга обтяга
обувай обувай
обувал
обуват обуват
обувка обувка
обувам обувам
@ -25889,6 +25910,7 @@
пруски пруски
пруско пруско
пръдла пръдла
пръдна
пръдни пръдни
пръдня пръдня
пръжка пръжка
@ -26055,6 +26077,9 @@
първак първак
първия първия
пъргав пъргав
пърдим
пърдиш
пърдят
пържат пържат
пържел пържел
пържен пържен
@ -27158,6 +27183,7 @@
сергей сергей
сергии сергии
сергия сергия
серете
сериен сериен
серист серист
серите серите
@ -27738,6 +27764,7 @@
смукач смукач
смукна смукна
смутен смутен
смутих
смутни смутни
смутня смутня
смучат смучат
@ -29941,6 +29968,7 @@
усещах усещах
усилен усилен
усилие усилие
усилих
усилва усилва
усещаш усещаш
усилил усилил
@ -36767,6 +36795,7 @@
дюдюкам дюдюкам
дюкянче дюкянче
дюкянът дюкянът
дюнерът
дюлгери дюлгери
дяволит дяволит
дяволии дяволии
@ -38642,6 +38671,7 @@
звънчев звънчев
звъняла звъняла
звъняло звъняло
звъняха
звъняща звъняща
звънящи звънящи
здравей здравей
@ -44209,6 +44239,8 @@
накацат накацат
наквася наквася
накваси накваси
накефен
накефил
накисна накисна
накипря накипря
накисва накисва
@ -50893,6 +50925,7 @@
присвия присвия
присвои присвои
присвоя присвоя
присети
присипя присипя
присипи присипи
прислон прислон
@ -51275,6 +51308,7 @@
прусаци прусаци
пруския пруския
пръдльо пръдльо
пръднах
пръкват пръкват
пръсвам пръсвам
пръскам пръскам
@ -51505,6 +51539,7 @@
пъргава пъргава
пъргави пъргави
пъргаво пъргаво
пърдите
пържели пържели
пържена пържена
пържене пържене
@ -54103,6 +54138,7 @@
смутено смутено
смутили смутили
смутило смутило
смутиха
смутове смутове
смучене смучене
смучещи смучещи
@ -55645,6 +55681,7 @@
сюртука сюртука
сюрприз сюрприз
сядайки сядайки
сядайте
сяклата сяклата
сяклото сяклото
сякохте сякохте
@ -57216,6 +57253,7 @@
усилила усилила
усилили усилили
усилило усилило
усилиха
усилния усилния
ускорен ускорен
ускорим ускорим
@ -65170,6 +65208,7 @@
дюлгеров дюлгеров
дюлевото дюлевото
дюлевата дюлевата
дюнерите
дюшемето дюшемето
дюшеклък дюшеклък
дюшеците дюшеците
@ -67829,6 +67868,8 @@
звънчеви звънчеви
звънчето звънчето
звънчета звънчета
звъняхме
звъняхте
зданието зданието
зданията зданията
здравата здравата
@ -74384,6 +74425,12 @@
накацане накацане
наквасям наквасям
накачуля накачуля
накефена
накефени
накефено
накефила
накефило
накефили
накисвам накисвам
накипели накипели
накипрям накипрям
@ -79725,6 +79772,7 @@
петролно петролно
петролът петролът
петромир петромир
петрохан
петрунов петрунов
петрушев петрушев
петстаен петстаен
@ -84110,6 +84158,8 @@
присвоят присвоят
присегна присегна
приседна приседна
присетил
присетих
прислони прислони
прислоня прислоня
прислуга прислуга
@ -84897,6 +84947,7 @@
пруските пруските
пруският пруският
пруското пруското
пръднаха
пръднята пръднята
пръжките пръжките
пръкване пръкване
@ -88121,6 +88172,8 @@
смукване смукване
смутения смутения
смутител смутител
смутихме
смутихте
смутната смутната
смутните смутните
смутното смутното
@ -91605,6 +91658,8 @@
усилващо усилващо
усиления усиления
усилието усилието
усилихме
усилихте
усилията усилията
усилната усилната
усилните усилните
@ -99610,6 +99665,7 @@
еднородно еднородно
едноръкия едноръкия
едноселец едноселец
еднослоен
едностаен едностаен
еднотипен еднотипен
еднотипна еднотипна
@ -101711,6 +101767,7 @@
зарибявам зарибявам
зарибяван зарибяван
зарибяват зарибяват
зарибяващ
заридавам заридавам
заричания заричания
заробване заробване
@ -109152,6 +109209,8 @@
наквасвам наквасвам
наквасено наквасено
наквасяне наквасяне
накефиния
накефилия
накипряне накипряне
накипявам накипявам
накисване накисване
@ -120438,6 +120497,10 @@
присвоили присвоили
присвоиха присвоиха
присвоява присвоява
присетила
присетили
присетило
присетиха
присипвам присипвам
присламча присламча
присламчи присламчи
@ -121514,6 +121577,8 @@
пружинира пружинира
пружините пружините
прусаците прусаците
пръднахме
пръднахте
пръждосам пръждосам
пръскалка пръскалка
пръскания пръскания
@ -124593,7 +124658,9 @@
сканиращи сканиращи
сканиращо сканиращо
скапалата скапалата
скапаната
скапаните скапаните
скапаният
скапаното скапаното
скапулите скапулите
скараните скараните
@ -134829,6 +134896,7 @@
доносничка доносничка
дооглаждам дооглаждам
дооздравея дооздравея
дооправяне
дооформена дооформена
дооформени дооформени
дооформяне дооформяне
@ -135391,6 +135459,9 @@
еднорелсов еднорелсов
еднородния еднородния
едносложен едносложен
еднослойна
еднослойни
еднослойно
едносменен едносменен
едносменно едносменно
едносричен едносричен
@ -137062,6 +137133,10 @@
зарежещото зарежещото
зарибяване зарибяване
зарибявани зарибявани
зарибяваме
зарибявате
зарибяваща
зарибяващи
зарибяващо зарибяващо
заридаване заридаване
заробените заробените
@ -143285,6 +143360,14 @@
накачилите накачилите
накачулвам накачулвам
наквасване наквасване
накефената
накефените
накефеният
некефеното
накефилата
накефилите
накефилият
накефилото
накипяване накипяване
накирливям накирливям
накичената накичената
@ -153937,6 +154020,8 @@
присвояващ присвояващ
приседнала приседнала
приседнали приседнали
присетихме
присетихте
присипване присипване
прискърбен прискърбен
прискърбие прискърбие
@ -166423,6 +166508,7 @@
дообработва дообработва
дообяснения дообяснения
дообяснявам дообяснявам
дооправяйки
дооформявам дооформявам
дооценяване дооценяване
допечатване допечатване
@ -166833,6 +166919,7 @@
едносеменен едносеменен
едносеменна едносеменна
едносеменно едносеменно
еднослойния
едносмислен едносмислен
едносричния едносричния
едностайния едностайния
@ -168236,6 +168323,7 @@
зарежданите зарежданите
зареждащата зареждащата
зареждащото зареждащото
зарибяващия
зарзаватчия зарзаватчия
заробването заробването
зародишната зародишната
@ -191054,6 +191142,7 @@
дообясняваме дообясняваме
дообясняване дообясняване
дооздравявам дооздравявам
дооправянето
допирателния допирателния
допитванията допитванията
допринасяйки допринасяйки
@ -191368,6 +191457,10 @@
едноседмична едноседмична
едноседмични едноседмични
едноседмично едноседмично
еднослойната
еднослойните
еднослойният
еднослойното
едносмислено едносмислено
едносричните едносричните
едностайната едностайната
@ -192104,6 +192197,10 @@
зарежданията зарежданията
зарибяването зарибяването
зарибяваните зарибяваните
зарибяващата
зарибяващите
зарибяващият
зарибяващото
заробителите заробителите
зарозовяване зарозовяване
заруменяване заруменяване

File diff suppressed because it is too large Load diff

View file

@ -4670,7 +4670,6 @@ acutezza
acuti acuti
acutissimo acutissimo
acuto acuto
ad
adagerai adagerai
adageranno adageranno
adagerebbe adagerebbe
@ -7568,7 +7567,6 @@ aguzzò
agì agì
ahi ahi
ahimè ahimè
ai
aitante aitante
aitanti aitanti
aiuola aiuola
@ -7634,7 +7632,6 @@ aizzasse
aizzata aizzata
aizzatori aizzatori
aizzava aizzava
al
ala ala
alabardieri alabardieri
alabastro alabastro
@ -16070,9 +16067,7 @@ beffarlo
beffato beffato
beffatore beffatore
beffe beffe
beh
bei bei
bel
belano belano
belare belare
belati belati
@ -16100,7 +16095,6 @@ beltà
belva belva
belve belve
bemolle bemolle
ben
benché benché
benda benda
bendai bendai
@ -16467,7 +16461,6 @@ birra
birre birre
birreria birreria
birrerie birrerie
bis
bisacce bisacce
bisaccia bisaccia
bisava bisava
@ -16626,7 +16619,6 @@ blocchiate
blocchino blocchino
blocco blocco
bloccò bloccò
blu
bobina bobina
bocca bocca
boccacce boccacce
@ -17631,7 +17623,6 @@ buttino
butto butto
buttò buttò
buzzurri buzzurri
c
cabala cabala
cabalistiche cabalistiche
cabina cabina
@ -19331,7 +19322,6 @@ cavò
cazzata cazzata
cazzo cazzo
cazzotti cazzotti
ce
cecino cecino
cecità cecità
ceco ceco
@ -20152,7 +20142,6 @@ chiuso
chiusura chiusura
chiusure chiusure
ché ché
ci
ciabattino ciabattino
cialda cialda
cialde cialde
@ -20800,7 +20789,6 @@ coinvolgimento
coinvolgono coinvolgono
coinvolse coinvolse
coinvolti coinvolti
col
cola cola
colai colai
colammo colammo
@ -21222,7 +21210,6 @@ coltura
colui colui
colà colà
colò colò
com
comanda comanda
comandai comandai
comandamenti comandamenti
@ -22413,7 +22400,6 @@ comunisti
comunitari comunitari
comunità comunità
comunque comunque
con
conca conca
concatena concatena
concatenai concatenai
@ -25871,7 +25857,6 @@ corto
corvaccio corvaccio
corvi corvi
corvo corvo
cos
cosa cosa
cosce cosce
coscia coscia
@ -27181,8 +27166,6 @@ custodivi
custodivo custodivo
custodì custodì
cute cute
d
da
dabbene dabbene
daccapo daccapo
dacceli dacceli
@ -27196,7 +27179,6 @@ daglielo
dai dai
daini daini
daino daino
dal
dall dall
dalla dalla
dalle dalle
@ -27211,7 +27193,6 @@ dammele
dammelo dammelo
dammene dammene
dammi dammi
dan
danaro danaro
danarose danarose
dando dando
@ -27373,7 +27354,6 @@ dappocaggine
dappoco dappoco
dappresso dappresso
dapprima dapprima
dar
darai darai
daranno daranno
darcelo darcelo
@ -28319,13 +28299,11 @@ degradiate
degradino degradino
degrado degrado
degradò degradò
deh
dei dei
deiezione deiezione
deificare deificare
deificati deificati
deità deità
del
delato delato
delatore delatore
delatori delatori
@ -29838,7 +29816,6 @@ deturpiate
deturpino deturpino
deturpo deturpo
deturpò deturpò
dev
devasta devasta
devastai devastai
devastammo devastammo
@ -29906,7 +29883,6 @@ devote
devoti devoti
devoto devoto
devozione devozione
di
dia dia
diabete diabete
diabolica diabolica
@ -31280,7 +31256,6 @@ diplomazie
diplomi diplomi
dipolo dipolo
diporto diporto
dir
dirada dirada
diradai diradai
diradammo diradammo
@ -33482,7 +33457,6 @@ divulgò
dizionari dizionari
dizionario dizionario
dizione dizione
do
dobbiamo dobbiamo
docce docce
doccia doccia
@ -33699,7 +33673,6 @@ domino
dominò dominò
domo domo
domò domò
don
dona dona
donaci donaci
donai donai
@ -34012,7 +33985,6 @@ dottrinale
dottrinalmente dottrinalmente
dottrine dottrine
dotò dotò
dov
dove dove
dovendo dovendo
dovendosi dovendosi
@ -34500,7 +34472,6 @@ economizza
economizzi economizzi
economizzo economizzo
ecumenica ecumenica
ed
edera edera
edere edere
edicola edicola
@ -34744,10 +34715,8 @@ eguagliò
eguale eguale
eguali eguali
egualmente egualmente
eh
ehi ehi
ehm ehm
ei
elabora elabora
elaborai elaborai
elaborammo elaborammo
@ -37304,7 +37273,6 @@ evolverà
evviva evviva
extraterrestri extraterrestri
eziandio eziandio
fa
fabbri fabbri
fabbrica fabbrica
fabbricai fabbricai
@ -37585,7 +37553,6 @@ famosissima
famosissimi famosissimi
famosissimo famosissimo
famoso famoso
fan
fanale fanale
fanali fanali
fanatica fanatica
@ -37629,7 +37596,6 @@ fantine
fantini fantini
fantino fantino
fantocciata fantocciata
far
fara fara
farabutti farabutti
farai farai
@ -38564,7 +38530,6 @@ filtri
filtro filtro
filza filza
filò filò
fin
fina fina
finale finale
finali finali
@ -39553,7 +39518,6 @@ fotografo
fotogramma fotogramma
fotolitografica fotolitografica
fottuto fottuto
fra
fracassa fracassa
fracassai fracassai
fracassammo fracassammo
@ -40228,7 +40192,6 @@ fruttiere
fruttino fruttino
frutto frutto
fruttò fruttò
fu
fucila fucila
fucilai fucilai
fucilammo fucilammo
@ -40787,7 +40750,6 @@ garze
garzoncello garzoncello
garzone garzone
garzoni garzoni
gas
gasati gasati
gassose gassose
gatta gatta
@ -41679,7 +41641,6 @@ glaciali
gladiatore gladiatore
gladiatori gladiatori
gleba gleba
gli
gliceridi gliceridi
glicerina glicerina
gliel gliel
@ -42506,7 +42467,6 @@ grossolano
grotta grotta
grotte grotte
groviera groviera
gru
grucce grucce
gruccia gruccia
grugnire grugnire
@ -42951,14 +42911,11 @@ gustose
gustosi gustosi
gustoso gustoso
gustò gustò
ha
hacker hacker
hai hai
hamiltoniana hamiltoniana
han
hanno hanno
hardware hardware
ho
iattura iattura
ibrida ibrida
ibridi ibridi
@ -43173,7 +43130,6 @@ ignuda
ignude ignude
ignudi ignudi
igroscopica igroscopica
il
ilari ilari
ilarità ilarità
illanguidire illanguidire
@ -45026,7 +44982,6 @@ imputino
imputo imputo
imputridiscono imputridiscono
imputò imputò
in
inabile inabile
inabili inabili
inabilità inabilità
@ -50363,7 +50318,6 @@ inzuppiate
inzuppino inzuppino
inzuppo inzuppo
inzuppò inzuppò
io
iodio iodio
ione ione
ionizzato ionizzato
@ -50911,8 +50865,6 @@ itinerario
itterizia itterizia
ivi ivi
kg kg
l
la
labbra labbra
labbreggiava labbreggiava
labbro labbro
@ -51612,7 +51564,6 @@ lavorò
lavò lavò
lazzaretto lazzaretto
lazzi lazzi
le
leale leale
leali leali
lealtà lealtà
@ -52101,7 +52052,6 @@ lezioni
leziosaggine leziosaggine
lezioso lezioso
lezzo lezzo
li
liana liana
libazioni libazioni
libbra libbra
@ -52606,7 +52556,6 @@ livornese
livornesi livornesi
livrea livrea
livree livree
lo
lobbia lobbia
locale locale
locali locali
@ -53041,8 +52990,6 @@ luttuose
luttuosissima luttuosissima
m
ma
maccherone maccherone
maccheroni maccheroni
macchia macchia
@ -53222,7 +53169,6 @@ mais
maiuscola maiuscola
maiuscole maiuscole
maiuscolo maiuscolo
mal
mala mala
malagevole malagevole
malagevoli malagevoli
@ -54270,7 +54216,6 @@ mazzetti
mazzetto mazzetto
mazzi mazzi
mazzo mazzo
me
meccanica meccanica
meccanicamente meccanicamente
meccaniche meccaniche
@ -54437,7 +54382,6 @@ memorizziate
memorizzino memorizzino
memorizzo memorizzo
memorizzò memorizzò
men
mena mena
menadito menadito
menai menai
@ -54909,7 +54853,6 @@ mezzo
mezzodì mezzodì
mezzogiorno mezzogiorno
mezzosangue mezzosangue
mi
mia mia
miagola miagola
miagolai miagolai
@ -56528,7 +56471,6 @@ mutualmente
mutuamente mutuamente
mutui mutui
mutò mutò
n
nacque nacque
nacquero nacquero
nacqui nacqui
@ -56821,7 +56763,6 @@ nazionalizzate
nazionalizzazione nazionalizzazione
nazione nazione
nazioni nazioni
ne
neanchio neanchio
neanche neanche
nebbia nebbia
@ -56959,7 +56900,6 @@ negri
negro negro
negò negò
nei nei
nel
nell nell
nella nella
nelle nelle
@ -57089,7 +57029,6 @@ nipote
nipoti nipoti
nitrati nitrati
nitrato nitrato
no
nobile nobile
nobili nobili
nobiliare nobiliare
@ -57264,7 +57203,6 @@ nominiate
nominino nominino
nomino nomino
nominò nominò
non
nona nona
nonché nonché
noncurante noncurante
@ -58053,7 +57991,6 @@ oculati
oculato oculato
oculista oculista
oculisti oculisti
od
ode ode
odi odi
odia odia
@ -60412,7 +60349,6 @@ pappagalli
pappagallo pappagallo
pappagorgia pappagorgia
pappe pappe
par
para para
parabola parabola
parabole parabole
@ -61853,7 +61789,6 @@ penò
pepe pepe
peperone peperone
peperoni peperoni
per
pera pera
peraltro peraltro
perbacco perbacco
@ -62811,7 +62746,6 @@ pezzettini
pezzetto pezzetto
pezzi pezzi
pezzo pezzo
pi
pia pia
piaccia piaccia
piacciamo piacciamo
@ -66165,7 +66099,6 @@ privilegiati
privilegiato privilegiato
privilegio privilegio
privo privo
pro
probabile probabile
probabili probabili
probabilistici probabilistici
@ -68489,7 +68422,6 @@ pupulliate
pupullino pupullino
pupullo pupullo
pupullò pupullò
pur
pura pura
puramente puramente
purché purché
@ -70851,7 +70783,6 @@ razzoliate
razzolino razzolino
razzolo razzolo
razzolò razzolò
re
rea rea
reagente reagente
reagenti reagenti
@ -78790,8 +78721,6 @@ ruzzoliate
ruzzolino ruzzolino
ruzzolo ruzzolo
ruzzolò ruzzolò
s
sa
sabatico sabatico
sabato sabato
sabbia sabbia
@ -79322,7 +79251,6 @@ salvo
salvò salvò
salì salì
salò salò
san
sana sana
sanai sanai
sanammo sanammo
@ -84367,7 +84295,6 @@ sdrucciolare
sdrucciolava sdrucciolava
sdrucciolò sdrucciolò
sdrucito sdrucito
se
sebbene sebbene
secca secca
seccaggine seccaggine
@ -87379,7 +87306,6 @@ sguinzaglino
sguinzaglio sguinzaglio
sguinzagliò sguinzagliò
sgusciavano sgusciavano
si
sia sia
siamo siamo
siano siano
@ -87806,7 +87732,6 @@ simultanee
simultanei simultanei
simultaneo simultaneo
simulò simulò
sin
sinagoga sinagoga
sinagoghe sinagoghe
sincera sincera
@ -89128,7 +89053,6 @@ snodiate
snodino snodino
snodo snodo
snodò snodò
so
soave soave
soavemente soavemente
soavi soavi
@ -89601,7 +89525,6 @@ sognino
sogno sogno
sognò sognò
soia soia
sol
sola sola
solai solai
solaio solaio
@ -89980,7 +89903,6 @@ sommo
sommossa sommossa
sommosse sommosse
sommò sommò
son
sonagli sonagli
sonaglio sonaglio
sonanti sonanti
@ -91031,7 +90953,6 @@ sotto
sottoargomenti sottoargomenti
sottocapitoli sottocapitoli
sottocapitolo sottocapitolo
sottocchio
sottocoppa sottocoppa
sottocosto sottocosto
sottocutanei sottocutanei
@ -97608,11 +97529,9 @@ stuzzichiate
stuzzichino stuzzichino
stuzzico stuzzico
stuzzicò stuzzicò
su
sua sua
suadente suadente
suadenti suadenti
sub
subacquea subacquea
subacquee subacquee
subacquei subacquei
@ -97832,7 +97751,6 @@ succulenti
succulento succulento
succursale succursale
succursali succursali
sud
suda suda
sudai sudai
sudammo sudammo
@ -98116,7 +98034,6 @@ suicidio
suindicato suindicato
suini suini
suino suino
sul
sulfurea sulfurea
sulfuree sulfuree
sulfurei sulfurei
@ -99634,7 +99551,6 @@ svuoto
svuotò svuotò
t
tabaccai tabaccai
tabaccaio tabaccaio
tabacchi tabacchi
@ -99830,7 +99746,6 @@ tagliuzzino
tagliuzzo tagliuzzo
tagliuzzò tagliuzzò
tagliò tagliò
tal
talamo talamo
talari talari
talché talché
@ -100395,7 +100310,6 @@ tavolozza
tavolozze tavolozze
tazza tazza
tazze tazze
te
teatrale teatrale
teatrali teatrali
teatri teatri
@ -101470,7 +101384,6 @@ tetto
tettoia tettoia
tettoie tettoie
tettuccio tettuccio
ti
tibia tibia
tibie tibie
ticchettii ticchettii
@ -102532,7 +102445,6 @@ tozza
tozze tozze
tozzi tozzi
tozzo tozzo
tra
traballa traballa
traballai traballai
traballammo traballammo
@ -104841,7 +104753,6 @@ travolti
travolto travolto
trazione trazione
trazioni trazioni
tre
trebbi trebbi
trebbia trebbia
trebbiai trebbiai
@ -105797,7 +105708,6 @@ truffo
truffò truffò
truppa truppa
truppe truppe
tu
tua tua
tuba tuba
tubai tubai
@ -106199,7 +106109,6 @@ tuttavia
tutte tutte
tutti tutti
tutto tutto
tuttora
ubbidiente ubbidiente
ubbidienti ubbidienti
@ -106630,7 +106539,6 @@ umori
umorismi umorismi
umorismo umorismo
umoristico umoristico
un
una una
unanime unanime
unanimi unanimi
@ -107251,8 +107159,6 @@ uve
uxoricida uxoricida
uxoricidi uxoricidi
uxoricidio uxoricidio
v
va
vacante vacante
vacanti vacanti
vacanza vacanza
@ -107535,7 +107441,6 @@ vagò
vai vai
vaioli vaioli
vaiolo vaiolo
val
valanga valanga
valanghe valanghe
vale vale
@ -107545,7 +107450,6 @@ valente
valenti valenti
valentissimo valentissimo
valentuomini valentuomini
valentuomo
valenza valenza
valere valere
valersi valersi
@ -107696,7 +107600,6 @@ vammi
vampa vampa
vampiri vampiri
vampiro vampiro
van
vana vana
vanagloria vanagloria
vanamente vanamente
@ -107997,7 +107900,6 @@ vaticano
vaticinare vaticinare
vattene vattene
vatti vatti
ve
vecchi vecchi
vecchia vecchia
vecchiacci vecchiacci
@ -108931,7 +108833,6 @@ vezzosa
vezzose vezzose
vezzosi vezzosi
vezzoso vezzoso
vi
via via
viadotti viadotti
viadotto viadotto
@ -110178,7 +110079,6 @@ xenofobia
xilofono xilofono
zabaione zabaione
zabaioni zabaioni
zac
zacchere zacchere
zaffata zaffata
zaffate zaffate
@ -110271,7 +110171,6 @@ zero
zeta zeta
zia zia
zibellino zibellino
zic
zie zie
zigomo zigomo
zii zii

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -1,5 +1,6 @@
Bulgarian wordlist by: Miglen Georgiev Bulgarian wordlist by: Miglen Georgiev
Version: f46eff1 (2022-04-26) Version: f46eff1 (2022-04-26)
Words Count: 234114
Source: https://github.com/miglen/bulgarian-wordlists/blob/master/wordlists/bg-words-validated-cyrillic.txt Source: https://github.com/miglen/bulgarian-wordlists/blob/master/wordlists/bg-words-validated-cyrillic.txt
License: https://github.com/miglen/bulgarian-wordlists/blob/master/LICENSE License: https://github.com/miglen/bulgarian-wordlists/blob/master/LICENSE
Additionally cleaned up repeating words and added some missing ones.

View file

@ -1,15 +1,33 @@
// Source for English dictionary: http://wordlist.sourceforge.net/ Custom wordlist generated from http://app.aspell.net/create using SCOWL
with parameters (words with 2-3 letters):
diacritic: strip
max_size: 50
max_variant: 0
special: <none>
spelling: US
with parameters (words with 4 or more letters):
diacritic: strip
max_size: 70
max_variant: 2
special: hacker
spelling: US GBz
Using Git Commit From: Mon Dec 7 20:14:35 2020 -0500 [5ef55f9]
=====
Spell Checking Oriented Word Lists (SCOWL) Spell Checking Oriented Word Lists (SCOWL)
Revision 7.1 (SVN Revision 161)
January 6, 2011 Mon Dec 7 20:14:35 2020 -0500 [5ef55f9]
by Kevin Atkinson (kevina@gnu.org) by Kevin Atkinson (kevina@gnu.org)
The SCOWL is a collection of word lists split up in various sizes, and The SCOWL is a collection of word lists split up in various sizes, and
other categories, intended to be suitable for use in spell checkers. other categories, intended to be suitable for use in spell checkers.
However, I am sure it will have numerous other uses as well. However, I am sure it will have numerous other uses as well.
The latest version can be found at http://wordlist.sourceforge.net/. The latest version can be found at http://wordlist.aspell.net/.
The directory final/ contains the actual word lists broken up into The directory final/ contains the actual word lists broken up into
various sizes and categories. The r/ directory contains Readmes from various sizes and categories. The r/ directory contains Readmes from
@ -29,10 +47,11 @@ Except for the special word lists the files follow the following
naming convention: naming convention:
<spelling category>-<sub-category>.<size> <spelling category>-<sub-category>.<size>
Where the spelling category is one of Where the spelling category is one of
english, american, british, british_z, canadian, english, american, british, british_z, canadian, australian
variant_0, varaint_1, variant_2, variant_1, variant_2, variant_3,
british_variant_0, british_variant_1, british_variant_1, british_variant_2,
canadian_variant_0, canadian_variant_1, canadian_variant_1, canadian_variant_2,
australian_variant_1, australian_variant_2
Sub-category is one of Sub-category is one of
abbreviations, contractions, proper-names, upper, words abbreviations, contractions, proper-names, upper, words
And size is one of And size is one of
@ -44,131 +63,273 @@ Where description is one of:
roman-numerals, hacker roman-numerals, hacker
The perl script "mk-list" can be used to create a word list of the The perl script "mk-list" can be used to create a word list of the
desired size, it usage is: desired size, its usage is:
./mk-list [-f] [-v#] <spelling categories> <size> ./mk-list [-f] [-v#] <spelling categories> <size>
where <spelling categories> is one of the above spelling categories where <spelling categories> is one of the above spelling categories
(the english and special categories are automatically included as well (the english and special categories are automatically included as well
as all sub-categories) and <size> is the desired desired size. The as all sub-categories) and <size> is the desired size. The
"-v" option can be used to used to also include the appropriate "-v" option can be used to also include the appropriate
variants file up to level '#'. The normal output will be a sorted variants file up to level '#'. The normal output will be a sorted
word list. If you rather see what files will be included, use the word list. If you rather see what files will be included, use the
"-f" option. "-f" option.
When manually combining the words lists the "english" spelling When manually combining the words lists the "english" spelling
category should be used as well as one of "american", "british", category should be used as well as one of "american", "british",
"british_z" (british with ize spelling), or "canadian". Great care "british_z" (british with ize spelling), "canadian" or "australian".
has been taken so that that only one spelling for any particular word Great care has been taken so that only one spelling for any particular
is included in the main list (with some minor exceptions). When two word is included in the main list (with some minor exceptions). When
variants were considered equal I randomly picked one for inclusion in two variants were considered equal I randomly picked one for inclusion
the main word list. Unfortunately this means that my choice in how to in the main word list. Unfortunately this means that my choice in how
spell a word may not match your choice. If this is the case you can to spell a word may not match your choice. If this is the case you
try including one of the "variant_0" spelling categories which can try including one of the "variant_1" spelling categories which
includes most variants which are considered almost equal. The includes most variants which are considered almost equal. The
"variant_0" spelling category corresponds mostly to American variants, "variant_1" spelling category corresponds mostly to American variants,
while the "british_variant_0" and "canadian_variant_0" are for British while the "british_variant_1", "canadian_variant_1" and
and Canadian variants, respectively. The "variant_1" spelling "australian_variant_1" are for British, Canadian and Australian
categories include variants which are also generally considered variants, respectively. The "variant_2" spelling categories include
acceptable, and "variant_2" contains variants which are seldom used variants which are also generally considered acceptable, and
and may now even be considered correct. There is no "variant_3" contains variants which are seldom used and may not even
"british_variant_2" or "canadian_variant_2" spelling category since be considered correct. There is no "british_variant_3",
"canadian_variant_3" or "australian_variant_3" spelling category since
the distinction would be almost meaningless. the distinction would be almost meaningless.
The "abbreviation" category includes abbreviations and acronyms which The "abbreviation" category includes abbreviations and acronyms which
are not also normal words. The "contractions" category should be self are not also normal words. The "contractions" category should be self
explanatory. The "upper" category includes upper case words and proper explanatory. The "upper" category includes upper case words and proper
names which are common enough to appear in a typical dictionary. The names which are common enough to appear in a typical dictionary. The
"proper-names" category included all the additional uppercase words. "proper-names" category includes all the additional uppercase words.
Final the "words" category contains all the normal English words. Finally the "words" category contains all the normal English words.
To give you an idea of what the words in the various sizes look like To give you an idea of what the words in the various sizes look like
here is a sample of 25 random words found only in that size: here is a sample of 25 random words found only in that size:
10: advertised agreeing artificial bucket changes closest currently finding 10: blow convert delete enables flow hot individual job maintains occurred
implications learning liable obvious partial peace planet preparing pointless political population provided quits recovering results settles
produced regulations shortly tries under unnecessary vacations vast wind simultaneous situation source tickets uncertain uses why
20: accomplishes addict baffles blink chapel corrections depresses dripping 20: additions advertisement akin applicants appoints celebrated contracts
erased infant interfere launch nicking novels paranoid passport pursued crime degradation discriminate enforcing escapes fabric funeral
recruitment rectifying relaxed sixteen sundry tab undergone withdraws genetically inconsistencies initialized innovative lodge lurking
photographic punches tiring trumpet wary
35: adores affixes brisks caking conciliates decimates discretionary 35: bagel brewed bushel charting commutative consigning dabbed displacements
dispatches forensics glorify gridiron healed hurling kelp massacring fatties flotillas flung gunshots harrow hull hungriest kangaroos math
necks pits placarding pyramids ratting recreates renovated sandals shirks memoirs negatives nonresident rampages ranchers submissive subtractions
subtract tipped
40: demoed dichotomy dilapidation disheveled ebullience estimable finagling 40: astrologers bedraggles buzzword cupcakes eyeglass gridlock grungy
hemorrhoid lazily medalists mintiest motherboards ostracism pornographers hairpiece hallucinates hotcakes inebriated leakier nymphomania papergirls
predilections remarries southbound steamrolled sympathizers tads tampons patchier patrolman predisposed reshuffled sasses snowmobiling
tattletale upchucked vainly viscous southeasterly teargas testiest topographer wimpy
50: bootless brawler bulkhead canoeist declassifying farthings hake hectors 50: apiaries besmirching boozier caducei communicant drainpipe ductile
helpmate hermitage humanoid kitsch mercerize pawnshops pleasingly exigencies gammas grouted harbinger hyphenations licentiate lynxes
retrorockets scurrilously solemnizes superficiality symbiosis tangelo maidenly malingerer palmettos pinwheeled prepackage propellant scrimmaged
timetabling unenviable unmoral unreconstructed sculleries senselessly unscrambled viburnums
55: beachfront bicarbonate caff campanologists execrably fab fightback 55: bloodstock bodge bruiting bumbag carthorse clumpy dandifying etiolated
firebricks insipidity laboriousness megawatts mirthlessly misnames fleabite guestrooms marge moi overdeveloped owlishly perisher plebes
nymphos photocell potholed psychoactive psychoanalytically schoolmarmish pseudy pukka putzes sangria splodges stocktaking subspecies tiebreaks
simulacra subeditors supremo sweated turbocharges yogic touchpapers
60: assayer banteringly besmeared brazer chromatin cremes deciliters 60: autobiographic cytologist fellowman footraces gypsters hardihood
doubtfulness enshrinement ephemerally fibular globalist gypper headshrinker homo interfile nonoperational nonsupporting outdraw
legitimatized mensch mopers oversea pantyliner paratyphoid redivide profligately readopted revetments semanticist stagnantly tapper thanes
rehabilitative salesladies sensualists superposition univalves thetas uncloaking uncross versifiers wasabi xylene
70: adactylous anticapitalist bezant bister boraginaceous civically cossacks 70: biltongs bookcraft bouilli bouse bronchiole cirrostrati coenurus
cousinly curricle dekaliter grippingly grugrus gurging hermaphroditism desorption feculence hackbuts heterolysis hylophagous ichthyosaur
levanted magnetizer nonapplicable panegyrists parametrize radomes iguanodon jillion lapidated mistranslating pullulating redd shylock skink
refilter ruinations teths truistic uts storaxes thalluses vermiculations voiture
80: bodikin buhrs covetiveness diarch disaccharidases drumbeater empusas 80: cellulolytic chomper costrels ditheistic doddard dwarfest fellwalkers
flyings hyperexcitability hyperpolarizations janizaries overwash fernless gammoners gasolinic introductive labrets macaber
physiocrats postform postsecondary preambulate puzzlehead remixer perspicaciousness pharmacodynamics pitchwomen pleuritical protore
snoutier tetrathlons toothdrawing triff unaffectionate wearish yawy repurifies ristras rolamite rumping sedimenting smithereening tolans
95: actinophone aerobious anadenia biochemics chromatopathia ciclatouns 95: amherstite appropinquations arsefoot assur commodate craspedia cutitis
gaspiest guapinol hagigah interdorsal melanotekite minnicking disciferous endeavourments endocondensation glyoxalase hatherlite
nonretrenchment overloftily oystriges peltandra retromaxillary interreticular interspicular lipothymy prieved reconvergence rousette
subterraqueous transphysically unconfidential unvalidating upspew septerium superdonation tenaim topepo trachelitis transgeneses
verminlike vetiveria yerth ultraenthusiastic
And here is a count on the number of in each spelling category
And here is a count on the number of words in each spelling category
(american + english spelling category): (american + english spelling category):
Size Words Names Running Total % Size Words Names Running Total %
10 4,427 15 4,442 0.7 10 4,425 13 4,438 0.7
20 8,122 0 12,564 1.9 20 8,126 0 12,564 1.9
35 37,251 224 50,039 7.7 35 37,260 220 50,044 7.6
40 6,802 503 57,344 8.8 40 6,858 489 57,391 8.7
50 24,505 15,455 97,304 14.9 50 25,289 18,683 101,363 15.4
55 6,555 0 103,859 15.9 55 6,487 0 107,850 16.4
60 13,633 775 118,267 18.1 60 14,551 850 123,251 18.7
70 35,507 7,747 161,521 24.8 70 35,294 7,897 166,442 25.3
80 143,791 33,293 338,605 51.9 80 144,158 33,368 343,968 52.3
95 227,056 86,814 652,475 100.0 95 227,633 86,630 658,231 100.0
(The "Words" column does not include the name count.) (The "Words" column does not include the name count.)
Size 35 is the recommended small size, 50 the medium and 70 the large. Size 35 is the recommended small size, 50 the medium and 70 the large.
For spell checking I recommend using 60. Sizes 70 and below contain Sizes 70 and below contain words found in most dictionaries while the
words found in most dictionaries while the 80 size contains all the 80 size contains all the strange and unusual words people like to use
strange and unusual words people like to use in word games such as in word games such as Scrabble (TM). While a lot of the words in the
Scrabble (TM). While a lot of the the words in the 80 size are not 80 size are not used very often, they are all generally considered
used very often, they are all generally considered valid words in the valid words in the English language. The 95 contains just about every
English language. The 95 contains just about every English word in English word in existence and then some. Many of the words at the 95
existence and then some. Many of the words at the 95 level will level will probably not be considered valid English words by most
probably not be considered valid English words by most people. I use people.
the 60 size for the English dictionary for Aspell, and I don't
recommend anyone use levels above 70 for spell checking. Levels above
70 contain rarely used words which can hide misspellings of similar
more commonly used words. For example the word "ort" can hide a
common typo of "or". No one should need to use a size larger than 80,
the 95 size is labeled insane for a reason.
Accents are present on certain words such as caf顩n iso8859-1 format. For spell checking I recommend using size 60. This size is the
largest size that I am fairly confident does not contain any
misspellings or invalid words. In addition an effort is made to
exclude valid yet problematic words (such as "calender") from the 60
size that are likely to be a misspelling of a more common word. The
70 size is reasonable for those wanting a larger list and don't mind a
few errors. The 80 or larger sizes are not reasonable for spell
checking.
Accents are present on certain words such as café in iso8859-1 format.
CHANGES: CHANGES:
From Version 2019.10.06 to 2020.12.07
Various new words.
Variant cleanups.
Bump irregardless, froward (+ derivatives) and perpend to level 70.
From Version 2018.04.16 to 2019.10.06
Various new words.
Remove compare's and fail's.
From Version 2017.08.24 to 2018.04.16
Various new words.
Fix build problems on macOS.
From Version 2017.01.22 to 2017.08.24
Various new words.
From Version 2016.11.20 to 2017.01.22
Various new words.
From Version 2016.06.26 to 2016.11.20
New Australian spelling category thanks to the work of Benjamin
Titze (btitze@protonmail.ch)
Various new words.
From Version 2016.01.19 to 2016.06.26
Various new words.
Updated to Version 6.0.2 of 12dicts
Other minor changes.
From Version 2015.08.24 to 2016.01.19
Various new words.
Clarified README to indicate why the 60 size is the preferred size
for spell checking.
Remove some very uncommon possessive forms.
Change "SET UTF8" to "SET UTF-8" in hunspell affix file.
From Version 2015.05.18 to 2015.08.24 (Aug 24, 2015)
Various new words.
From Version 2015.04.24 to 2015.05.18 (May 18, 2015)
Added some new words found to have a high frequency in the COCA
corpus. (http://corpus.byu.edu/coca/).
Fix en spelling suggestions for 'alot' and 'exersize' in hunspell
dictionary (upstreamed from the changes made in Firefox).
From Version 2015.02.15 to 2015.04.24 (April 24, 2015)
Added some new words.
Convert hunspell dictionary to UTF-8 in order to handle smart
quotes correctly.
From Version 2015.01.28 to 2015.02.15 (February 15, 2015)
Added a large number of neologisms (newly invented words)
such as "selfie" and "smartwatch" thanks to Alan Beale.
Various other new words.
Clean up the special-hacker category by removing some words that
didn't exist in the Google Book's Corpus (1980 - 2008) and
originated from the "Unofficial Jargon File Word Lists".
From Version 2014.11.17 to 2015.01.28 (January 28, 2015)
Various new words, many from analyzing the Google Book's Corpus
(1980 - 2008). See http://app.aspell.net/lookup-freq.
Moved some uncommon words that can easily hide a misspelling of a
more common word to level 70. (calender, adrenalin and Joesph)
Removed several -er and -est forms from adjectives that were so
uncommon that they were not found anywhere is the Google Book's
Corpus (1980 - 2008).
From Version 2014.08.11.1 to 2014.11.17 (November 17, 2014)
Various new words.
Fix typo in Hunspell readme.
From Version 2014.08.11 to 2014.08.11.1 (August 13, 2014)
Forgot to mention this important change from 7.1 to 2014.08.11:
Shifted the variant levels up by one: variant_0 is now variant_1,
variant_1 is now variant_2, and variant_2 is now variant_3.
Other minor fixes in this README.
No changes to the contents of the lists.
From Revision 7.1 to Version 2014.08.11 (August 11, 2014)
Added some missing possessive forms.
Added some new words and proper names.
Clean up the categories (words, upper, proper-names etc) so that they
are more accurate.
Convert documentation to UTF-8. For now, the wordlist are still in
ISO-8859-1 to prevent compatibility problems.
Add schema and scripts for creating a SQLite database from SCOWL.
Add some utility and library functions using them. This database is
used by the new web app's (http://app.aspell.net/lookup & create).
Enhance speller/make-hunspell-dict. The biggest improvement is that
it that it now generates several more dictionaries in addition to
the official ones. These additional dictionaries are ones for
British English and larger dictionaries that include up to SCOWL
size 70.
From Revision 7 to 7.1 (January 6, 2011) From Revision 7 to 7.1 (January 6, 2011)
Updated to revision 5.1 of Varcon which corrected several errors. Updated to revision 5.1 of Varcon which corrected several errors.
@ -179,7 +340,7 @@ From Revision 7 to 7.1 (January 6, 2011)
Added several now common proper names and some other words now Added several now common proper names and some other words now
in common use. in common use.
Include misc/ and speller/ directory which where in SVN but left Include misc/ and speller/ directory which were in SVN but left
out of the release tarball. out of the release tarball.
Other minor fixes, including some fixes to the taboo word lists. Other minor fixes, including some fixes to the taboo word lists.
@ -216,7 +377,7 @@ From Revision 5 to 6 (August 10, 2004)
Updated to version 4.1 of VarCon. Updated to version 4.1 of VarCon.
Added the "british_z" spelling category which it British using the Added the "british_z" spelling category which is British using the
"ize" spelling. "ize" spelling.
From Revision 4a to 5 (January 3, 2002) From Revision 4a to 5 (January 3, 2002)
@ -254,7 +415,7 @@ From Revision 3 to 4 (January 28, 2001)
Added words in the Ispell word list at the 65 level. Added words in the Ispell word list at the 65 level.
Other changes due to using more recent versions of various sources Other changes due to using more recent versions of various sources
included a more accurate version of AGID thanks to the word of included a more accurate version of AGID thanks to the work of
Alan Beale Alan Beale
From Revision 2 to 3 (August 18, 2000) From Revision 2 to 3 (August 18, 2000)
@ -285,10 +446,10 @@ From Revision 1 to 2 (August 5, 2000)
COPYRIGHT, SOURCES, and CREDITS: COPYRIGHT, SOURCES, and CREDITS:
The collective work is Copyright 2000-2011 by Kevin Atkinson as well The collective work is Copyright 2000-2018 by Kevin Atkinson as well
as any of the copyrights mentioned below: as any of the copyrights mentioned below:
Copyright 2000-2011 by Kevin Atkinson Copyright 2000-2018 by Kevin Atkinson
Permission to use, copy, modify, distribute and sell these word Permission to use, copy, modify, distribute and sell these word
lists, the associated scripts, the output created from the scripts, lists, the associated scripts, the output created from the scripts,
@ -399,7 +560,7 @@ The 40 level includes words from Alan's 3esl list found in version 4.0
of his 12dicts package. Like his other stuff the 3esl list is also in the of his 12dicts package. Like his other stuff the 3esl list is also in the
public domain. public domain.
The 50 level includes Brian's frequency class 1, words words appearing The 50 level includes Brian's frequency class 1, words appearing
in at least 5 of 12 of the dictionaries as indicated in the 12Dicts in at least 5 of 12 of the dictionaries as indicated in the 12Dicts
package, and uppercase words in at least 4 of the previous 12 package, and uppercase words in at least 4 of the previous 12
dictionaries. A decent number of proper names is also included: The dictionaries. A decent number of proper names is also included: The
@ -428,11 +589,11 @@ The 70 level includes Brian's frequency class 0 and the 74,550 common
dictionary words from the MWords package. The common dictionary words, dictionary words from the MWords package. The common dictionary words,
like those from the 12Dicts package, have had all likely inflections like those from the 12Dicts package, have had all likely inflections
added. The 70 level also included the 5desk list from version 4.0 of added. The 70 level also included the 5desk list from version 4.0 of
the 12Dics package which is the public domain. the 12Dics package which is in the public domain.
The 80 level includes the ENABLE word list, all the lists in the The 80 level includes the ENABLE word list, all the lists in the
ENABLE supplement package (except for ABLE), the "UK Advanced Cryptics ENABLE supplement package (except for ABLE), the "UK Advanced Cryptics
Dictionary" (UKACD), the list of signature words in from YAWL package, Dictionary" (UKACD), the list of signature words from the YAWL package,
and the 10,196 places list from the MWords package. and the 10,196 places list from the MWords package.
The ENABLE package, mainted by M\Cooper <thegrendel@theriver.com>, The ENABLE package, mainted by M\Cooper <thegrendel@theriver.com>,
@ -476,11 +637,30 @@ found anywhere else.
Accent information was taken from UKACD. Accent information was taken from UKACD.
My VARCON package was used to create the American, British, and The VarCon package was used to create the American, British, Canadian,
Canadian word list. and Australian word list. It is under the following copyright:
Since the original word lists used used in the VARCON package came Copyright 2000-2016 by Kevin Atkinson
from the Ispell distribution they are under the Ispell copyright:
Permission to use, copy, modify, distribute and sell this array, the
associated software, and its documentation for any purpose is hereby
granted without fee, provided that the above copyright notice appears
in all copies and that both that copyright notice and this permission
notice appear in supporting documentation. Kevin Atkinson makes no
representations about the suitability of this array for any
purpose. It is provided "as is" without express or implied warranty.
Copyright 2016 by Benjamin Titze
Permission to use, copy, modify, distribute and sell this array, the
associated software, and its documentation for any purpose is hereby
granted without fee, provided that the above copyright notice appears
in all copies and that both that copyright notice and this permission
notice appear in supporting documentation. Benjamin Titze makes no
representations about the suitability of this array for any
purpose. It is provided "as is" without express or implied warranty.
Since the original words lists come from the Ispell distribution:
Copyright 1993, Geoff Kuenning, Granada Hills, CA Copyright 1993, Geoff Kuenning, Granada Hills, CA
All rights reserved. All rights reserved.
@ -503,18 +683,18 @@ from the Ispell distribution they are under the Ispell copyright:
products derived from this software without specific prior products derived from this software without specific prior
written permission. written permission.
THIS SOFTWARE IS PROVIDED BY GEOFF KUENNING AND CONTRIBUTORS ``AS THIS SOFTWARE IS PROVIDED BY GEOFF KUENNING AND CONTRIBUTORS ``AS IS'' AND
IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL GEOFF ARE DISCLAIMED. IN NO EVENT SHALL GEOFF KUENNING OR CONTRIBUTORS BE LIABLE
KUENNING OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE SUCH DAMAGE.
POSSIBILITY OF SUCH DAMAGE.
The variant word lists were created from a list of variants found in The variant word lists were created from a list of variants found in
the 12dicts supplement package as well as a list of variants I created the 12dicts supplement package as well as a list of variants I created
@ -536,7 +716,7 @@ giant perl script. With the amount of memory available these days (at
least 2 GB, often 4 GB or more) this should not really be a problem. least 2 GB, often 4 GB or more) this should not really be a problem.
In addition, there is a very nice frequency analyze of the BNC corpus In addition, there is a very nice frequency analyze of the BNC corpus
done by Adam Kilgarriff. Unlike Brain's word lists the BNC lists done by Adam Kilgarriff. Unlike Brian's word lists the BNC lists
include part of speech information. I plan on somehow using these include part of speech information. I plan on somehow using these
lists as Adam Kilgarriff has given me the OK to use it in SCOWL. lists as Adam Kilgarriff has given me the OK to use it in SCOWL.
These lists will greatly reduce the problem of inflected forms of a These lists will greatly reduce the problem of inflected forms of a
@ -545,7 +725,7 @@ information.
There is frequency information for some other corpus such as COCA There is frequency information for some other corpus such as COCA
(Corpus of Contemporary American English) and ANS (American National (Corpus of Contemporary American English) and ANS (American National
Corpus) which I might also be able to use. The formal will require Corpus) which I might also be able to use. The former will require
permission, and the latter is of questionable quality. permission, and the latter is of questionable quality.
RECREATING THE WORD LISTS: RECREATING THE WORD LISTS:
@ -553,17 +733,17 @@ RECREATING THE WORD LISTS:
In order to recreate the word lists you need a modern version of Perl, In order to recreate the word lists you need a modern version of Perl,
bash, the traditional set of shell utilities, a system that supports bash, the traditional set of shell utilities, a system that supports
symbolic links, and quite possibly GNU Make. The easiest way to symbolic links, and quite possibly GNU Make. The easiest way to
recreate the word lists is to checkout SVN revision 161 (or tag recreate the word lists is to checkout the corresponding Git version
scowl-7.1) and simply type "make" (see http://wordlist.sourceforge.net). (see the version string at the start of the file) and simply type
You can try to download all the pieces manually, but you may not get "make" (see http://wordlist.aspell.net). You can try to download all
the same result since the latest version of some parts used to create the pieces manually, but this method is not no longer tested nor
SCOWL may not have been released yet. supported.
The src/ directory contains the numerous scripts used in the creation The src/ directory contains the numerous scripts used in the creation
of the final product. of the final product.
The r/ directory contains the raw data used to create the final The r/ directory contains the raw data used to create the final
product. If you checkout from SVN this directory should be populated product. If you checkout from Git this directory should be populated
automatically for you. If you insist on doing it the hard way see the automatically for you. If you insist on doing it the hard way see the
README file in the r/ directory for more information. README file in the r/ directory for more information.

View file

@ -1 +1,6 @@
Source for Russian dictionary: Various sources from Russian user Russian wordlist by: William Hingston
Version: 5481cb8 (2018-09-13)
Source: https://github.com/hingston/russian/blob/master/100000-russian-words.txt
License: https://github.com/hingston/russian/blob/master/LICENSE.md
Additionally cleaned up repeating and nonsense words.

View file

@ -0,0 +1,60 @@
const { basename } = require('path');
const { createReadStream, existsSync } = require('fs');
function printHelp() {
console.log(`Usage ${basename(process.argv[1])} LOCALE FILENAME.txt `);
console.log('Removes repeating words from a word list');
console.log('\nLocale could any valid JS locale, for exmaple: en, en-US, etc...');
}
function validateInput() {
if (process.argv.length < 4) {
printHelp();
process.exit(1);
}
if (!existsSync(process.argv[3])) {
console.error(`Failure! Could not find file "${process.argv[3]}."`);
process.exit(2);
}
return { fileName: process.argv[3], locale: process.argv[2] };
}
async function removeRepeatingWords({ fileName, locale }) {
const lineReader = require('readline').createInterface({
input: createReadStream(fileName)
});
const geographicalName = /[A-Z]\w+\-[^\n]+/;
const wordMap = {};
for await (const line of lineReader) {
const wordKey = geographicalName.test(line) ? line : line.toLocaleLowerCase(locale);
wordMap[wordKey] = true
}
return Object.keys(wordMap);
}
function printWords(wordList) {
if (!Array.isArray(wordList)) {
return;
}
wordList.forEach(w => console.log(w));
}
/** main **/
removeRepeatingWords(validateInput()).then(words => printWords(words));

View file

@ -183,7 +183,7 @@ public class TraditionalT9 extends KeyPadHandler {
* @return boolean * @return boolean
*/ */
protected boolean onNumber(int key, boolean hold, boolean repeat) { protected boolean onNumber(int key, boolean hold, boolean repeat) {
if (mInputMode.shouldAcceptCurrentSuggestion(key, hold, repeat)) { if (mInputMode.shouldAcceptCurrentSuggestion(mLanguage, key, hold, repeat)) {
mInputMode.onAcceptSuggestion(mLanguage, getComposingText()); mInputMode.onAcceptSuggestion(mLanguage, getComposingText());
commitCurrentSuggestion(false); commitCurrentSuggestion(false);
determineNextTextCase(); determineNextTextCase();

View file

@ -83,6 +83,6 @@ abstract public class InputMode {
public boolean shouldTrackNumPress() { return true; } public boolean shouldTrackNumPress() { return true; }
public boolean shouldTrackUpDown() { return false; } public boolean shouldTrackUpDown() { return false; }
public boolean shouldTrackLeftRight() { return false; } public boolean shouldTrackLeftRight() { return false; }
public boolean shouldAcceptCurrentSuggestion(int key, boolean hold, boolean repeat) { return false; } public boolean shouldAcceptCurrentSuggestion(Language language, int key, boolean hold, boolean repeat) { return false; }
public boolean shouldSelectNextSuggestion() { return false; } public boolean shouldSelectNextSuggestion() { return false; }
} }

View file

@ -33,7 +33,7 @@ public class ModeABC extends InputMode {
final public boolean isABC() { return true; } final public boolean isABC() { return true; }
public int getSequenceLength() { return 1; } public int getSequenceLength() { return 1; }
public boolean shouldAcceptCurrentSuggestion(int key, boolean hold, boolean repeat) { return hold || !repeat; } public boolean shouldAcceptCurrentSuggestion(Language l, int key, boolean hold, boolean repeat) { return hold || !repeat; }
public boolean shouldTrackUpDown() { return true; } public boolean shouldTrackUpDown() { return true; }
public boolean shouldTrackLeftRight() { return true; } public boolean shouldTrackLeftRight() { return true; }
public boolean shouldSelectNextSuggestion() { public boolean shouldSelectNextSuggestion() {

View file

@ -105,16 +105,16 @@ public class ModePredictive extends InputMode {
* In this mode, In addition to confirming the suggestion in the input field, * In this mode, In addition to confirming the suggestion in the input field,
* we also increase its' priority. This function determines whether we want to do all this or not. * we also increase its' priority. This function determines whether we want to do all this or not.
*/ */
public boolean shouldAcceptCurrentSuggestion(int key, boolean hold, boolean repeat) { public boolean shouldAcceptCurrentSuggestion(Language language, int key, boolean hold, boolean repeat) {
return return
hold hold
// Quickly accept suggestions using "space" instead of pressing "ok" then "space" // Quickly accept suggestions using "space" instead of pressing "ok" then "space"
|| key == 0 || key == 0
// Punctuation is considered "a word", so that we can increase the priority as needed // Punctuation is considered "a word", so that we can increase the priority as needed
// Also, it must break the current word. // Also, it must break the current word.
|| (key == 1 && digitSequence.length() > 0 && !digitSequence.endsWith("1")) || (!language.isPunctuationPartOfWords() && key == 1 && digitSequence.length() > 0 && !digitSequence.endsWith("1"))
// On the other hand, letters also "break" punctuation. // On the other hand, letters also "break" punctuation.
|| (key != 1 && digitSequence.endsWith("1")); || (!language.isPunctuationPartOfWords() && key != 1 && digitSequence.endsWith("1"));
} }

View file

@ -8,6 +8,7 @@ public class Language {
protected int id; protected int id;
protected String name; protected String name;
protected Locale locale; protected Locale locale;
protected boolean isPunctuationPartOfWords; // see the getter for more info
protected int icon; protected int icon;
protected String dictionaryFile; protected String dictionaryFile;
protected int abcLowerCaseIcon; protected int abcLowerCaseIcon;
@ -30,6 +31,24 @@ public class Language {
return icon; return icon;
} }
/**
* isPunctuationPartOfWords
* This plays a role in Predictive mode only.
*
* Return "true", if you need to use the 1-key for typing words, such as:
* "it's" (English), "a'tje" (Dutch), "п'ят" (Ukrainian).
*
* Return "false" also:
* - hide words like the above from the suggestions.
* - 1-key would commit the current word, then display the punctuation list.
* For example, pressing 1-key after "it" would accept "it" as a separate word,
* then display only: | , | . | ! | ? | ...
*
* "false" is recommended when apostrophes or other punctuation are not part of the words,
* because it would allow faster typing.
*/
final public boolean isPunctuationPartOfWords() { return isPunctuationPartOfWords; }
final public String getDictionaryFile() { final public String getDictionaryFile() {
return dictionaryFile; return dictionaryFile;
} }

View file

@ -14,6 +14,7 @@ public class Bulgarian extends Language {
name = "български"; name = "български";
locale = new Locale("bg","BG"); locale = new Locale("bg","BG");
dictionaryFile = "bg-utf8.txt"; dictionaryFile = "bg-utf8.txt";
isPunctuationPartOfWords = false;
icon = R.drawable.ime_lang_bg; icon = R.drawable.ime_lang_bg;
abcLowerCaseIcon = R.drawable.ime_lang_cyrillic_lower; abcLowerCaseIcon = R.drawable.ime_lang_cyrillic_lower;
abcUpperCaseIcon = R.drawable.ime_lang_cyrillic_upper; abcUpperCaseIcon = R.drawable.ime_lang_cyrillic_upper;

View file

@ -12,13 +12,14 @@ public class Dutch extends English {
id = 8; id = 8;
name = "Nederlands"; name = "Nederlands";
locale = new Locale("nl","NL"); locale = new Locale("nl","NL");
isPunctuationPartOfWords = true;
dictionaryFile = "nl-utf8.txt"; dictionaryFile = "nl-utf8.txt";
icon = R.drawable.ime_lang_nl; icon = R.drawable.ime_lang_nl;
characterMap.get(2).addAll(Arrays.asList("à", "ä", "ç")); characterMap.get(2).addAll(Arrays.asList("à", "ä", "ç"));
characterMap.get(3).addAll(Arrays.asList("é", "è", "ê", "ë")); characterMap.get(3).addAll(Arrays.asList("é", "è", "ê", "ë"));
characterMap.get(4).addAll(Arrays.asList("î", "ï")); characterMap.get(4).addAll(Arrays.asList("î", "ï"));
characterMap.get(6).addAll(Arrays.asList("ö")); characterMap.get(6).add("ö");
characterMap.get(8).addAll(Arrays.asList("û", "ü")); characterMap.get(8).addAll(Arrays.asList("û", "ü"));
} }
} }

View file

@ -14,6 +14,7 @@ public class English extends Language {
name = "English"; name = "English";
locale = Locale.ENGLISH; locale = Locale.ENGLISH;
dictionaryFile = "en-utf8.txt"; dictionaryFile = "en-utf8.txt";
isPunctuationPartOfWords = true;
icon = R.drawable.ime_lang_en; icon = R.drawable.ime_lang_en;
abcLowerCaseIcon = R.drawable.ime_lang_latin_lower; abcLowerCaseIcon = R.drawable.ime_lang_latin_lower;
abcUpperCaseIcon = R.drawable.ime_lang_latin_upper; abcUpperCaseIcon = R.drawable.ime_lang_latin_upper;

View file

@ -14,6 +14,7 @@ public class French extends English {
locale = Locale.FRENCH; locale = Locale.FRENCH;
dictionaryFile = "fr-utf8.txt"; dictionaryFile = "fr-utf8.txt";
icon = R.drawable.ime_lang_fr; icon = R.drawable.ime_lang_fr;
isPunctuationPartOfWords = false;
characterMap.get(2).addAll(Arrays.asList("à", "â", "æ", "ç")); characterMap.get(2).addAll(Arrays.asList("à", "â", "æ", "ç"));
characterMap.get(3).addAll(Arrays.asList("é", "è", "ê", "ë")); characterMap.get(3).addAll(Arrays.asList("é", "è", "ê", "ë"));

View file

@ -13,6 +13,7 @@ public class German extends English {
locale = Locale.GERMAN; locale = Locale.GERMAN;
dictionaryFile = "de-utf8.txt"; dictionaryFile = "de-utf8.txt";
icon = R.drawable.ime_lang_de; icon = R.drawable.ime_lang_de;
isPunctuationPartOfWords = false;
characterMap.get(2).add("ä"); characterMap.get(2).add("ä");
characterMap.get(6).add("ö"); characterMap.get(6).add("ö");

View file

@ -14,6 +14,7 @@ public class Italian extends English {
locale = Locale.ITALIAN; locale = Locale.ITALIAN;
dictionaryFile = "it-utf8.txt"; dictionaryFile = "it-utf8.txt";
icon = R.drawable.ime_lang_it; icon = R.drawable.ime_lang_it;
isPunctuationPartOfWords = false;
characterMap.get(2).add("à"); characterMap.get(2).add("à");
characterMap.get(3).addAll(Arrays.asList("é", "è")); characterMap.get(3).addAll(Arrays.asList("é", "è"));

View file

@ -14,6 +14,7 @@ public class Russian extends Language {
name = "русский"; name = "русский";
locale = new Locale("ru","RU"); locale = new Locale("ru","RU");
dictionaryFile = "ru-utf8.txt"; dictionaryFile = "ru-utf8.txt";
isPunctuationPartOfWords = false;
icon = R.drawable.ime_lang_ru; icon = R.drawable.ime_lang_ru;
abcLowerCaseIcon = R.drawable.ime_lang_cyrillic_lower; abcLowerCaseIcon = R.drawable.ime_lang_cyrillic_lower;
abcUpperCaseIcon = R.drawable.ime_lang_cyrillic_upper; abcUpperCaseIcon = R.drawable.ime_lang_cyrillic_upper;

View file

@ -14,6 +14,7 @@ public class Ukrainian extends Language {
name = "українська"; name = "українська";
locale = new Locale("uk","UA"); locale = new Locale("uk","UA");
dictionaryFile = "uk-utf8.txt"; dictionaryFile = "uk-utf8.txt";
isPunctuationPartOfWords = true;
icon = R.drawable.ime_lang_uk; icon = R.drawable.ime_lang_uk;
abcLowerCaseIcon = R.drawable.ime_lang_cyrillic_lower; abcLowerCaseIcon = R.drawable.ime_lang_cyrillic_lower;
abcUpperCaseIcon = R.drawable.ime_lang_cyrillic_upper; abcUpperCaseIcon = R.drawable.ime_lang_cyrillic_upper;