Toward Optimal Arabic Keyboard Layout Using Geneti
Toward Optimal Arabic Keyboard Layout Using Geneti
net/publication/228921828
CITATIONS READS
19 12,348
3 authors:
Gheith A. Abandah
University of Jordan
62 PUBLICATIONS 938 CITATIONS
SEE PROFILE
All content following this page was uploaded by Sinan Taifour on 31 May 2014.
KEYWORDS keyboard standard supports the ASMO standard for the 7-bit
Genetic Algorithm, Optimization, Arabic Keyboard Layout. Arabic characters code (ASMO 1985). However, this
keyboard standard was not used by the computer industry
ABSTRACT and the market adopted instead the currently used Arabic
keyboard layout shown in Figure 2. This keyboard layout
The common Arabic keyboard layout is derived from the gained wide acceptance in PCs and servers over other
Arabic typewriter keyboard’s layout. This layout is not layouts when Microsoft adopted it for its Arabized products.
optimized for typing speed and has several problems. We Another Arabic keyboard layout currently in use is the
use a genetic algorithm to optimize the design of the Arabic layout shown in Figure 3. This layout is used in Apple MAC
keyboard layout for typing speed. In the genetic algorithm, computers.
we use a fitness function that includes into consideration key
distance, finger used, and hand alternation. We also use this
fitness function to evaluate the optimized layout against
other layouts including the common layout. The optimized
layout is relatively 35% faster than the common layout and
has additional advantages.
INTRODUCTION
As the keyboard is the main device for text entry in Figures 1: ASMO 663 Arabic Keyboard Layout
computers, it is very important to have a comfortable
keyboard layout which also promotes fast text entry. In a
look at the history of the English keyboard layout, we see
that the standard layout (QWERTY) is not optimized for
efficient typing, but is actually quite arbitrary (Noyes 1998).
A more optimized keyboard layout known as the Dvorak
Simplified Keyboard (DSK) has been developed(Campbell-
Kelly 2005). But the QWERTY layout remains most widely
used because many people were already trained on using the
QWERTY, and were reluctant to change.
Figures 2: Common Arabic Keyboard Layout
Arabic Keyboard History
combination Lam-Alef " "ﻻhas one key dedicated for it from almost all fields. So the related character frequencies
despite the fact that it is not as frequent as other letter are not biased to any particular field. This site contains more
combinations. than 10,000 articles, mostly well-written. These articles can
be conveniently and freely downloaded in a single XML file
Related Work (http://en.wikipedia.org/wiki/Wikipedia:Database_download). The
Arabic Wikipedia was downloaded, parsed, and analyzed to
Many researchers used various algorithms to design better find frequency information.
keyboard layouts for other languages. A simulated annealing
algorithm was used by (Light and Anderson 1993) to The single XML file, around 200 MB in size, contains all the
optimize the English keyboard. They used the travel time articles structured with some meta information. The first step
and the frequency of letter pairs to optimize for typing was to extract the articles and save them in separate files for
speed. An evolutionary algorithm was used by (Walker analysis. The meta information was kept in an accessible
2003) for the English keyboard to evolve it for a layout format for latter usage. Due to the large size of this XML
faster than the QWERTY layout by 30%. Also a genetic file, it is not possible to use a parsing tool that loads the
algorithm was used by (Goettl et al. 2005) to rearrange the entire file into the memory, such as the Document Object
English letters and some punctuation marks. Model (DOM). Instead, a serial event-driven method must
be used. We use the Simple API for XML (SAX), which is
Ant Colony algorithm was used by (Wagner et al. 2003) to available in the Python programming language.
optimize keyboard layouts for English, French, and German
text. They used ergonomic criterion to develop the desired The next step was to remove all wikitext. Wikipedia articles
keyboard layouts. are formatted using a special syntax called wikitext
(http://en.wikipedia.org/wiki/Wikitext). For removing the
A new design for Hindi keyboard layout was done by wikitext, we used the Regular Expressions (RegExp) tool,
(Deshwal and Deb 2003) using a genetic algorithm. They which is also available in Python. The regular expressions
used an ergonomic criterion to optimize the layout for typing describe patterns of text that can be matched against longer
convenience. We are unaware of similar optimization work strings of text or used for substitution. Most of the wikitext
done for the Arabic keyboard layout. can be fully described using rather simple RegExp patterns.
Approach Used The last step was to count occurrences of characters and
character pairs and generate result files. In this step, the user
A keyboard layout can be optimized for several is prompted every time a new unknown character is
characteristics such as typing speed, comfort, low error rate, encountered. The user is given three options: taking the
and trainability. In this paper, we use a genetic algorithm to character into consideration (for Arabic characters),
design a layout optimized for typing speed. We also evaluate considering it as a special character (for special characters
the typing speed of important Arabic keyboard layouts. and white space), and ignoring it (for non-Arabic
Although our focus is on optimizing the typing speed, the characters). The special characters are given special status to
designed layout is also more comfortable than the common enable finding word boundaries. This allows finding the
keyboard layout. probabilities of words starting or ending with particular
Arabic characters.
The rest of this paper is organized as follows. The next
section describes the procedure used in this research. Genetic Algorithm for Design Optimization
Section 3 presents the results obtained and analyzes these
results. Finally, Section 4 presents the paper conclusions and A genetic Algorithm is a guided random search technique
future work. inspired by evolutionary Biology (Goldberg 1989). It
involves a population of candidate solutions to a problem,
PROCEDURE called individuals, which live, reproduce, and die based on
their fitness relative to the rest of the population. The genetic
Our procedure consists of two stages. First we collect data algorithm consists of five phases: initialization, evaluation,
and find the letters’ frequencies, and then we use these selection, reproduction, and competition. A typical genetic
frequencies in a genetic algorithm to optimize the design of algorithm requires two definitions:
the keyboard layout for speed. The keyboard layout is 1. Genetic representation of the solution domain
divided into two sets: the main set and the shift set. The main 2. Fitness function to evaluate the solution domain
set contains all keys that can be accessed without using the
shift key, while the shift set contains the same keys when the The optimization performed by a genetic algorithm has the
shift key is pressed. ability to search a large space for a solution which is close to
the optimum with relatively short time. The space of
Finding Frequencies of the Arabic Characters possible keyboard layouts is the factorial of 33 keys
(8.68×1036). Evaluating all these layouts is infeasible as it
The used optimization requires information about the Arabic takes very long time.
characters and character pairs’ frequencies. The data we
used to find these frequencies was taken from the Arabic The genetic algorithm is applied on the main set of the
Wikipedia (http://ar.wikipedia.org), which contains articles keyboard, where we allocated the most frequent Arabic
In Proc. 9th Int'l Middle Eastern Multiconference on Simulation and Modeling (MESM 2008), Aug 26-28, Amman, Jordan.
2%
Initialization
Figures 4: Frequency of the Arabic Letters
Initially 2000 individuals (layouts) were randomly generated
to form an initial population. The population was generated
Table 2 shows the frequencies of letter pairs. Columns are
randomly so that the algorithm is not biased. This helps to
for the first letter and rows are for the second letter. This
avoid convergence to a local minimum. The randomness
table shows that the frequency of the frequent pair Alef
gives disperse layouts that cover wide regions of the space.
followed by Lam “ ”الis 8.3%, Lam-Meem “ ”ﻟﻢis 1.6%,
Yeh-Teh Marbuta “ ”ﻳﺔis 1.3%, and Lam-Alef “ ”ﻻis 1.3%.
In Proc. 9th Int'l Middle Eastern Multiconference on Simulation and Modeling (MESM 2008), Aug 26-28, Amman, Jordan.
Table 2: Frequency of Arabic Letter Pairs (First letter on the column, second letter on the row)
ا ل ي م و ن ر ت ب ع ة د ﻩ ف س ك ق ح أ ج ش ص . ط ى خ إ ز ث ذ ض غ ئ
ا 0.0 1.2 0.7 1.0 1.1 0.6 0.8 0.3 0.7 0.6 0.0 0.4 0.7 0.4 0.2 0.5 0.4 0.3 0.0 0.3 0.2 0.2 0.0 0.2 0.0 0.1 0.0 0.1 0.1 0.2 0.1 0.1 0.0
ل8.3 0.5 0.4 0.3 0.7 0.0 0.0 0.2 0.2 0.7 0.0 0.1 0.1 0.2 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.0 0.0 0.1 0.0 0.2 0.3 0.0 0.1 0.1 0.0 0.0 0.1
ي0.3 0.9 0.1 0.5 0.4 0.6 0.8 0.3 0.6 0.2 0.0 0.6 0.2 0.4 1.3 0.2 0.2 0.3 0.1 0.2 0.1 0.1 0.0 0.1 0.0 0.1 0.0 0.2 0.1 0.1 0.1 0.1 0.2
م0.8 1.6 0.3 0.1 0.4 0.1 0.1 0.3 0.1 0.2 0.0 0.2 0.3 0.2 0.0 0.2 0.0 0.2 0.2 0.2 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0
و 0.2 0.4 0.3 0.4 0.0 0.2 0.3 0.3 0.2 0.1 0.0 0.3 0.2 0.2 0.1 0.2 0.2 0.1 0.3 0.1 0.1 0.0 0.0 0.1 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0
ن1.2 0.3 0.9 1.1 0.5 0.0 0.1 0.1 0.3 0.3 0.0 0.1 0.1 0.1 0.0 0.2 0.0 0.0 0.3 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.0
ر 0.8 0.2 0.6 0.3 0.5 0.0 0.0 0.3 0.4 0.3 0.0 0.2 0.1 0.1 0.2 0.2 0.2 0.2 0.1 0.1 0.1 0.2 0.0 0.1 0.0 0.1 0.0 0.0 0.1 0.0 0.0 0.1 0.0
ت0.9 0.7 0.3 0.2 0.2 0.3 0.1 0.1 0.1 0.1 0.0 0.0 0.0 0.4 0.1 0.2 0.1 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0
ب0.4 0.4 0.2 0.1 0.2 0.1 0.3 0.2 0.0 0.2 0.0 0.0 0.0 0.2 0.0 0.1 0.1 0.0 0.1 0.1 0.1 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
ع0.3 0.6 0.2 0.4 0.2 0.0 0.1 0.2 0.3 0.0 0.0 0.0 0.0 0.1 0.1 0.0 0.1 0.0 0.1 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
ة 0.1 0.3 1.3 0.2 0.0 0.2 0.4 0.0 0.1 0.2 0.0 0.3 0.0 0.1 0.1 0.1 0.1 0.1 0.0 0.1 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
د 0.5 0.3 0.4 0.3 0.2 0.2 0.1 0.1 0.2 0.3 0.0 0.1 0.1 0.0 0.0 0.0 0.3 0.3 0.0 0.1 0.1 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0
ﻩ 0.2 0.4 0.3 0.1 0.1 0.3 0.1 0.3 0.2 0.1 0.0 0.1 0.0 0.1 0.1 0.0 0.0 0.0 0.1 0.1 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0
س0.4 0.4 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.1 0.0 0.1 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0
ف0.2 0.3 0.1 0.0 0.2 0.1 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
ك0.2 0.5 0.2 0.1 0.1 0.0 0.2 0.1 0.1 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.1 0.1 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
ق0.2 0.4 0.2 0.1 0.2 0.1 0.1 0.2 0.1 0.1 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
ح0.2 0.4 0.1 0.2 0.1 0.0 0.1 0.2 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
أ 0.0 0.6 0.0 0.0 0.1 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
ج0.2 0.3 0.1 0.2 0.2 0.1 0.2 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
ص0.1 0.2 0.0 0.1 0.1 0.1 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0
ش0.1 0.3 0.1 0.1 0.0 0.1 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
. 0.1 0.0 0.1 0.1 0.0 0.1 0.0 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
ط0.1 0.1 0.1 0.0 0.1 0.1 0.0 0.1 0.1 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0
ى0.0 0.7 0.0 0.0 0.0 0.0 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
خ0.1 0.2 0.1 0.1 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
إ 0.0 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
ز 0.1 0.1 0.1 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
ث0.1 0.1 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
ذ 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
ض0.1 0.0 0.1 0.0 0.1 0.0 0.1 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
غ0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
ئ0.3 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Figure 5 shows the convergence curve of the genetic Figure 6 shows the optimized keyboard layout as found by
algorithm. This curve shows the best fitness value in a the genetic algorithm. This layout has the common letters
generation. Note the log scale of the generation number. located in the comfortable home row and gives good hand
This curve shows that the algorithm stabilizes fast after alternation. The following can be noticed from the main set
around 1000 generations (10 minutes of execution). layout:
1. The most frequent key-pair Alef-Lam “ ”الare
1,800,000 under different hands, on the middle row, and under
the strongest fingers (the index and the middle
fingers).
1,750,000
2. Most of the frequent letters are placed in the middle
Fitness Value
similar to the common layout. However, some modifications letter pair times to adjust the model used in Equation (2).
were necessary to accommodate for the additional letters not Moreover, we intend to try to take more factors in the
included in the main set. The letters which were moved to optimization such as typist comfort and error rate.
the shift set are put on keys of similar letters in the main set;
where users expect to find them. For example the pair Tah REFERENCES
" "طand Zah ""ظ, and the pair Alef with Hamza Above ""أ
Arab Standardization and Metrology Organization (ASMO). 1985.
and Hamza " "ءare put on same keys. Standard 449: 7-Bit Arabic Code. Arab League, Cairo, Egypt.
Arab Standardization and Metrology Organization (ASMO). 1987.
Standard 663: Arabic Keyboard Layout. Arab League, Cairo,
Egypt.
Campbell-Kelly M. 2005. The User-friendly Typewriter. The
Rutherford Journal. Retrieved Jul 20, 2008 from
http://www.rutherfordjournal.org/article010105.html
Deshwal P.; and Deb K. 2003. “Design of an Optimal Hindi
Keyboard for Convenient and Efficient Use.” Technical report
KanGAL 2003005, Indian Institute of Technology, Kanpur.
Goettl J.; Brugh A.; and Julstrom B. 2005. “Call Me E-Mail:
Arranging the Keyboard with a Permutation-Coded Genetic
Figures 6: Arabic Keyboard Layout Optimized for Speed Algorithm.” In Proceedings of the ACM Symposium on Applied
Computing (Santa Fe, New Mexico, USA, Mar), 947-951.
A comparison is made between the optimized keyboard Goldberg E. 1989. Genetic Algorithms in Search, Optimization and
Machine Learning. Addison-Wesley Longman, Boston, MA.
layout and some other layouts by applying the fitness
Hiraga Y.; Ono Y.; and Yamada-Hisao. 1980. “An Analysis of the
function. Table 3 presents the results of this comparison. The Standard English Keyboard.” In Proc. 8th Conf. on
worst layout was found by running the genetic algorithm Computational Linguistics (Tokyo, Japan). 242-248.
with the fitness function complemented. The random layout Jordan Institution for Standards and Metrology (JISM). 1994.
was built by generating a random layout. We note that the Minutes of Meeting of the Arabic Letter Committee. Mar 10,
common layout is better than the ASMO standard layout. 1994, Amman, Jordan.
The main reason for this difference is mapping the Light L.; and Anderson P. 1993. “Typewriter Keyboards via
moderately used letter Teh Marbuta “ ”ةto the shift set in the Simulated Annealing.” AI Expert, (Sep).
ASMO layout. Taking the interval between the worst and Noyes, J. 1998. “QWERTY-The Immortal Keyboard.” Computing
optimized layouts as a reference base, the common layout is & Control Engineering Journal 9, No.3 (Jun), 117-122.
Wagner M.; Yannou B.; Kehl S.; Feillet D.; and Eggers J. 2003.
6% better than the ASMO standard and the optimized layout
“Ergonomic Modeling and Optimization of the Keyboard
is 35% better than the common layout. Arrangement with an Ant Colony Algorithm.” European
Journal of Operations Research 148, No.3 (Aug), 672–686
Table 3: Comparison for Arabic Keyboard Layouts Walker C. 2003. “Evolving a More Optimal Keyboard.” Technical
Report (Dec).
Fitness Relative
Layout AUTHOR BIOGRAPHY
Value Performance
Worst 1,942,000 0%
Random 1,773,000 53% Tareq Malas is a bachelor student in the Computer
ASMO 663 1,756,100 59% Engineering Department at the University of Jordan. He is
Apple MAC 1,740,600 64% expected to graduate in Sep 2008. He has won a full
Common 1,735,000 65% scholarship for pursuing postgraduate studies in King
Optimized 1,637,000 100% Abdullah University of Science and Technology (KAUST).