An Intelligent Web-Based Voice Chat Bot: June 2009

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/224564336
An intelligent web-based voice chat bot
Conference Paper · June 2009

DOI: 10.1109/EURCON.2009.5167660 · Source: IEEE Xplore
CITATIONS READS
18 9,254
3 authors, including:
Manoj Lall
Tshwane University of Technology
50 PUBLICATIONS 81 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Biologically Inspired Modelling View project
A decision support system for ensuring safety for fishermen in South Afruca View project
All content following this page was uploaded by Manoj Lall on 23 January 2018.
The user has requested enhancement of the downloaded file.

AN INTELLIGENT WEB-BASED VOICE CHAT BOT
S. J. du Preez1, Student Member, IEEE, M. Lall2, S. Sinha3, MIEEE, MSAIEE
Abstract: This paper presents the design and The process of an online chat system would
development of an intelligent voice recognition chat bot. follow a client server approach which acquires the
The paper presents a technology demonstrator to verify signal and streams it to a server. The input voice is
a proposed framework required to support such a bot (a then processed and a response is generated. This
web service). While a black box approach is used, by process places a large processing requirement on the
controlling the communication structure, to and from
server’s processor and memory resources. This
the web-service, the web-service allows all types of clients
to communicate to the server from any platform. The limitation is even more evident when a large number
service provided is accessible through a generated of users are to be simultaneously accommodated on
interface which allows for seamless XML processing; the system.
whereby the extensibility improves the lifespan of such a Voice recognition requires a two part process of
service. By introducing an artificial brain, the web-based capturing and analysis of an input signal [3]. While
bot generates customized user responses, aligned to the the client utilizes the operating system for an input
desired character. Questions asked to the bot, which is mechanism to acquire a signal, it is for the client to
not understood is further processed using a third-party interpret the signal. This process can alleviate
expert system (an online intelligent research assistant),
processing from the server and allow the server to
and the response is archived, improving the artificial
brain capabilities for future generation of responses. generate responses faster than when it has more voice
processing requirements.
Index Terms: AI, XML, JAVA, AIML, ALICE. Server response generation can be broken down
into two categories: data retrieval and information
output. The core focus of this paper is to improve the
I. INTRODUCTION information output by generating a response that is
relevant to the request, factual and personal. This
Conventionally web-bots exist; web-bots were requires aspects of news and an intelligent algorithm
created as text based web-friends, an entertainer for a to generate informative and user specific responses.
user [1]. Furthermore, and separately there already The paper is divided into the following sections:
exists enhanced rich site summary (RSS) feeds and II. System Architecture, III. System Specifications,
expert content processing systems that are accessible IV. Open Source Approach, V. System
to web users. Text-based web-bots can be linked to Implementation, VI. Results, and VII. Conclusion.
function beyond an entertainer as an informer [2], if
linked with, amongst others, RSS feeds and or expert II. SYSTEM ARCHITECTURE
systems. Such a friendly bot could, hence, also
function as a trainer providing realistic and up-to-date The system consists of the following three
responses. components: client, server, and content acquisition.
The convenience could be improved if the system The server is a simple object access protocol (SOAP)
is not only text based but also voice-based & voice aware internet application (web service) based on a
trained. This is the problem addressed by this paper. black box approach. A black box approach isolates
A conversation is an assimilation of information the client from interacting with the inner workings of
where one creates differences and similarities during the web service; as opposed to a white box approach,
the duration of a conversation. Depending on the level where the inner workings are essential and allows the
of intelligence the experience would be enjoyable and client to interact with a distributed environment. As
a true emulation of a virtual entity. The gradient of shown in Fig. 1, all messages are formatted in an
intelligence is not the number of correct and incorrect extensible markup language (XML) and encapsulated
statements but the ability to learn and add to its as a SOAP message pack. The packs are text based
knowledge base. To create a more user accessible chat allowing for a greater diversity of clients and
system; a simpler input method using voice is platforms. The client contains the voice recognition
introduced; creating and catering for a more personal processing module which allows the client to only
and convenient experience. send and receive plain text.
1
S.J. du Preez is a BTech student at the Dept.: Enterprise Application Development, Tshwane University of Technology (TUT),
Staatsartillerie Road, Pretoria West, 0001, South Africa (corresponding author phone: +27-83-289-5142; e-mail:
sjdupreez@ieee.org).
2
M. Lall is a Senior Lecturer at the Dept.: Enterprise Application Development, TUT, Staatsartillerie Road, Pretoria West,
South Africa.
3
S. Sinha is a Senior Lecturer at the Dept.: Electrical, Electronic & Computer Engineering, Carl & Emily Fuchs Institute for
Microelectronics (CEFIM), University of Pretoria, Corner of University Road and Lynnwood Road, Pretoria, 0002, South Africa.
404 978-1-4244-3861-7/09/$25.00 ©2009 IEEE

libraries and technologies to create custom
implementations using open source libraries.
V. SYSTEM IMPLEMENTATION
The main language used to develop the

demonstrator of this paper is JAVA [1]; and the
applet is embedded using HTML. This approach of
hiding the JAVA component from the end user
creates an illusion of simplicity as shown in Fig. 2.
The website contains the embedded applet, and is
hosted by Apache web server[2]. The database and
website is managed using open source database
management software, MySQL [3].
Fig. 1 – System architecture
The web service processes all received queries

using the response generation module (based on the
Artificial Linguistic Internet Computing Entity
(ALICE) [10] system), which makes use of a data
repository. The data repository is updated by the
content retrieval module to increase the intelligence
autonomously (based on an Artificial Intelligence
Markup Language (AIML)). With this approach, an
external administrator is only required to verify the
quality of the self training function. For a future query,
the content is re-processed and incremental updates
are made.
III. SYSTEM SPECIFICATIONS
The system presented in this paper meets the

following requirements.
• The client application is easily accessible
through the client browser.
• All communications to and from the server is Fig. 2 – Website with integrated JAVA based applet
text and XML formatted.
• XML messages conform to a schema which
describes the format. The applet requires a number of libraries enabling
• Communications with the server is black box for processing voice inputs. The signed libraries were
oriented and parses incoming XML messages kept on the hosting website, the same location as the
seamlessly. applet. Before the applet launches inside a browser, a
• The user is allowed to register and login to the loading sequence streamlines the launch of the
system allowing for authenticated, personalized libraries. The code segment (edited to save space)
and controlled communication with the server. below illustrates such a loading sequence.
• The client applet provides the user two options:
<applet code="client.ICB" archive="Client.jar" name="IWBV">
text input or voice input. <param name="cache_archive"
• The self-training AI module prevents a service value="Client.jar,lib/cmu_time_awb.jar,lib/cmu_us_kal.jar,lib/cmu
bottleneck, and therefore prevents modules from dict04.jar">
<PARAM NAME="cache_archive_ex"
competing for resources. VALUE="Client.jar,lib/cmu_time_awb.jar;preload,lib/cmu_us_kal
.jar;preload,lib/cmudict04.jar;preload">
IV. OPEN SOURCE APPROACH <property name="freetts.voices"
value="com.sun.speech.freetts.en.us.cmu_time_awb.AlanVoiceDir
ectory"/>
There are a number of available libraries and open <PARAM name="freetts.voices"
source technologies to implement the specifications value="com.sun.speech.freetts.en.us.cmu_time_awb.AlanVoiceDir
presented in this paper. This approach allows new ectory"/>
implementations of a paradigm of using existing </applet>
405
The pre-loading of libraries allows for streamlined
operation (i.e. the libraries are not called in an ad hoc
basis, halting the sequence of activities).
The user is prompted to accept a signature, upon
such an acceptance; the applet can securely
communicate to the web service. By using the open
source development environment NetBeans [7], the
applet and the libraries can be digitally signed. This
allows the applet to communicate with a web service
not located on its source web server.
The chat client is interrupt driven and activates
upon interaction from a user. This is shown in Fig. 3;
where the control unit decides what component to Fig. 4 – Speech recognition framework
launch next and what function to process.
The main components of the speech framework

consist of the front-end application, decoder, and
language model. The front-end acquires and attributes
the voice input. The language and acoustic model is
used to translate from a standard language (as input to
the system) using a dictionary and construction of
words located in a look-up table (LUT). A search
manager located in the decoder then uses the
attributes and LUT to decode the input voice into a
result set. From the front-end, the user can activate
the configuration manager, loading all components
Fig. 3 – Applet management process stored in XML format.
When running a speech enabled application, the
application requires more than 256 Mb in heap size
The action listener is bound to the buttons (login, thus placing excessive memory demands on the
register and chat) which is included in the graphical system Although a more toned down API
user interface (GUI). Once an event is triggered, either (Pocketsphinx [9]), focusing on mobile platform is
the corresponding SOAP communication module or available, this paper focuses of Spinx 4.
voice recognition module or interface is activated. The Communication with the web service is text based
input and output to these modules rely on user input and XML-formatted using SOAP. The interface was
captured via the GUI. generated using a JAVA-based architecture for XML
Voice recognition is accomplished using an open binding (JAXB). An XML schema based API can
source library called Sphinx 4 [8]. Speech recognition also be published on a website for other developers to
can be broken into two groups with five subsets in develop their own clients. This process of generating
total; speakers and speech styles. Speakers make up the interface is shown in Fig. 5.
single speaking and speaking independently where
speech styles include isolated word recognition,
connected word recognition and continuous word
recognition.
Isolated word recognition requires long breaks
between all words to successfully interpret words.
Continuous word recognition also requires this but
considerably shorter breaks. The last subset of speech
styles is continuous speech recognition where one can
speak fluently and not require stopping or breaking
between words. This has problems interpreting similar
vowel in the beginning of words.
A vital part of the speech system is contained
within a configuration file. As shown in Fig. 4, the
configuration file is used to build the recognizer and
(or) decoder using the application programming
interface (API), including the dictionary. Grammar
files which is used in generating the message is an
integral part forming part of the acoustic model Fig. 5 – Object binding process
located in the API.
406
All messages are parsed through this interface
which creates structure and validates the XML at the
same time. The process of parsing the XML to a
usable structure is termed marshalling.
When an object is created, the XML content is
handled using the object extraction through the
interface class created on startup as indicated in the
next code segment.
JAXBContext.newInstance jaxbContext =
JAXBContext.newInstance( "icbxml" );
Unmarshaller u = jaxbContext.createUnmarshaller();
u.setEventHandler(new ICBXMLValidationEventHandler());
outputPipe.connect(inputPipe);
byte[] bytes = xmlData.getBytes();
outputPipe.write(bytes);
JAXBElement<IcbXml> mElement =
(JAXBElement<IcbXml>)u.unmarshal(inputPipe);
IcbXml icbxmlvar = (IcbXml) u.unmarshal(inputPipe);
IcbXml po = (IcbXml)mElement.getValue();
The interface also provides fault response

generation if a XML message is invalid. The error
location is then encapsulated in XML using the bound
object and sent using SOAP. The interface can be
regenerated on demand if the requirements or format
of the message format changes.
The server maintains a large number of users
through the process of synchronized threads. The most
basic hypertext transfer protocol (HTTP) connection
or request for content is done through sockets. When a
client connects through a basic connection request
(socket connection) the socket creates a new thread Fig. 6 – Processing thread
pair (incoming, outgoing and processes) to handle the
client messaged. All messages pulled from the socket
The bot is centered on supervised human
are then processed through the generated interface
assistance for a learning process. This helps to
object.
remove or indicate correct answers as the system
The thread pair queues all messages though a
learns from conversations with users and updates the
synchronized push and pop principle. Messages are
AIML files. AIML files are structured to consist of
pulled from the queue for processing in a controlled
XML formatting, with context descriptions as
fashion, as shown in Fig. 6. This optimizes processing
indicated in the next code segment.
time.
At the end of a queue, only the relevant thread <category>
would go to sleep for a pre-determined time period. <pattern>How old are you</pattern>
This still allows the other threads to continue parsing <template>
messages in the queue and transmitting them out. <think><set name="topic">Me</set></think>
I am as old as the mountains.
When a user logs on, the system will by default </template>
greet the user, and then prepare to receive a question </category>
or a statement. ALICE [10] is an open source
foundation developing AI chat systems. An Although a large set of AIML files are there to
implementation of the ALICE-bot engine, Chatterbean guide the bot it does not dictate a response. These sets
uses its algorithm to generate a response using its of categories make up groups of questions and
library for pattern assimilation (targeting), namely answers. Thus the intelligence is limited. If a question
AIML files. appears where no response can be generated (which
will happen often in the infancy of an AI system) the
system will try to change the topic or send a general
statement. This is considered a trick for inadequate
knowledge but is a first step for any chat bot.
To assist in the training, a training module was
created to process content acquired from the internet
related to statements or questions which the system
cannot understand. This would require a complex
algorithm to not only search for related content
407
source but provide for concise content. Thus a third
party expert system, “Ultimate Research Assistant”
[11] was used to generate a detailed report relating to
such a statement. This process is shown in Fig. 7.
Fig. 9 – AIML file population process
The process of Fig. 9 is autonomous and needs to

be moderated minimally to control the AIML file
expansion. This process results in improving the
intelligence of the system.
Fig. 7 – Content request utilizing the Ultimate
Research Assistant
VI. RESULTS
To get the most relevant response from the expert The combination of voice input and voice output
system, the query sent to it needs some refinement. To allows for a simpler experience which allows a client
refine the query, self processing using AIML-assisted to run on many types of platforms. Since the client is
categories is used. The process is methodic (Fig. 8), internet based the next step would be mobile or even
for instance, no result may be generated from the thin client systems. A thin client system is considered
entered query “Who is Albert Einstein?,” the an embedded computer or platform with limited
processing proceeds iteratively to secondary file(s), processing where the processing is done by a
where the same query is posed. The result of this controlling server. Examples include a mini computer
query may be “a human icon,” the processing on a fridge, GPS unit or a mobile phone. Although
proceeds, and the resultant response is a combination the aim of this paper was to implement an intelligent
of the output seeked from the secondary file and a virtual online friend, with voice, this integration of
standard response (“I am sorry… provide the right technologies can also be used in other applications,
questions.”) particularly when dealing with a need for simple
input control accessibility and/ or limited processing
capabilities.
The system resulted in a distributed environment
to allow for resource management and stability
between modules. This is shown in Fig. 10, handling
the hosting of the site, processing responses using the
ALICE-bot engine, content acquisition and
processing using an expert system to increase the
intelligence of the chat bot autonomously.
The use of the distributed framework allows for
an increase in throughput and the number of users it
can handle. The lifetime of the expert system can be a
Fig. 8 – ALICE-based processing of statement/question limitation to the age of the technology demonstrator
using AIML-assisted categories presented in this paper.
In the particular instance, “Albert Einstein” is not VII. CONCLUSION

understood, but also becomes the refined query. The
refined query is then queued as an input to the expert Using modular design for all its components a
system. As the queue is processed, reports are distributed environment facilitating transparent and
generated, which further populates the AIML files. high performance of the overall system has been
This process is shown in Fig. 9. created. The performance is relative to the processing
capacity of the systems involved. Since all the
modules are not running off one system the possible
load has been decreased and further decreased by
delegating the voice processing to the chat client
communicating with the service.
408
Go to Table of Content
speech recognition system based on Sphinx 4 for GSM

networks. Proceedings of International Symposium
“EL, MAR (Electronics in Marine) focused on Mobile
Multimedia”, 12-14 Sept. 2007, Zadar, Croatia,
pp. 147-150.
[4]. Sun Microsystems, Developer resources for JAVA
technology. [Online] http://java.sun.com (Accessed:
30 Oct. 2008)
[5]. The Apache Software Foundation, The Apache HTTP
Server Project. [Online] http://www.apache.org
(Accessed: 30 Oct. 2008)
[6]. Sun Microsystems, MySQL: The world's most popular
open source database. [Online] http://www.mysql.com
(Accessed: 30 Oct. 2008)
[7]. Sun Microsystems and CollabNet, NetBeans. [Online]
http://www.netbeans.org (Accessed: 30 Oct. 2008)
[8]. Carnegie Mellon University, Sun Microsystems,
Mitsubishi Electric Research Laboratories, Sphinx-4 -
A speech recognizer written entirely in the JAVA™
programming language, 2004. [Online]
Fig. 10 – Distributed system framework http://research.sun.com (Accessed: 30 Oct. 2008)
[9]. Carnegie Mellon University (CMU). Speech at CMU.
[Online] http://www.speech.cs.cmu.edu (Accessed: 30
The use of an expert system (Ultimate Research Oct. 2008)
Assistant) allows unlimited and autonomous [10]. ALICE AI Foundation, Inc. ALICE. the artificial
intelligence improvements. Conventional linguistic internet computer entity. [Online]
implementations of the ALICE-bot engine required an http://www.alicebot.org (Accessed: 30 Oct. 2008)
[11]. A. Hoskinson, Ultimate Research Assistant. [Online]
administrator to update the AIML files manually to
http://ultimate-research-assistant.com (Accessed: 30
increase the intelligence. All content received back 3
Oct. 2008)
from the expert system is processed minimally since
the information has already been processed for its
relevance. This can be somewhat subjective
considering that the intelligence reliance was shifted
to another party and when such a third-party system is
Salomon Jakobus du Preez completed his National
decommissioned this system would also fail. Diploma in Technical Applications from the Tshwane
The core use of threads allowed multiple University of Technology (TUT), South Africa in 2007.
processing of incoming and outgoing messages to Currently, on a part-time basis, Mr du Preez is nearing the
occur without having to create a waiting scenario or end of his B.Tech programme. On a full-time basis,
unavailable server due to over use or possible Mr du Preez serves Prospero SA (Pty) Ltd as a Senior
congestion. All new connections to the server software developer. Mr du Preez has been a graduate
spawned a new pair of threads without impacting on student member of the IEEE since 2006.
other threads related to other users.
Manoj Lall completed his B.Eng (Mechanical) and
The use of such a framework is not limited to chat
M.Sc. (2005) in Computer Science from the University of
applications only. The true potential lies with South Africa. On a part-time basis, Mr Lall is also pursuing
information systems that could be built into the a PhD programme with the University of South Africa
existing framework due to the distributed nature. (UNISA). He is currently employed as a Senior lecturer by
Depending on the requirements of the integration, the TUT. His research interests include Mobile Agent Systems
necessary additional components can be introduced on and Service Oriented Architecture.
the artificial brain level or the AIML file functionality.
Saurabh Sinha (M’02) completed his B.Eng degree
REFERENCES (cum laude), M.Eng degree (cum laude) and PhD(Eng)
degree from the University of Pretoria, South Africa, in
2001, 2005, 2008 respectively. He is currently employed by
[1]. Augello A. Saccone G. Gaglio S. Pilato G., Humorist
the University of Pretoria, South Africa where his main
Bot: Bringing Computational Humour in a Chat-Bot
activities include research, undergraduate and postgraduate
System. Proceedings of the International Conference on
training. Dr Sinha also serves as a consultant for Business
“Complex, Intelligent and Software Intensive Systems
Enterprises at University of Pretoria (Pty) Ltd. Dr Sinha is
(CISIS)”, 4-7 March 2008, Barcelona, Spain, pp.703-
the Chair of the IEEE South Africa Section. In 2008,
708.
Dr Sinha was invited to serve on the IEEE Membership and
[2]. Gambino O. Augello A. Caronia A. Pilato G.
Geographic Activities Board (MGAB), as a representative
Pirrone R. Gaglio S., Virtual conversation with a real
to the Educational Activities Board (EAB). Dr Sinha is also
talking head. Proceedings of the Conference on
a full-voting member of the EAB Operation Committee, as
“Human System Interactions”, 25-27 May 2008,
well as the EAB Finance Committee. Dr Sinha also
Kraow, Poland, pp. 263-268.
received the South African Institute of Electrical Engineers
[3]. Vojtko J. Kacur J. Rozinaj G., The training of Slovak
(SAIEE) Engineer of the year award in 2007.
409
View publication stats

An Intelligent Web-Based Voice Chat Bot: June 2009

Uploaded by

Document Informationclick to expand document informationassistant voice

Document Informationclick to expand document information

Copyright:

Available Formats

An Intelligent Web-Based Voice Chat Bot: June 2009

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

An Intelligent Web-Based Voice Chat Bot: June 2009

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

An intelligent web-based voice chat bot

Conference Paper · June 2009

Biologically Inspired Modelling View project

The user has requested enhancement of the downloaded file.

404 978-1-4244-3861-7/09/$25.00 ©2009 IEEE

The main language used to develop the

Fig. 1 – System architecture

The web service processes all received queries

III. SYSTEM SPECIFICATIONS

The system presented in this paper meets the

The main components of the speech framework

The interface also provides fault response

Fig. 9 – AIML file population process

The process of Fig. 9 is autonomous and needs to

In the particular instance, “Albert Einstein” is not VII. CONCLUSION

speech recognition system based on Sphinx 4 for GSM

View publication stats

You might also like