An Intelligent Web-Based Voice Chat Bot: June 2009
An Intelligent Web-Based Voice Chat Bot: June 2009
An Intelligent Web-Based Voice Chat Bot: June 2009
net/publication/224564336
CITATIONS READS
18 9,254
3 authors, including:
Manoj Lall
Tshwane University of Technology
50 PUBLICATIONS 81 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
A decision support system for ensuring safety for fishermen in South Afruca View project
All content following this page was uploaded by Manoj Lall on 23 January 2018.
Abstract: This paper presents the design and The process of an online chat system would
development of an intelligent voice recognition chat bot. follow a client server approach which acquires the
The paper presents a technology demonstrator to verify signal and streams it to a server. The input voice is
a proposed framework required to support such a bot (a then processed and a response is generated. This
web service). While a black box approach is used, by process places a large processing requirement on the
controlling the communication structure, to and from
server’s processor and memory resources. This
the web-service, the web-service allows all types of clients
to communicate to the server from any platform. The limitation is even more evident when a large number
service provided is accessible through a generated of users are to be simultaneously accommodated on
interface which allows for seamless XML processing; the system.
whereby the extensibility improves the lifespan of such a Voice recognition requires a two part process of
service. By introducing an artificial brain, the web-based capturing and analysis of an input signal [3]. While
bot generates customized user responses, aligned to the the client utilizes the operating system for an input
desired character. Questions asked to the bot, which is mechanism to acquire a signal, it is for the client to
not understood is further processed using a third-party interpret the signal. This process can alleviate
expert system (an online intelligent research assistant),
processing from the server and allow the server to
and the response is archived, improving the artificial
brain capabilities for future generation of responses. generate responses faster than when it has more voice
processing requirements.
Index Terms: AI, XML, JAVA, AIML, ALICE. Server response generation can be broken down
into two categories: data retrieval and information
output. The core focus of this paper is to improve the
I. INTRODUCTION information output by generating a response that is
relevant to the request, factual and personal. This
Conventionally web-bots exist; web-bots were requires aspects of news and an intelligent algorithm
created as text based web-friends, an entertainer for a to generate informative and user specific responses.
user [1]. Furthermore, and separately there already The paper is divided into the following sections:
exists enhanced rich site summary (RSS) feeds and II. System Architecture, III. System Specifications,
expert content processing systems that are accessible IV. Open Source Approach, V. System
to web users. Text-based web-bots can be linked to Implementation, VI. Results, and VII. Conclusion.
function beyond an entertainer as an informer [2], if
linked with, amongst others, RSS feeds and or expert II. SYSTEM ARCHITECTURE
systems. Such a friendly bot could, hence, also
function as a trainer providing realistic and up-to-date The system consists of the following three
responses. components: client, server, and content acquisition.
The convenience could be improved if the system The server is a simple object access protocol (SOAP)
is not only text based but also voice-based & voice aware internet application (web service) based on a
trained. This is the problem addressed by this paper. black box approach. A black box approach isolates
A conversation is an assimilation of information the client from interacting with the inner workings of
where one creates differences and similarities during the web service; as opposed to a white box approach,
the duration of a conversation. Depending on the level where the inner workings are essential and allows the
of intelligence the experience would be enjoyable and client to interact with a distributed environment. As
a true emulation of a virtual entity. The gradient of shown in Fig. 1, all messages are formatted in an
intelligence is not the number of correct and incorrect extensible markup language (XML) and encapsulated
statements but the ability to learn and add to its as a SOAP message pack. The packs are text based
knowledge base. To create a more user accessible chat allowing for a greater diversity of clients and
system; a simpler input method using voice is platforms. The client contains the voice recognition
introduced; creating and catering for a more personal processing module which allows the client to only
and convenient experience. send and receive plain text.
1
S.J. du Preez is a BTech student at the Dept.: Enterprise Application Development, Tshwane University of Technology (TUT),
Staatsartillerie Road, Pretoria West, 0001, South Africa (corresponding author phone: +27-83-289-5142; e-mail:
sjdupreez@ieee.org).
2
M. Lall is a Senior Lecturer at the Dept.: Enterprise Application Development, TUT, Staatsartillerie Road, Pretoria West,
South Africa.
3
S. Sinha is a Senior Lecturer at the Dept.: Electrical, Electronic & Computer Engineering, Carl & Emily Fuchs Institute for
Microelectronics (CEFIM), University of Pretoria, Corner of University Road and Lynnwood Road, Pretoria, 0002, South Africa.
V. SYSTEM IMPLEMENTATION
405
The pre-loading of libraries allows for streamlined
operation (i.e. the libraries are not called in an ad hoc
basis, halting the sequence of activities).
The user is prompted to accept a signature, upon
such an acceptance; the applet can securely
communicate to the web service. By using the open
source development environment NetBeans [7], the
applet and the libraries can be digitally signed. This
allows the applet to communicate with a web service
not located on its source web server.
The chat client is interrupt driven and activates
upon interaction from a user. This is shown in Fig. 3;
where the control unit decides what component to Fig. 4 – Speech recognition framework
launch next and what function to process.
406
All messages are parsed through this interface
which creates structure and validates the XML at the
same time. The process of parsing the XML to a
usable structure is termed marshalling.
When an object is created, the XML content is
handled using the object extraction through the
interface class created on startup as indicated in the
next code segment.
JAXBContext.newInstance jaxbContext =
JAXBContext.newInstance( "icbxml" );
Unmarshaller u = jaxbContext.createUnmarshaller();
u.setEventHandler(new ICBXMLValidationEventHandler());
outputPipe.connect(inputPipe);
byte[] bytes = xmlData.getBytes();
outputPipe.write(bytes);
JAXBElement<IcbXml> mElement =
(JAXBElement<IcbXml>)u.unmarshal(inputPipe);
IcbXml icbxmlvar = (IcbXml) u.unmarshal(inputPipe);
IcbXml po = (IcbXml)mElement.getValue();
407
source but provide for concise content. Thus a third
party expert system, “Ultimate Research Assistant”
[11] was used to generate a detailed report relating to
such a statement. This process is shown in Fig. 7.
408
Go to Table of Content
409