Comprehensive Evaluation Criteria For English Learning Websites Using Expert Validity Surveys

Available online at www.sciencedirect.
com
Computers & Education 51 (2008) 403–422

www.elsevier.com/locate/compedu
Comprehensive evaluation criteria for English learning

websites using expert validity surveys
Ya-Ting C. Yang *, Chia-Ying Chan
Institute of Education & Centre for Teacher Education, National Cheng Kung University, Taiwan, ROC
Received 21 November 2006; received in revised form 14 May 2007; accepted 25 May 2007
Abstract
This study aimed to develop a set of evaluation criteria for English learning websites. These criteria can assist English
teachers/web designers in designing effective websites for their English courses and can also guide English learners in
screening for appropriate and reliable websites to use in increasing their English ability. To fulfill our objective, we
employed a three-phase research procedure: (a) establishing a preliminary set of criteria from a thorough review of the
literature, (b) evaluating and refining the preliminary criteria by conducting interviews with in-service teachers and learn-
ers, and (c) validating and finalizing the criteria according to expert validity surveys. The established criteria have 46 items,
classified into 6 categories (the number of items within the category) – general information (12), integrated English learning
(13), listening (4), speaking (6), reading (5), and writing (6). The general information evaluates the authority, accuracy, and
format of the learning websites. The integrated English learning evaluates the overall information relevant to English
learning materials as well as the common features of the four language skills. The criteria for listening, speaking, reading,
and writing, for example, examine the suitable intonation, skills of discourse, classification of reading articles by their attri-
butes, and the proper use of discussion boards for students when practicing their writing skills. Based on qualitative and
quantitative analysis of the interviews and expert validity surveys, we confirmed the effectiveness of the developed evalu-
ation criteria with satisfactory indexes of inter-rater reliability, content validity, and factorial validity.
Ó 2007 Elsevier Ltd. All rights reserved.
Keywords: Teaching/learning strategies; Distributed learning environments; Human–computer interface; Interactive learning environ-
ments
1. Introduction
Nowadays, the world is a global village where people communicate with each other through a common lan-
guage, which has been recognized as English, a universal language in the international community. Crandall
(2003) identified 1.5 billion speakers of English, 350–450 million of whom speak English as a first language,
and an equivalent number speak English as a second language. Internet World Stats (2007) found that the
*
Corresponding author. Tel.: +886 6275 7575x56230; fax: +886 6276 6493.
E-mail address: yangyt@mail.ncku.edu.tw (Y.-T.C. Yang).
0360-1315/$ - see front matter Ó 2007 Elsevier Ltd. All rights reserved.
doi:10.1016/j.compedu.2007.05.011
404 Y.-T.C. Yang, C.-Y. Chan / Computers & Education 51 (2008) 403–422
major language used on the Internet is English at 30%. Obviously, the Internet has provided a unique learning
channel and great opportunity for English learners to practice and learn English. Warschauer, Shetzer, and
Meloni (2000) remarked, ‘‘Learners of the English language could practice the language 24 hours a day with
native speakers or other learners around the world.’’ Vogel (2001) stated, ‘‘The web serves as a platform for
communicative exchanges between learners and native speakers or between different groups of learners in dif-
ferent countries’’.
From a review of 246 research articles published in 1990–2000, Liu, Moore, Graham, and Lee (2003) found
that the use of the computer as a tool to assist language learning is growing dramatically; hence, English rel-
evant learning websites are expected to expand exponentially under this trend. Along with this trend, however,
the large number of English as Second Language (ESL) websites has made it extremely difficult for users to
make the right choices. That is, it is doubtful that the quality of these websites has increased with their quan-
tity. Warschauer et al. (2000) defined an unstructured website as a simple place for users to surf aimlessly with
little direction. However, a good website not only provides users with a place to surf, but also helps users cre-
ate some ‘‘waves’’ from surfing. Recently, some language educators have started to integrate the content of
second language learning into technology and also indicate specific directions for designing online resources
(Hemard & Cushion, 2000; Lonfils & Vanpary, 2001; Peterson, 2000a, 2000b; Plass, 1998; Susser, 2001; Ter-
gan, 1998). Holliday (1999) organized six guidelines for design of the medium in computer-assisted language
learning (CALL) environments. Susser and Robb (2004) developed five modules as a checklist of Web-based
CALL for ESL. Lasagabaster and Sierra (2003) researched the evaluation of a CALL program from students’
viewpoints. From the research, they found that presenting abundant information in text, images, music, and
sounds to enhance language learning involvement is important on language learning websites. Plass (1998)
emphasized that applying cognitive approaches is the appropriate way to design and evaluate Second Lan-
guage Acquisition (SLA) resources in the hypermedia. These approaches can influence both linguistic and
pragmatic competencies which derive cognitive approaches of interface. Proper criteria will guide educators,
learners, and other Internet users to focus on valuable insights from the website and save users’ time from
searching unreliable information and websites (Harris, 1997; Laura, 1999; Wilkinson, Bennett, & Oliver,
1997).
There is a significant amount of literature on the evaluation of information on general websites. To name a
few, Sonoma State University (2005) developed a set of criteria to evaluate the content of general websites.
Nielsen Norman Group (2006) designed important principles to evaluate the interface design of websites.
Besides these, there are other organizations or authors who designed sets of criteria for evaluating overall web-
site information, such as the content, objectivity, currency, navigation, and authority (American Association
of Law Libraries, 2005; Laura, 1999; Susan, 2006; University of Maine System, 2004; Wilkinson et al., 1997).
These general criteria are the cornerstone for creating a good website. However, a set of general criteria seems
not enough to evaluate educational websites and their information. A good learning website should further
consider the aspects of learners’ attributes, learning motivation, presentation of online resources, and interac-
tion among users (Furner & Daigle, 2004; Lee, Choi, & Byun, 1996). Learning websites generally aim to be
teaching aids for instruction or self-learning materials for learners. Thus, the principles for developing the cri-
teria should be based on learning theories and the characteristics of the content (Najjar, 1998; Reeves, 2001).
For instance, learning websites should include clear statements of instructional objectives, theory-based activ-
ities, and approaches to motivate learners’ involvement in the process. Clayton (2006) developed a set of cri-
teria for evaluating the quality of online courses. Each criterion was selected to identify specific course
components, qualities, or procedures proven to be helpful to learners and/or instructors.
In addition, as Williams and Burden (1997) noted, English learning is distinct from other subjects because
of the attributes of language acquisition. There is no question that learning a foreign language is different from
learning other subjects, mainly because of the social nature of such a venture. Language, after all, belongs to a
person’s whole social being; it is a part of one’s identity, and is used to convey this identity to other people.
The learning of a foreign language involves far more than simple learning skills, adoption of new social and
cultural behaviors and ways of being, and therefore has a significant impact on the social nature of the learner.
Language learning is not just a subject matter, but also acquisition of a culture, social rules, linguistic func-
tions, and psychological reactions (Abbott, McKeating, Greenwood, & Wingard, 1981; Davies, 2001; Dör-
nyei, 1996; Williams & Burden, 1997). In other words, good speech is produced in the appropriate
Y.-T.C. Yang, C.-Y. Chan / Computers & Education 51 (2008) 403–422 405
situation with the proper communicative manner. The meaning of a language is different when people say the
same words in different social settings and from a different psychological status. Because language learning is
unique, the criteria to evaluate English learning websites may have to be designed specifically for English
learning, and be distinctly different from other subjects. The critical points to evaluate English learning web-
sites are emphasized in communication and situational settings (Davies, 2001) since the language generates
different meanings in different situations with various people.
After our review of the literature, we came up with five main reasons that explain the need for the devel-
opment of evaluation criteria for English learning websites:
1. Many studies (Bradin, 1999; Comer & Geissler, 1998) focused on developing general guidelines for evalu-
ation, but they were not specific enough to provide the particular aspects and concerns about English
learning.
2. The existing evaluation criteria for evaluating English learning websites, courseware, or resources are sum-
marized in Table 1. These criteria are significant points to consider when creating CALL-related websites.
However, they are not specific enough for English learning websites due to a lack of emphasis on the eval-
uation of the transmission of the four language skills (listening, speaking, reading, and writing) to learners.
The website should organize the learning objectives based on the interaction of the four language skills and
also provide learning opportunities for the semantic, pragmatic, and sociolinguistic usages.
Table 1
Existing evaluation criteria for evaluating English learning Websites
Author Evaluation criteria
Hubbard (1988) 1. Giving meaningful rather than mechanical practice, contextualized in a coherent discourse larger than a
single sentence
2. Providing hints of various types to lead students to correct answers
3. Accepting appropriate alternative correct answers within a given context
4. Offering the option of explanations for why correct answers are correct
5. Anticipating incorrect answers and offers explanations for why they are incorrect
Chapelle (1998) 1. Making key linguistic characteristics salient
2. Offering modifications of linguistic input
3. Providing opportunities for ‘‘comprehensible output’’
4. Providing opportunities for learners to notice their errors
5. Providing opportunities for learners to correct their linguistic output
6. Supporting modified interaction between the learner and the computer
7. Acting as a participant in L2 tasks
Comer and Geissler 1. Content: (a) Quality; (b) Depth; (c) Tests
(1998) 2. Interface: (a) Ease of use; (b) Navigation; (c) Text quality; (d) Graphics; (b) Sound; (e) Technical; (f) Sound
3. Interactivity: (a) Feedback
4. Sequence: (a) Questions
5. Classroom related issues: (a) Entry level/Technical requirements; (b) Motivation; (c) Backwash;
(d) Management
6. Support: (a) Online help; (b) Off-line help
Bradin (1999) 1. Feasibility: (a) Will the software run on your computer and platform? (b) Will the software run on your
network? (c) Can the software be made available to many students? (d) Does the software require Internet
access? (e) Can you afford the software?
2. Quality
2.1 Content: (a) What is the goal of the software? (b) Is the level appropriate? (c) Is the content accurate and
up-to-date? (d) Is the material culturally appropriate? (e) Does the software accommodate the students’
learning styles and preferences? (f) Is the software interesting? (g) How flexible is the software?
2.2 Format: (a) Is the interface consistent? (b) Is the screen display effective? (c) Are the motivational
devices effective?
2.3 Operation: (a) Is the software easy to use? (b) Can the text and graphics be printed? (c) How much con-
trol are the learners allowed? (d) How interactive is the software? (e) Are the quality and degree of feedback
adequate? (f) What kinds of records does the software keep?
3. Most research has only developed some guidelines, not a complete set, in one of the language learning
aspects such as listening or speaking skills (Chapelle, 1998; Hubbard, 1988; Susser & Robb, 2004). That
is, relatively few research studies have been conducted on the exploration of a complete set of criteria in
one or all of the language learning aspects (listening, speaking, reading, and writing). Thus, this situation
places a heavy burden on instructional designers, teachers, and learners interested in designing, selecting,
and using English learning websites; as a result, they make use of different discrete criteria and a variety
of resources and evaluation methods to select and design English learning websites.
4. The existing studies in evaluation criteria for language learning were almost entirely based on theoretical
concepts (Najjar, 1998). Although good research should be based on theory, in the process of evaluating
learning materials, the opinions and needs of teachers and learners must be included (as they are also
experts in their teaching and learning as well as users of well-developed materials). Thus, there is a demand
to collect teachers’ opinions and suggestions to clarify their preferences for websites to use in instruction as
well as to organize learners’ insights and impressions so as to identify their needs for learning English.
5. Most of the developed criteria (Chapelle, 1998; Najjar, 1998; Lonfils & Vanpary, 2001) were not validated
by empirical research. This lays a weak foundation for making design and selection decisions and slows
progress in making English learning websites more effective. A content validity study should be conducted
to substantiate the validity of complex constructs with valid and reliable measures (Rubio, Berg-Weger,
Tebb, Lee, & Ruch, 2003).
According to the above reasons and needs, thus, the goal of this study is to develop a complete, specific, and
validated set of evaluation criteria for English learning websites which can assist English teachers/course
designers in designing effective websites for their English courses and can also guide English learners in screen-
ing for appropriate and reliable websites to help them increase their English ability.
2. Methods
To develop an effective set of evaluation criteria, we have proposed a three-phase research procedure that
consists of (Phase I) synthesizing and establishing a preliminary set of criteria from a thorough review of the
literature, (Phase II) evaluating and refining the preliminary criteria by conducting interviews with in-service
teachers and learners, and (Phase III) validating and finalizing the refined criteria according to expert validity
surveys. In Phase I, we reviewed the literature that discusses the relevant theories and research on the evalu-
ation of English learning websites. We established a preliminary set of evaluation criteria by synthesizing the
important themes collected from our literature review and practical experience on popular English learning
websites and also by adding new criteria that had not been developed by previous studies.
In Phase II, we first evaluated the preliminary criteria by interviewing teachers and students who had expe-
rience using English learning websites to gather more ideas/suggestions regarding the evaluation criteria. Dur-
ing the interview, the experienced teachers and students were asked to review some English learning websites
that they were familiar with, as well as those recommended by us. After reviewing the above websites, they
provided their insights and impressions on what the most important elements for a successful English learning
website are. Based on the interviews, we refined the preliminary criteria by adding new criteria, getting rid of
the insignificant ones, and retaining the good ones. Finally, in Phase III, we validated the refined criteria
through expert validity surveys and finalized our evaluation criteria based on the viewpoints of a panel of
experts/scholars in the field of CALL.
In the above research procedure, we employed two research methods – interviews and expert validity sur-
veys, which have been used to collect both qualitative and quantitative data. The details on the participants,
data collection instruments, and data analysis for each research method are presented in the following
subsections.
2.1. Interviews
The purpose of the interviews was to obtain more specific ideas about the evaluation criteria from experi-
enced in-service teachers’ and learners’ points of view. Based on the different viewpoints received from the
interviews, we confirmed the significance of the preliminary criteria established from a review of the literature
and refined the scope of the themes of the criteria.
2.1.1. Participants
There were a total of 16 participants, including 8 students and 8 English teachers selected from junior high
schools across different demographic areas of Taiwan. The scale of the schools included small-sized (under 12
classes), medium-sized (13–48 classes), and large-sized (over 49 classes). A purposive sampling was used in this
study based on the interviewees’ experience in using websites. The selected students were from both 1st and
2nd years of junior high school and had studied English for 3–5 years. They spent more than 2 h every day
surfing various websites, such as entertainment, search engines, and English learning websites recommended
by their teachers or parents. The teachers were chosen because of their experience (at least 1 year) in surfing or
designing English learning websites. They had surfed English learning websites over more than 5 years and
used computers to assist their English teaching for at least 1 year. The multimedia they used for computer-
assisted teaching included websites, PowerPoint, Flash, CD players, and movies.
2.1.2. Data instruments

In order to attain comprehensive concepts from the participants, the format of the interview questionnaire
was structured to widen participants’ perspectives. The interview included an introduction, eight open-ended
questions (see Appendix A), and participants’ background information on surfing websites and learning Eng-
lish, as well as their opinions about English learning websites. We provided the interview questions in advance
to the interviewees to improve the quality of the interviews and obtained their consent to record the whole
conversation. The average interview time per student was approximately 45 and 75 min per teacher.
2.1.3. Data analysis

During the interviews, the researcher took notes which were relevant to the evaluation criteria of English
learning website design, and audiotaped the conversation in the interview. The procedure for the data analysis
was as follows: (1) collecting and evaluating interviewees’ opinions on the items of the preliminary criteria
according to the recorded information, (2) adding and revising the preliminary criteria based on an analysis
of the interviewees’ opinions, and (3) finalizing the complete items of the evaluation criteria (refined criteria)
for the expert validate survey.
2.2. Expert validity survey
This survey was used to validate the refined criteria through a panel of experts’ evaluation. Based on the
validation, the final version of the evaluation criteria was given for the screening and design of quality English
learning websites. The detailed information is explained as follows.
2.2.1. Participants
A panel of experts consisted of professors in the area of CALL programs; all the selected experts had a
background in educational technology and language teaching fields. The population was stratified into three
geographic areas, Northern, Central, and Southern Taiwan, from national and private universities, science
and technology universities, and teachers colleges in Taiwan. All the experts had taught language courses
for over 5 years. Thus, they had rich experience in teaching as well as in research. Rubio et al. (2003) stated
that 6–20 participants is an adequate number of panel experts. In general, more experts will generate more
information towards the measure. Thus, we invited 17 experts to validate the refined criteria in this study.
2.2.2. Data instruments

An expert validity survey was designed for the invited experts to evaluate the refined criteria by providing
their suggestions and revisions. Through content validity, it is believed that a panel of experts can further
enhance the quality of the refined criteria. The survey included three parts: directions, review content, and
comprehensiveness (Grant & Davis, 1997; Rubio et al., 2003). An exemplification developed by revising Rubio
et al.’s model is shown in Table 2.
Table 2
The detailed information about the three parts in this survey is as follows:
(a) Directions: First of all, we clarified the purpose for the invited experts to validate the items in the
questionnaire. Clear and concise instructions were given to help the experts to state their opinions
efficiently.
(b) Review content: The review content is the main section in the survey. The experts were asked to revise
and rate each item of the refined criteria in this phase. There were four indexes for experts to use in eval-
uating each item: factors, representativeness, importance, and clarity:
1. Factors: We assigned each item of the evaluation criteria with a factor beforehand. The experts were
asked to provide their opinions on our initial factor assignment by checking one of the following
options: (1) Yes, it belongs to factor #, or (2) No, it should be assigned to factor #, with comments
to further clarify their judgments.
2. Representativeness (Rep.): Representativeness is used to determine whether the statement of an item
stands for the right factor in a four-point scale. A rating of four is the most representative for the fac-
tor. Experts can also provide a revision or comments under the rating. That is, if the expert thinks the
item is not representative of the corresponding factor, they can state reasons as to why the item might
be more appropriate for another factor.
3. Importance (Imp.): Importance represents whether the item is a crucial statement for the
factor in a four-point scale. The rating of four indicates that the item is the most important.
4. Clarity (Cla.): Clarity represents whether the wording or sentences are clear enough for users to follow
in a four-point scale. The rating of four represents the highest clarity. The experts can offer some rec-
ommendations about unclear statements. Through this step, the researcher can gather useful and
important opinions towards revising the statement of the criteria.
(c) Comprehensiveness: The experts are asked to write down their opinions, suggestions, or recommenda-
tions about the comprehensiveness of the expert validity survey (the entire measurement). This can
improve the quality of the instrument.
2.2.3. Data analysis

Determining the content validity of an instrument is a necessary task when conducting any form of
social science research using a new survey instrument (Rubio et al., 2003). The purpose of conducting
a content validity study is to analyze the measure for determining whether the expert validity survey
is valid in this study. According to Rubio et al., three analyses need to be performed – Inter-rater Reli-
ability (IR), Content Validity Index (CVI), and Factorial Validity Index (FVI). Furthermore, the averages
of Rep., Imp., and Cla. on a four-point Likert-type scale in the refined and finalized criteria were
calculated.
First, the IR determines how well the participants in the content validity study agree with each other
based on their responses to Rep. Some investigators recommend using Intraclass Correlation Coefficient
(ICC) for calculating IR when there are more than two raters (Garson, 2001; Wuensch, 2003). Because
the panel of experts consisted of 17 professors, ICC was adopted to compute the reliability among the raters
in this study. Due to the purposes and participants in this study, a two-way mixed effected model with mea-
sures of consistency was selected for computing the statistics. The reasons involved in selecting this model
included that the experts were not chosen at random to participate in the study. Furthermore, the purpose
of the IR is to measure whether the raters’ scores are highly correlated. Barrett (2001) stated that inter-rater/
intra-class r > .74 is excellent, .60–.74 is good, .40–.59 is fair, and <.40 is poor. The IR for the evaluation
criteria of English learning websites is .72, which means a good condition. The results and discussions are
provided in Section 3.
Second, the CVI is to determine if the item and the instrument as a whole are valid. This analysis is based
on the responses in Rep. The CVI is calculated by counting the number of experts who rated the item as three
or four and dividing that number by the total number of experts. The appropriate CVI should have a
minimum of .80 (Davis, 1992). The CVI for the measurement is calculated by estimating the average CVI
across the items. The CVI for the evaluation criteria of English learning websites is .94, which is reported in
Section 3.
The final analysis is the FVI, which determines how each item and the instrument as a whole is associated
with the appropriate Factors included in the content validity survey. The FVI is calculated by counting the
number of experts who agreed with the item in the assigned factor and dividing that number by the total num-
ber of experts. Each item has an FVI from individual expert responses. Again, the FVI of the measurement is
the average taken across the items. Rubio et al. (2003) recommended that an acceptable FVI should be at least
.80. The FVI for the evaluation criteria of English learning websites is .97, which is above the requirement. The
detailed results are presented in Section 3.
3. Results and discussion
3.1. Phase I: preliminary criteria
In Phase I of our research procedure, we reviewed the literature that focused on the following three themes:
Evaluation criteria for website resources: We investigated and summarized the criteria for general websites in
the compiling of general guidelines to be used in designing a sound website.
Evaluation criteria for learning websites: We strengthened the theoretical concept of designing websites
based on web-relevant learning theories and approaches, including programmed instruction, semantic net-
works, situated cognition, and discovery learning. In addition, the existing research about evaluation cri-
teria for learning websites have been reviewed and summarized.
Evaluation criteria for English learning websites: We (1) observed the current English learning websites cre-
ated by educational organizations, commercial organizations, and individuals, such as teachers and learn-
ers, (2) reviewed the CALL-related literature and then categorized the criteria for English learning websites
into integrated English learning, listening, speaking, reading, and writing, and finally, (3) studied four cru-
cial elements of language learning – sociolinguistics, pragmatics, psycholinguistics, and culture, as well as
language teaching approaches/methods – communicative language teaching, situational language teaching
and task-based language teaching.
According to a review of the above literature, we then established a preliminary set of evaluation criteria by
synthesizing and adding the new criteria that had not been developed by the previous research. The prelimin-
ary criteria consisted of 6 thematic categories with a total number of 74 items: (I) general information (20
items), (II) integrated English learning (23 items), (III) listening (9 items), (IV) speaking (6 items), (V) reading
(8 items), and (VI) writing (8 items). The criteria about general information are tend to be used in evaluating
the authority, navigation, and accuracy of an English learning website, while the criteria about integrated Eng-
lish learning are used to evaluate the quality of integrated English learning as well as the common instructional
approaches of teaching language skills on the website. The listening category is used in evaluating the listening
part of the website design, such as the appropriate intonation, stress, and interactive designs. The speaking
category helps determine whether the functionality of the speaking sections is well-designed. The evaluation
items include the record device, feedback design, and opportunities to communicate with other people. The
criteria for reading can be used in checking the new vocabulary, reading skills, and reading materials. Finally,
the criteria for writing mainly aim to help in evaluating the discussion topics, writing samples, and multimedia
to enhance learners’ ability to write.
3.2. Phase II: refined criteria
In Phase II, interviews were applied to obtain both teachers’ and learners’ opinions about the standards for
a good English learning website as well as their comments/suggestions on the preliminary criteria. The follow-
ing steps were performed to conduct the interviews: e-mailing the interview questions in advance, recording
the information from the interviews, inquiring about the English learning websites they were accustomed to
browsing, and discussing specifically how to design a good English website. After examining the data collected
from the interviews, we found that there were some similarities and differences between teachers’ and students’
opinions, as follows:
(a) Similarities: Both groups proposed similar ideas about the general information criteria, for instance, the
currency of links and information, explicit design of the format, and clear objectives. Besides, they con-
sidered that training in speaking skills might be limited with the current technology. Reading sections
should go along with listening materials to enhance both language skills.
(b) Differences: Students focused more on the application of multimedia, such as context with pictures,
background music, and English learning with animation/cartoons. However, the teachers tended to
place little emphasis on the format of the website, but focused more on educational psychology and lin-
guistic knowledge skills. In addition, most of the students expressed much interest in daily experience or
stories as the learning materials while the teachers believed that diversified materials are helpful in Eng-
lish learning.
We refined the preliminary criteria and included some new evaluation items based on the suggestions/com-
ments obtained from the interviews. The refined criteria that contain the 6 categories with 92 evaluation items
shown in Appendix B were the main content of the expert validity survey.
3.3. Phase III: finalized criteria
In research Phase III, we performed an expert validity survey to improve the quality of the refined criteria
based on the experts’ suggestions and revisions. The analyses included examining the Rep., Imp., Cla., CVI,
and FVI of each item of the refined criteria and overall survey, as well as the IR among the experts. The final-
ized criteria were chosen based on the Rep. and Imp. of 3.50 on a four-point Likert-type scale. For an item
with a rating below 3.50, we carefully considered the experts’ opinions to either amalgamate and assimilate it
with the existing item or remove it from the finalized criteria. Some items with ratings above 3.50 were still
revised if the original statements of these items did not describe the concepts/ideas clearly enough. Based
on the experts’ suggestions and our analysis of the Rep. and Imp. of criteria in evaluating English learning
websites, the recommended items from the experts were added to the finalized categories. Then, we recalcu-
lated the Rep., Imp., Cla., CVI, and FVI of the remaining criteria (after the revision of the remaining criteria
based on the rating of Cla. and suggestions provided by the experts, the remaining criteria became the finalized
criteria). In the following subsections, the results of the analyses of the descriptive statistics, the analyses of the
items in each category, and the analyses of the overall survey design are presented.
3.3.1. Criteria for general information

The criteria of this category were used to evaluate the authority, navigation, accessibility, accuracy, state-
ment of ownership and use, as well as content information of the website. This category plays a role as the
gateway for users in identifying whether the website is worth surfing more. The ratings of Rep., Imp., Cla.,
CVI, and FVI for the refined criteria were 3.51, 3.47, 3.72, .85, and .97, respectively. According to an analysis
of the expert survey on this category, 4 new items have been included in the finalized criteria. These items are
accuracy of web content, adaptive design, identification of copyright, and the best browser resolution. Based
on the ratings of the experts, we removed 7 less representative items such as providing site map and search
engines on the websites. Moreover, we amalgamated and assimilated 6 items into 2 items to make the criteria
more concise. For example, the 4 similar items – surfing the website easily, surfing in plain designs, suitable
linkage from screen to screen, and classifying data by their characteristics, were used to evaluate whether
the resources were placed in an organized format for users to browse efficiently. Thus, these items were com-
bined into 1 item – The format design is clear and easy to browse (e.g. the icons, graphs, and words on hyper-
links are explicit, and the site map and help information are available). As a result, there are a total of 12 items
in this category. The ratings of Rep., Imp., Cla., CVI, and FVI in the finalized criteria were 3.60, 3.55, 3.70,
.89, and .98, respectively. Most of the ratings improved based on the experts’ suggestions. Specifically, there
was an increase of .09 in Rep. and .08 in Imp. In addition, the rating of Cla. in the finalized criteria seemed to
drop .02. However, this rating should actually be much higher than 3.70 because the number of Cla. for the
finalized criteria we presented above, for the sake of being conservative about the research results, is for the
remaining refined criteria (one step before the revision of the wording or sentences of the finalized criteria).
That is, based on the Rep. and Imp. of 3.50 on a four-point Likert-type scale, we decided to keep some refined
criteria to develop the finalized criteria. We then recalculated the Rep., Imp., Cla., CVI, and FVI of the
remaining criteria and revised the wording or sentences for the remaining criteria based on the rating of
Cla. and suggestions provided by the experts. After the revision of the remaining refined criteria, the actual
rating of Cla. in the finalized criteria should be much higher than the remaining refined criteria. We present
the 12 finalized criteria as follows:
I. General information
1. The website includes the background information of the developer.
2. The website includes the contact information of the developer.
3. The resource provided by the website fulfills the stated purpose.
4. The format design is clear and easy to browse (e.g. the icons, graphs, and words on hyperlinks are expli-
cit, and the site map and help information are available).
5. There are no failed links (e.g. when the webpage is temporarily inaccessible, a local mirror site is
available).
6. When downloading files, the format and estimated duration are provided.
7. The information is objective without stereotypical or obvious discrimination.
8. The wording is accurate and fluent.
9. The website includes the date of the last revision, or informs users how often the website is updated.
10. The website specifies the sources of the data.
11. The website states the copyright and limits of authority of the data.
12. The best resolution is specified.
3.3.2. Criteria for integrated English learning

This category may be used in evaluating the overall information relevant to English learning materials as
well as the common features of the four language skills. The criteria in this category identify English learning
content, approaches, relevant materials, and evaluation. For the refined criteria, the ratings of Rep., Imp.,
Cla., CVI, and FVI were 3.35, 3.27, 3.54, .83, and .88, respectively. In this category, 4 new items were added.
These include (a) the content design should be based on the learning topic, (b) progressive approaches
should be applied to language learning skills, (c) comprehension tests should be included, and (d) the tests
should be related to the topics. Eleven items, such as designs for stimulating users’ multi-sense organs and
topics based on daily experiences, were removed due to their vagueness or insufficient Rep. Moreover, we
amalgamated and assimilated 6 similar items into 2 items according to the experts’ suggestions, for example,
4 items – the website should include enough text, pictures, and video. The multimedia will be meaningful as
they are designed to fit in with the content. Thus, the integrated item – the content is appropriately enhanced
by multimedia effects, such as the uses of font type/size, images, animation, audio, and video. (e.g. When
teaching conversation appropriate for restaurants, the multimedia effects for the conversation are presented
by audio, video, and subtitles to help students learn through virtual role playing.) The finalized criteria for
integrated English learning turned out to be 13 items, and the ratings of Rep., Imp., Cla., CVI, and FVI
were 3.75, 3.51, 3.78, .95, and .97, respectively. All of the ratings greatly increased from the refined
criteria. Specifically, there was an increase of .40 in Rep. and .24 in Imp. The 13 finalized criteria are as
follows:
II. Integrated English learning

13. The content is designed to be closely related to English learning topics (e.g. the topic of Christmas
includes relevant cultural information).
14. Various learning methods and opportunities are provided (e.g. online tests, discussion forums, listening
practice/exercises).
15. The learning content is rich and diverse (e.g. the content includes a wide variety of learning themes
related to the learning topic).
16. The content is appropriately enhanced by multimedia effects, such as the uses of font type/size, images,
animations, audio, and video. (e.g. When teaching conversation appropriate for restaurants, the multi-
media effects for the conversation are presented by audio, video, and subtitles to help students learn
through virtual reality role playing.)
17. The learning activities blend well together (e.g. the topic of cooking: listening to cooking vocabulary
(such as beat, blend, chop, and stir), seeing English cooking videos that exemplify vocabulary, phrases,
and conversation used in a cooking scheme, and having evaluation exercises about the vocabulary and
conversation learned).
18. The learning methods follow a gradual progress (e.g. the reading activities involve pre-, during- and
post-reading activities).
19. The level of learning materials/activities is specified (e.g. for novice learners or for advanced learners).
20. The learning materials are relevant to the objective of the website (e.g. advanced websites contain
advanced learning materials such as international news; introductory websites contain basic/easier con-
tent such as events that occur in our everyday life).
21. Frequently asked questions and answers are posted on the website.
22. Learning strategies are provided (e.g. the meaning of a new word can be inferred from the reading
context).
23. Online dictionary or other related website links are provided.
24. To help learners understand their learning achievement, the website includes comprehension exercises/
tests.
25. The comprehension exercises/tests are designed based on the content of the website.
3.3.3. Criteria for listening

Listening input will be acquired efficiently by users when there are multimedia aids. For example, the visual
context can enhance learners’ comprehension, and the function of the accurate playback enables users simply
to find the specific segment. Thus, listening resources, intonation relevant to a natural semantic situation, as
well as appropriate multimedia applications are considered in this study. In this category, 4 less important/rep-
resentative items were taken out, 3 relevant items integrated, and no new ideas added. As a result, the finalized
criteria for listening consist of 4 items. The ratings of Rep., Imp., Cla., CVI, and FVI were 3.64, 3.48, 3.68, .92,
and .96, respectively in the preliminary criteria and 3.72, 3.59, 3.73, .97, and .96 in the finalized criteria, respec-
tively. Most of the ratings increased from the refined criteria. Specifically, there was an increase of .08 in Rep.
and .11 in Imp. We list the 4 finalized criteria as follows:
III. Listening
26. The content of the listening comprehension is in an authentic context (e.g. real-life conversation, news,
story, or academic speeches).
27. In listening comprehension, the speech intonation is contextually appropriate to the specific area that the
website developer is from (note: intonation is different between the British/Irish and American/Canadian
versions of speech).
28. In listening comprehension, the pronunciation is recognized by most English speakers in the specific area
that the website developer is from (note: pronunciation is different between the British/Irish and Amer-
ican/Canadian versions of speech).
29. Multimedia-aided listening materials are provided (e.g. pictures, flash, or videos).
3.3.4. Criteria for speaking

The focus of the speaking criteria is to judge the description of hardware requirements, links, and intona-
tion usages, as well as opportunities to observe other people’s spoken words. Speaking instruction is hard to
implement well on a website due to limited technology support. In order to promote learners’ oral skills, the
criteria thus contain an example of linkages of sound as well as online feedback information in evaluating the
speaking materials. The design for users to observe other people’s speaking tasks is a process of knowledge
modeling because they can be aware of other people’s merits or shortcomings in their speaking performance.
Hence, in this category, we included 2 new items: specifying needed hardware requirements, and observing
other people’s speaking tasks. Five insignificant items were removed from the criteria. Consequently, this cat-
egory contains 6 evaluation items. The ratings of Rep., Imp., Cla., CVI, and FVI were 3.56, 3.53, 3.72, .89, and
.96 in the refined criteria and 3.62, 3.55, 3.73, .92, and .96 in the finalized criteria, respectively. Most of the
ratings increased from the refined criteria. Specifically, there was an increase of .06 in Rep. and .02 in Imp.
We present the 6 finalized criteria as follows:
IV. Speaking
30. In connection to speaking, the website specifies the needed hardware and software requirements.
31. Authentic examples of sound links are adequately provided.
32. The strategy of appropriate usage of tone is provided.
33. Examples of interactive conversation are provided.
34. In the speech design, online learners can communicate with each other in English.
35. Online feedback is given based on the recorded work of the learners (e.g. learners’ frequent mistakes in
pronunciation are pointed out).
3.3.5. Criteria for reading

The reading category includes an evaluation of vocabulary presentation, classification of different articles,
various assessments, reading resources, and self-evaluation designs. In this category, 6 items were deleted, 3
items integrated, and no items added. The deleted items included synonyms for new vocabulary, reading skills,
and summary of the articles. Experts have suggested that providing an example sentence for a word is more
important than giving a synonym. Moreover, the definition of synonym may be ambiguous since two kinds of
synonym exist: morphology and semantics. For these reasons, this item was not included. The importance of
containing reading skills on the website depends on the length or level of the summary of the article. Thus, the
experts thought that this item should not be required for every situation and should be based on the purposes
of the website. For integration, the experts believe that relevant resources, pictures, and background designs
for the reading are integral parts to increase learners’ comprehension and thus should be combined into 1 item.
As a result, the finalized reading criteria contain 5 items. The ratings of Rep., Imp., Cla., CVI, and FVI were
3.47, 3.37, 3.78, .85, and .99 in the refined criteria and 3.69, 3.66, 3.86, .92, and 1.00 in the finalized criteria,
respectively. All of the ratings increased from the refined criteria. Among them, the increase of .22 in
Rep. and .29 in Imp were the most significant. The 5 finalized criteria are as follows:
V. Reading
36. New vocabulary in an article is highlighted using special effects (e.g. different colors or italics are used for
new words).
37. Articles are categorized based on their characteristics (e.g. the topics or styles of the articles).
38. Through various interesting tasks, learners can self-evaluate their reading abilities. (e.g. through a cloze
test, multiple-choice test, or crossword).
39. For new vocabulary, definitions and explanations are provided.
40. Additional reading resources are provided for learners (e.g. related reading vocabulary, articles, hyper-
links, or websites).
3.3.6. Criteria for writing

Writing is an output process in language learning. The website should not only include an opportunity for
writing, but also present the writing skills. Besides, it is important to provide ease in revision and the provision
of relevant writing resources and peers’ writings as scaffolding for improving learners’ writing skills. This cat-
egory contains an evaluation of the written aspects of the website, as follows: discussion boards, activities for
writing, samples of writing types, and relevant writing resources. Based on experts’ suggestions, 7 criteria were
removed. For example, the experts doubted that providing different fonts for typing is pertinent to the concept
of writing. The flexible fonts for typing might only increase learners’ interest in arranging different fonts, but
not in their writing quality; this function would be more suitable as an option in writing courses and, thus, was
removed from the list of evaluation criteria. Moreover, 3 similar items (e.g. vocabulary, phrases, sentence
structures, or related topics) were integrated into 1 item. Because the ideas were regarded as information rel-
evant to the topics, the refined item was revised to: Writing resources related to the writing topic are provided
(e.g. vocabulary, phrases, sentence structures, or related topics). The finalized criteria for writing contained 6
items. The ratings of Rep., Imp., Cla., CVI, and FVI were 3.52, 3.39, 3.77, .88, and .96 in the refined criteria
and 3.81, 3.61, 3.90, .98, and .97 in the finalized criteria. All of the ratings increased from the refined criteria.
Specifically, there was an increase of .29 in Rep. and .22 in Imp. The 6 finalized criteria are listed as follows:
VI. Writing
41. Users are encouraged to communicate in English on the discussion board.
42. Users can discuss the composition with online advisers (e.g. through email or discussion forums).
43. A guided composition activity is provided (e.g. writing with pictures or filling in the dialogue of a comic).
44. Examples of various literary genres are provided (e.g., novel, short story, drama, poetry, and biography).
45. Writing resources related to the writing topic are provided (e.g. vocabulary, phrases, sentence structures,
or related topics).
46. Users can view writing from peers, famous writers, journalists, or authors of magazine articles.
3.3.7. The complete set of finalized criteria

As mentioned above, there were a total of 46 finalized criteria in 6 categories. To improve the quality of the
items in each category, we removed the items whose ratings in Rep. or Imp. were below 3.50, we revised the
unclear concepts of the items based on the rating of Cla. and on the experts’ suggestions, and we included
items that were recommended by at least three of the experts. On average, the ratings in each category
increased after the revisions; the ratings of Rep., Imp., Cla., CVI, and FVI were 3.51, 3.42, 3.70, .87, and
.95 in the refined criteria and 3.70, 3.58, 3.78, .94, and .97 in the finalized criteria. All of the ratings increased
from the refined criteria. On average, there was an increase of .19 in Rep. and .16 in Imp. From the results, we
can confirm that the quality as well as the conciseness of these criteria were greatly improved. Besides, the IR
for the finalized criteria was .72.
4. Conclusions
Users of the Internet were initially impressed that they could find useful information of any kind. However,
the problem has become one of sifting through a mass of advertising material and vanity publications in order
to find information of high quality (Smith, 1997). Further, courseware, especially web-based material, in com-
parison to textbooks, is less likely to have gone through any editorial review. This situation places a heavy
burden on teachers and learners interested in using web-based materials; as a result, they make use of a variety
of resources and evaluation methods to select websites (Robb & Susser, 2000). Research (Harris, 1997; Laura,
1999; Wilkinson et al., 1997; Warschauer et al., 2000) in educational technology areas has indicated that devel-
oping evaluation criteria for web-based materials is imperative and is necessary for guiding users in screening
resources. Thus, in order to provide a tool for selecting and evaluating the quality of English learning websites,
this study has attempted to develop a complete set of criteria which cover important aspects of English learn-
ing information. The criteria have been developed based on related CALL theories, CALL evaluation studies,
and language learning approaches/methods. In addition, the development of the criteria is based not only on
experts’ opinions, but also on viewpoints and suggestions of learners and teachers in practice to clarify their
preferences and needs for websites for instructional use. Finally, researchers in the social sciences study com-
plex constructs for which valid and reliable measures are needed. If the researchers develop a new measure for
a particular construct, a content validity study should be conducted (Rubio et al., 2003). Thus, in this study,
the suggestions of experts in the area of CALL programs about validating and revising the refined criteria
greatly improved the validity and reliability of the finalized criteria.
There were initially 74 items in the draft version of the criteria. Later, a total of 92 items in the refined cri-
teria were generated after the interviews with English teachers and students. Based on the expert validity sur-
vey, we further ameliorated the refined criteria to develop the finalized criteria, which contained 46 items.
These finalized criteria were classified into 6 categories (the number of items within the category) – general
information (12), integrated English learning (13), listening (4), speaking (6), reading (5), and writing (6).
The complete set of criteria evaluates English learning websites on two levels – macro-level in evaluating gen-
eral information and overall English learning materials on the websites, and micro-level in evaluating the four
language skills on the websites. More specifically, the general information evaluates the authority, accuracy,
and format of the learning websites. The integrated English learning evaluates the overall information relevant
to English learning materials as well as the common features of the four language skills. The criteria for lis-
tening, speaking, reading, and writing, for example, examine the suitable intonation, skills of discourse, clas-
sification of reading articles by their attributes, and the proper use of discussion boards for students when
practicing their writing skills. After the expert validity survey, the ratings of our finalized criteria received
Rep. of 3.70, Imp. of 3.58, Cla. of 3.78, CVI of .94, FVI of .97, and IR of .72. With these criteria, we hope
that teachers and web designers can develop quality English learning websites more theoretically and specif-
ically and judge the quality or appropriateness of English learning information, helping learners to increase
their English ability more effectively and conveniently by using these websites.
5. Implications for teachers and instructional/courseware designers
As to pedagogical implications, English teachers and learners have evidence of the importance of applying
the criteria with multiple principles to evaluate/create English learning websites. Additional criteria will be
required based on the different needs of target users. In addition, the findings of this study can be applied
to programs and seminars which aim to implement computer technology in English learning curricula. The
trend of incorporating computer technology into English learning courses is now in widespread use in elemen-
tary, junior high, and high schools. Furthermore, many countries are endeavoring to promote technologically
integrated English learning nowadays. Although technology equipment is sufficient in many schools, guidance
or rules on how to follow up or apply the technology to relevant English subjects is insufficient. Due to the
aforementioned considerations, explicit criteria for selection of English learning resources can guide teachers
in preparing and implementing their curricula using computer technology.
Besides, the interview indicated that learners were enthusiastic about using computers in learning. Teachers
play an important role in guiding students to choose a suitable website to surf nowadays in order to avoid the
flood of information they will encounter. With the criteria as a guide, teachers can study methods to direct
learners to choose appropriate websites. With teachers’ direction, students will learn with more efficiency
and with more enjoyment on the Internet. Finally, the developed criteria can guide web designers to develop
English learning websites by reminding them that a good website should be based not only on graphic design,
ease in browsing, and organization, but also on the theoretical perspectives of instructional design and the spe-
cial attributes of each subject, such as language learning.
6. Suggestions for future study
We have proposed the following recommendations for further study:
1. The development of criteria for evaluating English learning websites in this study has taken a first step
toward providing a set of theory-based and specific criteria for teachers and course designers to use in
designing reliable and well-designed websites and for learners to use in identifying good websites. With
the rapid changes in the technological world, this study can be used by other researchers in language learn-
ing fields as a reference concerning the consequences of updating or developing more comprehensive eval-
uation criteria for English learning websites or related studies to construct better learning environments.
2. The criteria developed in this study were based on opinions from interviews with English teachers and
learners and on the expert validity survey conducted with the CALL experts. Although the criteria devel-
oped were validated and significant, further study can investigate the effectiveness of learners’ academic per-
formance and the changes in learners’ motivation when teachers and learners actually use the criteria
developed as a guide in English learning.
3. The population of the expert validity survey validating the finalized criteria could include a more diverse
one. Although the participants in the interviews for developing the refined criteria were English teachers
and learners, the population of the expert validity survey was limited to professors in the field of CALL
in Taiwan. Thus, the results of this study may not be generalized to the whole population of English teach-
ers and learners, software designers, or CALL experts in other countries. However, with this study serving
as a cornerstone, future studies can be conducted with most of the above groups to gain more diverse opin-
ions and suggestions for revising or updating the criteria for English learning websites.
4. Although this study has provided an initial list of evaluation criteria for use in evaluating English learning
websites, the criteria developed do not report the rankings of the websites for ease of website selection. It is
recommended that further study could continue the effort and develop a checklist where users can see expli-
cit rankings among the evaluated websites. Also, further study can attempt to assign weights to thematic
categories or items within each category for generating more accurate scores for the evaluated websites.
Appendix A
Interview questions
1. What is your experience in browsing or using general websites or instructional websites?

The interviewees were provided with three websites for references:
Sesame Workshop: http://www.sesameworkshop.org/sesamestreet/

StoryPlace: http://www.storyplace.org/
Starfall: http://www.starfall.com/
(Leading questions: What kinds of websites do you usually browse? Among these websites, how do you feel
when you browse them? What are their advantages and disadvantages? For example, does the resource pro-
vided by the website fulfill the stated purpose? Which functions are useful or convenient to use? Can the
resource be accessed reliably? Is it convenient for you to search information on these websites? Are their
format designs or navigation functions clear and easy to follow?)
2. In your online experience, how do you feel about the English learning or instructional websites you have
used?
(Leading questions: Would you briefly describe the websites you have browsed? How do you feel
about these websites? For example, do they provide a varied and rich learning content, and arouse
the learner’s motivation by using pictures, flash, video clips, or special multimedia effects? Among
the English websites you have browsed or used for instructional purposes, which one is most impres-
sive? Why?)
3. Based on the advantages of the English websites you mentioned before, what are the most impor-
tant functions and elements in English learning, including listening, speaking, reading, and writing
skills?
4. What are the differences between traditional English teaching/learning and teaching/learning English with
the aid of websites? What kinds of functions or elements provided by the websites can be used to support and
help learners to learn English?
5. In the English listening part, what kinds of functions and elements should a good English learning website
have? How is it possible to integrate technology and important elements of training listening skills into the
websites to improve learners’ listening skills?
6. In the English speaking part, what kinds of functions and elements should a good English learning website
have? How is it possible to integrate technology and important elements of training speaking skills into
websites to improve learners’ speaking skills?
7. In the English reading part, what kinds of functions and elements should a good English learning website
have? How is it possible to integrate technology and important elements of training reading skills into
websites to improve learners’ reading skills?
8. In the English writing part, what kinds of functions and elements should a good English learning website
have? How is it possible to integrate technology and important elements of training writing skills into web-
sites to improve learners’ writing skills?
Appendix B
Refined criteria for the evaluation of English learning websites
Categories Criteria
General 1.Users can link to information about the sponsor and author
information 2.The author is an expert or is qualified to deliver reputable resources
3.Users can contact authors for clarification
4.The resource is designed to meet users’ needs*
5.The information is presented objectively (e.g. no racial or sexual bias)
6.Users can see what institution published the documents
7.The website includes site map for users to easily understand its frame*
8.The webpage is easy to surf and has a simple design
9.Users can distinctly understand how to surf the website (e.g. the sources on the
website are arranged distinctly)
10. The website provides the latest revised information*
11. The updated information is within the past 6 months
12. The webpage has consistent resources from screen to screen (e.g. within the
connection of ‘‘About Website,’’ the information on each webpage is related to the
introduction of the website)*
13. The website is always available (e.g. each webpage can be surfed; relevant
connection is available to surf stably)
14. The website includes search engines to find relevant information
15. The data can be downloaded successfully within 10 s.
16. The data can be obtained immediately (e.g. the lesson plan presented can be
downloaded by other teachers)
17. The website includes the relevant and free-download software (e.g. providing PDF
software for users to easily download and read on the site)
18. It is easy to search the website via different search engines (e.g.Google and Yahoo)
19. The website is available to non-members*
20. The data is placed in different categories based on its characteristics (e.g. similar data
is put together logically)*
Integrated English 21. The items on the website are decorated using the same theme (e.g. the topic of
learning Christmas with relevant pictures on the site)
22. The topic is related to daily experiences (e.g. dining conversation)
23. The linking name is identified in English (e.g. using ‘‘home’’ means back to the
main page)
24. The website includes appropriate situational illustrations (e.g. Christmas for )
25. In the picture-based instruction, there are meaningful directions to instruct learners
to click (e.g. online directions can guide users to interact with Flash documents)
26. Each picture has meaningful symbols on the page (e.g. back to main page )
Appendix B (continued)
Categories Criteria
27. The content is constructed and arranged in a series of relevant activities (e.g. topic of Disney:
vocabulary ! reading story ! evaluation)
28. The content is organized to create a stimulating learning experience (e.g. listening, pictures,
and articles)
29. Music is presented along with text or pictures to enhance learning rhythm
30. The input is focused on providing learners with opportunities to interact with different kinds of
learning content (e.g. online test, discussion board, and listening comprehension)
31. The website has sufficient text-based input
32. The website has sufficient picture-based input
33. The website has sufficient animation-based input
34. Text, image, and sound are appropriately coordinated (e.g. in learning dining language, it is
presented with suitable conversation, relevant pictures, and movies)
35. The material is classified by level (e.g. materials for beginning or intermediate level)*
36. There is a system of ‘‘help’’ to answer users’ questions about learning (e.g. users can key in the
puzzle they encountered, and the helper will help them to figure it out)*
37. There are abundant and multiple learning resources on the website (e.g. different English
learning topics)
38. Questions on different levels are mixed up and arranged in the evaluation*
39. The website includes relevant English learning skills (e.g. in reading, the key concept shows up
in the first sentence of each paragraph)
40. The website includes links to online dictionaries (e.g. a link to Cambridge Dictionaries Online)*
41. The material is accompanied with meaningful statements based on daily situations (e.g. the
topic is about the ceremonies of New Year)
42. The task occurs in situational learning
43. The content of the learning task is relevant and consistent (e.g. Christmas topics ! evaluate
users on how to spell relevant vocabulary ! creates a Christmas card)*
44. Users can distinctly understand their abilities from the online test*
Listening 45. There are text-aids after the listening comprehension (e.g. presenting the whole content after
the listening)
46. The content is combined with a series of pictures which help learners to comprehend the
listening comprehension
47. The content is presented with situational animation (e.g. on the topic of traveling zoo, users
can simultaneously watch an animation about walking in the zoo)
48. The intonation is appropriate
49. The pronunciation is correct*
50. The listening comprehension takes place in authentic situations (e.g. conversation in the
restaurant)
51. The listening comprehension is designed to guide users to answer the questions (e.g. the
questions and listening content are relevant)*
52. The division of listening materials is based on users’ attention span (e.g. users’ attention span is
1 min, so the material shouldn’t be longer than 1 min according to teachers’ experience)*
53. There are different levels of listening materials for users to select*
54. The listening comprehension includes introspective questions*
Speaking 55. The website provides feedback towards recorded information (e.g. feedback can be based on
users’ pronunciation or intonation)
56. The design is available for learners to communicate with native or non-native speakers
57. The content takes place in real life settings
(continued on next page)
Appendix B (continued)
Categories Criteria
58. The speaking quality adequately conveys its meaning between speakers
59. There are examples of conversation on the specific topics (e.g. commercial conversation or
talks about traveling)
60. The website includes examples of speech*
61. The website includes colloquial usages (e.g. local communicative speech)*
62. The website presents pronunciation skills (e.g. the mouth of /e/ in pronunciation)*
63. The website presents skills of using intonation correctly*
64. The website presents skills of recognizing relationships between two vocabulary words*
Reading 65. New vocabulary is presented using special forms (e.g. using different colors or bold to identify
the new vocabulary)
66. There is a section to instruct about the new vocabulary (e.g. link to another webpage to learn
the new words)*
67. The website has relevant stories that include the new words*
68. New vocabulary is accompanied with resources of its synonyms*
69. The website presents reading skills
70. An abundance of reading resources is supplied for learners to select from (e.g. for reading
Harry Potter, the website can link to other online information that introduces the author)
71. The website includes a summary of the article*
72. There are pictures to enhance users’ comprehension in reading (e.g. the story of a caterpillar’s
metamorphosis with its picture)
73. Suitable backgrounds are displayed
74. The website includes self-evaluation functions*
75. The reading materials are classified by level*
76. The articles are relevant to daily experiences*
77. The website includes the major sentence structure of the article*
Writing 78. Discussion boards are provided for students to exchange opinions
79. Discussion topics are related to users’ ages (e.g. news for adults; summer vacation planning for
students)
80. It is easy to change fonts, colors, and sizes in the discussion boards
81. Users can use programming language to enrich their written article (e.g. using html to post the
pictures or make a table)
82. The website involves different writing modes (e.g. how to write the first paragraph)
83. The website includes relevant resources for different articles*
84. The website includes different types of writing samples (e.g. biographic sketches or prose)*
85. There are many pictures to increase learners’ motivation to write
86. Users can discuss the writing content with instructors online (e.g. via email or discussion
boards)
87. The website includes relevant writing skills*
88. Users can observe others people’s written works online*
89. The writing topic is related to daily situations and is meaningful (e.g. topic about my
aspirations)*
90. Multiple approaches are used to promote users’ writing ability (e.g. writing based on the
pictures or fill in the blanks of comics)
91. The website includes relevant vocabulary for the writing topic*
92. The website includes relevant phrases for the writing topic*
Note: * represents the new or revised items to the preliminary version of criteria.
References
Abbott, G., McKeating, D., Greenwood, J., & Wingard, P. (1981). The teaching of English as an international language. A practical guide.
London: Collins.
American Association of Law Libraries. (2005). General website evaluation criteria. Retrieved March 10, 2007 from http://
www.aallnet.org/committee/aelic/criteria.html.
Barrett, P. (2001). Assessing the reliability of rating data. Retrieved March 10, 2007 from http://www.pbarrett.net/rater.pdf.
Bradin, C. (1999). CALL issues: Instructional aspects of software evaluation. In J. Egbert & E. Hanson-Smith (Eds.), CALL environments:
Research, practice and critical issues (pp. 159–175). Alexandria, VI: TESOL.
Chapelle, C. A. (1998). Multimedia CALL Lessons to be learned from research on instructed SLA. Language Learning and Technology, 2,
22–34.
Clayton, R. W. (2006). Instructional media & design. Criteria for evaluating the quality of online courses. Retrieved March 10, 2007 from
http://www.imd.macewan.ca/imd/content.php?contentid=36.
Comer, P., & Geissler, C. (1998). A methodology for software evaluation. In SITE 98: Society for information technology & teacher
education.
Crandall, J. (2003). They do speak English: World Englishes in US schools. Retrieved March 10, 2007 from http://www.cal.org/resources/
archive/news/2003summer/englishes.html.
Davies, G. (2001). New technologies and language learning: A suitable subject for research? In A. Chambers & G. Davies (Eds.), ICT and
language learning: A European perspective (pp. 13–27). Lisse: Swets & Zeitlinger.
Davis, L. (1992). Instrument review: Getting the most from your panel of experts. Applied Nursing Research, 5, 194–197.
Dörnyei, Z. (1996). Moving language learning motivation to a larger platform for theory and practice. In R. L. Oxford (Ed.), Language
learning motivation: Pathways to the New Century (pp. 71–80). Honolulu: University of Hawaii Press.
Furner, J. M., & Daigle, D. (2004). The educational software/website effectiveness survey. International Journal of Instructional Media,
31(1), 61–77.
Garson, W. G. (2001). Reliability. PA765: Quantitative research in public administration. Retrieved March 10, 2007 from http://
www2.chass.ncsu.edu/garson/pa765/reliab.htm#intraclass.
Grant, J. S., & Davis, L. L. (1997). Focus on quantitative methods selection and use of content experts for instrument development.
Research in Nursing & Health, 20, 269–274.
Harris, R. (1997). Evaluating Internet research sources. Media awareness network. Retrieved March 10, 2007 from http://www.virtu-
alsalt.com/evalu8it.htm.
Hemard, D., & Cushion, S. (2000). From access to acceptability: Exploiting the web to design a new CALL environment. Computer
Assisted Language Learning, 13, 103–118.
Holliday, L. (1999). Theory and research: Input, interaction, and CALL. In J. Egbert & E. Hanson-Smith (Eds.), CALL environments:
Research practice and critical issues (pp. 181–188). Alexandria, VI: TESOL.
Hubbard, P. (1988). An integrated framework for CALL courseware evaluation. CALICO Journal, 6, 51–71.
Internet World Stats. (2007). Top ten language used in the web. Internet world users by language. Retrieved March 10, 2007 from http://
www.internetworldstats.com/stats7.htm.
Lasagabaster, D., & Sierra, J. M. (2003). Students’ evaluation of CALL software programs. Educational Media International, 3–4, 293–304.
Laura, G. (1999). Evaluating net evaluators. Searcher, 7(2), 57–66.
Lee, S. H., Choi, W., & Byun, H. (1996). Criteria for evaluating and selecting multimedia software for instruction. In Proceedings of
selected research and development presentations at the 1996 national convention of the association for education communications and
technology (pp. 413–422). Indianapolis, IN: AECT (ERIC Document Reproduction Service No. ED 397 812).
Liu, M., Moore, Z., Graham, L., & Lee, S. (2003). A look at the research on computer-based technology use in second language learning:
A review of the literature from 1990–2000. Journal of Research on Technology in Education, 34(3), 250–273.
Lonfils, C., & Vanpary, J. (2001). How to design user-friendly CALL interfaces. Computer Assisted Language Learning, 14, 405–417.
Najjar, L. J. (1998). Principles of educational multimedia user interface design. Human Factors, 40(2), 311–323.
Nielsen Norman Group. (2006). First principles of interaction design. Retrieved March 10, 2007 from http://www.asktog.com/basics/
firstPrinciples.html.
Peterson, M. (2000a). Creating hypermedia learning environments: Guidelines for designers. Computer Assisted Language Learning, 11,
115–124.
Peterson, M. (2000b). Direction for development in hypermedia design. Computer Assisted Language Learning, 13, 253–269.
Plass, J. L. (1998). Design and evaluation of the user interface of foreign language multimedia software: A cognitive approach. Language
Learning and Technology, 2, 35–45.
Reeves, T. C. (2001). A model of the effective dimensions of interactive learning on the World Wide Web. Retrieved March 10, 2007 from
http://it.coe.uga.edu/~treeves/WebPaper.pdf.
Robb, T. N., & Susser, B. (2000). The life and death of software: Examining the selection process. CALICO Journal, 18(1), 41–52.
Rubio, D. M., Berg-Weger, M., Tebb, S. S., Lee, E. S., & Ruch, S. (2003). Objectifying content validity: Conducting a content validity
study in social work research. Social Work Research, 27(2), 94–104.
Smith, A. G. (1997). Testing the surf: Criteria for evaluating Internet information resources. Public-Access Computer Systems Review, 8(3),
1–14.
Sonoma State University. (2005). Evaluating web resources. Retrieved March 10, 2007 from http://library.sonoma.edu/research/subject/
eval.html.
Susan, E. B. (2006). Evaluation criteria. Retrieved March 10, 2007 from http://lib.nmsu.edu/instruction/evalcrit.html.
Susser, B. (2001). A defense of checklists for courseware evaluation. ReCALL, 13, 261–276.
Susser, B., & Robb, T. N. (2004). Evaluation of ESL/EFL instructional web sites. In S. Fotos & C. M. Browne (Eds.), New perspectives on
CALL for second language classrooms (pp. 279–295). Mahwah, NJ: Lawrence Erlbaum Associates.
Tergan, S. (1998). Checklists for the evaluation of educational software: Critical review and prospects. Innovations in Education and
Training International, 35(1), 9–20.
University of Maine System. (2004). Checklist for evaluating web resources. Retrieved March 10, 2007 from http://library.usm.maine.edu/
research/researchguides/webevaluating. html#.
Vogel, T. (2001). Learning out of control: Some thoughts on the World Wide Web in learning and teaching foreign languages. In A.
Chambers & G. D. Davies (Eds.), Information and communications technology: A European perspective. Lisse: Swets & Zeitlinger.
Warschauer, M., Shetzer, H., & Meloni, C. (2000). Internet for English teaching. Alexandria, VA: TESOL Publications.
Wilkinson, G. L., Bennett, L. T., & Oliver, K. M. (1997). Evaluation criteria and indicators of quality for Internet resources. Educational
Technology, 37(3), 52–59.
Williams, M., & Burden, R. (1997). Psychology for language teachers: A social constructivist approach. Cambridge: Cambridge University
Press.
Wuensch, K. L. (2003). Inter-rater agreement. Retrieved March 10, 2007 from http://core.ecu.edu/psyc/wuenschk/docs30/InterRater.doc.

Comprehensive Evaluation Criteria For English Learning Websites Using Expert Validity Surveys

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

Comprehensive Evaluation Criteria For English Learning Websites Using Expert Validity Surveys

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Comprehensive Evaluation Criteria For English Learning Websites Using Expert Validity Surveys

Uploaded by

Copyright:

Available Formats

Available online at www.sciencedirect.

Computers & Education 51 (2008) 403–422

Comprehensive evaluation criteria for English learning

2.1.2. Data instruments

2.1.3. Data analysis

2.2. Expert validity survey

2.2.2. Data instruments

2.2.3. Data analysis

3. Results and discussion

3.1. Phase I: preliminary criteria

3.2. Phase II: reﬁned criteria

3.3. Phase III: ﬁnalized criteria

3.3.1. Criteria for general information

3.3.2. Criteria for integrated English learning

II. Integrated English learning

3.3.3. Criteria for listening

3.3.4. Criteria for speaking

3.3.5. Criteria for reading

3.3.6. Criteria for writing

3.3.7. The complete set of ﬁnalized criteria

5. Implications for teachers and instructional/courseware designers

6. Suggestions for future study

We have proposed the following recommendations for further study:

1. What is your experience in browsing or using general websites or instructional websites?

Sesame Workshop: http://www.sesameworkshop.org/sesamestreet/

Reﬁned criteria for the evaluation of English learning websites

You might also like