Using Corpora in Language Learning The S
Using Corpora in Language Learning The S
of language in
Technology-Enhanced
Learning
ORGANIZING COMMITTEE
Charles Alderson
Lancaster University
1. Introduction
Computer-adaptive testing (CAT) involves presenting learners with items thought to
be most suitable for them, and adjusting the selection of items in light of the learners’
responses to previous items. The classic case of CAT involves the construction of a
bank (collection) of test items which have been calibrated in terms of their empirical
difficulty. Learners are typically initially presented with an item of medium difficulty.
If their response is correct, they will then be presented with a more difficult item. If
their response is incorrect, they are then presented with an easier item. If their response
to the second item is correct they are given a more difficult item and if incorrect, an
easier item. The computer calculates the learner’s ability level (or score) on the fly as
well as the reliability of the test as administered up to that point. Items from the bank
are presented to test-takers following specially developed algorithms for the selection
of the initial test item, subsequent test items, and a rule for concluding the test – ie the
criteria to be met for the test to be terminated. Typically, the test is terminated when a
given level of reliability has been reached, or when a pre-determined number of items
has been delivered.
The advantages are that tests can be tailored to a learner’s ability level rather than
wasting time and effort by presenting them with items that are far too easy or far too
difficult. As a consequence, tests can be markedly shorter than traditional linear tests,
and thus more efficient. In addition, since each learner takes a different test than his or
her fellow test-takers, cheating is made much more difficult. The major disadvantages
are that in order to be able to predict a learner’s ability level items need to be pre-
tested and analysed using an Item Response Theory model – IRT (which allows the
estimation of a learner’s ability level independent of the difficulty of the items) – but
IRT requires relatively large numbers of pilot test candidates for reliable ability
estimates. Secondly, in high-stakes testing situations (like the TOEFL) learners are
often schooled in remembering which items they have taken, and the item bank can be
reconstructed if sufficient numbers of candidates recall the items (this has happened in
China, for example, where CAT versions of TOEFL were compromised). In addition,
truth-in-testing laws in some states of the USA mean that the test items constituting
the basis of the test-taker’s score have to be made available to test-takers on request.
This inevitably compromises the test bank. As a result, ETS (the developers of
TOEFL) have ceased developing CATS since, as they put it, “feeding the CAT is too
expensive”.
mastery of items in a given skill or language use domain at each particular level could
be explored in some depth.
Another adaptation of the principle of adaptivity could take account of learner
characteristics (age, mother tongue, years of learning, gender, topics of interest, area of
academic study, etc). The learner would select from a menu of possible characteristics
those that applied to them, and the computer would only present items known to be
suitable for learners with such a profile – or, indeed, items known to be a challenge for
such learners.
Finally, instead of the computer making the decision on which next item to select, the
learner could be allowed to do so (by, for example, requesting a more difficult item, or
one on a different linguistic feature or another topic or academic discipline).
4. References
CHALHOUB-DEVILLE M. (ed) (2000), Computer-adaptive tests of reading, Cambridge, CUP.
CHALHOUB-DEVILLE M. and DEVILLE C. (1999), “Computer Adaptive Testing in Second
Language Contexts” in Annual Review of Applied Linguistics, 19: 273-299.
DUNKEL P.A. (1999), Considerations in Developing and Using Computer-Adaptive Tests to
Assess Second Language Proficiency, http://www.cal.org/resources/Digest/cat.html (last
accessed 7.8.07).
WAINER H. (ed.) (2000), Computer-Adaptive Testing: A Primer, Mahwah, NJ, USA,
Lawrence Erlbaum Associates.
VISL: A cross-language approach to NLP-
and games-based grammar teaching
Eckhard Bick
University of Southern Denmark
1. Introduction
VISL (Visual Interactive Syntax Learning) is an integrated interactive user interface
for teaching grammatical analysis on the Internet, developed at the University of
Southern Denmark, offering a unified system of analysis for 25 different languages, 8
of which are supported by live grammatical analysis of running text. For reasons of
robustness, efficiency and correctness, the system’s internal tools are based on the
Constraint Grammar formalism (Karlsson 1990), but users are free to choose from a
variety of notational filters, supporting different descriptional paradigms, with a
current teaching focus on syntactic tree structures, language independent grammatical
categories and the form-function dichotomy. VISL’s core NLP-programs use the
author’s hybrid multi-level parsers (http://beta.visl.sdu.dk), while teaching applications
(http://visl.sdu.dk) and corpus searching tools (http://corp.hum.sdu.dk) are
implemented as platform independent Java-programs and Perl-cgi’s. Though lexica
and parsing rules are developed individually for each language, a common CG and
treebank data format facilitates source data transfer into grammar teaching games,
structural or color based visualisation, and linguistic revision of corpus data.
across languages, built on a clear distinction between function and form, and tied to
visually stable clues, such as iconic abbreviations, symbols and colour coding.
The presentation/demo will demonstrate how these principles can be implemented in
the form of internet-based grammar games such as WordFall, Labyrinth, Syntris etc.,
as well as tree structures and corpus tools.
4. References
BICK E. (2005-1), “Grammar for Fun: IT-based Grammar Learning with VISL”, in
P.J. Henriksen (ed.), CALL for the Nordic Languages, København, Samfundslitteratur:
49-64 (Copenhagen Studies in Language).
BICK E. (2005-2), “Live use of Corpus data and Corpus annotation tools in CALL: Some new
developments in VISL”, in H. Holmboe (red.), Nordic Language Technology, Årbog for
Nordisk Sprogteknologisk Forskningsprogram 2000-2004 (Yearbook 2004), Copenhaguen,
Museum Tusculanum: 171-186.
DAVIES G. (ed.) (2007), Information and Communications Technology for Language
Teachers (ICT4LT), Slough, Thames Valley University (online: http://www.ict4lt.org/).
EUROCALL bibliography: http://www.eurocall-languages.org/resources/bibliography/books.
html
FITZPATRICK A. and DAVIES G. (eds) (2003), The Impact of Information and Communications
Technologies on the Teaching of Foreign Languages and on the Role of Teachers of
Foreign Languages.
KARLSSON et al. (1995), Constraint Grammar – A Language-Independent System for Parsing
Unrestricted Text, Mouton de Gruyter.
WARSCHAUER M. and HEALEY D. (1998), “Computers and language learning: An overview”,
in Language Teaching, 31: 57-71.
WARSCHAUER M. (1996), “Computer-assisted language learning: an introduction”, in
S. FOTOS (ed), Multimedia Language Teaching, Tokyo, Logos International.
CALL software design principles
and the integration of NLP
Jozef Colpaert
University of Antwerp
1. Introduction
The role and shape of solutions for language learning should not be based on a
technology-driven (not even NLP-driven) approach, but on an accurate specification of
what is needed for a particular language learning situation. We therefore have to create
a language learning environment first, defined as an architecture of actors and
components and their mutual interactions, before deciding on the language method,
media, systems, technologies and NLP routines needed.
Our current research focuses on the implementation of Distributed Language Learning,
a conceptual and methodological framework (DLL) for designing language learning
solutions in distributed environments.
4. References
COLPAERT J. (2007), “Distributed Language Learning”, editorial in Computer Assisted
Language Learning, Vol. 20, No. 1, February 2007: 1-3.
COLPAERT J. (2007). “Pedagogy-driven design for online language teaching and learning”, in
CALICO Journal 23:3: 477-497.
COLPAERT J. (2007). “Toward an ontological approach in goal-oriented language courseware
design and its implications for technology-independent content structuring”, in Computer
Assisted Language Learning, vol. 19, 2&3: 109-127.
COLPAERT J. (2004). Design of Online Interactive Language Courseware: Conceptualization,
Specification and Prototyping. Research into the impact of linguistic-didactic functionality
on software architecture. (Doctoral dissertation). University of Antwerp, 2004, 342 p. UMI
micropublication number 3141560. Also available on www.didascalia.be/doc-design.pdf.
HEIFT T. and SCHULZE M. (2007), Errors and Intelligence in Computer-Assisted Language
Learning. Parsers and Pedagogues, Milton Park (Routledge Studies in Computer-Assisted
Language Learning (ed. C. Chapelle).
CorpusCALL: Challenges and opportunities
1. Introduction
This talk is situated in the field of corpusCALL, the use of corpora within CALL
(Computer Assisted Language Learning), that has gained growing importance within
the CALL research community as can be seen from recent publications (Sinclair 2004,
Gavioli 2005, Braun et al. 2006, Chambers 2007), and the introduction of SIG
communities based on this theme within EuroCall & Calico.
Our research group is quite active within the field of corpusCALL: we have two
projects on this domain running at the moment. This should be placed in our general
interest in CALL, which has recently led to the foundation of ALT, Research Center on
CALL. Our current projects involve research topics such as harnessing collective
intelligence in e-learning environments, effectiveness of electronic learning platforms,
authoring systems for the creation of half-open and open supported tasks and
electronic language testing.
Corpora have been created and explored for a long time for different purposes
including language technology and linguistics. Only the last ten years has corpus
exploration moved to other domains, including foreign language learning and CALL
(Computer Assisted Language Learning). Moving away from language specialists to
4. References
BRAUN S., KOHN K. and MUKHERJEE J. (2006), “Corpus technology and language pedagogy”,
in English Corpus Linguistics, Vol. 3, Frankfurt am Rain, Peter Lang.
CHAMBERS A. (2007), Integrating Corpora in Language Learning and Teaching. Special
Issue of ReCALL, Volume 17(3).
DESMET P. and HÉROGUEL A. (2005), “Les enjeux de la création d’un environnement
d’apprentissage électronique axé sur la compréhension orale à l’aide du système auteur
IDIOMA-TIC”, in ALSIC (Apprentissage des Langues et Systèmes d’Information et de
Communication), 8. http://alsic.u-strasbg.fr/v08/desmet/alsic_v08_12-poi4.htm
CORPUSCALL: CHALLENGES AND OPPORTUNITIES 11
Sylviane Granger
Université catholique de Louvain
1. Introduction
Learner corpus research is a fairly young but highly dynamic research field that
emerged in the late 1980s. It focuses on the collection, annotation and computer-aided
analysis of vast electronic collections of authentic written and spoken data produced
by foreign language learners. The Centre for English Corpus Linguistics of the
University of Louvain (UCL) has played a key role in shaping the field and
demonstrating its tremendous pedagogical potential. In my presentation I will briefly
describe the work carried out at Louvain and sketch the numerous possibilities it offers
for Technology-Enhanced Language Learning.
4. References
BELZ J.A. and VYATKINA, N. (2005), “Learner Corpus Research and the Development of L2
Pragmatic Competence in Networked Intercultural Language Study: The Case of German
Modal Particles”, in Canadian Modern Language Review 62.1: 17-48.
GRANGER S. (ed.) (1998), Learner English on Computer, Addison Wesley Longman, London
and New York.
THE CONTRIBUTION OF LEARNER CORPUS RESEARCH TO TELL 15
Thomas Hansen
University of Southern Denmark
1. Introduction
The field of Computer Assisted Pronunciation Training (CAPT) has seen an explosion
in the use of Automatic Speech Recognition (ASR) technology within the past two
decades. Contemporary applications come equipped with commercial battle cries of
success that leaves one wondering why the use of such applications is not more
widespread than it is and also why second language acquisition (SLA) so often still
fails.
The main question in this connection is whether the feedback strategies which are
presently employed work or how they should be structured to maximize learner
benefit.
2. Outline of presentation/demo
Contemporary CAPT applications employ a variety of feedback methods, of which the
pedagogical value will be discussed. A potential strategy, or roadmap, for improving
the effectiveness of ASR in CAPT applications will be outlined for discussion.
4. References
HANSEN Th. (2006), “The Four K’s of feedback?” In Proceedings of the 4th International
Conference on Multimedia and Information and Communication Technologies in
Education (m-ICTE2006), Seville: 342-346.
BERNSEN N.O., HANSEN Th., KIILERICH S., MADSEN T. (2006), “Field Evaluation of a Single-
Word Pronunciation Training System”, in Proceedings of The Fifth International
Conference on Language Resources and Evaluation, LREC 2006, Genova, May, 2006:
2068-2073.
HINCKS R. (2005), Computer Support for Learners of Spoken English, Doctoral dissertation,
School of Computer Science and Communication.
HINCKS, R. (2003), “Speech technologies for pronunciation feedback and evaluation”, in
ReCall: 3-20.
NERI A., CUCCHIARINI C. and STRIK H. (2006), “ASR corrective feedback on pronunciation:
Does it really work?”, in Proceedings of ICSLP2006, Pittsburgh: 1982-1985.
Language technology projects at IDM
Holger Hvelplund
IDM, Paris
1. Introduction
Based on experience with production of electronic versions of monolingual and
bilingual dictionaries the emphasis in the presentation will be on how content for cross
media products can be produced efficiently, how and why different types of content
can/should be integrated; how content can be accessed and adapted in different ways
depending on the user, the medium, and the context the content is used in.
4. References
Recent productions from IDM:
• DPS – Dictionary Production System.
• Production of dictionaries with XDCC for publishers like
- Oxford University Press (including CD-ROM version of Oxford Advanced Learner’s
Dictionary).
- Pearson Education (including CD-ROM and online version of Longman Dictionary of
Contemporary English).
- Macmillan Dictionaries (including CD-ROM version of Macmillan English Dictionary).
- Cambridge University Press (including CD-ROM and online version of Cambridge
Advanced Learner’s Dictinary and CD-ROM version of Cambridge Grammar of
English).
- + several others.
Using corpora in language learning:
the Sketch Engine
Adam Kilgarriff
Lexical Computing Ltd
1. Introduction
I am writing an essay about my career plans, and I want to talk about goals. How does
the word work? What sorts of sentences might I construct around it, with what
collocates?
The current range of EFL dictionaries aim to help, and are well-designed, sophisticated
tools which specify grammatical patterns and collocates, and show the user a range of
example sentences. Often that will be enough. But they are limited to a couple of
column inches for a word like goal (in which they must cover all of its meanings) and
sometimes they just do not cover the case the student is interested in. When that
happens, where should they go next?
It is tempting to say that they should go and look in the corpus: after all, that is where
the people who wrote the dictionary went. The idea has been discussed at length in the
“Teaching and Language Corpora” community. The problem is that reading
concordance lines is a skill requiring advanced language competence, and is simply
offputting to most learners. The issue may be presented as follows: the dictionary is a
highly condensed short summary of the word’s behaviour. The corpus is the raw data
for such a summary, not at all condensed or summarized. The user would like a point
in between: not as short and minimal as the dictionary, but with a level of abstraction
and generalization.
Using techniques from computational linguistics, we have applied just this logic to
produce ‘word sketches’ – one-page accounts of the grammatical and collocational
behavior of word, as in the figure below.
and/or 1112 0.8 object_of 3430 3.1 subject_of 557 1.0 a_modifier 2546 1.8
objective 57 32.86 score 797 75.31 come 78 28.4 ultimate 83 42.22
try 30 32.67 achieve 363 48.14 give 34 14.57 away 25 32.56
goal 32 23.39 concede 126 47.79 win 13 14.32 winning 31 32.56
penalty 20 22.75 disallow 26 34.87 help 10 10.69 compact 34 31.79
target 22 20.1 pursue 75 33.13 stated 17 27.88
value 33 19.36 attain 34 29.34 adj_subject_of 149 1.4 late 53 27.33
conversion 12 18.92 net 18 26.7 important 10 15.32 dropped 11 26.98
aim 15 17.6 kick 36 26.2 organisational 22 26.83
mission 11 16.29 grab 30 24.43 long-term 34 25.7
priority 10 14.13 reach 78 23.81 common 56 24.62
strategy 11 12.28 set 97 23.53 headed 11 24.48
point 19 12.21 notch 10 22.81 organizational 18 24.45
USING CORPORA IN LANGUAGE LEARNING: THE SKETCH ENGINE 23
Goals occur, of course, in sport as well as life. The word sketch highlights the
ambiguity. Scanning the ‘object-of’ list, if we score, concede, disallow, net or kick
goals, we are talking sport; if we achieve, pursue, attain or reach them, life. England
football fans will be glad to see England standing alone in the ‘possessor’ relation to
goals!
Word sketches can be explored at http://www.sketchengine.co.uk where papers and
bibliographical references are also available.
4. Reference
Kilgarriff A., Rychly P., Smrz P. and Tugwell D. (2004), “The Sketch Engine”, in
G. Williams and S. Vessier (eds), Proceedings of the Eleventh EURALEX International
Congress, EURALEX 2004, Lorient, France, July 6-10, 2004. Lorient: Faculté des Lettres
et des Sciences Humaines, Université de Bretagne Sud (Proc EURALEX 2004), Lorient.
A plurilingual ICALL System for Romance
languages
Thomas Koller
University of Nottingham
1. Introduction
The (completed Ph.D.) research described in this abstract deals with the design,
development, implementation and evaluation of an interactive plurilingual ICALL
(Intelligent Computer-Assisted Language Learning) software system (ESPRIT) for
contrastive learning of French, Spanish and Italian. ESPRIT targets learners who are
already at an advanced level in at least one of the Romance languages involved. These
learners are expected to be familiar with general lexical and grammatical properties of
this language. Equivalent properties of the other languages are taught through
comparison.
The addressed research questions build upon the general research findings in
plurilingual teaching and learning of Romance languages, CALL (Computer-Assisted
Language Learning) and ICALL, and the use of animation in language teaching.
Formative and summative evaluation processes provided learner assessment data of
different components of ESPRIT.
Plurilingual means that grammatical and lexical properties of the languages involved
are tightly linked to each other, showing a high degree of similarity in form and
function. Plurilingual teaching and learning of Romance languages exploits the
similarities between these languages to teach them contrastively and to raise the
language awareness of the learner.
The ESPRIT toolset comprises dictionary tools, a concordancer, an input analysis and
feedback module, custom-made animated grammar presentations and an authoring tool
for animated text. ESPRIT represents a fully functional web-based language learning
platform which is designed for autonomous learning. ESPRIT uses a TV metaphor to
present language learning materials to the learner. The contents can easily be expanded
at any time.
In ESPRIT, learners are free to explore the activities offered and to choose the
activities which are of most interest to them. Guided tours, however, provide
information and help about which activities form a logical unit, and can be used to
suggest in which sequence to work on materials.
Tools and language data of ESPRIT have already been reused in current projects. Due
to their modular character, the tools and language data can easily be integrated in any
other project, in which they can be applied to other languages or even language
families. Slavic languages, for example, also share a high number of grammatical and
lexical properties.
Several ESPRIT tools can also be adapted and provided as Firefox browser plug-ins.
As a Firefox extension, an ESPRIT tool would be instantly accessible from any other
web page (for example for dictionary look-up). The dictionary tools, lexicon interface
components and the concordancer could be adapted as Firefox extensions to provide a
wide range of plug-in resources for plurilingual learning of Romance or other
languages.
(and teachers). The evaluation for ESPRIT also showed that adult learners varied
considerably in the number and type of languages learned already and the degree of
fluency therein. Therefore the development of materials for plurilingual teaching and
learning posed a number of issues and challenges which differ from second language
acquisition and the creation of monolingual language learning materials.
Foreign language teaching in secondary schools and at universities has been largely
unaffected by plurilingual research. Language students at both levels only occasionally
get the opportunity to learn similar languages simultaneously in a plurilingual setting.
As a consequence, it is challenging to identify target learners and to conduct standard
institutionalised testing and evaluation of developed plurilingual materials.
The development of plurilingual materials in general has in many cases not been
directly connected to research in third language acquisition. Additionally, the majority
of existing plurilingual materials tends to be rather descriptive than didactic.
Therefore, in my opinion, it would be beneficial for future research in plurilingual
teaching and learning to be more tightly linked to research findings of third language
acquisition.
4. References
BLANCHE-BENVENISTE C. (ed.) (1997), EuRom 4: Metodo de ensino simultâneo das línguas
românicas – Metodo para la enseñanza simultánea de las lenguas románicas – Metodo di
insegnamento simultaneo delle lingue romanze – Méthode d’enseignement simultané des
langues romanes, Florence, La Nuova Italia Editrice.
DEGACHE Christian (ed.) (2003), Intercompréhension en langues romanes. Du développement
des compétences de compréhension aux interactions plurilingues, de Galatea à Galanet,
volume 28. Grenoble, LIDILEM, Université Stendhal Grenoble 3.
KOLLER Th. (2007), Design, Development, Implementation and Evaluation of a Plurilingual
ICALL System for Romance Languages Aimed at Advanced Learners, PhD thesis, Dublin
City University, Dublin.
MCCANN W.J., KLEIN H.G. and STEGMANN T.D. (2002), EuroComRom – The Seven Sieves,
volume 5 of Editiones EuroCom. Aachen, Shaker.
SCHMIDELY J., ALVAR EZQUERRA M. and HERNÁNDEZ GONZÁLEZ C. (2001), De una a cuatro
lenguas. Intercomprensión románica: del español al portugués, al italiano y al francés.
Madrid, Arco Libros.
Like stars in the firmament:
language learning on mobile devices
Agnes Kukulska-Hulme
The Open University, UK
1. Introduction
Mobile learning can be studied as one instance of the ongoing adoption of innovative
technologies in the field of education, particularly with a view to understanding learner
experience and the potential of the new technologies to transform current practices.
Educational uses of mobile technologies offer a rich and complex field of investigation
which allows me personally to combine my expertise in e-learning pedagogy with my
background in linguistics, language learning, dictionary design and terminology
studies (areas I was actively involved in during the 1980s/90s). I’m particularly
interested in how mobile devices are changing foreign language learning and how new
forms and motivations for language learning might in turn have an effect on attitudes
and approaches to multilingual knowledge seeking, global communication and
knowledge representation on the web.
My research in mobile learning has been fairly wide-ranging, encompassing studies of
how learners read course materials on mobile devices (Waycott and Kukulska-Hulme
2003), surveys of learner-driven mobile innovation (Kukulska-Hulme and Pettit 2006;
Pettit and Kukulska-Hulme 2007), critical reviews of evaluation in mobile learning
(Traxler and Kukulska-Hulme 2006), reflections on what has been learnt with regard
to mobile device usability (Kukulska-Hulme 2007), and issues of collaboration and
privacy in contextual learning (Kukulska-Hulme et al. 2007). Together with my
colleague John Traxler I co-edited the first book on mobile learning to give a coherent
account of the field, incorporating a dozen international case studies (Kukulska-Hulme
and Traxler 2005). My externally funded projects have also led to the publication of a
guide to innovative e-learning with mobile technologies, distributed widely within UK
higher and further education. I have tried to make sense of how the field is evolving by
studying the possibilities of both formally-designed and user-driven mobile learning
(Kukulska-Hulme, Traxler and Pettit 2007). I have also attempted to imagine how
mobile language learning will develop (Kukulska-Hulme 2006; Kukulska-Hulme
forthcoming).
4. References
CHINNERY G.M. (2006), “Going to the MALL: Mobile Assisted Language Learning”, in
Language Learning & Technology, 10(1): 9-16. Available online: http://llt.msu.edu/
vol10num1/emerging/default.html
KUKULSKA-HULME A. and TRAXLER J. (eds) (2005), Mobile Learning: A Handbook for
Educators and Trainers, Routledge, London.
KUKULSKA-HULME A. (2006), “Mobile Language Learning Now and in the Future”, in
Svensson, P. (ed.), Från vision till praktik: Språkutbildning och Informationsteknik (From
vision to practice: language learning and IT), Swedish Net University (Nätuniversitetet):
295-310.
KUKULSKA-HULME A. (2007), “Mobile usability in educational contexts: what have we
learnt?”, Special issue of the The International Review of Research in Open and Distance
Learning, 8(2): 1-16. Available online: http://www.irrodl.org/index.php/irrodl
KUKULSKA-HULME A., Traxler J. and Pettit J. (2007), “Designed and User-generated Activity
in the Mobile Age”, in Journal of Learning Design, 2 (1): 52-65. Available online:
http://www.jld.qut.edu.au/
PETTIT J. and KUKULSKA-HULME A. (2007,)” Going with the Grain: Mobile Devices in
Practice”, in Australasian Journal of Educational Technology (AJET), 23 (1): 17-33.
Available online: http://www.ascilite.org.au/ajet/ajet23/ajet23.html
SHARPLES M. (2006), Big Issues in Mobile Learning. Kaleidoscope report, Available online:
http://mlearning.noe-kaleidoscope.org/repository/BigIssues.pdf
WAYCOTT J. and KUKULSKA-HULME A. (2003), “Students’ Experiences with PDAs for
Reading Course Materials”, in Personal and Ubiquitous Computing, 7 (1): 30-43.
Computer-mediated communication
for language learning
Marie-Noëlle LAMY
The Open University, UK
1. Introduction
The field of activity captured by the phrase “computer-mediated communication for
language learning” recently reached a critical mass, as regards the number of teaching
projects and of published papers that have been devoted to it. Chun (2007) suggests
that ‘communication’ as used in the phrase ‘computer-assisted communication’
(CMC) receives the most coverage of all topic categories in her overview of recent
research based on evidence from two major US journals on technology-mediated
language learning, and also comes top of a list of ‘hits’ tracked by one of the two
journals in her corpus.
Although caveats are needed due to exclusively US-oriented nature of these results,
similar trends are observed in other research cultures, signalling that CMC for
language learning (henceforth CMCL) as a field is no longer immature and can be held
up to scrutiny. In a volume to be published in November 2007, Lamy and Hampel
offer such a scrutiny. The current presentation gives a preview of their findings, and
outlines research directions suggested not only by the gaps identified in their study but
also by the emergence of new questions raised within neighbouring areas such as
multiliteracies research.
Finally, I show how the nature of challenges 2 and 3 point to the importance of
multiliteracies research for the future of CMCL.
4. References
CHUN D.M. (2007), “Come Ride the Wave: But Where is it Taking Us?”, in The CALICO
Journal 24(2): 239-252.
HASSAN X., HAUGER D., NYE G. and SMITH P. (2005), “The Use and Effectiveness of
Synchronous Audiographic Conferencing in Modern Language Teaching and Learning
(Online Language Tuition): A Systematic Review of Available Research”, in Research
Evidence in Education Library, London: EPPI-Centre, Social Science Research Unit,
Institute of Education, University of London.
HUBBARD P. (2005), “A Review of Characteristics in CALL Research”, in Computer Assisted
Language Learning 18(5): 351-368.
JUNG U. (2005), “CALL – Past, Present and Future: A Bibliometric Approach”, in ReCALL
17(1): 4-17.
KERN R. G. (2006), « La Communication médiatisée par ordinateur en langues: recherches et
applications récentes aux USA », in F. Mangenot and C. Dejean-Thircuir (eds), Les
Echanges en ligne dans l’apprentissage et la formation, le français dans le monde,
recherches et applications 40: 17-29.
LEVY M. (2000), “Scope, Goals and Methods in CALL Research: Questions of Coherence
and Autonomy”, in ReCALL 12(2): 170-195.
LIU M., MOORE Z., GRAHAM and LEE, S. (2002), “A Look at the Research on Computer-
based Technology Use in Second-language Learning: A Review of the Literature from
1990-2000”, in Journal of Research on Technology in Education 34(3): 250-273.
LAMY M.-N. and HAMPEL R. (2007 forthcoming)? Online Communication for Language
Teaching and Learning, Palgrave Basingstoke, Palgrave McMillan.
WARSCHAUER M. (1995), Virtual Connections: Online Activities and Projects for Networking
Language Learners, Honolulu: Second Language Teaching and Curriculum Centre,
University of Hawaii.
WARSCHAUER M. and KERN R. (eds) (2000), Network-based Language Teaching: Concepts
and Practice, Cambridge, Cambridge University Press.
ZHAO Y. (2003), “Recent Developments in Technology and Language Learning: A Literature
Review and Meta-analysis”, in The CALICO Journal 21(1): 7-27.
Writing English as a second language:
A proofreading tool
Claudia Leacock
The Butler Hill Group
1. Introduction
This work combines Natural Language Processing (NLP) and Machine Learning (ML)
techniques for detecting and correcting grammatical errors in the writing of English
Language Learners (ELL).
The Writing English as a Second Language tool being developed at Microsoft
Research (for which the presenter is a consultant) focuses on those areas of grammar
that pose special challenges for English language learners. This presentation focuses
on those problems that are hardest for ELLs – the use of determiners and of
prepositions – although the system also identifies gerund/infinitive confusion,
auxiliary verb presence and choice, over-regularized verb inflection (writed vs. wrote),
adjective/noun confusion (China book vs. Chinese book), word order errors, and mass
vs. count noun errors (much knowledge vs. many knowledges).
2. Outline of presentation
I will describe the system’s major components:
1. The Suggestion Provider consists of Individual error identification modules that
identify potential errors. These modules are flexible and can identify errors using
ML techniques rules, regular expressions or a combination of the three.
For the preposition and determiner correction modules, a classifier is used that is
trained on edited native English. For each potential insertion point of a determiner
or preposition in that training data, a vector of features is extracted from the
context.
2. The Language Model, which selects the most likely suggestion(s), is a 5-gram
model trained on the English gigaword corpus.
3. The Example Provider retrieves relevant example sentences from the web to help
the user select the most appropriate rewrite. This innovative component generates
4. References
BURSTEIN J. and LEACOCK C. (eds) (2006), Natural Language Engineering: Special Issue on
Using NLP in Educational Applications, 12:2.
BURSTEIN J., CHODOROW M. and LEACOCK C. (2004), “Automated Essay Evaluation: The
Criterion Online Writing Service”, in AI Magazine 25:3: 27-36.
BURSTEIN J., CHODOROW M. and LEACOCK C. (2003), “Criterion Online Essay Evaluation:
An Application for Automated Evaluation of Student Essays”, in Proceedings of the
Fifteenth Annual Conference on Innovative Applications of Artificial Intelligence
Conference (IAAI-03), Acapulco.
CHODOROW M., TETREAULT J.R. and HAN N.-R. (2007), “Detection of Grammatical Errors
Involving Prepositions”, in Proceedings of the 4th ACL-SIGSEM Workshp on Prepositions:
25-30.
CHODOROW M. and LEACOCK C. (2000), “An Unsupervised Method for Detecting
Grammatical Errors”, in Proceedings of the 1st Annual Meeting of the North American
Chapter of the Association for Computational Linguistics, Seattle, WA.
HAN N.-R., CHODOROW M. and LEACOCK C. (2006). “Detecting Errors in English Article
Usage by Non-Native Speakers », in Natural Language Engineering 12:2.
HAN N.-R., CHODOROW M. and LEACOCK C. (2004). “Detecting Errors in English Article
Usage with a Maximum Entropy Classifier Trained on a Large, Diverse Corpus”, in
Proceedings of the 4th International Conference on Language Resources and Evaluation.
Lisbon.
WRITING ENGLISH AS A SECOND LANGUAGE: A PROOFREADING TOOL 39
IZUMI E., UCHIMOTO K., SAIGA T., SUPNITHI T. and ISAHARA H. (2003), “Automatic Error
Detection in the Japanese Learners’ English spoken Data”, in The Companion Volume to
the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics,
July 2003: 145-148.
LEACOCK C., and CHODOROW M. (2003), “Automated Grammatical Error Detection”, in M.D.
Shermis and J. Burstein (eds), Automated Essay Scoring: A Cross-Disciplinary
Perspective. Hillsdale, NJ, Lawrence Erlbaum : 195-208.
LEACOCK C., and CHODOROW M. (2001), “A Corpus-Based Approach to Diagnosing
Grammatical Errors”, in Corpus Linguistics and Language Teaching Conference, Boston,
MA.
LIU T., ZHOU M., GAO J., XUN E. and HYAN C. (2000), “PENS: A Machine-Aided English
Writing System for Chinese users”, in Proceedings of ACL 2000: 529-536.
SHERMIS M.D., BURSTEIN J. and LEACOCK C. (2006), “Applications of Computers in
Assessment and Analysis of Writing”, in C.A. McArthur, S. Graham and J. Fitzgerald
(eds), Handbook of Writing Research, New York, Guilford Press.
TURNER J. and CHARNIAK E. (2007), “Language Modeling for Determiner Selection”, in
Human Language Technologies 2007: The Conference of the NAACL; Companion
Volume, Short Papers: 177-180.
Second language acquisition theory and TELL
Fanny Meunier
Université Catholique de Louvain
1. Introduction
As a researcher in Second Language Acquisition (and more specifically instructed
second language acquisition) and a teacher of English as a foreign language, my aim in
this presentation is twofold: first, stress the importance of the two ‘L’s in technology-
enhanced language learning; and secondly, address the convergences and divergences
that exist between big issues in second language acquisition and TELL.
learning’ as one of its future challenges. Learners in TELL environments should have
access to observational, descriptive or explanatory options, together with opportunities
for immediate feedback. Third, learnability issues (defined here as the input/output
efficiency of some method or approach) should become more central in order to
validate the efficiency of TELL.
4. References
CHAPELLE C. (1998), “Multimedia CALL: Lessons to be Learned from Research on
Instructed SLA”, in Language Learning and Technology, 2(1): 22-34.
CHAPELLE C. (2003), English Language Learning and Technology, Benjamins, Amsterdam
and Philadelphia.
KASPER L. (2000), “New technologies, new literacies: focus discipline research and ESL
learning communities”, in Language Learning & Technology, vol. 4, n°. 2: 105-128.
LIGHTBOWN P. and SPADA N. (2003) Factors Affecting Second Language Learning. How
Languages Are Learned, Revised edition, Oxford University Press, Oxford.
MEUNIER, F. (forthcoming) “Corpora, cognition and pedagogical grammars: An account of
convergences and divergences”, in S. De Knop and T. De Rycker (eds), Cognitive
Approaches to Pedagogical Grammar, Mouton de Gruyter, Berlin.
WIBLE D. (2005), Language Learning and Language Technology: Toward Foundations for
Interdisciplinary Collaboration, Crane, Taipei.
WIBLE D. (in press), “Multiword Expressions and the Digital Turn”, in F. Meunier and
S. Granger (eds), Phraseology in Foreign Language Learning and Teaching, John
Benjamins, Amsterdam.
Detecting syntactic interference
John Nerbonne
University of Groningen
Introduction
This presentation involves joint work with Wybo Wiersma (Groningen) and Timo
Lauttamus (Oulu). It applies techniques from quantitative computational linguistics to
the problem of detecting frequent effects of first language interference in second
language learning. We focus on production interference in syntax.
Second language learners typically differ syntactically from native speakers not only
in making outright errors, but also in overusing and under-using some constructions –
all of which we subsume under INTERFERENCE. In approaching the phenomenon of
interference computationally, we were motivated both to attempt to identify
interference effects more systematically, and also to attempt to quantify a level of
aggregate interference, a goal Weinreich (1953: 63) found worthwhile, but which he
speculated to be unreachable:
No easy way of measuring or characterizing the total impact of one language on another in
the speech of bilinguals has been, or probably can be devised. The only possible
procedure is to describe the various forms of interference and to tabulate their
frequency.
2. Outline
Following a suggestion by Aarts and Granger (1998), we model the syntax of the
second-language learners of English via the parts of speech (POS) they use and the
sequences in which the POS appear. The idea is to compare the POS sequences used
by second-language learners to those used by natives. Concretely, we examine the
distribution of triplets of POS in a large corpus of English as used by adult Finnish
immigrants to Australia, and compare this distribution to that of their children, who
immigrated as children and speak English at a near-native level. To assay the “total
impact” as Weinreich wished, we examine the differences between the two
distributions via a permutation test, which is implemented in a Monte Carlo fashion.
To identify systematically the areas of difference, we examine the frequent POS triples
that contribute most to the overall differences in the two distributions.
4. References
AARTS J. and GRANGER S. (1998), “Tag sequences in learner corpora: A Key to Interlanguage
Grammar and Discourse”, in S. Granger (ed), Learner English on Computer, London,
Longman: 132-141.
MOBERG J., GOOSKENS C., NERBONNE J. and VAILLETTE N. (ca. 2007), “Conditional Entropy
Measures Intelligibility among Related Languages”, accepted to appear in F. Van Eynde,
P. Dirix, I. Schuurman and V. Vandeghinste (eds.) Proceedings of Computational
Linguistics in the Netherlands 2006, Amsterdam, Rodopi.
NERBONNE J. and WIERSMA W. (2006), “A Measure of Aggregate Syntactic Distance”, in
J. Nerbonne and E. Hinrichs (eds), Linguistic Distances, Workshop at the joint conference
of International Committee on Computational Linguistics and the Association for
Computational Linguistics, Sydney, July, 2006: 82-90.
SANDERS N. (2007) “Measuring Syntactic Difference in British English”, in Proceedings of
the ACL 2007 Student Research Workshop, Prague: ACL: 1-6.
WEINREICH, Uriel ([1953], 1968, 1974), Languages in Contact. The Hague, Mouton.
Planning a smart phone system to support
self-directed L2 vocabulary learning
Richard Pemberton
University of Nottingham
1. Introduction
There are two major problems with vocabulary learning that almost every language
learner will be familiar with:
• learning enough frequent vocabulary to be able to read and listen fluently;
• retaining the vocabulary that we have learned.
The first problem involves a considerable amount of time. To take English for
example, in order to be able to understand unsimplified texts, you need to know some
3,000–4,000 of the most common English word families (Nation & Waring 1997;
Nation 2001). The figure is likely to be upwards of 5,000 word families if fluent
reading for pleasure is the aim.
Equally, if vocabulary is to be retained, the learner needs to spend a lot of time in
conscious processing or repeating of the target items (explicit learning) and/or in
extensive language use (implicit learning). These problems of time are of course even
worse for the busy adult learner living outside the target country.
One type of technology which has the potential to save time on the go and to support
both implicit and explicit learning is the mobile phone. However, recent mobile phone
systems supporting vocabulary learning have tended to use one medium only – e.g.
text messages (Pincas 2004; Song & Fox 2005), e-mail (Thornton & Houser 2005) or
images (Joseph et al 2005) – and to have involved designed rather than learner-located,
learner-generated and learner-shared materials and activities.
2. Outline of presentation
In this presentation I will first propose a smart phone system that could use the
phone’s full capabilities (see e.g. Kukulska-Hulme & Shield 2007: 20) to support self-
directed vocabulary learning.
I will then exemplify and discuss the desirability of various potential features (both
‘existing’ and ‘to-be-created’), including:
4. References
ELLIS N.C. (1995), “The psychology of foreign language vocabulary acquisition: implications
for CALL”, in Computer Assisted Language Learning 8(2-3): 103-128.
FALLAHKHAIR S., PEMBERTON L. and GRIFFITHS R. (2007), “Development of a cross-platform
ubiquitous language learning service via mobile phone and interactive television”, in
Journal of Computer Assisted Learning 23: 312-325.
JOSEPH S., BINSTED K. and SUTHERS D. (2005), “PhotoStudy: vocabulary learning and
collaboration on fixed and mobile devises”, in H. Ogata, M. Sharples, G. Kinshuk and Y.
Yano (eds), Proceedings of the third IEEE International Workshop on Wireless and
Mobile Technologies in Education 2005, Los Alamitos, CA: IEEE: 206-210.
KUKULSKA-HULME, A. and SHIELD L. (2007), “An overview of mobile assisted language
learning: can mobile devices support collaborative practice in speaking and listening?”, in
EuroCALL 2007. http://vsportal2007.googlepages.com/collaborativepractice [Accessed 15
September 2007]
KUKULSKA-HULME A., TRAXLER J. and PETTIT J. (2007), “Designed and user-generated
activity in the mobile age”, in Journal of Learning Design 2(1): 52-65.
http://www.jld.qut.edu.au [Accessed 15 September 2007.]
NATION I.S.P. (2001), Learning Vocabulary in Another Language, Cambridge, Cambridge
University Press.
NATION P. and WARING R. (1997), “Vocabulary size, text coverage and word lists », in
N. Schmitt and M. McCarthy (eds), Vocabulary: description, acquisition and pedagogy,
Cambridge, Cambridge University Press: 6-19.
PINCAS A. (2004), “Using mobile phone support for use of Greek during the Olympic Games
2004”, in International Journal of Instructional Technology & Distance Learning
http://www.itdl.org/Journal/Jun_04/article01.htm [Accessed 19 September 2007.]
SONG Y. and FOX R. (2005), “Integrating web-based ESL vocabulary learning for working
adult learners”, in H.Ogata, M. Sharples, G. Kinshuk and Y. Yano (eds), Proceedings of
the third IEEE International Workshop on Wireless and Mobile Technologies in Education
2005, Los Alamitos, CA, IEEE: 154-158.
THORNTON P. and HOUSER C. (2005), “Using mobile phones in English education in Japan”,
in Journal of Computer Assisted Learning 21: 217-228.
SMART PHONE SYSTEM TO SUPPORT SELF-DIRECTED L2 VOCABULARY LEARNING 47
Michael Rundell
Lexicography MasterClass Ltd and Macmillan Dictionaries
1. Introduction
Donald Rumsfeld’s famous reflections on “what we know we don’t know and what we
don’t know we don’t know” apply to most forms of futurology, and certainly to any
attempt to predict what might happen in the world of reference materials. This talk is
at the interface of language-learning, lexicography, NLP, and delivery media, and will
outline some possible future directions for dictionaries aimed at learners of English.
Though content and accessibility of these dictionaries has steadily improved, the basic
model hasn’t fundamentally altered. But like any other kind of reference resource, the
MLD can’t fail to be affected by the biggest change of all – the arrival of the Web.
I will look first at signs that the old model is beginning to break down (for example,
the fact that electronic versions of MLDs have begun to include content not present in
the print editions); then consider current challenges and opportunities; and finally
suggest what the MLD might look like ten years from now.
4. References
DE SCHRYVER G.-M. (2003), “Lexicographers’ dreams in the electronic dictionary age”, in
International Journal of Lexicography, 16.2: 143-199.
GILQUIN G., GRANGER S. and PAQUOT M. (forthcoming), “Learner corpora: the missing link
in EAP pedagogy », in Journal of English for Academic Purposes.
THE DICTIONARY OF THE FUTURE 51
Emma Shercliff
MacMillan English Campus
1. Introduction
Macmillan English Campus is an online practice environment designed for the
learning and teaching of English as a Foreign Language. It was developed in
conjunction with one of the world’s leading language schools, Cultura Inglesa, Sao
Paolo, and is today being used by over 90,000 students worldwide.
Macmillan English Campus consists of two components:
1. A flexible database of over 3,000 highly interactive language activities, developed
by Macmillan's leading ELT authors. These activities include interactive language
exercises, listening tasks, pronunciation exercises, vocabulary exercises, progress
tests, exam preparation exercises, language games, web projects and weekly news
items. All users also have access to an online version of the Macmillan English
Dictionary.
2. Sophisticated content management software, allowing institutions to manage their
users and chart our online resources to their own courses and course materials. The
Macmillan English Campus platform includes an electronic mark book and
personalisation tools for each user.
The concept behind the Macmillan English Campus is that language learning can be
greatly enhanced by an effective combination of face-to-face teaching and customized
online support materials. It is this blended learning solution that makes the Macmillan
English Campus unique. It ensures that our users continue to receive face-to-face
tuition and contact with their teachers whilst remaining free to study online within a
controlled learning environment.
This presentation will comprise a case study of the Macmillan English Campus, which
was launched in 2003. I will outline the concept behind the ‘blended learning’
pedagogy of the Macmillan English Campus and will then address the issues and
challenges we have faced over the past four years, with specific reference to the
experience of language teachers wishing to integrate technology-enhanced learning
into their teaching programmes for the first time. I will outline the enhancements we
have incorporated into the Macmillan English Campus learning platform as a result of
user feedback and outline future developments we have planned for 2008 and beyond.
In the light of our extensive experience developing online learning applications, I will
also highlight what we believe to be the limitations of technology in a language
learning context.
I will give specific examples of TEL methods and tools and demonstrate some of the
new functionality, such as teacher-to-student messaging, recently incorporated into the
Macmillan English Campus.
The Macmillan English Campus has been adopted by a number of teaching prestigious
institutions, schools and universities worldwide, including the International House
World Organisation, the British Council and the Bell Schools network. The
presentation will draw on Macmillan English Campus’s widespread experience in the
field and is intended to focus on practice rather than theory. Much of our publishing is
driven by user responses to our learning platform and we have therefore developed
sophisticated mechanisms for gathering and evaluating feedback from users across five
continents.
By sharing the experiences of Macmillan English Campus, I will offer a practical
insight into the challenges of developing materials for technology enhanced language
learning which I hope will stimulate comment and debate.
Commercial
• Investment: enormous cost of developing online platform
• Perception amongst certain customers that digital product should be free
• Initial assumption amongst certain customers that teachers could author their
own material at lower cost
MACMILLAN ENGLISH CAMPUS: A CASE STUDY 55
Serge Verlinde
Katholieke Universiteit Leuven
1. Introduction
This presentation deals with recent developments in web-based electronic learner’s
dictionaries and their use in CALL (computer assisted language learning) applications.
My research interests are the lexicon and its structure, corpus analysis and CALL. I am
coauthor of the Dictionnaire d’apprentissage du français des affaires (DAFA) and
have developed, together with Thierry Selva, the Base lexicale du français
(www.kuleuven.be/ilt/blf), a free accessible learning environment (online dictionary
and exercises) for French vocabulary.
to a series of characters contained in a cell of the database (e.g. a query concerning all
definitions encompassing the noun action or all verbs used with a prepositional group
introduced by the preposition à). In the BLF, we have tried to reach these goals within
the didactic perspective of teaching/learning French as a foreign language, as well as
exploit these resources for research purposes.
The corpus is used to provide both examples of the use of multiword units (word
combinations) and sentences for the exercises in the CALL application (ALFALEX)
by using NLP-tools.
ALFALEX offers about ten different types of exercises relating to 'the words’ most
important features:
- formal features (morphology, verb conjugation, derivation);
- intrinsic features (gender);
- combinatorial features (use of prepositions after verbs, nouns and adjectives,
multiword units;
- lexical relations (synonyms, schémas actanciels or words encountered in the same
communicative situation: e.g. how do we designate the act of killing (assassiner) a
person ? un assassinat ; what do we call the person who killed another person ? un
assassin ; and the person who was killed? la victime);
- translation (decoding: French > Dutch, encoding: Dutch > French).
The exercises listed above are semi-automatically generated through direct use of the
information in the lexical database and the corpus (for the contextual exercises).
Directional and constructive feedback is provided: by means of hyperlinks, the user
can access the lexicographical description, which is available for almost every item in
the exercises. Twice a year, ALFALEX also automatically generates a qualitative
report for every user of the environment.
Unfortunately, non-commercial dictionaries such as the DAFLES only cover a part of
the lexicon. Therefore, if a user submits a word which is not listed, he will be
redirected to other lexical resources available on the internet. He also has access to
other free web resources (e.g. corpora, semantic networks) for French.
units, …). Could NLP applications (e. g. the use of a parser), combined with a
dictionary/lexical database, be helpful?
-- Encoding is even more complicated. How can we develop a real writing assistant?
4. References
ABEL A. and WEBER V. (2000), ELDIT – A prototype of an innovative dictionary, in U. Heid
et al. (eds), Proceedings EURALEX, The Ninth EURALEX International Congress,
Stuttgart : 807-818.
ALDABE I., ARRIETA B., DÍAZ DE ILARAZZA A., MARITXALAR M., NIEBLA I., ORONOZ M. and
URIA L. (2006), “The use of NLP tools for Basque in a multiple user CALL environment
and its feedback”, in P. Mertens, C. Fairon, A. Dister and P. Watrin (eds), Verbum ex
machina. Actes de la 13e conférence sur le Traitement automatique des langues naturelles,
Louvain-la-Neuve : 815-824 (Cahiers du Cental 2).
ANTONIADIS G., ECHINARD S., KRAIF O., LEBARBÉ T. and PONTON C. (2005), « Modélisation
de l'intégration de ressources TAL pour l'apprentissage des langues : la plateforme
MIRTO”, in Alsic.org, vol. 8: 65-79.
ANTONIADIS G., ECHINARD S., KRAIF O., LEBARBÉ T., LOISEAU M. and PONTON C. (2004),
“CALL: from current problems to NLP solutions MIRTO: a user-oriented NLP based
teaching platform”, in Proceedings of EuroCALL Conference 2004, Vienna.
BLUMENTHAL P. (2006): Wortprofil im Französischen, Tübingen.
BLUMENTHAL P. and HAUSMANN F.J. (eds.) (2006), Collocations, corpus, dictionnaires, in
Langue française, 150.
DE SCHRYVER G.-M. (2003), Lexicographers’ Dreams in the Electronic-dictionary Age, in
International Journal of Lexicography, 16.2: 143-199.
GAUME B. (2004), “Ballades aléatoires dans les Petits Mondes Lexicaux”, in: I3, Information
Interaction Intelligence, 4.2 : 1-59 (w3.univ-tlse2.fr/erss/textes/pagespersos/gaume/
resources/I3.impression.5.pdf).
GROSSMANN F. and TUTIN A. (eds.) (2003), “Les collocations. Analyse et traitement”, in
Travaux et recherches en linguistique appliquée, série E, n° 1.
HERBST T. and POPP K. (1999), The Perfect Learners’ Dictionary (?),Tübingen.
RUNDELL M. (1998), “Recent trends in pedagogical lexicography”, in: International Journal
of Lexicography, 11.4: 315-342.
VERLINDE S., SELVA T. and BINON J. (2005), “Dictionnaires électroniques et environnement
d'apprentissage du lexique”, in Revue française de linguistique appliquée, X.2: 19-30.
VERLINDE S., SELVA T. and BINON J. (2006), “The Base lexicale du français: a
Multifunctional Online Database for Learners of French”, in Corino E., Marello C., Onesti
C. (eds.), Proceedings XII Euralex International Congress. Torino, Italia, September 6th-
9th 2006, Torino, vol. II: 471-481.
VERLINDE, S., BINON J., OSTYN S. and Bertels A. (to appear), “La Base lexicale du français
(BLF): un portail pour l’apprentissage du lexique français”, in Cahiers de Lexicologie.
Linguistic anomaly
Carl Vogel
Trinity College Dublin
1. Introduction
My work in computational linguistics is influenced by the course of my education: a
liberal arts undergraduate degree in computing, literature, philosophy and psychology;
an MSc by research in artificial intelligence focussed on inheritance reasoning for
constraint based syntax; a PhD in cognitive science on models of default reasoning and
their relation to human reasoning.
ambiguous language, yet the impressive facility humans have for reasoning with only
partial resolution of ambiguity. However, I have also considered “overspecification”
the process by which accepted linguistic expressions have their senses extended to
new meanings and the constraints that exist both theoretically, and in human behavior,
with respect to sense extension. This is intimately linked to metaphoricity.
I have also studied study of logics of human reasoning with defaults, chains of
statements that express regularities confronted with exceptions. Here there are
concerns with the formal properties of the logics themselves, and with the degree to
which they serve as adequate models of human reason with generalizations that have
exceptions.
Thus, my interests in linguistic anomaly span from orthographical well-formedness,
through appropriateness of lexical meaning, formal syntactic description and degrees
of grammaticality, to semantic well-formedness and sense extension for
representations, and reasoning with incomplete and inconsistent information.
Currently, my funded research is in techniques for text classification, looking
particularly at linguistic change over time, towards establishing milestones of normal
language development and decline, particularly addressing the Iris Murdoch corpus as
a source of data that may reveal features that correlate with progression of Alzheimer’s
disease.
The uniting theme in all of these sorts of linguistic anomaly that I study is the tension
between linguistic convention and linguistic creativity: ill-formedness versus
creativity.
David Wible
National Central University
1. Introduction
The task of designing systems and tools that support language learning on the Web is
changing due to evolving demands on such technologies from various sources. The
design of Learning Activity Management Systems (LAMS) dedicated to language
learning, for example, is facing a countervailing trend toward the use of all-purpose
platforms such as Blackboard and Moodle. Parallel trends toward the consolidation of
systems and content creation include standards specification movements such as
SCORM. This overall convergence of content standards and a few all-purpose
platforms does not represent an unqualified positive benefit for language pedagogy.
The advantages are limited by the unique nature of language learning among learning
domains and by the growing availability of digital language tools that ignore both
SCORM conformity and portability into larger platforms. Finally, the communicative
turn in language pedagogy highlights the need for individualized learning experiences
suited to each learner’s communicative needs and interests. This sort of
individualization is traditionally the forte of ITS (Intelligent Tutoring Systems), yet
few of these focus on language learning; those that do are stand-alone systems having
virtually no interoperability with other platforms; they treat narrowly-defined
dimensions of language learning; and they do not extend or scale up easily or at all.
In this talk, I describe some ongoing work by our team in Taiwan that addresses these
current challenges in the design of Web-supported language learning technologies.
4. References
IKEDA M., ASHLEY D.K. and CHAN T.-W. (eds), Intelligent Tutoring Systems: Eighth
International Conference, ITS 2006, Lecture Notes in Computer Science 4053, Springer,
Berlin.
WIBLE D. (in press), “Multiword Expressions and the Digital Turn”, in F. Meunier and
S. Granger (eds), Phraseology in Language Learning and Teaching, John Benjamins,
Amsterdam.
WIBLE, D. (2005), Language Learning and Language Technology: Toward Foundations for
Interdisciplinary Collaboration, Crane, Taipei.
WIBLE D., KUO C.-H., CHEN M.C., TSAO N.-L. and HUNG T.-F. (2006), “A Ubiquitous Agent
for Unrestricted Vocabulary Learning in Noisy Digital Environments”, in Lecture Notes on
Computer Science, 4053: 503-512.
WIBLE, D., KUO C.-H., TSAO N.-L. (2004), “Contextualizing Language Learning in the
Digital Wild: Tools and a Framework”, in Proceedings of IEEE International Conference
on Advanced Learning Technologies, Joensuu.
EVOLVING APPROACHES TO WEB-SUPPORTED LANGUAGE LEARNING 67
WIBLE D., KUO C.-H., TSAO N.-L., HSIU-LING LIN A.L. (2003), “Bootstrapping in a
Language Learning Environment”, in Journal of Computer-Assisted Learning, vol 19 #1:
90-102.
The SACODEYL project – Corpus exploitation
for language learning purposes
Johannes Widmann
University of Tübingen
1. Introduction
SACODEYL is situated in the field of computer-assisted language learning with the
help of recent developments in corpus research.
SACODEYL is a project within the SOCRATES-MINERVA initiative whose main
aim is to develop an ICT-based system for the assisted compilation and open
distribution of European teen talk. This scheme encompasses two groups: group 1,
youngsters between 13 and 15 and, group 2, those between 16 and 18.
The main aim of the project is for young Europeans to use corpora for the learning of
languages. The pedagogical rationale of the project rests upon notions of autonomous
learning and meaningful interaction. SACODEYL users will come into close contact
with the real voices of peer young Europeans from other countries, their feelings,
opinions and speech, without the mediation, otherwise natural and necessary, of third
parties such as publishing houses. These peer group voices will make it easier for
young people to identify with the language and the contents being taught.
The SACODEYL project aims inter alia at developing a pedagogically-driven search
tool for querying corpora in such a way that allows language teachers to access
corpora from their teaching perspectives and experiences. The basis of the corpus
annotation will be the SACODEYL annotation and corpus enrichment scheme that
will be used with the raw transcripts.
4. References
BRAUN S. (2005), “From pedagogically relevant corpora to authentic language learning
contents”, in ReCALL 17:1: 47-64.
THE SACODEYL PROJECT 71