Examining the Validity of Cross-Lingual Word
Sense Disambiguation
Els Lefever1,2 and Veronique Hoste1,2
1
2
LT3, University College Ghent,
Groot-Brittanniëlaan 45, Ghent, Belgium
Dpt. of Applied Mathematics and Computer Science, Ghent University,
Krijgslaan 281(S9), Ghent, Belgium
{Els.Lefever,Veronique.Hoste}@hogent.be
Abstract. This paper describes a set of experiments in which the viability of a classification-based Word Sense Disambiguation system that
uses evidence from multiple languages is investigated. Instead of using
a predefined monolingual sense-inventory such as WordNet, we use a
language-independent framework and start from a manually constructed
gold standard in which the word senses are made up by the translations
that result from word alignments on a parallel corpus. To train and test
the classifier, we used English as an input language and we incorporated
the translations of our target words in five languages (viz. Spanish, Italian, French, Dutch and German) as features in the feature vectors. Our
results show that the multilingual approach outperforms the classification experiments where no additional evidence from other languages is
used. These results confirm our initial hypothesis that each language
adds evidence to further refine the senses of a given word. This allows us
to develop a proof of concept for a multilingual approach to Word Sense
Disambiguation.
Key words: WSD, Word Sense Disambiguation, multilingual, crosslingual
1
Introduction
Word Sense Disambiguation (WSD) is the NLP task that consists in selecting the
correct sense of a polysemous word in a given context. For a detailed overview
of the main WSD approaches we refer to Agirre and Edmonds [1] and Navigli[2].
State-of-the-art WSD systems are mainly supervised systems, trained on large
sense-tagged corpora, where human annotators have labeled each instance of the
target word with a label from a predefined sense inventory such as WordNet [3].
Two important problems arise with this approach. Firstly, large sense-tagged
corpora and sense inventories are very time-consuming and expensive to build.
As a result they are extremely scarce for languages other than English. In addition, there is a growing conviction within the WSD community that WSD should
2
Els Lefever and Veronique Hoste
not be tested as a stand-alone NLP task, but should be integrated in real applications such as Machine Translation and cross-lingual information retrieval [4].
In this paper, we describe the construction of a multilingual WSD system
that takes an English ambiguous word and its context as input, and outputs
correct translations for this ambiguous word in a given focus language. For our
experiments we trained a classifier for five focus languages (viz. Italian, German,
Dutch, Spanish and French). In addition to a set of local context features, we
included the translations in the four other languages (depending on the focus
language of the classifier) in the feature vector. All translations are retrieved
from the parallel corpus Europarl [5].
Using a parallel corpus, such as for example Europarl, instead of human defined sense-labels offers some advantages: (1) for most languages we do not have
large sense-annotated corpora or sense inventories, (2) using corpus translations
should make it easier to integrate the WSD module into real multilingual applications and (3) this approach implicitly deals with the granularity problem, as
fine sense distinctions (that are often listed in electronic sense inventories) are
only relevant in case they get lexicalized in the target translations.
The idea to use translations from parallel corpora to distinguish between word
senses is based on the hypothesis that different meanings of a polysemous word
are lexicalized across languages [6, 7]. Many WSD studies have already shown
the validity of this cross-lingual evidence idea. Most of these studies have focused
on bilingual WSD (E.g.[8–10]) or on the combination of existing WordNets with
multilingual evidence (E.g. [11]).
In order to use the parallel texts to train a WSD classifier, most systems lump
different senses of the ambiguous target word together if they are translated in
the same way (E.g. Chan and Ng [12]), which reflects the problem of assigning
unique translations to each sense of a noun. If we take for instance the English
word mouse, this is translated in French as souris, both for the animal and the
computer sense of the word. In order to construct and refine a multilingual sense
inventory reflecting the different senses of a given word, more translations are
required to increase the chance that the different word senses are lexicalized
differently across the different languages. To our knowledge, however, it has
not been shown experimentally if and how much multilingual evidence from
a parallel corpus indeed helps to perform classification-based WSD for a given
target language. In the experiments reported in this paper, we included evidence
from up to 4 languages into the feature vectors of a multilingual lexical sample
WSD classifier.
The remainder of this paper is organized as follows. Section 2 describes the
data set we used for the experiments. Section 3 presents the construction of the
feature vectors, and gives more insights in the classifier that was built. Section 4
gives an overview of the experiments and we finally draw conclusions and present
some future research in Section 5.
Examining the Validity of Cross-Lingual Word Sense Disambiguation
2
3
Data
In order to construct our sense inventory, we extracted the translations of our
ambiguous target words from the parallel corpus Europarl [5]. We selected 6
languages from the 11 European languages represented in the corpus, viz. English (our target language), Dutch, French, German, Italian and Spanish. As our
approach is both language- and corpus-independent, and all steps can be run in
an automatic way, we can easily add other languages and extend or replace the
corpus that was used.
All Europarl data were already sentence-aligned using a tool based on the
Gale and Church algorithm [13], which was part of the corpus. We only considered the intersected 1-1 sentence alignments between English and the five
other languages (see also [11] for a similar strategy). The experiments were performed on a lexical sample of five ambiguous words, being bank, plant, movement,
occupation and passage, which were collected in the framework of the SemEval2 Cross-Lingual Word Sense Disambiguation task. The six-language sentence
aligned corpus, as well as the test set and corresponding gold standard, can be
downloaded from the task website3 .
After the selection of all English sentences containing these target nouns and
the aligned sentences in the five target languages, we used GIZA++ [14] word
alignment on the selected sentences to retrieve the set of possible translations for
our ambiguous target words. All alignments were manually checked afterwards.
In cases where one single target word (E.g. occupation) led to a multiword
translation (e.g actividad profesional in Spanish) or to a compound (e.g beroepsbezigheden in Dutch and Berufstätigkeit in German), we kept the multi-part
translation as a valid translation suggestion.
All sentences containing the target words were preprocessed by means of a
memory-based shallow parser (MBSP) [15], that performs tokenization, Partof-Speech tagging and text chunking. On the basis of these preprocessed data,
we built a feature vector which contains information related to the target word
itself as well as local patterns around the target word. Table 1 shows the size of
the instance base for each of the ambiguous words, whereas Figure 1 lists the
number of classes per ambiguous target word in the five focus languages.
Table 1. Size of the instance base per ambiguous target word
Number of instances
bank
4029
movement
4222
occupation
634
passage
238
plant
1631
3
http://lt3.hogent.be/semeval/
4
Els Lefever and Veronique Hoste
Figure 1 also suggests that due to the high number of unique translations in
Dutch and German, mainly due to their compounding strategies, the classification task will be especially challenging for these two languages.
Fig. 1. Number of unique translations per language and per ambiguous word.
As Figure 1 shows, the polysemy of the target words is considerably high in
all five target languages. Even for the romance languages, where the number of
compound translations is rather low, the classifier has to choose from a substantial number of possible classes. Example 1 illustrates this by listing the French
translations that were retrieved for the English word plant (NULL refers to a
null link from the word alignment):
(1)
3
centrale, installation, plante, usine, végétal, NULL, phytosanitaire, entreprise,
incinérateur, station, pesticide, site, flore, unité, atelier, plant, phytopharmaceutique, établissement, culture, réacteur, protéagineux, centre, implantation, oléoprotéagineux, équipement, horticulture, phytogénétique, exploitation,
végétation, outil, plantation, sucrerie, société, fabrique, four, immobilisation,
céréale, espèce, séchoir, production, claque, arsenal, ceps, poêle, récolte, plateforme, artémisinine, fabrication, phytogénéticien, oléagineux, glacière, espèce
végétale, chou, tranche, Plante, installation incinérateur.
Experimental set-up
We consider the WSD task as a classification task: given a feature vector containing the ambiguous word and the context as features, a classifier predicts the
correct sense (or translation in our case) for this specific instance.
Examining the Validity of Cross-Lingual Word Sense Disambiguation
3.1
5
Feature vectors
For our initial feature set we started off with the traditional features that have
shown to be useful for WSD [1]:
– features related to the target word itself being the word form of the target
word, the lemma, Part-of-Speech and chunk information
– local context features related to a window of three words preceding and
following the target word containing for each of these words their full form,
lemma, Part-of-Speech and syntactic dependencies.
In addition to these well known WSD features, we integrated the translations
of the target word in the other languages (Spanish, German, Italian, Dutch and
French depending on the desired classification output) as separate features into
the feature vector. Example 2 lists the feature vector for one of the instances
in the training base of the Dutch classifier. The first features contain the word
form, PoS-tag and chunk information for the three words preceding the target
word, the target word itself and for the three words following the target word.
In addition we added the aligned translations for the target word in the four
additional languages (being German, Spanish, Italian and French for the Dutch
classifier). The last field contains the classification label, which is the aligned
Dutch translation in this case.
(2)
English input sentence for the word bank:
This is why the Commission resolved on raising a complaint against these two
banks at its last meeting, and I hope that Parliament approves this step.
Feature vector:
against these two against these two IN DT CD I-PP I-NP I-NP banks bank
NNS I-NP at its last at its last IN PRP JJ I-PP I-NP I-NP Bank banco banca
banque bank
Incorporating the translations in our feature vector allows us to develop a
proof of concept for a multilingual approach to Word Sense Disambiguation.
This multilingual approach will consist of two steps: (1) we first examine whether
evidence from different languages can lead to better sense discrimination (which
is the scope of this paper) and (2) in a following step we will then introduce
additional cross-lingual evidence (bag-of-words features containing all content
words from the aligned translations) in the feature vectors for our WSD classifier.
An automatic sense discrimination step can then be applied on the training
feature base.
Unsupervised approaches to sense discrimination know a long research history. The idea to use distributional methods to cluster words that appear in
similar contexts corpora has been succesfully applied on monolingual corpora
(E.g. [16, 17]), as well as on parallel corpora. Previous research on parallel corpora [18, 7] confirmed the use of cross-lingual lexicalization as a criterion for
performing sense discrimination. Whereas in previous research on cross-lingual
6
Els Lefever and Veronique Hoste
WSD the evidence from the aligned sentences was mainly used to enrich WordNet information, our approach does not require any external resources. With our
experiments we want to examine to which extent evidence from other languages,
without additional information from external lexical resources, helps to detect
correct sense distinctions that result in a better WSD classification output (or
translation in our case).
3.2
Classification
To train our WSD classifier, we used the memory-based learning (MBL) algorithms implemented in timbl [19], which have been shown to perform well on
WSD [20]. We performed heuristic experiments to define the parameter settings
for the classifier, leading to the selection of the Jeffrey Divergence distance metric, Gain Ratio [21] feature weighting and k = 7 as number of nearest distances.
In future work, we plan to use a genetic algorithm to perform joint feature
selection and parameter optimization per ambiguous word [22].
4
Evaluation
For the evaluation, we performed 10-fold cross-validation on the instance bases.
As a baseline, we selected the most frequent translation that was given by the
automatic word alignment. We added the translations in the other languages
that resulted from the word alignment as features to our feature vector and built
classifiers for each target word for all five supported languages. Since we aim to
investigate the impact of cross-lingual evidence on WSD, we deliberately chose
to use the manually verified gold standard word alignments. Our classification
results can thus be considered as an upper bound for this task, as the automatic
word alignments will presumably lead to lower performance figures.
An overview of the classification results for the romance languages (French,
Italian, Spanish) can be found in Table 2, whereas the classification results for
Dutch and German are to be found in Table 3. Figure 2 illustrates the classification results per language for 2 ambiguous words, viz “bank” and “plant” when
averaging over the translations in the feature vector.
The results show that even the simple classifier which does not incorporate
translation features, beats the most frequent translation baseline for all languages (except for occupation in Spanish and Italian), although we can improve
a lot on the feature base level (e.g. by adding bag of word features for a broader
context, etc.).
The scores clearly confirm the validity of our hypothesis: the experiments
using all different translations as features are constantly better than the ones
using less or no multilingual evidence. This conclusion holds for all five classification results. In addition, the scores also degrade relatively to the number of
translation features that is used. This allows us to conclude that incorporating
multilingual information in the feature vectors helps the classifier to choose more
reliable and finer sense distinctions, which results in better translations in our
Examining the Validity of Cross-Lingual Word Sense Disambiguation
7
Fig. 2. Classification results for “bank” and “plant” for each of the target languages.
The languages are resp. from top to bottom: Dutch, French, Italian, Spanish and German.
case. Moreover, the more translations (in different languages) are incorporated
in the feature vector, the better the classification results get. Another striking
observation is that the classifier that solely relies on translation features (Only
translation features) often beats the classifier that incorporates all context and
translation features. There are, however, two limitations to our experimental
framework. We have not experimented with a higher number of languages, and
as a consequence we can not estimate from which number of languages the performance would start to degrade. In addition, another interesting line of research
would be to include languages belonging to more distant language families.
The experimental results also reveal remarkable differences between the different languages. This can probably be explained by the difference in morphological structure between the two language families. As Dutch and German tend
to concatenate the parts of compounds in one orthographic unit, whereas the ro-
8
Els Lefever and Veronique Hoste
mance languages (French, Italian, Spanish) keep these parts separated by spaces,
this often results in compound translations in German and Dutch. As a result,
the number of different classes this classifier has to choose from, is much larger
(as already shown in Figure 1). This difference is also reflected in the baselines,
where the French, Italian and Spanish baseline is clearly higher than the Dutch
or German one for most words.
Another interesting observation to make is that languages from the same
language branch seem to contribute more to a correct classification result. The
results show for instance that for the Spanish classifier, the use of Italian and
French translations in the feature vector results in better classification scores,
whereas for German, the incorporation of the Dutch translations in the feature
vector seems to contribute most for choosing a correct translation. More experiments with other words and languages will allow us to examine whether this
trend can be confirmed. Previous research on this topic has ended in contradictory results: Ide [18] showed that there was no relationship between sense discrimination and language distance, whereas Resnik and Yarowsky [6] found that
languages from other language families tend to lexicalize more sense distinctions.
Our results clearly show that adding more multilingual evidence to the feature vector helps the WSD classifier to predict more accurate translations. The
logical next step is to integrate this multilingual information into a real WSD
application. In order to do so we will use the multilingual evidence from the
parallel corpus to enrich our training vectors. Instead of only incorporating the
aligned translations from the other languages, we will add all content words from
the aligned translations as bag-of-word features to the feature vector. We will
also develop a strategy to generate the corresponding translation features for the
test instances. Both the local context features of the English target word and
the cross-lingual evidence will be taken into account for computing the similarity scores between the test input and the training instance base. The expected
outcome, based on the results we showed in this paper, is that each language can
contribute to make finer sense distinctions and thus to provide more contextually
accurate translations for the ambiguous target words.
5
Conclusion and future work
We presented preliminary results for a multilingual Word Sense Disambiguation system, which does not use labels from a predefined sense inventory, but
translations that are retrieved by running word alignment on a parallel corpus.
Although there is still a lot of room for improvement on the feature base, the
scores of all five WSD systems constantly beat the most frequent translation
baseline. The results allow us to develop a proof of concept that multilingual evidence in the feature vector, helps the classifier to make more reliable and finer
sense distinctions, which result in better translations. We also observed that
adding translations from the same language branch seems to help the classifier
best to predict a correct translation in the focus language.
Examining the Validity of Cross-Lingual Word Sense Disambiguation
9
In future work, we want to run additional experiments with different classifiers on a larger sample of ambiguous words. We also wish to improve the classification results by performing joint feature selection and parameter optimization
per ambiguous target word (E.g. by using a genetic algorithm approach). In addition, we also plan to include more multi-lingual evidence in a real WSD set-up.
Therefore we will store the bag-of-words translation features resulting from the
aligned translations in the training feature vectors, and add the automatically
generated corresponding translation features for the test sentences to the test
feature vectors.
References
1. Agirre, E., Edmonds, P., eds.: Word Sense Disambiguation. Text, Speech and
Language Technology. Springer, Dordrecht (2006)
2. Navigli, R.: Word sense disambiguation: a survey. In: ACM Computing Surveys.
Volume 41. (2009) 1–69
3. Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press (1998)
4. Otegi, A., Agirre, E., Rigau, G.: Ixa at clef 2008 robust-wsd task: Using word sense
disambiguation for (cross lingual) information retrieval. In: Evaluating Systems
for Multilingual and Multimodal Information Access 9th Workshop of the CrossLanguage Evaluation Forum, CLEF 2008, Aarhus, Denmark, September 17-19,
2008. (2009)
5. Koehn, P.: Europarl: A parallel corpus for statistical machine translation. In:
Proceedings of the MT Summit. (2005)
6. Resnik, P., Yarowsky, D.: Distinguishing systems and distinguishing senses: New
evaluation methods for word sense disambiguation. Natural Language Engineering
5 (2000) 113–133
7. Ide, N., Erjavec, T., Tufis, D.: Sense discrimination with parallel corpora. In:
Proceedings of ACL Workshop on Word Sense Disambiguation: Recent Successes
and Future Directions. (2002) 54–60
8. Gale, W., Church, K., Yarowsky, D.: A method for disambiguating word senses in
a large corpus. In: Computers and the Humanities. Volume 26. (1993) 415–439
9. Ng, H., Wang, B., Chan, Y.: Exploiting parallel texts for word sense disambiguation: An empirical study. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Santa Cruz (2003) 455–462
10. Diab, M., Resnik, P.: An unsupervised method for word sense tagging using parallel
corpora. In: Proceedings of ACL. (2002) 255–262
11. Tufiş, D., Ion, R., Ide, N.: Fine-Grained Word Sense Disambiguation Based on
Parallel Corpora, Word Alignment, Word Clustering and Aligned Wordnets. In:
Proceedings of the 20th International Conference on Computational Linguistics
(COLING 2004), Geneva, Switzerland, Association for Computational Linguistics
(2004) 1312–1318
12. Chan, Y., Ng, H.: Scaling up word sense disambiguation via parallel texts. In:
AAAI’05: Proceedings of the 20th national conference on Artificial intelligence,
AAAI Press (2005) 1037–1042
13. Gale, W., Church, K.: A program for aligning sentences in bilingual corpora. In:
Computational Linguistics. (1991) 177–184
14. Och, F., Ney, H.: A systematic comparison of various statistical alignment models.
Computational Linguistics 29 (2003) 19–51
10
Els Lefever and Veronique Hoste
15. Daelemans, W., van den Bosch, A.: Memory-Based Language Processing. Cambridge University Press (2005)
16. Schütze, H.: Automatic word sense discrimination. Computational Linguistics 24
(1998) 97–123
17. Purandare, A., Pedersen, T.: Word sense discrimination by clustering contexts in
vector and similarity spaces. In: Proceedings of the Conference on Computational
Natural Language Learning. (2004) 41–48
18. Ide, N.: Parallel translations as sense discriminators. In: SIGLEX Workshop On
Standardizing Lexical Resources. (1999)
19. Daelemans, W., Zavrel, J., van der Sloot, K.v.d.B.: Timbl: Tilburg memory-based
learner, version 4.3, reference guide. Technical Report ILK Technical Report - ILK
02-10, Tilburg University (2002)
20. Hoste, V., Hendrickx, I., Daelemans, W., van den Bosch, A.: Parameter optimization for machine-learning of word sense disambiguation. Natural Language
Engineering, Special Issue on Word Sense Disambiguation Systems 8 (2002) 311–
325
21. Quinlan, J.: C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo,
CA (1993)
22. Daelemans, W., Hoste, V., De Meulder, F., Naudts, B.: Combined optimization
of feature selection and algorithm parameter interaction in machine learning of
language. In: Proceedings of the 14th European Conference on Machine Learning
(ECML-2003). (2003) 84–95
Examining the Validity of Cross-Lingual Word Sense Disambiguation
11
Table 2. French (top left), Italian (top right) and Spanish (bottom left) results for a
varying number of translation features including the other four languages viz. Italian
(I), Spanish (E), German (D), Dutch (N) and French (F).
French
bank move- occu- passage plant
ment pation
Baseline 55.8 44.7 75.5
50.0 20.7
all four translation features
IEDN 84.9 71.7 82.8
60.3 65.4
Three translation features
I,E,D
84.5 70.9 80.8
59.5 63.7
E,D,N 84.0 70.7 81.6
59.1 63.7
I,D,N
83.9 70.7 82.0
59.1 61.3
I,E,N
84.6 71.3 81.2
57.4 64.3
Two translation features
E, D
83.2 69.2 80.0
59.9 60.8
I, D
83.1 69.8 80.1
58.7 58.8
D, N
82.8 69.1 80.9
57.4 58.6
I, E
84.3 69.8 80.0
57.8 61.0
E, N
83.2 69.8 80.5
57.4 61.0
I, N
83.2 70.1 81.1
57.8 59.4
One translation feature
D
81.4 67.5 78.9
58.7 54.0
E
83.0 67.7 79.2
56.5 56.4
I
82.4 68.4 79.5
57.4 56.1
N
82.0 68.0 80.5
57.4 55.4
No translation features
none
83.5 65.6 76.5
55.3 47.6
Only translation features
only
85.8 73.3 82.8
62.9 69.0
Spanish
bank move- occu- passage plant
ment pation
Baseline 58.8 51.0 81.6
24.1 30.1
all four translation features
IFDN 90.0 80.8 83.0 38.0 59.0
Three translation features
I,F,D
89.6 80.6 82.8
35.9 58.6
F,D,N 89.1 79.6 82.7
37.6 57.1
I,D,N
89.4 79.4 82.4
37.6 55.9
I,F,N
89.8 80.3 82.7
35.4 58.7
Two translation features
F, D
88.9 79.1 82.7
35.9 55.9
I, D
88.7 79.0 82.4
36.3 54.3
D, N
88.0 78.0 82.0
38.0 53.7
I, F
89.4 79.9 82.5
34.2 57.8
F, N
89.0 79.2 82.2
35.4 57.3
I, N
89.3 78.6 82.4
34.2 54.9
One translation feature
D
87.2 77.3 82.2
37.1 50.8
F
88.7 78.3 82.7
34.2 55.1
I
88.7 78.3 81.6
32.5 53.6
N
87.7 77.1 81.9
34.6 52.6
No translation features
none
86.5 75.8 80.6
32.9 48.5
Only translation features
only
89.9 82.0 83.0
40.9 63.4
Italian
bank move- occu- passage plant
ment pation
Baseline 54.6 51.9 78.7
37.1 32.8
all four translation features
EFDN 83.1 80.2 81.1 40.1 66.1
Three translation features
E,F,D 82.7 79.6 81.1
40.1 65.1
F,D,N 82.8 79.7 79.2
40.9 64.2
E,D,N 82.6 79.2 81.0
40.5 64.6
E,F,N 82.8 80.0 81.0
40.5 65.3
Two translation features
F, D
82.0 78.6 79.3
40.5 63.4
E, D
81.8 78.5 80.9
40.5 62.1
D, N
81.4 77.8 78.5
40.9 62.4
E, F
82.3 79.5 80.9
40.1 64.3
F, N
82.4 79.0 79.2
41.4 63.2
E, N
82.1 78.7 80.1
40.1 62.7
One translation feature
D
80.0 76.8 77.9
40.5 59.4
F
81.4 78.0 79.2
40.9 61.1
E
81.4 77.5 80.6
38.4 58.1
N
80.9 77.2 78.1
39.7 59.4
No translation features
none
79.5 75.2 78.1
38.0 53.0
Only translation features
only
83.9 81.4 81.6
42.6 67.3
12
Els Lefever and Veronique Hoste
Table 3. Dutch (left) and German (right) results for a varying number of translation
features including the other four languages viz. Italian (I), Spanish (E), German (D),
Dutch (N) and French (F).
Dutch
bank move- occu- passage plant
ment pation
Baseline 33.4 46.7 60.6
26.7 12.0
all four translation features
IEDF
80.3 65.8 69.3 36.3 47.3
Three translation features
I,E,D
80.0 65.1 68.9
35.0 44.2
E,D,F 79.4 65.2 69.0
34.6 45.8
I,D,F
79.4 65.5 69.2
36.3 45.2
I,E,F
79.1 63.7 68.2
35.4 44.5
Two translation features
E, D
79.2 64.4 67.6
35.0 45.2
I, D
79.0 64.3 68.5
34.6 42.7
D, F
78.8 64.9 68.8
35.0 43.8
I, E
79.0 62.9 66.3
34.6 41.2
E, F
78.4 63.3 67.7
34.6 42.7
I, F
78.0 63.1 68.2
35.0 42.2
One translation feature
D
77.8 63.5 67.6
35.0 40.4
E
78.1 62.1 65.3
33.3 37.1
I
77.7 62.1 66.3
33.8 38.9
F
77.3 62.1 67.6
33.8 39.8
No translation features
none
76.6 60.8 65.2
31.7 34.4
Only translation features
only
80.0 64.1 69.6
34.6 47.3
German
bank move- occu- passage plant
ment pation
Baseline 36.7 32.3 39.0
20.3 14.0
all four translation features
IEFN
82.8 57.1 48.3 32.9 45.2
Three translation features
I,E,N
82.5 57.0 47.9
31.2 44.0
E,F,N 82.5 57.2 47.7
32.1 43.9
I,E,F
81.7 55.8 47.5
31.6 42.9
F,I,N
82.6 57.2 48.3
31.6 44.5
Two translation features
E, F
81.6 55.6 45.5
31.2 41.1
I, F
81.6 55.5 46.9
31.2 41.6
F, N
82.3 56.9 47.2
30.4 42.9
I, E
81.6 55.3 46.4
29.5 41.1
E, N
82.2 56.6 46.7
30.0 41.6
I, N
82.2 57.1 48.0
30.0 42.5
One translation feature
F
81.1 54.8 45.5
30.0 39.2
E
81.1 54.7 43.6
28.7 36.6
I
81.3 55.1 45.0
29.5 39.1
N
81.9 56.1 46.7
28.3 40.4
No translation features
none
80.5 53.5 42.1
27.8 34.0
Only translation features
only
73.1 51.1 50.4
32.5 43.8