8000 ๐ŸŒ [i18n-KO] Translated `multilingual.mdx` to Korean (#23008) ยท githubhjs/transformers@e28fff1 ยท GitHub
[go: up one dir, main page]

Skip to content

Commit e28fff1

Browse files
HanNayeoniee0525hhgusgabrielwithappyjungnerdsim-so
authored
๐ŸŒ [i18n-KO] Translated multilingual.mdx to Korean (huggingface#23008)
docs: ko: `multilingual.mdx` Co-authored-by: Hyeonseo Yun <0525_hhgus@naver.com> Co-authored-by: Gabriel Yang <gabrielwithhappy@gmail.com> Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com> Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
1 parent 9435cc6 commit e28fff1

File tree

2 files changed

+190
-2
lines changed

2 files changed

+190
-2
lines changed

โ€Ždocs/source/ko/_toctree.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,8 +42,8 @@
4242
- sections:
4343
- local: in_translation
4444
title: (๋ฒˆ์—ญ์ค‘) Use tokenizers from ๐Ÿค— Tokenizers
45-
- local: in_translation
46-
title: (๋ฒˆ์—ญ์ค‘) Inference for multilingual models
45+
- local: multilingual
46+
title: ๋‹ค๊ตญ์–ด ๋ชจ๋ธ ์ถ”๋ก ํ•˜๊ธฐ
4747
- local: in_translation
4848
title: (๋ฒˆ์—ญ์ค‘) Text generation strategies
4949
- sections:

โ€Ždocs/source/ko/multilingual.mdx

Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# ๋‹ค๊ตญ์–ด ๋ชจ๋ธ ์ถ”๋ก ํ•˜๊ธฐ[[multilingual-models-for-inference]]
14+
15+
[[open-in-colab]]
16+
17+
๐Ÿค— Transformers์—๋Š” ์—ฌ๋Ÿฌ ์ข…๋ฅ˜์˜ ๋‹ค๊ตญ์–ด(multilingual) ๋ชจ๋ธ์ด ์žˆ์œผ๋ฉฐ, ๋‹จ์ผ ์–ธ์–ด(monolingual) ๋ชจ๋ธ๊ณผ ์ถ”๋ก  ์‹œ ์‚ฌ์šฉ๋ฒ•์ด ๋‹ค๋ฆ…๋‹ˆ๋‹ค.
18+
๊ทธ๋ ‡๋‹ค๊ณ  ํ•ด์„œ *๋ชจ๋“ * ๋‹ค๊ตญ์–ด ๋ชจ๋ธ์˜ ์‚ฌ์šฉ๋ฒ•์ด ๋‹ค๋ฅธ ๊ฒƒ์€ ์•„๋‹™๋‹ˆ๋‹ค.
19+
20+
[bert-base-multilingual-uncased](https://huggingface.co/bert-base-multilingual-uncased)์™€ ๊ฐ™์€ ๋ช‡๋ช‡ ๋ชจ๋ธ์€ ๋‹จ์ผ ์–ธ์–ด ๋ชจ๋ธ์ฒ˜๋Ÿผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
21+
์ด๋ฒˆ ๊ฐ€์ด๋“œ์—์„œ ๋‹ค๊ตญ์–ด ๋ชจ๋ธ์˜ ์ถ”๋ก  ์‹œ ์‚ฌ์šฉ ๋ฐฉ๋ฒ•์„ ์•Œ์•„๋ณผ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
22+
23+
## XLM[[xlm]]
24+
25+
XLM์—๋Š” 10๊ฐ€์ง€ ์ฒดํฌํฌ์ธํŠธ(checkpoint)๊ฐ€ ์žˆ๋Š”๋ฐ, ์ด ์ค‘ ํ•˜๋‚˜๋งŒ ๋‹จ์ผ ์–ธ์–ด์ž…๋‹ˆ๋‹ค.
26+
๋‚˜๋จธ์ง€ ์ฒดํฌํฌ์ธํŠธ 9๊ฐœ๋Š” ์–ธ์–ด ์ž„๋ฒ ๋”ฉ์„ ์‚ฌ์šฉํ•˜๋Š” ์ฒดํฌํฌ์ธํŠธ์™€ ๊ทธ๋ ‡์ง€ ์•Š์€ ์ฒดํฌํฌ์ธํŠธ์˜ ๋‘ ๊ฐ€์ง€ ๋ฒ”์ฃผ๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
27+
28+
### ์–ธ์–ด ์ž„๋ฒ ๋”ฉ์„ ์‚ฌ์šฉํ•˜๋Š” XLM[[xlm-with-language-embeddings]]
29+
30+
๋‹ค์Œ XLM ๋ชจ๋ธ์€ ์ถ”๋ก  ์‹œ์— ์–ธ์–ด ์ž„๋ฒ ๋”ฉ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค:
31+
32+
- `xlm-mlm-ende-1024` (๋งˆ์Šคํ‚น๋œ ์–ธ์–ด ๋ชจ๋ธ๋ง, ์˜์–ด-๋…์ผ์–ด)
33+
- `xlm-mlm-enfr-1024` (๋งˆ์Šคํ‚น๋œ ์–ธ์–ด ๋ชจ๋ธ๋ง, ์˜์–ด-ํ”„๋ž‘์Šค์–ด)
34+
- `xlm-mlm-enro-1024` (๋งˆ์Šคํ‚น๋œ ์–ธ์–ด ๋ชจ๋ธ๋ง, ์˜์–ด-๋ฃจ๋งˆ๋‹ˆ์•„์–ด)
35+
- `xlm-mlm-xnli15-1024` (๋งˆ์Šคํ‚น๋œ ์–ธ์–ด ๋ชจ๋ธ๋ง, XNLI ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ ์ œ๊ณตํ•˜๋Š” 15๊ฐœ ๊ตญ์–ด)
36+
- `xlm-mlm-tlm-xnli15-1024` (๋งˆ์Šคํ‚น๋œ ์–ธ์–ด ๋ชจ๋ธ๋ง + ๋ฒˆ์—ญ, XNLI ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ ์ œ๊ณตํ•˜๋Š” 15๊ฐœ ๊ตญ์–ด)
37+
- `xlm-clm-enfr-1024` (Causal language modeling, ์˜์–ด-ํ”„๋ž‘์Šค์–ด)
38+
- `xlm-clm-ende-1024` (Causal language modeling, ์˜์–ด-๋…์ผ์–ด)
39+
40+
์–ธ์–ด ์ž„๋ฒ ๋”ฉ์€ ๋ชจ๋ธ์— ์ „๋‹ฌ๋œ `input_ids`์™€ ๋™์ผํ•œ shape์˜ ํ…์„œ๋กœ ํ‘œํ˜„๋ฉ๋‹ˆ๋‹ค.
41+
์ด๋Ÿฌํ•œ ํ…์„œ์˜ ๊ฐ’์€ ์‚ฌ์šฉ๋œ ์–ธ์–ด์— ๋”ฐ๋ผ ๋‹ค๋ฅด๋ฉฐ ํ† ํฌ๋‚˜์ด์ €์˜ `lang2id` ๋ฐ `id2lang` ์†์„ฑ์— ์˜ํ•ด ์‹๋ณ„๋ฉ๋‹ˆ๋‹ค.
42+
43+
๋‹ค์Œ ์˜ˆ์ œ์—์„œ๋Š” `xlm-clm-enfr-1024` ์ฒดํฌํฌ์ธํŠธ(์ฝ”์ž˜ ์–ธ์–ด ๋ชจ๋ธ๋ง(causal language modeling), ์˜์–ด-ํ”„๋ž‘์Šค์–ด)๋ฅผ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค:
44+
45+
```py
46+
>>> import torch
47+
>>> from transformers import XLMTokenizer, XLMWithLMHeadModel
48+
49+
>>> tokenizer = XLMTokenizer.from_pretrained("xlm-clm-enfr-1024")
50+
>>> model = XLMWithLMHeadModel.from_pretrained("xlm-clm-enfr-1024")
51+
```
52+
53+
ํ† ํฌ๋‚˜์ด์ €์˜ `lang2id` ์†์„ฑ์€ ๋ชจ๋ธ์˜ ์–ธ์–ด์™€ ํ•ด๋‹น ID๋ฅผ ํ‘œ์‹œํ•ฉ๋‹ˆ๋‹ค:
54+
55+
```py
56+
>>> print(tokenizer.lang2id)
57+
{'en': 0, 'fr': 1}
58+
```
59+
60+
๋‹ค์Œ์œผ๋กœ, ์˜ˆ์ œ ์ž…๋ ฅ์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค:
61+
62+
```py
63+
>>> input_ids = torch.tensor([tokenizer.encode("Wikipedia was used to")]) # ๋ฐฐ์น˜ ํฌ๊ธฐ๋Š” 1์ž…๋‹ˆ๋‹ค
64+
```
65+
66+
์–ธ์–ด ID๋ฅผ `"en"`์œผ๋กœ ์„ค์ •ํ•ด ์–ธ์–ด ์ž„๋ฒ ๋”ฉ์„ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.
67+
์–ธ์–ด ์ž„๋ฒ ๋”ฉ์€ ์˜์–ด์˜ ์–ธ์–ด ID์ธ `0`์œผ๋กœ ์ฑ„์›Œ์ง„ ํ…์„œ์ž…๋‹ˆ๋‹ค.
68+
์ด ํ…์„œ๋Š” `input_ids`์™€ ๊ฐ™์€ ํฌ๊ธฐ์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค.
69+
70+
```py
71+
>>> language_id = tokenizer.lang2id["en"] # 0
72+
>>> langs = torch.tensor([language_id] * input_ids.shape[1]) # torch.tensor([0, 0, 0, ..., 0])
73+
74+
>>> # (batch_size, sequence_length) shape์˜ ํ…์„œ๊ฐ€ ๋˜๋„๋ก ๋งŒ๋“ญ๋‹ˆ๋‹ค.
75+
>>> langs = langs.view(1, -1) # ์ด์ œ [1, sequence_length] shape์ด ๋˜์—ˆ์Šต๋‹ˆ๋‹ค(๋ฐฐ์น˜ ํฌ๊ธฐ๋Š” 1์ž…๋‹ˆ๋‹ค)
76+
```
77+
78+
์ด์ œ `input_ids`์™€ ์–ธ์–ด ์ž„๋ฒ ๋”ฉ์„ ๋ชจ๋ธ๋กœ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค:
79+
80+
```py
81+
>>> outputs = model(input_ids, langs=langs)
82+
```
83+
84+
[run_generation.py](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-generation/run_generation.py) ์Šคํฌ๋ฆฝํŠธ๋กœ `xlm-clm` ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์‚ฌ์šฉํ•ด ํ…์ŠคํŠธ์™€ ์–ธ์–ด ์ž„๋ฒ ๋”ฉ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
85+
86+
### ์–ธ์–ด ์ž„๋ฒ ๋”ฉ์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” XLM[[xlm-without-language-embeddings]]
87+
88+
๋‹ค์Œ XLM ๋ชจ๋ธ์€ ์ถ”๋ก  ์‹œ์— ์–ธ์–ด ์ž„๋ฒ ๋”ฉ์ด ํ•„์š”ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค:
89+
90+
- `xlm-mlm-17-1280` (๋งˆ์Šคํ‚น๋œ ์–ธ์–ด ๋ชจ๋ธ๋ง, 17๊ฐœ ๊ตญ์–ด)
91+
- `xlm-mlm-100-1280` (๋งˆ์Šคํ‚น๋œ ์–ธ์–ด ๋ชจ๋ธ๋ง, 100๊ฐœ ๊ตญ์–ด)
92+
93+
์ด์ „์˜ XLM ์ฒดํฌํฌ์ธํŠธ์™€ ๋‹ฌ๋ฆฌ ์ด ๋ชจ๋ธ์€ ์ผ๋ฐ˜ ๋ฌธ์žฅ ํ‘œํ˜„์— ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
94+
95+
## BERT[[bert]]
96+
97+
๋‹ค์Œ BERT ๋ชจ๋ธ์€ ๋‹ค๊ตญ์–ด ํƒœ์Šคํฌ์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
98+
99+
- `bert-base-multilingual-uncased` (๋งˆ์Šคํ‚น๋œ ์–ธ์–ด ๋ชจ๋ธ๋ง + ๋‹ค์Œ ๋ฌธ์žฅ ์˜ˆ์ธก, 102๊ฐœ ๊ตญ์–ด)
100+
- `bert-base-multilingual-cased` (๋งˆ์Šคํ‚น๋œ ์–ธ์–ด ๋ชจ๋ธ๋ง + ๋‹ค์Œ ๋ฌธ์žฅ ์˜ˆ์ธก, 104๊ฐœ ๊ตญ์–ด)
101+
102+
์ด๋Ÿฌํ•œ ๋ชจ๋ธ์€ ์ถ”๋ก  ์‹œ์— ์–ธ์–ด ์ž„๋ฒ ๋”ฉ์ด ํ•„์š”ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
103+
๋ฌธ๋งฅ์—์„œ ์–ธ์–ด๋ฅผ ์‹๋ณ„ํ•˜๊ณ , ์‹๋ณ„๋œ ์–ธ์–ด๋กœ ์ถ”๋ก ํ•ฉ๋‹ˆ๋‹ค.
104+
105+
## XLM-RoBERTa[[xlmroberta]]
106+
107+
๋‹ค์Œ XLM-RoBERTa ๋˜ํ•œ ๋‹ค๊ตญ์–ด ๋‹ค๊ตญ์–ด ํƒœ์Šคํฌ์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
108+
109+
- `xlm-roberta-base` (๋งˆ์Šคํ‚น๋œ ์–ธ์–ด ๋ชจ๋ธ๋ง, 100๊ฐœ ๊ตญ์–ด)
110+
- `xlm-roberta-large` (๋งˆ์Šคํ‚น๋œ ์–ธ์–ด ๋ชจ๋ธ๋ง, 100๊ฐœ ๊ตญ์–ด)
111+
112+
XLM-RoBERTa๋Š” 100๊ฐœ ๊ตญ์–ด์— ๋Œ€ํ•ด ์ƒˆ๋กœ ์ƒ์„ฑ๋˜๊ณ  ์ •์ œ๋œ 2.5TB ๊ทœ๋ชจ์˜ CommonCrawl ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
113+
์ด์ „์— ๊ณต๊ฐœ๋œ mBERT๋‚˜ XLM๊ณผ ๊ฐ™์€ ๋‹ค๊ตญ์–ด ๋ชจ๋ธ์— ๋น„ํ•ด ๋ถ„๋ฅ˜, ์‹œํ€€์Šค ๋ผ๋ฒจ๋ง, ์งˆ์˜ ์‘๋‹ต๊ณผ ๊ฐ™์€ ๋‹ค์šด์ŠคํŠธ๋ฆผ(downstream) ์ž‘์—…์—์„œ ์ด์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.
114+
115+
## M2M100[[m2m100]]
116+
117+
๋‹ค์Œ M2M100 ๋ชจ๋ธ ๋˜ํ•œ ๋‹ค๊ตญ์–ด ๋‹ค๊ตญ์–ด ํƒœ์Šคํฌ์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
118+
119+
- `facebook/m2m100_418M` (๋ฒˆ์—ญ)
120+
- `facebook/m2m100_1.2B` (๋ฒˆ์—ญ)
121+
122+
์ด ์˜ˆ์ œ์—์„œ๋Š” `facebook/m2m100_418M` ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๊ฐ€์ ธ์™€์„œ ์ค‘๊ตญ์–ด๋ฅผ ์˜์–ด๋กœ ๋ฒˆ์—ญํ•ฉ๋‹ˆ๋‹ค.
123+
ํ† ํฌ๋‚˜์ด์ €์—์„œ ๋ฒˆ์—ญ ๋Œ€์ƒ ์–ธ์–ด(source language)๋ฅผ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
124+
125+
```py
126+
>>> from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
127+
128+
>>> en_text = "Do not meddle in the affairs of wizards, for they are subtle and quick to anger."
129+
>>> chinese_text = "ไธ่ฆๆ’ๆ‰‹ๅทซๅธซ็š„ไบ‹ๅ‹™, ๅ› ็‚บไป–ๅ€‘ๆ˜ฏๅพฎๅฆ™็š„, ๅพˆๅฟซๅฐฑๆœƒ็™ผๆ€’."
130+
131+
>>> tokenizer = M2M100Tokenizer.from_pretrained("facebook/m2m100_418M", src_lang="zh")
132+
>>> model = M2M100ForConditionalGeneration.from_pretrained("facebook/m2m100_418M")
133+
```
134+
135+
๋ฌธ์žฅ์„ ํ† ํฐํ™”ํ•ฉ๋‹ˆ๋‹ค:
136+
137+
```py
138+
>>> encoded_zh = tokenizer(chinese_text, return_tensors="pt")
139+
```
140+
141+
M2M100์€ ๋ฒˆ์—ญ์„ ์ง„ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ์ฒซ ๋ฒˆ์งธ๋กœ ์ƒ์„ฑ๋˜๋Š” ํ† ํฐ์€ ๋ฒˆ์—ญํ•  ์–ธ์–ด(target language) ID๋กœ ๊ฐ•์ œ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.
142+
์˜์–ด๋กœ ๋ฒˆ์—ญํ•˜๊ธฐ ์œ„ํ•ด `generate` ๋ฉ”์†Œ๋“œ์—์„œ `forced_bos_token_id`๋ฅผ `en`์œผ๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค:
143+
144+
```py
145+
>>> generated_tokens = model.generate(**encoded_zh, forced_bos_token_id=tokenizer.get_lang_id("en"))
146+
>>> tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
147+
'Do not interfere with the matters of the witches, because they are delicate and will soon be angry.'
148+
```
149+
150+
## MBart[[mbart]]
151+
152+
๋‹ค์Œ MBart ๋ชจ๋ธ ๋˜ํ•œ ๋‹ค๊ตญ์–ด ํƒœ์Šคํฌ์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
153+
154+
- `facebook/mbart-large-50-one-to-many-mmt` (์ผ๋Œ€๋‹ค ๋‹ค๊ตญ์–ด ๋ฒˆ์—ญ, 50๊ฐœ ๊ตญ์–ด)
155+
- `facebook/mbart-large-50-many-to-many-mmt` (๋‹ค๋Œ€๋‹ค ๋‹ค๊ตญ์–ด ๋ฒˆ์—ญ, 50๊ฐœ ๊ตญ์–ด)
156+
- `facebook/mbart-large-50-many-to-one-mmt` (๋‹ค๋Œ€์ผ ๋‹ค๊ตญ์–ด ๋ฒˆ์—ญ, 50๊ฐœ ๊ตญ์–ด)
157+
- `facebook/mbart-large-50` (๋‹ค๊ตญ์–ด ๋ฒˆ์—ญ, 50๊ฐœ ๊ตญ์–ด)
158+
- `facebook/mbart-large-cc25`
159+
160+
์ด ์˜ˆ์ œ์—์„œ๋Š” ํ•€๋ž€๋“œ์–ด๋ฅผ ์˜์–ด๋กœ ๋ฒˆ์—ญํ•˜๊ธฐ ์œ„ํ•ด `facebook/mbart-large-50-many-to-many-mmt` ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
161+
ํ† ํฌ๋‚˜์ด์ €์—์„œ ๋ฒˆ์—ญ ๋Œ€์ƒ ์–ธ์–ด(source language)๋ฅผ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
162+
163+
```py
164+
>>> from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
165+
166+
>>> en_text = "Do not meddle in the affairs of wizards, for they are subtle and quick to anger."
167+
>>> fi_text = "ร„lรค sekaannu velhojen asioihin, sillรค ne ovat hienovaraisia ja nopeasti vihaisia."
168+
169+
>>> tokenizer = AutoTokenizer.from_pretrained("facebook/mbart-large-50-many-to-many-mmt", src_lang="fi_FI")
170+
>>> model = AutoModelForSeq2SeqLM.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")
171+
```
172+
173+
๋ฌธ์žฅ์„ ํ† ํฐํ™”ํ•ฉ๋‹ˆ๋‹ค:
174+
175+
```py
176+
>>> encoded_en = tokenizer(en_text, return_tensors="pt")
177+
```
178+
179+
MBart๋Š” ๋ฒˆ์—ญ์„ ์ง„ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ์ฒซ ๋ฒˆ์งธ๋กœ ์ƒ์„ฑ๋˜๋Š” ํ† ํฐ์€ ๋ฒˆ์—ญํ•  ์–ธ์–ด(target language) ID๋กœ ๊ฐ•์ œ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.
180+
์˜์–ด๋กœ ๋ฒˆ์—ญํ•˜๊ธฐ ์œ„ํ•ด `generate` ๋ฉ”์†Œ๋“œ์—์„œ `forced_bos_token_id`๋ฅผ `en`์œผ๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค:
181+
182+
```py
183+
>>> generated_tokens = model.generate(**encoded_en, forced_bos_token_id=tokenizer.lang_code_to_id("en_XX"))
184+
>>> tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
185+
"Don't interfere with the wizard's affairs, because they are subtle, will soon get angry."
186+
```
187+
188+
`facebook/mbart-large-50-many-to-one-mmt` ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋‹ค๋ฉด, ์ฒซ ๋ฒˆ์งธ๋กœ ์ƒ์„ฑ๋˜๋Š” ํ† ํฐ์„ ๋ฒˆ์—ญํ•  ์–ธ์–ด(target language) ID๋กœ ๊ฐ•์ œ ์ง€์ •ํ•  ํ•„์š”๋Š” ์—†์Šต๋‹ˆ๋‹ค.

0 commit comments

Comments
ย (0)
0