Skip to main content
While Natural Language Processing (NLP) models have gained substantial attention, only in recent years has research opened new paths for tackling Human-Computer Design (HCD) from the perspective of natural language. We focus on developing... more
While Natural Language Processing (NLP) models have gained substantial attention, only in recent years has research opened new paths for tackling Human-Computer Design (HCD) from the perspective of natural language. We focus on developing a human-centered corpus, more specifically, a persona-based corpus in a particular healthcare domain (diabetes mellitus self-care). In order to follow an HCD approach, we created personas to model interpersonal interaction (expert and non-expert users) in that specific domain. We show that an HCD approach benefits language generation from different perspectives, from machines to humans-contributing with new directions for low-resource contexts (languages other than English and sensitive domains) where the need to promote effective communication is essential.
ABSTRACT
Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro de Comunicação e Expressão, Programa de Pós-Graduação em Estudos de Tradução, Florianópolis, 2016.Este trabalho tem por objetivo principal avaliar um sistema de... more
Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro de Comunicação e Expressão, Programa de Pós-Graduação em Estudos de Tradução, Florianópolis, 2016.Este trabalho tem por objetivo principal avaliar um sistema de tradução com base em corpus, denominado COPA-TRAD, sob a perspectiva do usuário (pesquisador, tradutor, estudante ? da área de tradução), considerando características de usabilidade e ergonomia cognitiva. A intenção é compreender como se dá a interação dos usuários com o software investigado, visto o crescimento em nível de complexidade e diversidade das tecnologias de tradução com base em corpus, e a pouca atenção empregada às recomendações da área de interação humano-computador (IHC). A pesquisa foi dividida em etapas distintas: primeiramente as conversas informais com os participantes da pesquisa, e seguidas pela aplicação de um questionário de usabilidade. Ademais, foram conduzidas a avaliação heurística; a inspeção ergonômica por meio de listas de v...
Com o advento da internet e dos constantes avanços tecnológicos, os corpora se tornaram essenciais no crescimento dos Estudos da Tradução Baseados em Corpus (ETBC), assim como no desenvolvimento de sistemas de informação e técnicas que... more
Com o advento da internet e dos constantes avanços tecnológicos, os corpora se tornaram essenciais no crescimento dos Estudos da Tradução Baseados em Corpus (ETBC), assim como no desenvolvimento de sistemas de informação e técnicas que fazem uso destes. Este artigo apresenta uma breve revisão de sistemas web baseados em corpus no par linguístico inglês-português, a partir de uma perspectiva de aplicação ao ensino, à pesquisa e à prática tradutória. Para tanto, buscamos proporcionar uma significação no âmbito tecnológico por meio de (i) uma breve contextualização teórica sobre o uso de corpora, (ii) as suas principais características e (iii) as aplicações mais conhecidas. Posteriormente, apresenta-se uma síntese das ferramentas web gratuitas: COMPARA (2000), CorTrad (2009), COPA-TRAD (2011), OPUS-CORPUS (2012) e VVV (2013). Em seguida, elencamos os usos e benefícios mais comuns de sistemas de compilação, análise, classificação e exploração de corpora. Por fim, a análise revela o mome...
This paper reports on a study that analyses the impact of two different machine translation (MT) outputs on the cognitive effort required to post-edit machine-translated metaphors by means of eye tracking and think-aloud protocols. We... more
This paper reports on a study that analyses the impact of two different machine translation (MT) outputs on the cognitive effort required to post-edit machine-translated metaphors by means of eye tracking and think-aloud protocols. We hypothesise that the statistical MT output would have a positive effect on reducing cognitive effort. In order to test this hypothesis, a post-editing experiment was conducted with two different groups of participants. Each experimental group had two post-editing tasks using the language pair English into Brazilian Portuguese. On Task 1 (T1), participants were asked to postedit a Google machine-translated output whereas on Task 2 (T2) the same participants were assigned to post-edit a Systran machine translated output. Data collection was conducted under the experimental paradigm of data triangulation in translation process research. Data analysis focuses on eye tracking data related to fixation duration and pupil dilation as well as think-aloud protoc...
The continuous development of translation technologies has fundamentally changed the way users of this area interact with computers. The need for understanding and measuring how a vast number of resources and software applications can... more
The continuous development of translation technologies has fundamentally changed the way users of this area interact with computers. The need for understanding and measuring how a vast number of resources and software applications can impact users and other stakeholders had led to recommendations related to human-computer interaction (HCI), presented as guidelines and best practices. Unfortunately, when developing translation tools limited attention is still paid to usability and ergonomics, be it during the design, implementation or deployment phases. Meanwhile, the level of complexity of corpus-based translation tools has increased in difficulties and diversity, however, this evolution does not take into consideration HCI recommendations yet. The goal of this study is to bridge this gap between corpus-based tools, ergonomics, and usability, by presenting the results of a user-oriented methodology. With this in mind, a corpus analysis software, called COPA-TRAD, was used as the basis for applying some existing methods within usability and ergonomics area. The proposed study was composed of three main stages: (i) usability questionnaire – administered to participants of this knowledge area; (ii) heuristics analysis – performed by five usability experts; and (iii) ergonomics checklist inspection, to analyze general elements. The results indicated that despite the concern of providing a “user-friendly” interface, the analyzed system had not made use of known usability and ergonomics methods, just guidelines of the third-party software used as part of COPA-TRAD. The study points out directions on which a corpus-based tool can be adapted to user needs and further indicate some important criteria that require improvement. After applying the necessary changes, a complementary analysis needs to be carried out to verify if those identified issues were accurately adjusted. We believe translation technology should concern with building adequate interfaces, allowing humans to interact effectively with tools data and facilitating the process of retrieving information.
Data-to-text Natural Language Generation (NLG) is the computational process of generating natural language in the form of text or voice from non-linguistic data. A core micro-planning task within NLG is referring expression generation... more
Data-to-text Natural Language Generation (NLG) is the computational process of generating natural language in the form of text or voice from non-linguistic data. A core micro-planning task within NLG is referring expression generation (REG), which aims to automatically generate noun phrases to refer to entities mentioned as discourse unfolds. A limitation of novel REG models is not being able to generate referring expressions to entities not encountered during the training process. To solve this problem, we propose two extensions to NeuralREG, a state-of-the-art encoder-decoder REG model. The first is a copy mechanism, whereas the second consists of representing the gender and type of the referent as inputs to the model. Drawing on the results of automatic and human evaluation as well as an ablation study using the WebNLG corpus, we contend that our proposal contributes to the generation of more meaningful referring expressions to unseen entities than the original system and related...
This demo paper introduces DaMata, a robot-journalist covering deforestation in the Brazilian Amazon. The robot-journalist is based on a pipeline architecture of Natural Language Generation, which yields multilingual daily and monthly... more
This demo paper introduces DaMata, a robot-journalist covering deforestation in the Brazilian Amazon. The robot-journalist is based on a pipeline architecture of Natural Language Generation, which yields multilingual daily and monthly reports based on the public data provided by DETER, a real-time deforestation satellite monitor developed and maintained by the Brazilian National Institute for Space Research (INPE). DaMata automatically generates reports in Brazilian Portuguese and English and publishes them on the Twitter platform. Corpus and code are publicly available.
This paper reports on a study that analyses the impact of two different machine translation (MT) outputs on the cognitive effort required to post-edit machine-translated metaphors by means of eye tracking and think-aloud protocols. We... more
This paper reports on a study that analyses the impact of two different machine
translation (MT) outputs on the cognitive effort required to post-edit machine-translated
metaphors by means of eye tracking and think-aloud protocols. We hypothesise that the
statistical MT output would have a positive effect on reducing cognitive effort. In order to
test this hypothesis, a post-editing experiment was conducted with two different groups
of participants. Each experimental group had two post-editing tasks using the language
pair English into Brazilian Portuguese. On Task 1 (T1), participants were asked to postedit a Google machine-translated output whereas on Task 2 (T2) the same participants
were assigned to post-edit a Systran machine translated output. Data collection was
conducted under the experimental paradigm of data triangulation in translation process
research. Data analysis focuses on eye tracking data related to fixation duration and
pupil dilation as well as think-aloud protocols. This analysis shows that the cognitive
effort required to post-edit the pure statistical MT output might be lower in comparison to
the hybrid output when conventional metaphors are machine translated.
This paper reports on a study that analyses the impact of two different machine translation (MT) outputs on the cognitive effort required to post-edit machine-translated metaphors by means of eye tracking and think-aloud protocols. We... more
This paper reports on a study that analyses the impact of two different machine translation (MT) outputs on the cognitive effort required to post-edit machine-translated metaphors by means of eye tracking and think-aloud protocols. We hypothesise that the statistical MT output would have a positive effect on reducing cognitive effort. In order to test this hypothesis, a post-editing experiment was conducted with two different groups of participants. Each experimental group had two post-editing tasks using the language pair English into Brazilian Portuguese. On Task 1 (T1), participants were asked to post-edit a Google machine-translated output whereas on Task 2 (T2) the same participants were assigned to post-edit a Systran machine translated output. Data collection was conducted under the experimental paradigm of data triangulation in translation process research. Data analysis focuses on eye tracking data related to fixation duration and pupil dilation as well as think-aloud protocols. This analysis shows that the cognitive effort required to post-edit the pure statistical MT output might be lower in comparison to the hybrid output when conventional metaphors are machine translated.
Com o advento da internet e dos constantes avanços tecnológicos, os corpora se tornaram essenciais no crescimento dos Estudos da Tradução Baseados em Corpus (ETBC), assim como no desenvolvimento de sistemas de informação e técnicas que... more
Com o advento da internet e dos constantes avanços tecnológicos, os corpora se tornaram essenciais no crescimento dos Estudos da Tradução Baseados em Corpus (ETBC), assim como no desenvolvimento de sistemas de informação e técnicas que fazem uso destes. Este artigo apresenta uma breve revisão de sistemas web baseados em corpus no par linguístico inglês-português, a partir de uma perspectiva de aplicação ao ensino, à pesquisa e à prática tradutória. Para tanto, buscamos proporcionar uma significação no âmbito tecnológico por meio de (i) uma breve contextualização teórica sobre o uso de corpora, (ii) as suas principais características e (iii) as aplicações mais conhecidas. Posteriormente, apresenta-se uma síntese das ferramentas web gratuitas: COMPARA (2000), CorTrad (2009), COPA-TRAD (2011), OPUS-CORPUS (2012) e VVV (2013). Em seguida, elencamos os usos e benefícios mais comuns de sistemas de compilação, análise, classificação e exploração de corpora. Por fim, a análise revela o momento vivenciado nos ETBC por meio de um resumo do aparato tecnológico existente na área. Desta maneira, almejamos que a presente discussão venha a proporcionar o desenvolvimento de pesquisas relacionadas aos sistemas baseados em corpus, haja vista a constante evolução tecnológica e a variedade de aplicações que podem se beneficiar do uso de corpora, seja no contexto prático ou profissional.

Abstract: With the advent of Internet and continuous technological advances, corpora have become essential in the growth of Corpus-Based Translation Studies (CTS), as well as in the development of information systems and techniques that make use of them. This paper presents a brief revision of corpus-based web systems in the English-Portuguese language pair, from a perspective of application in translation teaching, research and practice. To this end, we aim to provide a meaning in the technological scope through (i) a brief theoretical contextualization on the use of corpora, (ii) its key features and (iii) the best-known applications. Afterwards, a summary of the open-source web-based tools is presented: COMPARA (2000), CorTrad (2009), COPA-TRAD (2011), OPUS-CORPUS (2012) and VVV (2013). Next, we list the most common uses and benefits of systems for compiling, analyzing, classifying, and exploiting corpora. Finally, the analysis reveals the moment experienced by CTS through a synthesis of the technological apparatus in the area. To sum up, we aim to encourage the development of corpus-based systems research, due to the constant technological evolution and the variety of applications that can benefit from the use of corpora, either in the practical or professional context.
Research Interests:
O objetivo deste trabalho é apresentar uma análise comparativa, centrada no usuário, que foi realizada em duas ferramentas de apoio à tradução disponíveis on-line: Google Translator Toolkit e Wordfast Anywhere. Como principal método de... more
O objetivo deste trabalho é apresentar uma análise comparativa, centrada no usuário, que foi realizada em duas ferramentas de apoio à tradução disponíveis on-line: Google Translator Toolkit e Wordfast Anywhere. Como principal método de investigação, temos a avaliação exploratória e aplicação de lista de verificação de usabilidade e ergonomia cognitiva. Os resultados iniciais mostram que o emprego de métodos de avaliação deste tipo pode ter baixo custo e ser de fácil acesso, necessitando-se apenas de uma disseminação maior sobre como aplicá-los às ferramentas de tradução.

Abstract: The aim of this paper is to present a user-centered comparative analysis of two CAT tools available on-line: Google Translator Toolkit and Wordfast Anywhere. As the central research method, we have applied an exploratory evaluation and a usability and cognitive ergonomics checklist. Initial results show that the use of some of these methods may be low-cost and easily accessible, requiring only further dissemination about how applying them to translation tools.
Research Interests:
Translation is a profession highly connected to technology, and for this reason, most of today's translators are in contact with a variety of tools, services and programs, such as word processors, e-mail, electronic dictionaries, among... more
Translation is a profession highly connected to technology, and for this reason, most of today's translators are in contact with a variety of tools, services and programs, such as word processors, e-mail, electronic dictionaries, among others. In this paper, we argue that while translation and technology have a strong relationship, there are few researches in Corpus-based Translation Studies area, which are related to analyze and evaluate translation software. The corpus-based information system, called COPA- TRAD is analyzed considering ergonomics and software usability aspects, in order to those who are involved in the area can have access to a more familiar system that can be used to translation research; teaching; and practice. Due to the nature and still little explored subject, we intend to provide to Translation Studies area, and more specifically, to those familiar to Corpus-based Translation Studies, features/characteristics that can lead to even more studies about this subject, resulting in possible improvements and/or development of translation-based tools.
Research Interests: