Skip to main content

Fabio Kepler

Universidade Federal do Pampa, Computer Science, Faculty Member

Followers

67

Following

12

Co-authors

6

Public Views

Interests

Uploads

Papers

IST-Unbabel 2021 Submission for the Quality Estimation Shared Task

We present the joint contribution of IST and Unbabel to the WMT 2021 Shared Task on Quality Estim... more We present the joint contribution of IST and Unbabel to the WMT 2021 Shared Task on Quality Estimation. Our team participated on two tasks: Direct Assessment and Post-Editing Effort, encompassing a total of 35 submissions. For all submissions, our efforts focused on training multilingual models on top of OpenKiwi predictor-estimator architecture, using pre-trained multilingual encoders combined with adapters. We further experiment with and uncertainty-related objectives and features as well as training on out-of-domain direct assessment data.

Proceedings of the XIV Brazilian Symposium on Information Systems - SBSI'18, 2018

Software size estimation may be costly due to the time spent in estimation, even using referenced... more Software size estimation may be costly due to the time spent in estimation, even using referenced method, such as, Functional Size Measurement methods. This research objective is to reduce estimation cost in the very early stage of software development life cycle based on user stories and COSMIC method. We propose automated size estimation in order to reduce estimation cost and increase the accuracy. We conducted a quasi-experiment where the control group is composed of certified professionals and the experimental group is actually the proposed automated estimation tool. Results show promising evidence of success in terms of estimation precision. Based on these preliminary results we conclude that user story size automation is valuable and may be more objective and precise that manual estimation.

Fusion of Simple Models for Native Language Identification

Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, 2017

IST-Unbabel Participation in the WMT20 Quality Estimation Shared Task

We present the joint contribution of IST and Unbabel to the WMT 2020 Shared Task on Quality Estim... more We present the joint contribution of IST and Unbabel to the WMT 2020 Shared Task on Quality Estimation. Our team participated on all tracks (Direct Assessment, Post-Editing Effort, Document-Level), encompassing a total of 14 submissions. Our submitted systems were developed by extending the OpenKiwi framework to a transformer-based predictor-estimator architecture, and to cope with glass-box, uncertainty-based features coming from neural machine translation systems.

A Web Tool for Building Parallel Corpora of Spoken and Sign Languages

In this paper we describe our work in building an online tool for manually annotating texts in an... more In this paper we describe our work in building an online tool for manually annotating texts in any spoken language with SignWriting in any sign language. The existence of such tool will allow the creation of parallel corpora between spoken and sign languages that can be used to bootstrap the creation of efficient tools for the Deaf community. As an example, a parallel corpus between English and American Sign Language could be used for training Machine Learning models for automatic translation between the two languages. Clearly, this kind of tool must be designed in a way that it eases the task of human annotators, not only by being easy to use, but also by giving smart suggestions as the annotation progresses, in order to save time and effort. By building a collaborative, online, easy to use annotation tool for building parallel corpora between spoken and sign languages we aim at helping the development of proper resources for sign languages that can then be used in state-of-the-art...

Uma Análise Da Interdisciplinaridade De Pesquisadores Baseada Em Indicadores Qualis/Capes

A interdisciplinaridade e um conceito cada vez mais presente nas universidades e de grande import... more A interdisciplinaridade e um conceito cada vez mais presente nas universidades e de grande importância para o meio academico. Este trabalho propoe quantificar a interdisciplinaridade de pesquisadores com base nas informacoes de estrato Qualis/CAPES. Para tanto, foi utilizada a base Lattes para identificar as publicacoes de cada pesquisador, assim como o qualis e area de conhecimento de cada publicacao. Tambem, foi criado um coeficiente de interdisciplinaridade para uma avaliacao mais precisa desses dados. Como e um trabalho em andamento, o resultado parcial obtido e o coeficiente de interdisciplinaridade. A criacao da medida ira contribuir para identificar padroes de pesquisadores e sua variacao interdisciplinar.

Cosmic User Story Standard

Lecture Notes in Business Information Processing, 2018

Uso de Aprendizado de Máquina para a Classificaçãao de Documentos do Exército Brasileiro

Anais do IX Simpósio Brasileiro de Sistemas de Informação (SBSI 2013), 2013

A cada semestre o Exército Brasileiro gera relatórios sumarizados a respeito de cada militar e su... more A cada semestre o Exército Brasileiro gera relatórios sumarizados a respeito de cada militar e suas atividades. Para isso é necessário encontrar referências relevantes a cada militar dentro de um conjunto de documentos produzidos periodicamente no intervalo de seis meses. Este trabalho propõe formas de realizar essa classiﬁcação de maneira automática, utilizando o método Naive Bayes de aprendizado probabilístico. Para isso, também é necessário identiﬁcar quais sentenças em um documento são relativas a cada militar, de modo que apenas elas sejam usadas durante o treinamento do classiﬁcador. Assim, este trabalho propõe duas heurísticas de seleção de sentenças que escolhem trechos de texto que aparecem próximos ao nome de cada militar. Os experimentos mostram que é possível atingir 76, 7% de medida-f na recuperação de documentos relevantes, e que a seleção de sentenças e o tamanho da base de treinamento desempenham papéis importantes na tarefa.

Unbabel's Participation in the WMT17 Translation Quality Estimation Shared Task

Proceedings of the Second Conference on Machine Translation, 2017

Urnas Eletrônicas no Brasil: linha do tempo, evolução e falhas e desafios de segurança

Revista Brasileira de Computação Aplicada, 2019

Mesmo após anos de implantação e evolução do voto eletrônico, as urnas eletrônicas continu... more Mesmo após anos de implantação e evolução do voto eletrônico, as urnas eletrônicas continuam sendo alvo crescente de críticas, tanto por parte de especialistas em segurança da informação quanto pela sociedade. Os principais desafios no uso desse tipo de urnas são garantir a transparência, a auditabilidade e a confiabilidade do sistema de votação, ao mesmo tempo em que garante-se também a integridade, a confidencialidade e a privacidade dos votos. No sistema brasileiro, entretanto, os principais pontos criticados são exatamente a pouca transparência e a restrita auditabilidade das urnas, que nos poucos casos em que foram concedidos à sociedade civil fora de períodos eleitorais, levaram a descobertas de falhas de segurança. Não é surpresa, portanto, que isso, somado à atual impossibilidade de se auditar os resultados eleitorais, coloque em cheque a confiabilidade no sistema. Neste survey, nós apresentamos e analisamos a evolução dos sistemas de votação ele...

OpenKiwi: An Open Source Framework for Quality Estimation

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

Unbabel’s Participation in the WMT19 Translation Quality Estimation Shared Task

Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

Pushing the Limits of Translation Quality Estimation

Transactions of the Association for Computational Linguistics

Translation quality estimation is a task of growing importance in NLP, due to its potential to re... more Translation quality estimation is a task of growing importance in NLP, due to its potential to reduce post-editing human effort in disruptive ways. However, this potential is currently limited by the relatively low accuracy of existing systems. In this paper, we achieve remarkable improvements by exploiting synergies between the related tasks of word-level quality estimation and automatic post-editing. First, we stack a new, carefully engineered, neural model into a rich feature-based word-level quality estimation system. Then, we use the output of an automatic post-editing system as an extra feature, obtaining striking results on WMT16: a word-level FMULT1 score of 57.47% (an absolute gain of +7.95% over the current state of the art), and a Pearson correlation score of 65.56% for sentence-level HTER prediction (an absolute gain of +13.36%).

Unbabel's Participation in the WMT16 Word-Level Translation Quality Estimation Shared Task

Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, 2016

Sentiment Analysis for Brazilian Portuguese over a Skewed Class Corpora

Lecture Notes in Computer Science, 2016

Um etiquetador morfo-sintático baseado em cadeiasi de Markov de tamanho variável

RESUMO. Dado um texto, queremos atribuir a cada palavra em contexto uma categoria morfossintática... more RESUMO. Dado um texto, queremos atribuir a cada palavra em contexto uma categoria morfossintática. Para isto, implementamos um etiquetador morfossintático que usa cadeias de Markov de ordem fixa dois, e mostramos seu funcionameto e alguns resultados obtidos. Apresentamos os problemas e desafios mais comuns a serem resolvidos, e então propomos a implementação de um etiquetador morfossintático utilizando uma nova abordagem, baseada em cadeias de Markov de tamanho variável. Mostramos a teoria ...

Planejamento, Objetivos e Definição De Padrões Nadescrição Dos Metadados Da Coleção Da Bibliotecabrasiliana Digital

Seminario Nacional De Bibliotecas Universitarias Xvi Seminario Internacional De Bibliotecas Digiitais Brasil Ii, Oct 1, 2010

Um Novo Sistema de Gerenciamento de Aprendizagem

Anais Do Salao Internacional De Ensino Pesquisa E Extensao, Nov 1, 2013

Using PDCA as a General Framework for Teaching and Evaluating the Learning of Software Engineering Disciplines

Isys Revista Brasileira De Sistemas De Informacao, Nov 15, 2014

Classifying Ontologies

IST-Unbabel 2021 Submission for the Quality Estimation Shared Task

We present the joint contribution of IST and Unbabel to the WMT 2021 Shared Task on Quality Estim... more We present the joint contribution of IST and Unbabel to the WMT 2021 Shared Task on Quality Estimation. Our team participated on two tasks: Direct Assessment and Post-Editing Effort, encompassing a total of 35 submissions. For all submissions, our efforts focused on training multilingual models on top of OpenKiwi predictor-estimator architecture, using pre-trained multilingual encoders combined with adapters. We further experiment with and uncertainty-related objectives and features as well as training on out-of-domain direct assessment data.

Proceedings of the XIV Brazilian Symposium on Information Systems - SBSI'18, 2018

Software size estimation may be costly due to the time spent in estimation, even using referenced... more Software size estimation may be costly due to the time spent in estimation, even using referenced method, such as, Functional Size Measurement methods. This research objective is to reduce estimation cost in the very early stage of software development life cycle based on user stories and COSMIC method. We propose automated size estimation in order to reduce estimation cost and increase the accuracy. We conducted a quasi-experiment where the control group is composed of certified professionals and the experimental group is actually the proposed automated estimation tool. Results show promising evidence of success in terms of estimation precision. Based on these preliminary results we conclude that user story size automation is valuable and may be more objective and precise that manual estimation.

Fusion of Simple Models for Native Language Identification

Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, 2017

IST-Unbabel Participation in the WMT20 Quality Estimation Shared Task

We present the joint contribution of IST and Unbabel to the WMT 2020 Shared Task on Quality Estim... more We present the joint contribution of IST and Unbabel to the WMT 2020 Shared Task on Quality Estimation. Our team participated on all tracks (Direct Assessment, Post-Editing Effort, Document-Level), encompassing a total of 14 submissions. Our submitted systems were developed by extending the OpenKiwi framework to a transformer-based predictor-estimator architecture, and to cope with glass-box, uncertainty-based features coming from neural machine translation systems.

A Web Tool for Building Parallel Corpora of Spoken and Sign Languages

In this paper we describe our work in building an online tool for manually annotating texts in an... more In this paper we describe our work in building an online tool for manually annotating texts in any spoken language with SignWriting in any sign language. The existence of such tool will allow the creation of parallel corpora between spoken and sign languages that can be used to bootstrap the creation of efficient tools for the Deaf community. As an example, a parallel corpus between English and American Sign Language could be used for training Machine Learning models for automatic translation between the two languages. Clearly, this kind of tool must be designed in a way that it eases the task of human annotators, not only by being easy to use, but also by giving smart suggestions as the annotation progresses, in order to save time and effort. By building a collaborative, online, easy to use annotation tool for building parallel corpora between spoken and sign languages we aim at helping the development of proper resources for sign languages that can then be used in state-of-the-art...

Uma Análise Da Interdisciplinaridade De Pesquisadores Baseada Em Indicadores Qualis/Capes

A interdisciplinaridade e um conceito cada vez mais presente nas universidades e de grande import... more A interdisciplinaridade e um conceito cada vez mais presente nas universidades e de grande importância para o meio academico. Este trabalho propoe quantificar a interdisciplinaridade de pesquisadores com base nas informacoes de estrato Qualis/CAPES. Para tanto, foi utilizada a base Lattes para identificar as publicacoes de cada pesquisador, assim como o qualis e area de conhecimento de cada publicacao. Tambem, foi criado um coeficiente de interdisciplinaridade para uma avaliacao mais precisa desses dados. Como e um trabalho em andamento, o resultado parcial obtido e o coeficiente de interdisciplinaridade. A criacao da medida ira contribuir para identificar padroes de pesquisadores e sua variacao interdisciplinar.

Cosmic User Story Standard

Lecture Notes in Business Information Processing, 2018

Uso de Aprendizado de Máquina para a Classificaçãao de Documentos do Exército Brasileiro

Anais do IX Simpósio Brasileiro de Sistemas de Informação (SBSI 2013), 2013

A cada semestre o Exército Brasileiro gera relatórios sumarizados a respeito de cada militar e su... more A cada semestre o Exército Brasileiro gera relatórios sumarizados a respeito de cada militar e suas atividades. Para isso é necessário encontrar referências relevantes a cada militar dentro de um conjunto de documentos produzidos periodicamente no intervalo de seis meses. Este trabalho propõe formas de realizar essa classiﬁcação de maneira automática, utilizando o método Naive Bayes de aprendizado probabilístico. Para isso, também é necessário identiﬁcar quais sentenças em um documento são relativas a cada militar, de modo que apenas elas sejam usadas durante o treinamento do classiﬁcador. Assim, este trabalho propõe duas heurísticas de seleção de sentenças que escolhem trechos de texto que aparecem próximos ao nome de cada militar. Os experimentos mostram que é possível atingir 76, 7% de medida-f na recuperação de documentos relevantes, e que a seleção de sentenças e o tamanho da base de treinamento desempenham papéis importantes na tarefa.

Unbabel's Participation in the WMT17 Translation Quality Estimation Shared Task

Proceedings of the Second Conference on Machine Translation, 2017

Urnas Eletrônicas no Brasil: linha do tempo, evolução e falhas e desafios de segurança

Revista Brasileira de Computação Aplicada, 2019

Mesmo após anos de implantação e evolução do voto eletrônico, as urnas eletrônicas continu... more Mesmo após anos de implantação e evolução do voto eletrônico, as urnas eletrônicas continuam sendo alvo crescente de críticas, tanto por parte de especialistas em segurança da informação quanto pela sociedade. Os principais desafios no uso desse tipo de urnas são garantir a transparência, a auditabilidade e a confiabilidade do sistema de votação, ao mesmo tempo em que garante-se também a integridade, a confidencialidade e a privacidade dos votos. No sistema brasileiro, entretanto, os principais pontos criticados são exatamente a pouca transparência e a restrita auditabilidade das urnas, que nos poucos casos em que foram concedidos à sociedade civil fora de períodos eleitorais, levaram a descobertas de falhas de segurança. Não é surpresa, portanto, que isso, somado à atual impossibilidade de se auditar os resultados eleitorais, coloque em cheque a confiabilidade no sistema. Neste survey, nós apresentamos e analisamos a evolução dos sistemas de votação ele...

OpenKiwi: An Open Source Framework for Quality Estimation

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

Unbabel’s Participation in the WMT19 Translation Quality Estimation Shared Task

Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

Pushing the Limits of Translation Quality Estimation

Transactions of the Association for Computational Linguistics

Translation quality estimation is a task of growing importance in NLP, due to its potential to re... more Translation quality estimation is a task of growing importance in NLP, due to its potential to reduce post-editing human effort in disruptive ways. However, this potential is currently limited by the relatively low accuracy of existing systems. In this paper, we achieve remarkable improvements by exploiting synergies between the related tasks of word-level quality estimation and automatic post-editing. First, we stack a new, carefully engineered, neural model into a rich feature-based word-level quality estimation system. Then, we use the output of an automatic post-editing system as an extra feature, obtaining striking results on WMT16: a word-level FMULT1 score of 57.47% (an absolute gain of +7.95% over the current state of the art), and a Pearson correlation score of 65.56% for sentence-level HTER prediction (an absolute gain of +13.36%).

Unbabel's Participation in the WMT16 Word-Level Translation Quality Estimation Shared Task

Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, 2016

Sentiment Analysis for Brazilian Portuguese over a Skewed Class Corpora

Lecture Notes in Computer Science, 2016

Um etiquetador morfo-sintático baseado em cadeiasi de Markov de tamanho variável

RESUMO. Dado um texto, queremos atribuir a cada palavra em contexto uma categoria morfossintática... more RESUMO. Dado um texto, queremos atribuir a cada palavra em contexto uma categoria morfossintática. Para isto, implementamos um etiquetador morfossintático que usa cadeias de Markov de ordem fixa dois, e mostramos seu funcionameto e alguns resultados obtidos. Apresentamos os problemas e desafios mais comuns a serem resolvidos, e então propomos a implementação de um etiquetador morfossintático utilizando uma nova abordagem, baseada em cadeias de Markov de tamanho variável. Mostramos a teoria ...

Planejamento, Objetivos e Definição De Padrões Nadescrição Dos Metadados Da Coleção Da Bibliotecabrasiliana Digital

Seminario Nacional De Bibliotecas Universitarias Xvi Seminario Internacional De Bibliotecas Digiitais Brasil Ii, Oct 1, 2010

Um Novo Sistema de Gerenciamento de Aprendizagem

Anais Do Salao Internacional De Ensino Pesquisa E Extensao, Nov 1, 2013

Using PDCA as a General Framework for Teaching and Evaluating the Learning of Software Engineering Disciplines

Isys Revista Brasileira De Sistemas De Informacao, Nov 15, 2014

Classifying Ontologies