Eloi Araujo

Ancestral reconstruction is a classic task in comparative genomics. Here, we study the genome median problem, a related computational problem which, given a set of three or more genomes, asks to find a new genome that minimizes the sum of... more

Ancestral reconstruction is a classic task in comparative genomics. Here, we study the genome median problem, a related computational problem which, given a set of three or more genomes, asks to find a new genome that minimizes the sum of pairwise distances between it and the given genomes. The distance stands for the amount of evolution observed at the genome level, for which we determine the minimum number of rearrangement operations necessary to transform one genome into the other. For almost all rearrangement operations the median problem is NP-hard, with the exception of the breakpoint median that can be constructed efficiently for multichromosomal circular and mixed genomes. In this work, we study the median problem under a restricted rearrangement measure called c4-distance, which is closely related to the breakpoint and the DCJ distance. We identify tight bounds and decomposers of the c4-median and develop algorithms for its construction, one exact ILP-based and three combin...

Publisher: EDP Sciences

Publication Name: RAIRO - Operations Research

Research Interests:
Information Systems, Mathematics, Applied Mathematics, Heuristics, Combinatorics, and 2 moreGenome and Breakpoint

Download (.pdf)

Publisher: Elsevier BV

Publication Name: Discrete Applied Mathematics

Research Interests:
Mathematics, Applied Mathematics, Edit Distance, Discrete Applied Mathematics, and Similarity Geometry

Download (.pdf)

In computational biology, mapping a sequencesonto a sequence graphGis a significant challenge. One possible approach to addressing this problem is to identify a walkpinGthat spells a sequence which is most similar tos. This problem is... more

In computational biology, mapping a sequencesonto a sequence graphGis a significant challenge. One possible approach to addressing this problem is to identify a walkpinGthat spells a sequence which is most similar tos. This problem is known as the Graph Sequence Mapping Problem (GSMP). In this paper, we study an alternative problem formulation, namely the De Bruijn Graph Sequence Mapping Problem (BSMP), which can be stated as follows: given a sequencesand a De Bruijn graphGk(wherek≥ 2), find a walkpinGkthat spells a sequence which is most similar tosaccording to a distance metric. We present both exact algorithms and approximate distance heuristics for solving this problem, using edit distance as a criterion for measuring similarity.

Publisher: Cold Spring Harbor Laboratory

Research Interests:
Heuristics, Combinatorics, Graph, and De Bruijn Graph

Download (.pdf)

Publisher: IEEE

Publication Name: 2022 IEEE 22nd International Conference on Bioinformatics and Bioengineering (BIBE)

Research Interests:
Computer Science, Boolean network, Biological Network, and attractor

In the median problem we are given a set of three or more genomes and want to find a new genome minimizing the sum of pairwise distances between it and the given genomes. For almost all rearrangement operations the median problem is... more

In the median problem we are given a set of three or more genomes and want to find a new genome minimizing the sum of pairwise distances between it and the given genomes. For almost all rearrangement operations the median problem is NP-hard. We study the median problem under a restricted rearrangement measure called c4-distance, which is closely related to breakpoint and DCJ distances. We propose two algorithms for its construction, one exact ILP-based and a combinatorial heuristic, and perform experiments on simulated data.

Publisher: Sociedade Brasileira de Computação - SBC

Publication Name: Anais do VII Encontro de Teoria da Computação (ETC 2022)

Research Interests:
Computer Science, Combinatorics, Genome, Heuristic, and Breakpoint

Download (.pdf)

Boolean networks are discrete-time dynamic systems that have been used as a model for a wide range of applications in different areas, especially in Systems Biology. The analysis of Boolean networks includes the search for attractors,... more

Boolean networks are discrete-time dynamic systems that have been used as a model for a wide range of applications in different areas, especially in Systems Biology. The analysis of Boolean networks includes the search for attractors, which may represent important biological conditions such as gene expression patterns in models of gene regulatory networks, among others. Attractors can be found through exploring the network paths by achieving the solution to the SAT problem, which is known to be NP-complete. In this paper, we propose an approach to find all attractors by first transforming the corresponding instance of the SAT problem to a Hitting Set instance in linear time through a new direct linear reduction. Finally, the instance of the Hitting Set problem is solved by applying a fast parallel algorithm implemented in GPU. As a proof of principle, we tested the method for Boolean networks with 3 and 4 variables, returning the result in about 3 seconds and 9 hours respectively. However, for larger networks the execution time grows substantially due to the algorithm used in the Hitting Set problem solver. But the result achieved for networks with 3 and 4 variables encourages improvements in the method for dealing with large-scale Boolean networks, specially by incorporating some parameter restrictions based on prior information about the state diagram transition graphs structure and optimizing the method by means of dynamic programming and parallelism.

Publisher: IEEE

Publication Date: 2019

Publication Name: 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE)

Research Interests:
Computer Science, Solver, Boolean network, Biological Network, and attractor

An important problem in Computational Biology is to determine genetic markers, substrings of a set of sequences that do not occur on sequences of other sets. Applications for this problem include finding small specific regions for primer... more

An important problem in Computational Biology is to determine genetic markers, substrings of a set of sequences that do not occur on sequences of other sets. Applications for this problem include finding small specific regions for primer design and to find specific organisms or sequences in metagenomes. Genetic markers can be addressed by the Specific Substring Problem - SSP which consists of finding all minimal substrings in a given set of sequences with at least k differences among all the substrings in another sequence set. Since this problem spend quadratic time when Hamming distance is considered and we have, in general, a large volume of data to be processed, this solution becomes impractical. With this in mind, the main focus of this work is to propose and investigate the use of heuristic and parallel approaches for the SSP whose effectiveness were verified with artificial and real data experiments.

Publisher: 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE)

Publication Date: 2019

Publication Name: 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE)

Research Interests:
Computer Science, Heuristics, Substring, and Hamming Distance

One of the most important concepts in biological network analysis is that of network motifs, which are patterns of interconnections that occur in a given network at a frequency higher than expected in a random network. In this work we are... more

One of the most important concepts in biological network analysis is that of network motifs, which are patterns of interconnections that occur in a given network at a frequency higher than expected in a random network. In this work we are interested in searching and inferring network motifs in a class of biological networks that can be represented by vertex-colored graphs. We show the computational complexity for many problems related to colorful topological motifs and present efficient algorithms for special cases. We also present a probabilistic strategy to detect highly frequent motifs in vertex-colored graphs. Experiments on real data sets show that our algorithms are very competitive both in efficiency and in quality of the solutions.

Publisher: ArXiv

Publication Date: 2020

Publication Name: ArXiv

Research Interests:
Mathematics, Computer Science, Probabilistic Logic, F, G, and arXiv

Download (.pdf)

Sequence alignment supports numerous tasks in bioinformatics, natural language processing, pattern recognition, social sciences, and others fields. While the alignment of two sequences may be performed swiftly in many applications, the... more

Sequence alignment supports numerous tasks in bioinformatics, natural language processing, pattern recognition, social sciences, and others fields. While the alignment of two sequences may be performed swiftly in many applications, the simultaneous alignment of multiple sequences proved to be naturally more intricate. Although most multiple sequence alignment (MSA) formulations are NP-hard, several approaches have been developed, as they can outperform pairwise alignment methods or are necessary for some applications. Taking into account not only similarities but also the lengths of the compared sequences (i.e. normalization) can provide better alignment results than both unnormalized or post-normalized approaches. While some normalized methods have been developed for pairwise sequence alignment, none have been proposed for MSA. This work is a first effort towards the development of normalized methods for MSA. We discuss multiple aspects of normalized multiple sequence alignment (NM...

Publisher: ArXiv

Publication Date: 2021

Publication Name: ArXiv

Research Interests:
Computer Science and arXiv

Download (.pdf)

Publisher: IEEE

Publication Name: 2017 IEEE 17th International Conference on Bioinformatics and Bioengineering (BIBE)

Publisher: Springer International Publishing

Publication Name: Computational Science and Its Applications – ICCSA 2021

Publisher: Springer Science and Business Media LLC

Publication Name: Journal of Combinatorial Optimization

Research Interests:
Applied Mathematics, Combinatorial Optimization, Pure Mathematics, and Numerical Analysis and Computational Mathematics

Given two sets of sequences A and B, the Substring Specific problem is to find all minimum substrings in A having distance at least k for each subsequence in B. This work addresses three new implementations for the Maaß algorithm when the... more

Given two sets of sequences A and B, the Substring Specific problem is to find all minimum substrings in A having distance at least k for each subsequence in B. This work addresses three new implementations for the Maaß algorithm when the Hamming distance is considered: a naive cubic-time algorithm and two quadratic-time algorithms. We run tests to compare the running time of these implementations and another recently described algorithm implementation that uses the edit distance. In addition, we conducted preliminary testing on a large Tara Ocean database, looking for efficient and effective strategies for finding unique sequences in a set of sequences comparing with the other

Publisher: Sociedade Brasileira de Computação - SBC

Publication Name: Anais Estendidos do Simpósio Brasileiro de Bioinformática (BSB)

Download (.pdf)

Publisher: Springer Nature

Publication Name: BMC Bioinformatics

Research Interests:
Algorithms, Molecular Evolution, Biological Sciences, Mathematical Sciences, Gene Order, and 5 moreBMC Bioinformatics, Molecular Phylogenetics and Evolution, Genome, Computer User Interface Design, and Internet

Download (.pdf)

Publication Date: 2016

Publication Name: Lecture Notes in Computer Science

Research Interests:
Computational Biology, Biological Sciences, and Mathematical Sciences

Download (.pdf)

Scoring matrices are widely used in sequence comparisons. A scoring matrix γ is indexed by symbols of an alphabet. The entry in γ in row a and column b measures the cost of the edit operation of replacing symbol a by symbol b. For a given... more

Scoring matrices are widely used in sequence comparisons. A scoring matrix γ is indexed by symbols of an alphabet. The entry in γ in row a and column b measures the cost of the edit operation of replacing symbol a by symbol b. For a given scoring matrix and sequences s and t, we consider two kinds of induced scoring functions. The first function, known as weighted edit distance, is defined as the sum of costs of the edit operations required to transform s into t. The second, known as normalized edit distance, is defined as the minimum quotient between the sum of costs of edit operations to transform s into t and the number of the corresponding edit operations. In this work we characterize the class of scoring matrices for which the induced weighted edit distance is actually a metric. We do the same for the normalized edit distance.

Publication Date: 2006

Publication Name: Proceedings of the 7th Latin American Conference on Theoretical Informatics

Research Interests:
Distance, String Matching, Indexation, Edit Distance, Metric, and Score Function

Publication Date: 2015

Publication Name: 2015 IEEE 15th International Conference on Bioinformatics and Bioengineering (BIBE)

Publisher: Universidade de Sao Paulo Sistema Integrado de Bibliotecas - SIBiUSP

Publication Date: 2000

Download (.pdf)

Dissertação (Mestrado)--Instituto de Matemática e Estatística da Universidade de São Paulo, 14/08/98.

ABSTRACT

Publication Date: 2015

Publication Name: Lecture Notes in Computer Science

ABSTRACT

Publication Date: 2013

Publication Name: 13th IEEE International Conference on BioInformatics and BioEngineering

Research Interests:
Molecular Biophysics, Hardness, Data Structures, and Polynomials

The mechanism of alternative splicing in the transcriptome may increase the proteome diversity in eukaryotes. In proteomics, several studies aim to use protein sequence repositories to annotate MS experiments or to detect differentially... more

The mechanism of alternative splicing in the transcriptome may increase the proteome diversity in eukaryotes. In proteomics, several studies aim to use protein sequence repositories to annotate MS experiments or to detect differentially expressed proteins. However, the available protein sequence repositories are not designed to fully detect protein isoforms derived from mRNA splice variants. To foster knowledge for the field, here we introduce SpliceProt, a new protein sequence repository of transcriptome experimental data used to investigate for putative splice variants in human proteomes. Current version of SpliceProt contains 159 719 non-redundant putative polypeptide sequences. The assessment of the potential of SpliceProt in detecting new protein isoforms resulting from alternative splicing was performed by using publicly available proteomics data. We detected 173 peptides hypothetically derived from splice variants, which 54 of them are not present in UniprotKB/TrEMBL sequence...

Publication Date: 2014

Publication Name: Proteomics

Research Interests:
Bioinformatics, Proteomics, Biological Sciences, Humans, Computer Simulation, and 7 moreAnimals, Alternative splicing, Peptides, Alternative Splicing, Protein isoforms, Amino Acid Sequence, and Medical and Health Sciences

Scoring matrices are widely used in sequence comparisons. A scoring matrix γ is indexed by symbols of an alphabet. The entry in γ in row a and column b measures the cost of the edit operation of replacing symbol a by symbol b. For a given... more

Scoring matrices are widely used in sequence comparisons. A scoring matrix γ is indexed by symbols of an alphabet. The entry in γ in row a and column b measures the cost of the edit operation of replacing symbol a by symbol b. For a given scoring matrix and sequences s and t, we consider two kinds of induced scoring functions. The first function, known as weighted edit distance, is defined as the sum of costs of the edit operations required to transform s into t. The second, known as normalized edit distance, is defined as the minimum quotient between the sum of costs of edit operations to transform s into t and the number of the corresponding edit operations. In this work we characterize the class of scoring matrices for which the induced weighted edit distance is actually a metric. We do the same for the normalized edit distance.

Publisher: EDP Sciences

Publication Date: May 1, 2023

Publication Name: Rairo-operations Research

Publisher: Cold Spring Harbor Laboratory

Publication Date: Feb 7, 2023

Publication Name: bioRxiv (Cold Spring Harbor Laboratory)

Research Interests: Heuristics, Combinatorics, Graph, and De Bruijn Graph<div>()</div>

Publisher: Elsevier BV

Publication Date: Jun 1, 2023

Publication Name: Discrete Applied Mathematics

Research Interests: Mathematics, Applied Mathematics, Edit Distance, Discrete Applied Mathematics, and Similarity Geometry<div>()</div>

Publication Date: Jul 31, 2022

Research Interests: Computer Science, Combinatorics, Genome, Heuristic, and Breakpoint<div>()</div>

Publisher: Mary Ann Liebert, Inc.

Publication Date: Jun 1, 2017

Publication Name: Journal of Computational Biology

Publisher: Springer Science+Business Media

Publication Date: 2023

Publication Name: Lecture Notes in Computer Science

Research Interests: Computer Science, Heuristics, Graph, and De Bruijn Graph<div>()</div>

Publisher: Springer Science+Business Media

Publication Date: 2023

Publication Name: Lecture Notes in Computer Science

Research Interests: Computer Science<div>()</div>

Publisher: Inderscience Publishers

Publication Date: 2016

Publication Name: International Journal of Innovative Computing and Applications

Research Interests: Computer Science, Heuristic, Greedy Algorithm, and Similarity Geometry<div>()</div>

Publisher: IEEE

Publication Name: 2022 International Conference on Computational Science and Computational Intelligence (CSCI)

Publisher: EDP Sciences

Publication Name: RAIRO - Operations Research

Publisher: Elsevier BV

Publication Name: Discrete Applied Mathematics

Research Interests: Mathematics, Applied Mathematics, Edit Distance, Discrete Applied Mathematics, and Similarity Geometry<div>()</div>

Publisher: Cold Spring Harbor Laboratory

Research Interests: Heuristics, Combinatorics, Graph, and De Bruijn Graph<div>()</div>

Publisher: IEEE

Publication Name: 2022 IEEE 22nd International Conference on Bioinformatics and Bioengineering (BIBE)

Research Interests: Computer Science, Boolean network, Biological Network, and attractor<div>()</div>

Publisher: Sociedade Brasileira de Computação - SBC

Publication Name: Anais do VII Encontro de Teoria da Computação (ETC 2022)

Research Interests: Computer Science, Combinatorics, Genome, Heuristic, and Breakpoint<div>()</div>

Publisher: IEEE

Publication Date: 2019

Publication Name: 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE)

Research Interests: Computer Science, Solver, Boolean network, Biological Network, and attractor<div>()</div>

Publisher: 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE)

Publication Date: 2019

Publication Name: 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE)

Research Interests: Computer Science, Heuristics, Substring, and Hamming Distance<div>()</div>

Publisher: ArXiv

Publication Date: 2020

Publication Name: ArXiv

Research Interests: Mathematics, Computer Science, Probabilistic Logic, F, G, and arXiv<div>()</div>

Publisher: ArXiv

Publication Date: 2021

Publication Name: ArXiv

Research Interests: Computer Science and arXiv<div>()</div>

Publisher: IEEE

Publication Name: 2017 IEEE 17th International Conference on Bioinformatics and Bioengineering (BIBE)

Publisher: Springer International Publishing

Publication Name: Computational Science and Its Applications – ICCSA 2021

Publisher: Springer Science and Business Media LLC

Publication Name: Journal of Combinatorial Optimization

Research Interests: Applied Mathematics, Combinatorial Optimization, Pure Mathematics, and Numerical Analysis and Computational Mathematics<div>()</div>

Publisher: Sociedade Brasileira de Computação - SBC

Publication Name: Anais Estendidos do Simpósio Brasileiro de Bioinformática (BSB)

Publisher: Springer Nature

Publication Name: BMC Bioinformatics

Publication Date: 2016

Publication Name: Lecture Notes in Computer Science

Research Interests: Computational Biology, Biological Sciences, and Mathematical Sciences<div>()</div>

Publication Date: 2006

Publication Name: Proceedings of the 7th Latin American Conference on Theoretical Informatics

Research Interests: Distance, String Matching, Indexation, Edit Distance, Metric, and Score Function<div>()</div>

Publication Date: 2015

Publication Name: 2015 IEEE 15th International Conference on Bioinformatics and Bioengineering (BIBE)

Publisher: Universidade de Sao Paulo Sistema Integrado de Bibliotecas - SIBiUSP

Research Interests:
Heuristics, Combinatorics, Graph, and De Bruijn Graph

Research Interests:
Mathematics, Applied Mathematics, Edit Distance, Discrete Applied Mathematics, and Similarity Geometry

Research Interests:
Computer Science, Combinatorics, Genome, Heuristic, and Breakpoint

Research Interests:
Computer Science, Heuristics, Graph, and De Bruijn Graph

Research Interests:
Computer Science

Research Interests:
Computer Science, Heuristic, Greedy Algorithm, and Similarity Geometry

Research Interests:
Mathematics, Applied Mathematics, Edit Distance, Discrete Applied Mathematics, and Similarity Geometry

Research Interests:
Heuristics, Combinatorics, Graph, and De Bruijn Graph

Research Interests:
Computer Science, Boolean network, Biological Network, and attractor

Research Interests:
Computer Science, Combinatorics, Genome, Heuristic, and Breakpoint

Research Interests:
Computer Science, Solver, Boolean network, Biological Network, and attractor

Research Interests:
Computer Science, Heuristics, Substring, and Hamming Distance

Research Interests:
Mathematics, Computer Science, Probabilistic Logic, F, G, and arXiv

Research Interests:
Computer Science and arXiv

Research Interests:
Applied Mathematics, Combinatorial Optimization, Pure Mathematics, and Numerical Analysis and Computational Mathematics

Research Interests:
Computational Biology, Biological Sciences, and Mathematical Sciences

Research Interests:
Distance, String Matching, Indexation, Edit Distance, Metric, and Score Function

Research Interests:
Molecular Biophysics, Hardness, Data Structures, and Polynomials

Research Interests:
Distance, String Matching, Indexation, Edit Distance, Metric, and Score Function