Skip to main content
C. Cobos

    C. Cobos

    • PhD. in Computer Science from Universidad Nacional de Colombia (Bogotá, Colombia) in 2014. Master in Computer Science... moreedit
    El presente trabajo de investigación propone un modelo macro que permite detectar diferentes probables delitos o anomalías relacionados con corrupción en los procesos de contratación pública en Colombia. Para ello, el modelo propuesto... more
    El presente trabajo de investigación propone un modelo macro que permite detectar diferentes probables delitos o anomalías relacionados con corrupción en los procesos de contratación pública en Colombia. Para ello, el modelo propuesto consta de cinco componentes principales: 1) servicios especializados que buscan identificar situaciones específicas de probable corrupción (se propusieron tres servicios: detección de similitud entre documentos, detección de manipulación de ofertas y detección de carteles); 2) servicios transversales que sustentan la transformación del modelo en una herramienta de software; 3) servicios adicionales que abordan situaciones generales de probable corrupción, en específico el servicio de alerta ciudadana; 4) relaciones explícitas entre servicios; y 5) salida global del modelo. En la experimentación práctica, dos de los servicios planteados en esta investigación fueron puestos a prueba en diversos escenarios. Con los resultados arrojados por algunas de las ...
    En este artículo se presenta la CMIN, una herramienta CASE (Computer Aided Software Engineering) integrada (que soporta todas las fases de un proceso) basada en CRISP-DM 1.0 (Cross – Industry Standard Process for Data Mining) para... more
    En este artículo se presenta la CMIN, una herramienta CASE (Computer Aided Software Engineering) integrada (que soporta todas las fases de un proceso) basada en CRISP-DM 1.0 (Cross – Industry Standard Process for Data Mining) para soportar el desarrollo de proyectos de minería de datos.Primero se expone la funcionalidad general de CMIN, lo que incluye la gestión de procesos, plantillas y proyectos, y se destaca la capacidad de CMIN para realizar el seguimiento de los proyectos de una forma fácil e intuitiva y la manera como CMIN posibilita que el usuario incremente su conocimiento en el uso de CRISP-DM o de cualquier otro proceso que se defina en la herramienta a través de las ayudas e información que se ofrece en cada paso del proceso. Después, se detalla cómo CMIN permite enlazar en tiempo de ejecución (sin necesidad de volver a compilar la herramienta) nuevos algoritmos de minería de datos que apoyen la labor de modelado (basada en un flujo de trabajo o workflow) en un proyecto d...
    Research Interests:
    Research Interests:
    In this paper, we present a web service model to support snapshots replication in multiple database engines. This model is open and is composed by a set of web services, agents and a manage console that allows to perform administrative... more
    In this paper, we present a web service model to support snapshots replication in multiple database engines. This model is open and is composed by a set of web services, agents and a manage console that allows to perform administrative tasks independently from the application and the database engines. This model uses a replication master catalog (CMR) to store the replication data. We show the architecture used and the behavior of our replication model to provide a better understanding about the data replication, as well as, discussion of several papers related to the replication services independent from the database engines.
    En las últimas décadas, el número de homicidios violentos en América Latina ha crecido considerablemente debido a la ampliación y auge de grupos criminales organizados en zonas rurales y urbanas de las principales ciudades de países como... more
    En las últimas décadas, el número de homicidios violentos en América Latina ha crecido considerablemente debido a la ampliación y auge de grupos criminales organizados en zonas rurales y urbanas de las principales ciudades de países como México, Colombia y Venezuela. Con base en el alto índice de homicidio de estos países, consecuencia de la alta criminalidad, éstos han sido clasificados dentro de los más violentos a nivel mundial. Según datos reportados por el Observatorio del Delito de la Policía Nacional y la Fiscalía General de la Nación de Colombia, en 2019 se presentaron 1.032 asesinatos en Bogotá. Estos datos arrojan una tasa de 14,3 homicidios por cada 100.000 habitantes. A partir de esto, se estima que entre 1960 y 2019 se han generado alrededor de 226.215 homicidios, unas 3,834 muertes por año, en promedio. En este trabajo se presenta un modelo de machine learning basado en random forest, el cual permite predecir las tendencias de homicidio violento (HV) en Colombia para l...
    Las empresas desarrolladoras de software muy pequeñas o Very Small Entities (VSE), con menos de 25 empleados, se ven obligadas a garantizar la calidad del proceso para disminuir el reproceso e incrementar sus ganancias. Esto implica... more
    Las empresas desarrolladoras de software muy pequeñas o Very Small Entities (VSE), con menos de 25 empleados, se ven obligadas a garantizar la calidad del proceso para disminuir el reproceso e incrementar sus ganancias. Esto implica cumplir con buenas prácticas recomendadas en diversos estándares internacionales que ofrecen pautas para mejorar los procesos y productos de software. Sin embargo, los requisitos de estos estándares son difíciles de cumplir para las VSE por su capacidad y los costos de implementación. Este documento describe el proceso de implementaciones de la norma ISO 29110 en el contexto específico de la región del Valle del Cauca en Colombia para 4 empresas, exponiendo las buenas prácticas, herramientas y técnicas utilizadas, de lo cual los resultados son prometedores.
    ABSTRACT
    Research Interests:
    Despite the continuous development modern Web browsers have had, they have not fulfilled user needs, and the retrieved documents relevance is one of the main issues affecting the search quality. The proposed web search meta model engine... more
    Despite the continuous development modern Web browsers have had, they have not fulfilled user needs, and the retrieved documents relevance is one of the main issues affecting the search quality. The proposed web search meta model engine integrates Web search collaborative filtering (based on items) to Massimo Melucci’s proposal that is based on projectors on plans that came in the user context information. The obtained model was implemented in a meta search site that retrieves documents from traditional search engines like Google and Bing. It presents the results to the user through a list of documents sorted by relevance based on information from the user’s context and the collaborative community feedback. The proposed model constitutes a contribution to the field of information retrieval, since it shows promising results in both closed collections and open collections tests.
    Some Latin American countries are developing programs that allow their educational communities to have greater access to in Information and Communication Technology (ICT). In order to verify the effectiveness of these programs is... more
    Some Latin American countries are developing programs that allow their educational communities to have greater access to in Information and Communication Technology (ICT). In order to verify the effectiveness of these programs is necessary to measure the ICT skills acquired by the members of these communities. This paper presents a dimensional model for a data warehouse that enables the analysis of information related to the appropriation of ICT skills by teachers and students of educational institutions. The design cases are based on Ralph Kimball’s proposals: many to many relationship, heterogeneous products, subdimension and organizational hierarchy. There is also a new design case called “Dimension with measures”, which together with a fact table and calculation of measures based on MDX functions allows weighted analysis of the competences of the actors in the educational institutions. This new design case can be used in other contexts that require recursive calculations of weig...
    Research Interests:
    During requirements elicitation some problems in analyst-stakeholders communication may appear; such misunderstanding may cause that final products do not accomplish customer expectations. This proposal aims for improving understanding... more
    During requirements elicitation some problems in analyst-stakeholders communication may appear; such misunderstanding may cause that final products do not accomplish customer expectations. This proposal aims for improving understanding and comprehension between stakeholder and software development team during requirements phase of XP (eXtreme Programming) methodology. To do so, our approach replaces user stories by business processes models (BP). For analyzing the effectiveness of the present approach, user stories and BP models were used in eleven projects during requirements phase and quantity and quality of data collected was compared. Experiments evidence that the use of BPMN for lifting requirements vs. using user stories from XP methodology helps to improve the quality and quantity of information collected and causes that the users could specify more clearly their needs and business goals to analysts .
    This paper presents the data model of two multidimensional data marts that are part of a Decision Support System in the genomics’ area, which is based on data warehousing and OLAP technologies. The first data mart is related to the... more
    This paper presents the data model of two multidimensional data marts that are part of a Decision Support System in the genomics’ area, which is based on data warehousing and OLAP technologies. The first data mart is related to the "Units information analysis”, which can store and retrieve information about units information (Exon or Intron) in a gene structure, the order, and the start and end position of information units. The second data mart called "fractal analysis" allows you to store and retrieve information about genes, for example, the number of information units and length of the gene, and measurements obtained by previews fractal analysis research. Finally, this paper presents the problems during the extraction, transformation and loading data process and data modeling, together with the proposed solutions to them, and some interfaces of the developed tool.
    Research Interests:
    ABSTRACT Composite convergent services integrate a set of functionalities from Web and Telecommunication domains. Due to the big amount of available functionalities, automation of composition process is required in many fields. However,... more
    ABSTRACT Composite convergent services integrate a set of functionalities from Web and Telecommunication domains. Due to the big amount of available functionalities, automation of composition process is required in many fields. However, automated composition is not feasible in practice if reconfiguration mechanisms are not considered. This paper presents a novel approach for dynamic reconfiguration of convergent services that replaces malfunctioning regions of composite convergent services considering user preferences. In order to replace the regions of services, a multimodal search is performed. Our contributions are: a model for representing composite convergent services and a region-based algorithm for reconfiguring services supported by multimodal search.
    Internet se ha convertido en el mayor repositorio de conocimiento humano y la cantidad de información almacenada crece cada día más. Esto último repercute en el bajo nivel de precisión que reportan los sistemas de búsqueda Web respecto a... more
    Internet se ha convertido en el mayor repositorio de conocimiento humano y la cantidad de información almacenada crece cada día más. Esto último repercute en el bajo nivel de precisión que reportan los sistemas de búsqueda Web respecto a los documentos que son recuperados para el usuario. Para enfrentar este problema, una de las estrategias utilizadas es la recuperación personalizada de recursos. Actualmente existen varios proyectos que proponen métodos semánticos para aumentar la relevancia de las búsquedas, a través del uso de ontologías, procesamiento de lenguaje natural, sistemas basados en conocimiento, lenguajes de especificación de consultas y perfil de usuario, entre otras. Los resultados generalmente son mejores que los obtenidos por buscadores que no usan éstas técnicas. Sin embargo, el costo que se paga por estas mejoras en precisión se centra en el uso de algoritmos más complejos en implementación y que consumen más recursos computacionales. Este artículo describe un mod...
    Internet has become the largest repository of human knowledge, and the amount of stored information increases day by day. This increase of information affects the levels of precision reported by Web search engines regarding documents... more
    Internet has become the largest repository of human knowledge, and the amount of stored information increases day by day. This increase of information affects the levels of precision reported by Web search engines regarding documents retrieved for the user. One strategy being used to address this problem is a focus on a personalized resource recovery. Several projects currently offer semantic methods for improving the relevance of search results through the use of ontologies, natural language processing, knowledge based systems, query specification languages, and user profile, among others. Results are generally better than for web search engines that do not use these techniques. However, the high cost of these improvements in precision relate to use of more complex algorithms in carrying out the search and which are more wasteful of computational resources. This article describes a semantic query expansion model called MSEC, which is based mostly on the concept of semantic similari...
    ABSTRACT Data warehouses and On-Line Analytical Processing tools, OLAP, together permit a multi-dimensional analysis of structured data information. However, as business systems are increasingly required to handle substantial quantities... more
    ABSTRACT Data warehouses and On-Line Analytical Processing tools, OLAP, together permit a multi-dimensional analysis of structured data information. However, as business systems are increasingly required to handle substantial quantities of unstructured textual information, the need arises for an effective and similar means of analysis. To manage unstructured text data stored in data warehouses, a new multi-dimensional analysis model is proposed that includes textual measures as well as a topic hierarchy. In this model, the textual measures that associate the topics with the text documents are generated by Probabilistic Latent Semantic Analysis, while the hierarchy is created automatically using a clustering algorithm. Documents are then able to be queried using OLAP tools. The model was evaluated from two viewpoints - query execution time and user satisfaction. Evaluation of execution time was carried out on scientific articles using two query types and user satisfaction (with query time and ease of use) using statistical frequency and multivariate analyses. Encouraging observations included that as the number of documents increases, query time increases as a lineal, rather than exponential tendency. In addition, the model gained an increasing acceptance with use, while the visualization of the model was also well received by users.
    ABSTRACT The clustering of web search results – or web document clustering – has become a very interesting research area among academic and scientific communities involved in information retrieval. Web search result clustering systems,... more
    ABSTRACT The clustering of web search results – or web document clustering – has become a very interesting research area among academic and scientific communities involved in information retrieval. Web search result clustering systems, also called Web Clustering Engines, seek to increase the coverage of documents presented for the user to review, while reducing the time spent reviewing them. Several algorithms for clustering web results already exist, but results show room for more to be done. This paper introduces a new description-centric algorithm for the clustering of web results, called WDC-CSK, which is based on the cuckoo search meta-heuristic algorithm, k-means algorithm, Balanced Bayesian Information Criterion, split and merge methods on clusters, and frequent phrases approach for cluster labeling. The cuckoo search meta-heuristic provides a combined global and local search strategy in the solution space. Split and merge methods replace the original Lévy flights operation and try to improve existing solutions (nests), so they can be considered as local search methods. WDC-CSK includes an abandon operation that provides diversity and prevents the population nests from converging too quickly. Balanced Bayesian Information Criterion is used as a fitness function and allows defining the number of clusters automatically. WDC-CSK was tested with four data sets (DMOZ-50, AMBIENT, MORESQUE and ODP-239) over 447 queries. The algorithm was also compared against other established web document clustering algorithms, including Suffix Tree Clustering (STC), Lingo, and Bisecting k-means. The results show a considerable improvement upon the other algorithms as measured by recall, F-measure, fall-out, accuracy and SSLk.
    This paper introduces a new description-centric algorithm for web document clustering based on Memetic Algorithms with Niching Methods, Term-Document Matrix and Bayesian Information Criterion. The algorithm defines the number of clusters... more
    This paper introduces a new description-centric algorithm for web document clustering based on Memetic Algorithms with Niching Methods, Term-Document Matrix and Bayesian Information Criterion. The algorithm defines the number of clusters automatically. The Memetic Algorithm provides a combined global and local strategy for a search in the solution space and the Niching methods to promote diversity in the population and prevent the population from converging too quickly (based on restricted competition replacement and restrictive mating). The Memetic Algorithm uses the K-means algorithm to find the optimum value in a local search space. Bayesian Information Criterion is used as a fitness function, while FP-Growth is used to reduce the high dimensionality in the vocabulary. This resulting algorithm, called WDC-NMA, was tested with data sets based on Reuters-21578 and DMOZ, obtaining promising results (better precision results than a Singular Value Decomposition algorithm). Also, it was also then initially evaluated by a group of users.

    And 24 more