Skip to main content
Vijay Raghavan
  • CACS, P. O. Box 43694
    Univ. of Louisiana, Lafayette, LA 70504
    USA
  • 337-482-6603

Vijay Raghavan

Development of protein 3-D structural comparison methods is important in understanding protein functions. At the same time, developing such a method is very challenging. In the last 40 years, ever since the development of the first... more
Development of protein 3-D structural comparison methods is important in understanding protein functions. At the same time, developing such a method is very challenging. In the last 40 years, ever since the development of the first automated structural method, ~200 papers were published using different representations of structures. The existing methods can be divided into five categories: sequence-, distance-, secondary structure-, geometry-based, and network-based structural comparisons. Each has its uniqueness, but also limitations. We have developed a novel method where the 3-D structure of a protein is modeled using the concept of Triangular Spatial Relationship (TSR), where triangles are constructed with the Cα atoms of a protein as vertices. Every triangle is represented using an integer, which we denote as “key,” A key is computed using the length, angle, and vertex labels based on a rule-based formula, which ensures assignment of the same key to identical TSRs across protei...
Lectures: Monday 14-16 in E-523 Course materials: http://isweb.uni-koblenz.de (teaching) Examination: Oral exam at the end of the semester Outline Motivation and Overview Text processing and analysis Link Analysis and Authority Ranking... more
Lectures: Monday 14-16 in E-523 Course materials: http://isweb.uni-koblenz.de (teaching) Examination: Oral exam at the end of the semester Outline Motivation and Overview Text processing and analysis Link Analysis and Authority Ranking information Top-K Query Processing and Indexing search Advanced IR Models Multimedia Retrieval Automatic Classification Clustering and Graph Mining Peer-to-Peer Technologies information Information Extraction organization Data Warehouses and OLAP Ontologies and Semantic Web
Cognitive Computing: Theory and Applications, written by internationally renowned experts, focuses on cognitive computing and its theory and applications, including the use of cognitive computing to manage renewable energy, the... more
Cognitive Computing: Theory and Applications, written by internationally renowned experts, focuses on cognitive computing and its theory and applications, including the use of cognitive computing to manage renewable energy, the environment, and other scarce resources, machine learning models and algorithms, biometrics, Kernel Based Models for transductive learning, neural networks, graph analytics in cyber security, neural networks, data driven speech recognition, and analytical platforms to study the brain-computer interface. Comprehensively presents the various aspects of statistical methodologyDiscusses a wide variety of diverse applications and recent developmentsContributors are internationally renowned experts in their respective areas
Traditional association mining algorithms, like Apriori, generate all frequent itemsets existing within the dataset. However, only a very small fraction of this massive volume of frequent itemsets is interesting to the user. Therefore,... more
Traditional association mining algorithms, like Apriori, generate all frequent itemsets existing within the dataset. However, only a very small fraction of this massive volume of frequent itemsets is interesting to the user. Therefore, such algorithms waste a lot of time and resources to uncover itemsets that are insignificant. The objective of this dissertation is to introduce data structure that can be used to support selective association mining, which is an association mining algorithm that generates only itemsets containing items of user interest. The first data structure introduced for selective association mining is itemset tree. The performance of itemset tree can be improved by reordering the items. Five different distributions are used to determine which of these distributions performed the best. Two of these distributions are extracted from the structure of a clustering algorithm known as UNIMEM. As the performance of the distributions from UNIMEM are evaluated, UNIMEM shows potential for selective association mining. However, the algorithm cannot be used directly and our proposed modified version of the conceptual tree of UNIMEM is called ISE-Tree. Experiments show that ISE-Tree performs better than itemset tree and we have successfully introduced a new improved data structure for selective association mining.
Weighted graphs can be used to model any data sets composed of entities and relationships. Social networks, concept networks, and document networks are among the types of data that can be abstracted as weighted graphs. Identifying... more
Weighted graphs can be used to model any data sets composed of entities and relationships. Social networks, concept networks, and document networks are among the types of data that can be abstracted as weighted graphs. Identifying minimum-sized influential vertices (MIV) in a weighted graph is an important task in graph mining that gains valuable commercial applications. Although different algorithms for this task have been proposed, it remains challenging for processing web-scale weighted graph. In this chapter, we propose a highly scalable algorithm for identifying MIV on large-scale weighted graph using the MapReduce framework. The proposed algorithm starts with identifying an individual zone for every vertex in the graph using an α-cut fuzzy set. This approximation allows to divide the whole graph into multiple subgraphs that can be processed independently. Then, for each subgraph, a MapReduce-based greedy algorithm can be designed to identify the minimum-sized influential vertices for the whole graph.
All criminal networks are social networks with multiple channels of communication and collaboration between their members. In this paper, we analyze different types of criminal networks with respect to metrics commonly used in social... more
All criminal networks are social networks with multiple channels of communication and collaboration between their members. In this paper, we analyze different types of criminal networks with respect to metrics commonly used in social network analysis literature. We focus mostly on two types of networks: cocaine trading and terrorist activities. We also include a legal organization's network for comparison. Our findings reveal that there are significant differences in terms of some of these metrics between different types of criminal networks. These differences, in turn, may help security forces to identify unknown networks or substructures in very large networks as potential criminal networks.
Ontology-based approaches have been explored in several domains for knowledge representation and improving accuracy. However, ontology-based approaches for assisting a decision maker by delivering a concrete plan from analyzing the... more
Ontology-based approaches have been explored in several domains for knowledge representation and improving accuracy. However, ontology-based approaches for assisting a decision maker by delivering a concrete plan from analyzing the insights extracted from an ontology, have not received much attention. Insights-as-a-service is a technology that aids a decision maker by providing a concrete action plan, involving a comparative analysis of patterns derived from the data and the extraction of insights from such an analysis. In this paper, we propose an ontology-based architecture for mining insights within the Wireless Network Ontology (WNO), an ontology generated for the wireless network domain for delivering better wireless network performance. We present and illustrate: (i) the major components of the architecture together with the algorithms used for summarizing the network performance profiles in the form of rank tables, and (ii) how the insight rules (the action plan) are extracted from these tables. By utilizing the proposed approach, an actionable plan for assisting the decision maker can be obtained as domain knowledge is incorporated in the system. Experimental results on a wireless network dataset show that the proposed model provides an optimal action plan for a wireless network to improve its performance by encoding data-driven rules into the ontology and suggesting changes to its current network configuration.
Social media plays an important role in communication between people in recent times. This includes information about news and events that are currently happening. Most of the research on event detection concentrates on identifying events... more
Social media plays an important role in communication between people in recent times. This includes information about news and events that are currently happening. Most of the research on event detection concentrates on identifying events from social media information. These models assume an event to be a single entity and treat it as such during the detection process. This assumption ignores that the composition of an event changes as new information is made available on social media. To capture the change in information over time, we extend an already existing Event Detection at Onset algorithm to study the evolution of an event over time. We introduce the concept of an event life cycle model that tracks various key events in the evolution of an event. The proposed unsupervised sub-event detection method uses a threshold-based approach to identify relationships between sub-events over time. These related events are mapped to an event life cycle to identify sub-events. We evaluate the proposed sub-event detection approach on a large-scale Twitter corpus.
When retrieving images, users may find it easier to express the desired semantic content with keywords than visual features. Accurate keyword retrieval can only occur when images are completely and accurately described. This can be... more
When retrieving images, users may find it easier to express the desired semantic content with keywords than visual features. Accurate keyword retrieval can only occur when images are completely and accurately described. This can be achieved either through laborious manual effort or ...
The k Nearest Neighbors (KNN) algorithm has been widely applied in various supervised learning tasks due to its simplicity and effectiveness. However, the quality of KNN decision making is directly affected by the quality of the... more
The k Nearest Neighbors (KNN) algorithm has been widely applied in various supervised learning tasks due to its simplicity and effectiveness. However, the quality of KNN decision making is directly affected by the quality of the neighborhoods in the modeling space. Efforts have been made to map data to a better feature space either implicitly with kernel functions, or explicitly through learning linear or nonlinear transformations. However, all these methods use pre-determined distance or similarity functions, which may limit their learning capacity. In this paper, we propose a novel deep learning architecture, which is called the Deep Similarity-Enhanced K Nearest Neighbors (DSE-KNN), to learn an optimized similarity function of the data directly towards the goal of optimizing the KNN decision making. In other words, the type of similarity function that is used in our method is not pre-determined but rather learned to map data to a high-dimensional feature space where the accuracy of the KNN decision making is maximized. Experimental results show that DSE-KNN outperforms other common machine learning methods on classifying different types of disease datasets and predicting daily price direction of different stock ETFs.
When analyzing streaming data, the results can depreciate in value faster than the analysis can be completed and results deployed. This is certainly the case in the area of anomaly detection, where detecting a potential problem as it is... more
When analyzing streaming data, the results can depreciate in value faster than the analysis can be completed and results deployed. This is certainly the case in the area of anomaly detection, where detecting a potential problem as it is occurring (or in the early stages) can permit corrective behavior. However, most anomaly detection methods focus on point anomalies, whilst many fraudulent behaviors could be detected only through collective analysis of sequences of data in practice. Moreover, anomaly detection systems often stop at detecting anomalies; they typically do not provide information about how the features (attributes) of anomalies relate to each other or to those in normal states. The goal of this research is to create a distributed system that allows for the detection of collective anomalies from streaming data, and to provide a richer context of information about the anomalies besides their presence. To accomplish this, we (a) re-engineered an online sequence anomaly detection algorithm and (b) designed new algorithms for targeted association mining to run on a streaming, distributed environment. Our experiments, conducted on both synthetic and real-world data sets, demonstrated that the proposed framework is able to achieve near real-time response in detecting anomalies and extracting information pertaining to the anomalies.
Digitized resources are growing at a rapid pace. One of the challenges facing the computer science community is the development of techniques and tools to discover new and useful information from large collections of data. There are a... more
Digitized resources are growing at a rapid pace. One of the challenges facing the computer science community is the development of techniques and tools to discover new and useful information from large collections of data. There are a number of basic issues associated with this challenge and many are still unresolved. This situation has led to the emergence of a new area of study called "Knowledge Discovery in Databases" (KDD). KDD is comprised of researchers from a variety of fields, including statistics, pattern recognition, artificial intelligence, machine learning and databases. Recent efforts of KDD researchers have focused primarily on issues surrounding the individual steps of the discovery process. Those issues not directly related to the discovery process have received much less attention. One such issue is the impact of this new technology on database security. In particular, the security threat presented by classification learning methods. Providing safeguards a...
Social Media generates information about news and events in real-time. Given the vast amount of data available and the rate of information propagation, reliably identifying events is a challenge. Most state-of-the-art techniques are post... more
Social Media generates information about news and events in real-time. Given the vast amount of data available and the rate of information propagation, reliably identifying events is a challenge. Most state-of-the-art techniques are post hoc techniques that detect an event after it happened. Our goal is to detect onset of an event as it is happening using the user-generated information from Twitter streams. To achieve this goal, we use a discriminative model to identify change in the pattern of conversations over time. We use a topic evolution model to find credible events and eliminate random noise that is prevalent in many of the event detection models. The simplicity of the proposed model allows detect events quickly and efficiently, permitting discovery of events within minutes from the start of conversation about those conversations on Twitter. Our model is evaluated on a large-scale Twitter corpus to detect events in real-time. The proposed model is tested on other datasets to detect change over longer periods of time. The results indicate we can detect real events, within 3 to 8 minutes of it first appearing, with a lower degree of noise compared to other methods.
Interoperability of annotations in different domains is an essential demand to facilitate the interchange of data between semantic applications. Foundational ontologies, such as SKOS (Simple Knowledge Organization System), play an... more
Interoperability of annotations in different domains is an essential demand to facilitate the interchange of data between semantic applications. Foundational ontologies, such as SKOS (Simple Knowledge Organization System), play an important role in creating an interoperable layer for annotation. We are proposing a multi-layer ontology schema, named SKOS-Wiki, which extends SKOS to create an annotation model and relies on the semantic structure of the Wikipedia. We also inherit the DBpedia definition of named entities. The main goal of our proposed extension is to fill the semantic gaps between these models to create a unified annotation schema.
Adverse drug events (ADEs) are among the leading causes of death in the United States. Although many ADEs are detected during pharmaceutical drug development and the FDA approval process, all of the possible reactions cannot be identified... more
Adverse drug events (ADEs) are among the leading causes of death in the United States. Although many ADEs are detected during pharmaceutical drug development and the FDA approval process, all of the possible reactions cannot be identified during this period. Currently, post-consumer drug surveillance relies on voluntary reporting systems, such as the FDA's Adverse Event Reporting System (AERS). With an increase in availability of medical resources and health related data online, interest in medical data mining has grown rapidly. This information coupled with online conversations of people which involve discussions about their health provide a substantial resource for the identification of ADEs. In this work, we propose a method to identify adverse drug effects from tweets by modeling it as a link classification problem in graphs. Drug and symptom mentions are extracted from the tweet history of each user and a drug-symptom graph is built, where nodes represent either drugs or symptoms and edges are labelled positive or negative, for desired or adverse drug effects respectively. A link classification model is then used to identify negative edges i.e. adverse drug effects. We test our model on 864 users using 10-fold cross validation with Sider's dataset as ground truth. Our model was able to achieve an F-Score of 0.77 compared to the best baseline model with an F-Score of 0.58.
Though the issues of data quality trace back their origin to the early days of computing, the recent emergence of Big Data has added more dimensions. Furthermore, given the range of Big Data applications, potential consequences of bad... more
Though the issues of data quality trace back their origin to the early days of computing, the recent emergence of Big Data has added more dimensions. Furthermore, given the range of Big Data applications, potential consequences of bad data quality can be for more disastrous and widespread. This paper provides a perspective on data quality issues in the Big Data context. it also discusses data integration issues that arise in biological databases and attendant data quality issues.
Abstract—LBD tools enable the establishment of relationships between concepts appearing in scientific articles in the biomedical field and the generation of new hypotheses via the examination of these existing relationships. In this... more
Abstract—LBD tools enable the establishment of relationships between concepts appearing in scientific articles in the biomedical field and the generation of new hypotheses via the examination of these existing relationships. In this paper, we study the effectiveness of generally ...
ABSTRACT Recently, with companies and government agencies saving large repositories of time stream/temporal data, there is a large push for adapting association rule mining algorithms for dynamic, targeted querying. In addition, issues... more
ABSTRACT Recently, with companies and government agencies saving large repositories of time stream/temporal data, there is a large push for adapting association rule mining algorithms for dynamic, targeted querying. In addition, issues with data processing latency and results depreciating in value with the passage of time, create a need for swifter and more efficient processing. The aim of targeted association mining is to find potentially interesting implications in large repositories of data. Using targeted association mining techniques, specific implications that contain items of user interest can be found faster and before the implications have depreciated in value beyond usefulness. In this paper, the DynTARM algorithm is proposed for the discovery of targeted and rare association rules. DynTARM has the flexibility to discover strong and rare association rules from data streams within the user’s sphere of interest. By introducing a measure, called the Volatility Index, to assess the fluctuation in the confidence of rules, rules conforming to different temporal patterns are discovered.
For the last decade, the automatic generation of hypothesis from the literature has been widely studied. One common approach is to model biomedical literature as a concept network; then a prediction model is applied to predict the future... more
For the last decade, the automatic generation of hypothesis from the literature has been widely studied. One common approach is to model biomedical literature as a concept network; then a prediction model is applied to predict the future relationships (links) between pairs of concept. Typically, this link prediction task can be cast into in one of two forms: (a) predict the future links for a specific concept (node) or (b) predict the future links for the entire network. However, while being able to accurately forecast future relationships is vital, another, equally important question should be addressed: of the predicted links, which will be most important and/or most relevant? Attempts to answer these questions in the past have generally been domain specific. In this paper, we propose a domain-independent, supervised method that predicts the rank of future links utilizing objective interestingness measures. The results, based on analysis of thirteen common interestingness measures, indicate that, while predicting the specific future interestingness values is difficult, our approach allowed us to capture the relative ordering of the links with low error.
ABSTRACT Due to the inherent complexity of natural languages, many natural language tasks are ill-posed for mathematically precise algorithmic solutions. To circumvent this problem, statistical machine learning approaches are used for NLP... more
ABSTRACT Due to the inherent complexity of natural languages, many natural language tasks are ill-posed for mathematically precise algorithmic solutions. To circumvent this problem, statistical machine learning approaches are used for NLP tasks. The emergence of Big Data enables a new paradigm for solving NLP problems — managing the complexity of the problem domain by harnessing the power of data for building high quality models. This chapter provides an introduction to various core NLP tasks and highlights their data-driven solutions. Second, a few representative NLP applications that use the underlying infrastructure consisting of the core NLP tasks are described. Third, various sources of Big Data for NLP research are discussed. Fourth, Big Data driven NLP research and applications are outlined. Finally, the chapter concludes by indicating trends and future research directions.
Rather than finding new association-mining types one at a time, in this paper, we propose a framework, which is called Generalization of Association Mining via Information Granulation (GAMInG), based on which new association-mining types... more
Rather than finding new association-mining types one at a time, in this paper, we propose a framework, which is called Generalization of Association Mining via Information Granulation (GAMInG), based on which new association-mining types capable of discovering new patterns hidden in data can be systematically defined.
The k Nearest Neighbor (KNN) algorithm has been widely applied in various supervised learning tasks due to its simplicity and effectiveness. However, the quality of KNN decision making is directly affected by the quality of the... more
The k Nearest Neighbor (KNN) algorithm has been widely applied in various supervised learning tasks due to its simplicity and effectiveness. However, the quality of KNN decision making is directly affected by the quality of the neighborhoods in the modeling space. Efforts have been made to map data to a better feature space either implicitly with kernel functions, or explicitly through learning linear or nonlinear transformations. However, all these methods use pre-determined distance or similarity functions, which may limit their learning capacity. In this paper, we present two loss functions, namely KNN Loss and Fuzzy KNN Loss, to quantify the quality of neighborhoods formed by KNN with respect to supervised learning, such that minimizing the loss function on the training data leads to maximizing KNN decision accuracy on the training data. We further present a deep learning strategy that is able to learn, by minimizing KNN loss, pairwise similarities of data that implicitly maps data to a feature space where the quality of KNN neighborhoods is optimized. Experimental results show that this deep learning strategy (denoted as Deep KNN) outperforms state-of-the-art supervised learning methods on multiple benchmark data sets.
In this dissertation, we propose a novel integrated information retrieval approach that provides a unified solution to challenging problems faced by existing, popular information retrieval models. The first problem relates to the vector... more
In this dissertation, we propose a novel integrated information retrieval approach that provides a unified solution to challenging problems faced by existing, popular information retrieval models. The first problem relates to the vector space model. We found that different information needs in fact require different vector spaces to represent documents. However, the question of how to dynamically build optimal vector spaces that are tailored to users' different information needs remains unexplored. The second problem relates to the language modeling approach. It is conceptually difficult for the language modeling approach to utilize the advantages of machine learning techniques. In order to solve these problems, we designed a kernel function called the language-modeling kernel. This kernel function retains all the modeling benefits provided by the language modeling approach. Meanwhile, for each information need, it dynamically determines an optimal vector space, based on which machine learning algorithms, such as the support vector machine (SVM), can be applied to find an optimal decision boundary that separates the relevant documents from the non-relevant ones. Furthermore, an effective double learning strategy is proposed based on the language-modeling kernel. Large-scale experiments on standard test-beds show that our approach makes significant improvements over the state-of-art information retrieval methods.
... Concept Based Retrieval by Minimal Term Sets. (CIT:199033). ... Find all citations by this author (default). Or filter your current search. Ali H. Alsaffar, Find all citations by this author (default). Or filter your current search.... more
... Concept Based Retrieval by Minimal Term Sets. (CIT:199033). ... Find all citations by this author (default). Or filter your current search. Ali H. Alsaffar, Find all citations by this author (default). Or filter your current search. Jitender S. Deogun, Find all citations by this author (default). ...
Big data requirements are motivating new database-management models that can process billions of data requests per second, and established relational models are changing to keep pace. The authors provide practical tools for navigating... more
Big data requirements are motivating new database-management models that can process billions of data requests per second, and established relational models are changing to keep pace. The authors provide practical tools for navigating this shifting product landscape and finding candidate systems that best fit a data manager's application needs.

And 378 more