Skip to main content
Paul Jones

    Paul Jones

    • Background in streaming analytics and big data algorithms for government and industry. Currently researching the use ... moreedit
    • Dr Nagiza F Samatovaedit
    Missing data is a severe problem for algorithms that operate over knowledge graphs (KGs). Most previous research in KG completion has focused on the problem of inferring missing entities and missing relation types between entities.... more
    Missing data is a severe problem for algorithms that operate over knowledge graphs (KGs). Most previous research in KG completion has focused on the problem of inferring missing entities and missing relation types between entities. However, in addition to these, many KGs also suuer from missing entity types (i.e. the category labels for entities, such as /music/artist). Entity types are a critical enabler for many NLP tasks that use KGs as a reference source, and inferring missing entity types remains an important outstanding obstacle in the eld of KG completion. Inspired by recent work to build a contextual KG embedding model, we propose a novel approach to address the entity type prediction problem. We compare the performance of our method with several state-of-the-art KG embedding methods, and show that our approach gives higher prediction accuracy compared to baseline algorithms on two real-world datasets. Our approach also produces consistently high accuracy when inferring entities and relation types, as well as the primary task of inferring entity types. is is in contrast to many of the baseline methods that specialize in one prediction task or another. We achieve this while preserving linear scalability with the number of entity types. Source code and datasets from this paper can be found at (hhps://github.ncsu.edu/cmoon2/kg).
    Online reviews often contain user's specific opinions on aspects (features) of items. These opinions are very useful to merchants and customers, but manually extracting them is time-consuming. Several topic models have been proposed to... more
    Online reviews often contain user's specific opinions on aspects (features) of items. These opinions are very useful to merchants and customers, but manually extracting them is time-consuming. Several topic models have been proposed to simultaneously extract item aspects and user's opinions on the aspects , as well as to detect sentiment associated with the opinions. However, existing models tend to find poor aspect-opinion associations when limited examples of the required word co-occurrences are available in corpus. These models often also assign incorrect sentiment to words. In this paper, we propose a Latent embedding structured Opinion mining Topic model, called the LOT, which can simultaneously discover relevant aspect-level specific opinions from small or large numbers of reviews and to assign accurate sentiment to words. Experimental results for topic coherence, document sentiment classification, and a human evaluation all show that our proposed model achieves significant improvements over several state-of-the-art baselines.
    Research Interests:
    Knowledge workers are frequently subject to information overload. As a result, when looking to make analytic judgements, they may only have time to search for evidence that already matches their existing viewpoint, leading to confirmation... more
    Knowledge workers are frequently subject to information overload. As a result, when looking to make analytic judgements, they may only have time to search for evidence that already matches their existing viewpoint, leading to confirmation bias. New computer systems are needed that can help users overcome this and other cognitive biases. As an enabling step towards such systems, the research community has developed instrumentation software that captures data to help better understand sensemaking processes and workflows. However, existing instrumentation approaches are limited by the need to write operating system-specific (and often application-specific) code to 'see' what the user is doing inside different applications on their computer. This source code quickly becomes complex and brittle. Furthermore, this approach does not provide a holistic view of how the user is gleaning information from multiple applications at once. We propose an alternative approach to instrumentation based on automated analysis of desktop screenshots, and demonstrate this in the context of extraction of 'claims' from reports that users are writing, and association of these claims with 'evidence' obtained from web browsing. We evaluate our approach on a corpus of 121,000 screenshots obtained from a study of 150 participants carrying out a controlled analysis task. The topic of the task was previously unfamiliar to them (hence the need to search for evidence on the web). We report results from several variants of our approach using a human evaluation of extracted claim/evidence pairs, and find that a simple word matching metric (based on Jaccard similarity) can outperform more complex sentence similarity metrics. We also describe many of the difficulties inherent to screenshot analysis and our approaches to overcome them.
    The success of sentence classification often depends on understanding both the syntactic and semantic properties of word-phrases. Recent progress on this task has been based on exploiting the grammatical structure of sentences but often... more
    The success of sentence classification often depends on understanding both the syntactic and semantic properties of word-phrases. Recent progress on this task has been based on exploiting the grammatical structure of sentences but often this structure is difficult to parse and noisy. In this paper, we propose a structure-independent ‘Gated Representation Alignment’ (GRA) model that blends a phrase- focused Convolutional Neural Network (CNN) approach with sequence-oriented Recurrent Neural Network (RNN). Our novel alignment mechanism allows the RNN to selectively include phrase information in a word-by-word sentence representation, and to do this without awareness of the syntactic structure. An empirical evaluation of GRA shows higher prediction accuracy (up to 4.6%) of fine-grained sentiment ratings, when compared to other structure-independent baselines. We also show comparable results to several structure-dependent methods. Finally, we analyzed the effect of our alignment mechanism and found that this is critical to the effectiveness of the CNN-RNN hybrid.
    Knowledge workers are being exposed to more information than ever before, as well as having to work in multi-tasking and collaborative environments. There is an increasing need for interfaces and algorithms to help automatically keep... more
    Knowledge workers are being exposed to more information than ever before, as well as having to work in multi-tasking and collaborative environments. There is an increasing need for interfaces and algorithms to help automatically keep track of documents that are associated with both individual and team tasks. Previous approaches to the problem of automatically applying task labels to documents have been limited to small feature spaces or have not taken into account multi-user environments. Many different clues to potential task associations are available through user, task and document similarity metrics, as well as through temporal patterns in individual and team workflows. We present a network-fusion algorithm for automatic task-centric document curation, and show how this can guide a recent-work dashboard interface, which organizes user's documents and gathers feedback from them. Our approach efficiently computes representations of users, tasks and documents in a common vector space, and can easily take into account many different types of associations through the creation of edges in a multi-layer graph. We have demonstrated the effectiveness of this approach using labelled document corpora from three empirical studies with students and intelligence analysts. We have also shown how to leverage relationships between different entity types to increase classification accuracy by up to 20% over a simpler baseline, and with as little as 10% labelled data.
    Assisting teams of knowledge workers achieve common strategic and tactical goals is becoming an increasing priority as information analysis tasks become more complex. Tools to monitor and support individual workers, such as TaskTracer,... more
    Assisting teams of knowledge workers achieve common strategic and tactical goals is becoming an increasing priority as information analysis tasks become more complex. Tools to monitor and support individual workers, such as TaskTracer, demonstrated the potential for assisting individuals but there is a lack of tools for analyzing workflows and information needs at a collaborative level within enterprises. Providing assistance for collaboration is a current priority for the new generation of ‘smart digital assistants’ and presents unique challenges, in terms of associating collective goals with user activities in way that minimizes disruption to the user's workflow, and in generating useful summaries. To address these challenges, we have developed ‘Journaling’ interfaces to: capture a user or team's tasks and goals, associate them with information artifacts, assist in information recall, and display aggregate visualizations. These interfaces are supported by a passive instrumentation platform that aims to monitor user's consented activities in a minimally intrusive manner, including interactions with various applications, documents and URLs as part of collaborative workflows. We have deployed these interfaces in a production environment using heterogeneous workflows performed by groups of students and intelligence analysts. Evaluations based on user interviews and engagement metrics suggest that our approach is useful for understanding and supporting user collaboration, as well as being easy to work with and suitable for continuous use. In addition, the data gathered provides situational awareness for individuals, teams, educators and managers. Through our platform, we are enabling continuous collection of labelled user activity logs and document corpora that are enabling further research in this nascent field.
    Stream processing applications are used to ingest, process, and analyze continuous data streams from heterogeneous sources of live and stored data, generating streams of output results. These applications are, in many cases, complex,... more
    Stream processing applications are used to ingest, process, and analyze continuous data streams from heterogeneous sources of live and stored data, generating streams of output results. These applications are, in many cases, complex, large-scale, low-latency, and distributed in nature. In this paper, we describe the design principles and architectural underpinnings for stream processing applications. These principles are distilled from our experience in building real-world applications both for internal use as well as with customers from several industrial and academic domains. We provide principles, guidelines, as well as appropriate implementation examples to highlight the different aspects of stream processing application design and development.
    Research Interests:
    The history of interactions between a user and a piece of technology can be represented as a sequence of actions. The ability to predict a user's next action is useful to many applications. For example, a user-interface that can... more
    The history of interactions between a user and a piece of technology can be represented as a sequence of actions. The ability to predict a user's next action is useful to many applications. For example, a user-interface that can anticipate the actions of a user is able to provide a more positive experience through just-in-time recommendations and pro-actively allocating or caching resources. Existing sequence prediction techniques have failed to address some of the challenges associated with this task, such as predicting an action that has never appeared for a given context. Techniques for an analogous task in the field of Natural Language Processing (NLP) avoid this issue; however, applying these NLP techniques directly to user action prediction would result in the loss of action frequency and action order, both of which are critically important.

    Therefore, we propose a method that unifies ideas from NLP with the task of sequence prediction. Our method, Frequency Vector (FVEC) prediction, is an online algorithm that predicts the top-N most likely next actions by combining scores from two models: a frequency analysis model and a vector representation model. In the frequency model, the score of an action is calculated based on the frequency that the action has occurred right after a given context. In the vector representation model, a vector for each action is learned, and a score for an action is calculated based on the similarity of its vector and the mean of the vectors for each action in a given context. Evaluations of FVEC on three real-world datasets resulted in a consistently higher prediction accuracy (and lower standard deviation) than all tested sequence prediction algorithms.
    Information analysis tasks are becoming increasingly complex as the growth in volume and variety of available data continues to outpace methods to automatically analyze it. The result is an increasing burden on knowledge workers. Industry... more
    Information analysis tasks are becoming increasingly complex as the growth in volume and variety of available data continues to outpace methods to automatically analyze it. The result is an increasing burden on knowledge workers. Industry and academia are currently lacking platforms and tools that can help provide enterprise-wide understanding of how humans analyze information. Through instrumentation of knowledge worker's activities during analysis tasks, we can enable research into techniques that address this need. Several research and design challenges need to be addressed, however, to develop scalable, robust and efficient instrumenta-tion methods for collecting streaming data on relevant human-machine interactions. This data is typically high volume and multi-modal. We present an instrumentation platform to meet these and future challenges in better understanding and improving the craft of information analysis. Our platform provides an extensible framework to instrument analyst's workstations during information processing tasks, and includes a streaming data processing pipeline that supports real-time analysis of large volumes of event data. We have built the instrumentation platform using the latest open-source scalable and flexible web components and hardware infrastructure. Researchers from science , engineering, and humanities are using our platform to gain insight into tool usage, analytical workflows, and collaboration patterns. Several use cases from these groups are described. Our platform provides a unique, pragmatic and holistic foundation to understand the behavior of knowledge workers, and to support applications that assist with information analysis.
    Research Interests: