Skip to main content
  • Tim Finin is a Professor of Computer Science and Electrical Engineering at the University of Maryland, Baltimore Coun... moreedit
We present a family of novel methods for embedding knowledge graphs into real-valued tensors. These tensor-based embeddings capture the ordered relations that are typical in the knowledge graphs represented by semantic web languages like... more
We present a family of novel methods for embedding knowledge graphs into real-valued tensors. These tensor-based embeddings capture the ordered relations that are typical in the knowledge graphs represented by semantic web languages like RDF. Unlike many previous models, our methods can easily use prior background knowledge provided by users or extracted automatically from existing knowledge graphs. In addition to providing more robust methods for knowledge graph embedding, we provide a provably-convergent, linear tensor factorization algorithm. We demonstrate the efficacy of our models for the task of predicting new facts across eight different knowledge graphs, achieving between 5% and 50% relative improvement over existing state-of-the-art knowledge graph embedding techniques. Our empirical evaluation shows that all of the tensor decomposition models perform well when the average degree of an entity in a graph is high, with constraint-based models doing better on graphs with a small number of highly similar relations and regularization-based models dominating for graphs with relations of varying degrees of similarity.
We present CASIE, a system that extracts information about cybersecurity events from text and populates a semantic model, with the ultimate goal of integration into a knowledge graph of cybersecurity data. It was trained on a new corpus... more
We present CASIE, a system that extracts information about cybersecurity events from text and populates a semantic model, with the ultimate goal of integration into a knowledge graph of cybersecurity data. It was trained on a new corpus of 1,000 English news articles from 2017-2019 that are labeled with rich, event-based annotations and that covers both cyberattack and vulnerability-related events. Our model defines five event subtypes along with their semantic roles and 20 event-relevant argument types (e.g., file, device, software , money). CASIE uses different deep neural networks approaches with attention and can incorporate rich linguistic features and word embeddings. We have conducted experiments on each component in the event detection pipeline and the results show that each subsystem performs well.
We present a way to generate gazetteers from the Wikidata knowledge graph and use the lists to improve a neural NER system by adding an input feature indicating that a word is part of a name in the gazetteer. We empirically show that the... more
We present a way to generate gazetteers from the Wikidata knowledge graph and use the lists to improve a neural NER system by adding an input feature indicating that a word is part of a name in the gazetteer. We empirically show that the approach yields performance gains in two distinct languages: a high-resource, word-based language, English and a high-resource, character-based language, Chinese. We apply the approach to a low-resource language, Russian, using a new annotated Russian NER corpus from Reddit tagged with four core and eleven extended types, and show a baseline score.
The goal of this work is to improve the performance of a neu-ral named entity recognition system by adding input features that indicate a word is part of a name included in a gazetteer. This article describes how to generate gazetteers... more
The goal of this work is to improve the performance of a neu-ral named entity recognition system by adding input features that indicate a word is part of a name included in a gazetteer. This article describes how to generate gazetteers from the Wikidata knowledge graph as well as how to integrate the information into a neural NER system. Experiments reveal that the approach yields performance gains in two distinct languages: a high-resource, word-based language, English and a high-resource, character-based language, Chinese. Experiments were also performed in a low-resource language, Rus-sian on a newly annotated Russian NER corpus from Reddit tagged with four core types and twelve extended types. This article reports a baseline score. It is a longer version of a paper in the 33rd FLAIRS conference (Song et al. 2020).
As cybersecurity-related threats continue to increase , understanding how the field is changing over time can give insight into combating new threats and understanding historical events. We show how to apply dynamic topic models to a set... more
As cybersecurity-related threats continue to increase , understanding how the field is changing over time can give insight into combating new threats and understanding historical events. We show how to apply dynamic topic models to a set of cybersecurity documents to understand how the concepts found in them are changing over time. We correlate two different data sets, the first relates to specific exploits and the second relates to cybersecurity research. We use Wikipedia concepts to provide a basis for performing concept phrase extraction and show how using concepts to provide context improves the quality of the topic model. We represent the results of the dynamic topic model as a knowledge graph that could be used for inference or information discovery.
We introduce the notion of reinforcement quantum annealing (RQA) scheme in which an intelligent agent searches in the space of Hamiltonians and interacts with a quantum annealer that plays the stochastic environment role of learning... more
We introduce the notion of reinforcement quantum annealing (RQA) scheme in which an intelligent agent searches in the space of Hamiltonians and interacts with a quantum annealer that plays the stochastic environment role of learning automata. At each iteration of RQA, after analyzing results (samples) from the previous iteration, the agent adjusts the penalty of unsatisfied constraints and re-casts the given problem to a new Ising Hamiltonian. As a proof-of-concept, we propose a novel approach for casting the problem of Boolean satisfiability (SAT) to Ising Hamiltonians and show how to apply the RQA for increasing the probability of finding the global optimum. Our experimental results on two different benchmark SAT problems (namely factoring pseudo-prime numbers and random SAT with phase transitions), using a D-Wave 2000Q quantum processor, demonstrated that RQA finds notably better solutions with fewer samples, compared to the best-known techniques in the realm of quantum annealing.
Building new knowledge-based systems today usually entails constructing new knowledge bases from scratch. It could instead be done by assembling reusable components. System developers would then only need to worry about creating the... more
Building new knowledge-based systems today usually entails constructing new knowledge bases from scratch. It could instead be done by assembling reusable components. System developers would then only need to worry about creating the specialized knowledge and reasoners new to the specific task of their system. This new system would interoperate with existing systems, using them to perform some of its reasoning. In this way, declarative knowledge, problem-solving techniques, and ...
This paper describes current work at the University of Pennsylvania centered around providing intelligent help and advice to users of interactive task oriented systems. This work focuses on three general themes: (1) Help systems should be... more
This paper describes current work at the University of Pennsylvania centered around providing intelligent help and advice to users of interactive task oriented systems. This work focuses on three general themes: (1) Help systems should be active rather than passive ; (2) help systems should contain explicit models of the user, the task and the system utility being used and (3) the help system should engage in an interactive dialogue with the user in order to identify the information he really needs. An experimental Unix system , WIZARD, has been implemented for the VAX/VMS operating system to explore some of these issues.
ABSTRACT
Research Interests:
Current information extraction systems can do a good job of discov- ering entities, relations and events in natural language text. The traditional out- put of such systems is XML, with the ACE Pilot Format (APF) schema as a common target.... more
Current information extraction systems can do a good job of discov- ering entities, relations and events in natural language text. The traditional out- put of such systems is XML, with the ACE Pilot Format (APF) schema as a common target. We are developing a system that will take the output of an in- formation extraction system as APF documents and
We describe the initial results of a project aimed at adapting the Cyc system for use in an agent architecture. Two Cyc systems that share a large common core of knowledge but differ in additional knowledge they possess were able to... more
We describe the initial results of a project aimed at adapting the Cyc system for use in an agent architecture. Two Cyc systems that share a large common core of knowledge but differ in additional knowledge they possess were able to reason together to solve problems that neither could solve on its own. A rudimentary interface was constructed for Cyc
Research Interests:
Research Interests:
Research Interests:
In this paper, we present a unified framework for discovering and querying hybrid linked data. We describe our approach to developing a natural language query interface for a hybrid knowledge base Wikitology, and present that as a case... more
In this paper, we present a unified framework for discovering and querying hybrid linked data. We describe our approach to developing a natural language query interface for a hybrid knowledge base Wikitology, and present that as a case study for accessing hybrid information sources with structured and un-structured data through natural language queries. We evaluate our system on a publicly available dataset and demonstrate improvements over a baseline system. We describe limitations of our approach and also discuss cases where our system can complement other structured data querying systems by retrieving additional answers not available in structured sources.
O 0K #!=HPN%%$HN,- =0 /9+ 0PO 5 + 28800-51810 =!) -50760 0-51810 %$HN,- 99 9 K) 7?-N) - +%Q '%R%=H0K%S'%(%K) 99 ) % + '%V"/0+ '%R%=H0K% 0=!) 40 @790+!=H%V#D) =H0K%S'%( X D) =H0K%S'%(%K) 99... more
O 0K #!=HPN%%$HN,- =0 /9+ 0PO 5 + 28800-51810 =!) -50760 0-51810 %$HN,- 99 9 K) 7?-N) - +%Q '%R%=H0K%S'%(%K) 99 ) % + '%V"/0+ '%R%=H0K% 0=!) 40 @790+!=H%V#D) =H0K%S'%( X D) =H0K%S'%(%K) 99 ]\\_^`"N+2Ta.0K0 '%cb30++)PdJefgThWK) 28 i f:j56k8c) efgThWK) 28480-47620 %K) 99 =0 .) 00-45540 gThW 2Tl9-C7M0+ ThW "I#2) + ThWK) 28480-47620 `"N+K) ThWK) 28480-47620 nmUoQp>qVrkYps\\_1:0+S0K) oQp>qVr J'P) % n J'P) %K =H 6G 29000-43440 0 2588 w0 ==) -4 'j5T+%vxM H+v 80-44 `\\_1:0+ "/=H 28930-42390 80-44480 99 0+K) =0 Q"N+B#)879R)Y790I=H+J'P) n-P B#)879 2.
This book constitutes the refereed proceedings of the 7th International Semantic Web Conference, ISWC 2008, held in Karlsruhe, Germany, during October 26-30, 2008. The volume contains 43 revised full research papers selected from a total... more
This book constitutes the refereed proceedings of the 7th International Semantic Web Conference, ISWC 2008, held in Karlsruhe, Germany, during October 26-30, 2008. The volume contains 43 revised full research papers selected from a total of 261 submissions, of which an additional 3 papers were referred to the semantic Web in-use track; 11 papers out of 26 submissions to the semantic Web in-use track, and 7 papers and 12 posters accepted out of 39 submissions to the doctorial consortium. The topics covered in the research track ...
Complex systems are becoming more pervasive, yet in order for these systems to be used effectively, machine-based assistance is needed. With the advent of powerful personal systems it is anticipated that experts in various disciplines... more
Complex systems are becoming more pervasive, yet in order for these systems to be used effectively, machine-based assistance is needed. With the advent of powerful personal systems it is anticipated that experts in various disciplines will become increasingly dependent on computational environments provided they are given a means of exploiting system capabilities. Traditional help systems have made use of canned
ABSTRACT In the near future, we will see dramatic changes in comput- ing and networking hardware. A large number of devices (e.g., phones, PDAs, even small household appliances) will become computationally enabled. Micro/nano sensors will... more
ABSTRACT In the near future, we will see dramatic changes in comput- ing and networking hardware. A large number of devices (e.g., phones, PDAs, even small household appliances) will become computationally enabled. Micro/nano sensors will be widely embedded in most engineered artifacts, from the clothes we wear to the roads we drive on. All of these de- vices will be (wirelessly) networked using Bluetooth, IEEE 802.15 or IEEE 802.11 for short range connectivity creating pervasive environments. In this age, where a large number of wirelessly networked appliances and devices are becoming commonplace, there is a necessity for providing a standard interface to them that is easily accessible by any user. This paper outlines the design of Centaurus, an infrastructure for presenting services to heterogeneous mobile clients in a physical space via some short range wireless links. The infrastructure is communication medium independent; we have implemented the system over Bluetooth, CDPD and Infrared, three well-known wireless technologies. All the components in our model use a language based on Extensi- ble Markup Language (XML) for communication, giving the system a uniform and easily adaptable interface. Centaurus defines a uniform infrastructure for heterogeneous services, both hardware and software, to be made available to diverse mobile users within a confined space. ∗
n this paper we describe the Unified Cybersecurity Ontology (UCO) that is intended to support information integration and cyber situational awareness in cybersecurity systems. The ontology incorporates and integrates heterogeneous data... more
n this paper we describe the Unified Cybersecurity Ontology (UCO) that is intended to support information integration and cyber situational awareness in cybersecurity systems. The ontology incorporates and integrates heterogeneous data and knowledge schemas from different cybersecurity systems and most commonly used cybersecurity standards for information sharing and exchange. The UCO ontology has also been mapped to a number of existing cybersecurity ontologies as well as concepts in the Linked Open Data cloud. Similar to DBpedia which serves as the core for general knowledge in Linked Open Data cloud, we envision UCO to serve as the core for cybersecurity domain, which would evolve and grow with the passage of time with additional cybersecurity data sets as they become available. We also present a prototype system and concrete use cases supported by the UCO ontology. To the best of our knowledge, this is the first cybersecurity ontology that has been mapped to general world ontologies to support broader and diverse security use cases. We compare the resulting ontology with previous efforts, discuss its strengths and limitations, and describe potential future work directions.
Semantic textual similarity is a measure of the degree of semantic equivalence between two pieces of text. We describe the SemSim system and its performance in the *SEM~2013~and SemEval-2014~tasks on semantic textual similarity. At the... more
Semantic textual similarity is a measure of the degree of semantic equivalence between two pieces of text. We describe the SemSim system and its performance in the *SEM~2013~and SemEval-2014~tasks on semantic textual similarity. At the core of our system lies a robust distributional word similarity component that combines Latent Semantic Analysis and machine learning augmented with data from several linguistic resources. We used a simple term alignment algorithm to handle longer pieces of text. Additional wrappers and resources were used to handle task specific challenges that include processing Spanish text, comparing text sequences of different lengths, handling informal words and phrases, and matching words with sense definitions. In the *SEM~2013~task on Semantic Textual Similarity, our best performing system ranked first among the~89~submitted runs. In the SemEval-2014~task on Multilingual Semantic Textual Similarity, we ranked a close second in both the English and Spanish subtasks. In the SemEval-2014~task on Cross--Level Semantic Similarity, we ranked first in Sentence--Phrase, Phrase--Word, and Word--Sense subtasks and second in the Paragraph--Sentence subtask.
Research Interests:
n about the exemplars associated with each concept in that ontology. Then, the similarity score################# from concept### in ontology B to concept in ontology A can be obtained by comparing the exemplars of against the model of... more
n about the exemplars associated with each concept in that ontology. Then, the similarity score################# from concept### in ontology B to concept in ontology A can be obtained by comparing the exemplars of against the model of ontology A. In essence, # # ### # ### # # measures similarity between exemplars associated with and those with . Bayesian subsumption. may (partially) match more than one concept in A, each with a different similarity score. Also since a non-leaf node is a superclass of its children, its exemplars should include both those associated with it and those with all of its descendants in the hierarchy. Therefore, non-leaf nodes need to synthesize scores from their descendants before the final mapping can be selected. This is accomplished by a Bayesian extension of the subsumption operation of description logics. In this approach, we assume that all leaves in a hierarchy form a mutually exclusive and exhaustive set, and take the score######
The Semantic Web is a vision to simplify and improve knowledge reuse on the Web. It is all set to alter the way humans benefit from the web from active interaction to somewhat passive utilization through the proliferation of software... more
The Semantic Web is a vision to simplify and improve knowledge reuse on the Web. It is all set to alter the way humans benefit from the web from active interaction to somewhat passive utilization through the proliferation of software agents and in particular personal assistants that can better function and thrive on the Semantic Web than the conventional web. Agents can parse, understand and reason about information available on Semantic Web pages in an attempt to use it to meet users' needs. Such personal assistants will be driven by rules , axioms and the internal model or profile that the agents have inside them for the user. An intrinsic and important pre-requisite for a personal assistant or rather any agent is to manipulate information available on the Semantic Web in the form of ontologies, axioms, and rules written in various semantic markup languages. In this paper, a model architecture for such a personal assistant, which we call Sidekick, that deals with real-world se...
We compared image features with a distance metric and support vector machine to identify the critical view of a laparoscopic cholecystectomy. Our accu- racy was up to 91%. We are currently experimenting with particle analysis, edge... more
We compared image features with a distance metric and support vector machine to identify the critical view of a laparoscopic cholecystectomy. Our accu- racy was up to 91%. We are currently experimenting with particle analysis, edge analysis, and feature clus- tering to create a more robust image classifier.
Research Interests:
A discussion on the topic of ontologies in agent-based systems had Drs. Sidney Bailin, Gary Berg-Cross, and Tim Finin as panel members. Bailin, who chaired the session, led off with a series of questions for consideration: 1. Can we... more
A discussion on the topic of ontologies in agent-based systems had Drs. Sidney Bailin, Gary Berg-Cross, and Tim Finin as panel members. Bailin, who chaired the session, led off with a series of questions for consideration: 1. Can we properly speak of an ontology (an artifact, as opposed to just ontology as a field of study)? Can we properly speak
Research Interests:
A network of infostations offering high bandwidth islands of data connectivity has often been suggested as a viable alterna- tive to cellular WAN for providing network connectivity for mobile devices. Current data management models... more
A network of infostations offering high bandwidth islands of data connectivity has often been suggested as a viable alterna- tive to cellular WAN for providing network connectivity for mobile devices. Current data management models proposed for such networks treat these infostations mainly as passiv e entities that offer devices in range, access to needed infor ma- tion. In this paper,
Introduction This is a brief, informal and, because of Christmas holidays, regrettably partial survey of current research in computational linguistics at the University of Pennsylvania. Any inaccuracies can be blamed on the departmental... more
Introduction This is a brief, informal and, because of Christmas holidays, regrettably partial survey of current research in computational linguistics at the University of Pennsylvania. Any inaccuracies can be blamed on the departmental egg nog. 2. Extending the Range of Interactive Behavior Perhaps the most activity here in computational linguistics is aimed at extending the kinds of behavior that can be supported in interactions with data base and expert systems. Elsewhere we have argued that such systems have to do more than retrieve and present appropriate facts and conclusions if they are to satisfy their users' real needs (Pollack, Hirschberg, and Webber 1982). Out of this conviction have already come systems able to recognize and respond to two types of presupposition failures (Kaplan 1982, Mays 1980) and a system able to describe in Natural Language what it knows about various entities (McKeown 1982). The following snippets indicate our current efforts in this area. 2.1. Re
We consider the problem of how agents should be named and what kind of software infrastructure is necessary in order to locate an agent given only its name. We assume an agent environment which (1) is dynamic with agents being created and... more
We consider the problem of how agents should be named and what kind of software infrastructure is necessary in order to locate an agent given only its name. We assume an agent environment which (1) is dynamic with agents being created and destroyed frequently; (2) undergoes re-organizations with agent groups and sub-groups forming and disbanding; and (3) supports agent communication

And 376 more