Lens: A Faceted Browser for Research Networking Platforms

Richard Whaling

Ian Foster

Tanu Malik

visibility

…

description

8 pages

link

1 file

Research networking platforms, such as VIVO and Profiles Networking provide an information infrastructure for scholarship, representing information about research and researchers—their scholarly works, research interests, and organizational relationships. These platforms are open information infrastructures for scholarship, consisting of linked open data and open- source software tools for managing and visualizing scholarly information. Being RDF based, faceted browsing is a natural technique for navigating such data, partitioning the scholarly information space into orthogonal conceptual dimensions. However, this technique has so far been explored through limited queries in research networking platforms--not allowing for instance full graph based navigation on RDF data. In this paper we present Lens a client-side user interface for faceted navigation of scholarly RDF data. Lens is based on Exhibit, which is a lightweight structured data-publishing framework, but extends Exhibit for expressive SPARQL- like queries and scales it up for navigating amounts of RDF data. Lens consumes data in VIVO ontology, the de facto schema for researcher networking systems. We show how Lens provides better usability over current faceted browsers for research networking platforms.

Lens: A Faceted Browser for Research Networking Platforms Richard Whaling, Tanu Malik, Ian Foster Computation Institute, University of Chicago and Argonne National Laboratory Chicago, USA {rwhaling, tanum1, foster} @ci.uchicago.edu Abstract— Research networking platforms, such as VIVO and Profiles Networking provide an information infrastructure for scholarship, representing information about research and researchers—their scholarly works, research interests, and organizational relationships. These platforms are open information infrastructures for scholarship, consisting of linked open data and opensource software tools for managing and visualizing scholarly information. Being RDF based, faceted browsing is a natural technique for navigating such data, partitioning the scholarly information space into orthogonal conceptual dimensions. However, this technique has so far been explored through limited queries in research networking platforms--not allowing for instance full graph based navigation on RDF data. In this paper we present Lens a client-side user interface for faceted navigation of scholarly RDF data. Lens is based on Exhibit, which is a lightweight structured data-publishing framework, but extends Exhibit for expressive SPARQLlike queries and scales it up for navigating amounts of RDF data. Lens consumes data in VIVO ontology, the de facto schema for researcher networking systems. We show how Lens provides better usability over current faceted browsers for research networking platforms. Keywords—researcher profiling systems, RDF databases; faceted search; ontologies; user interfaces (key words) I. INTRODUCTION Scientific activity for several decades was viewed as a purely intellectual exercise. Over the last decade, it is percieved as a complex adaptive system that requires various inputs elements, such as monetary resources and human resources, and output elements that focus on knowledge creation and its economic, social and human impact [2]. The understanding of this complex adaptive system is essential for policy makers, funding agencies, and researchers---Policy makers want to empirically quantify the impact of science dollars on job creation; funding agencies want to comprehend the science programs to support, and researchers want to identify and engage experts whose scholarly work is of value to one’s own. This work is supported by NSF Grant SBE-1160899 and seed funding through the University of Chicago. 1 : Contact Author Research networking platforms such as VIVO [2] and Profiles Networking Software [10] and others [19] have emerged in response to this need of understanding complex research networks. The platforms provide an information infrastructure for scholarship, represent information about research and researchers, inlcuding but not limited to their scholarly works, research interests, and organizational relationships. To enable such diverse interests and research works, often obtained from heterogenous data sources, most platforms adopt Semantic Web principles to model the data and use tools for reasoning about researcher relationships and networks. To provide connections between people, organizations, works, and funding over time and place, the platforms provide a myriad of visualization interfaces. In our evaluation of profiling platforms, we have identified three kinds of interfaces: (1) a simple keyword search, (2) explicit SPARQL queries, and (3) graph visualisation. However, none of these interfaces provide the power of faceted search that makes available a number of co-existing dimensions that can be simultaeously browsed by the user. In keyword search, a user has to guess the right ‘search’ terms; SPARQL queries can lead to inconclusive searches; and graph visualization does not provide an intuitive feel for the data as a whole just by looking at the available or most relevant facets. This intuitive feel is important for policy makers and researchers to get an overall understanding of research investment of say a department, an institute or university, and explore areas of deep investment. Several faceted browsers [5,6,12,13,14] have been investigated in the recent past, some of them are RDF-based. Amongst the RDF-based facet browsers, the degree of data exploration is based on the extent of graph navigation made possible by the user interface. Simple RDF-based browsers assume predicates as facets and determine elements to show based on basic facet selection and/or conjunctions or disjunctions of the basic selections. More advanced RDF browsers allow for exploring relationships between subjects based on properties of nodes that they are connected to. For instance, “determine all subjects who know somebody, who in turn knows somebody named ‘Whaling’. However, exploring a large number of pathways or facets of facets (indirect facet chains of arbitrary length [17]) in an efficient manner is still a challenge for RDF-based facted browsers especially since paths may be of arbitrary length and relationships can be defined on arbitrary predicates. Full graph navigation can, however, aid in answering complex questions. Profiling systems typically use a standard ontology (VIVO ontology [10]) for describing RDF data, thus making nodes and properties known apriori. This can help us in optimizing for efficient graph navigation. However, none of the current opensource RDF-based faceted browsers index data based on an underlying ontology thus preventing the user from visualizing the end-result with natural categorizations of the content. In this paper, we present Lens, an open-source faceted browser for research profiling networks that uses RDF data based on the VIVO ontology. Lens can be used to answer a variety of questions about a research organizations, such as determining “who published in what area?”, “which researchers received most funding from a federal program?”, or “show all the faculty in areas of informatics”. Internally, Lens is based on Exhibit, which is a lightweight data publishing plaform and supports RDF data. Our choice of Exhibit is based on two current practices: First research profile servers are often based in IT departments and for security reasons do not support SPARQL end points, which make it hard to support a faceted browser based on server-side computation. Secondly, Exhbit offers a very intuitive user interface that can be used for basic RDF navigation. Exhbit, however, does not support graph navigation or provides support for ontologies. We have extended Exhibit in the following ways: • We improve expressivity of faceted queries in Exhibit to support graph navigation, making it possible to support join queries, inverse join selection or a combination thereof. • We improve the ability of Exhibit to accept ontology-based data and use generic methods to improve the efficiency of graph traversal. In particular, facet metrics, such as predicate and join frequency are used to optimize traversal. • We make Exhibit scalable for large amounts of data by bypassing nodes and predicates in RDF data that are not used in faceted search. Our experiments are conducted on publicly available data about researchers at the University of Chicago. This data currently includes their directory, publication, grant and patent information. The experiments show how computing statistics on the data optimizes graph navigation and these statistics further help us in improving loading and querying time, thus improving the scalability of the data. Lens is being built as a prototype for the university profiling system (http://profiles.uchicago.edu) to help researchers find collaborators and find and explore departmental and institutional strengths. The remainder of the paper is organized as follows. Section 2 presents related work in the areas of researcher profiling platforms and faceted browsing. Section 3 presents Lens, our faceted browser that supports graph navigation but is optimized for graph traversal. We show a prototype that can help a user explore research strengths in an organization with minimal effort. In Section 4, we experimentally evaluate the performance of Lens. We conclude in Section 5. II. RELATED WORK A. Researcher Profiling System Interfaces Exploring relationships is fundamental to research profiling platforms. Towards this, visualization interfaces such as temporal trends, geospatial maps, expertise profiles, and scholarly networks have been explored. These visualizations are useful for aggregating vast amounts of information, but not for classifying information as in faceted search. VIVO supports a faceted search interface that allows a user to classify research products and affiliations of a given person using facets such as organizations, papers, grants, courses, and events. Their implementation constructs a syntactic Lucene text search on available research products and associates one or more products with a given facet. For e.g., if the word “NSF” appears in the same document as the word “Ian Foster” it is associated with the “Grant” facet, even if the document mentions that Ian Foster participated in a NSF workshop, i.e., the word “NSF” appeared in a different context. Thus they do not fully represent relationships of the person with the facet values. B. Faceted Browsers Several faceted browsers have been proposed based on the faceted browsing technique first proposed by Hearst et. al [11]. In the classic faceted model described by Hearst et. al., the items to be searched constitute a single table in a relational database, and every user interaction can be mapped to adding or removing a condition from the WHERE clause of a SQL query. This conjunctive/disjunctive model is usually sufficient for homogeneous data sets, such as online merchandise, or a library catalog in which the tacit assumption is that only one set of items has facets viz. the books or products. However, semi-structured data containing inter-related items of heterogeneous types presents challenges in which facets themselves may be composed of facets. Several RDF-based faceted browsers [5,6,12,13,14] have also been proposed varying on aspects such as browsing, filtering or query expressivity. Oren et. al developed a faceted browser that explores graph-based relationships by expressing joins over source and target RDF nodes [17]. Their implementation however, is based on non-standard ActiveRDF [16], an objectoriented API for arbitrary RDF data. More recently, W3C standard SPARQL-based faceted browsers have been proposed [15][20]. However, the expressivity of join realized in a faceted UI is limited [15] or unknown [20]. None of these RDF browsers are open-source for us to rapidly determine if they can work with scholarly data. Even if they were available, it is not clear that they can build facets by reading typed information from ontology-based RDF data. The ontology aspect is mostly considered outside the scope of the RDF browsers [17]. Exhibit, from MIT, is an open-source, lightweight client-side data-publishing framework that creates Web pages with rich, dynamic visualization of structured data, and with faceted browsing [13]. Exhibit supports RDF data, and being opensource is great for experimentation. It however has limited support for exploring graph-based relationships. In our work, we have used Exhibit but made our joins as expressive as proposed in [5], while exploiting the native VIVO ontology for RDF processing. III. LENS: A FACETED BROWSER FOR RESEARCHER NETWORKING PLATFORMS The Lens faceted browser presents categories of the scholarly domain such as title, organization, publication journal, publication year, and grants program to a user. The browser assists the user in filtering information about the domain by selecting, combining, or joining categories and seeing the resulting information. By interacting with these categories users can query and manipulate scholarly information in an intuitive manner without having to construct logically sophisticated join queries. Thus users can relate topics, publications or grants to people, or vice versa, without a priori knowing the people in a particular area. In this section, we describe how queries are internally constructed, and relationships inferred based on user interactions. We first describe two background topics: (i) the nature of our data resulting from the underlying scholarly ontology, and (ii) basic faceted browsing as available through the open-source Exhibit system. We then describe graph operations on RDF data and show how they are efficiently implemented in Lens, using knowledge of the data. Finally, we show interaction with our prototype system. A. Background Scholarly Data: RDF schema describes classes or resources and their relationships. It defines the type of a resource and all relationship edges have a domain and range constrained to one or more type of resources. The VIVO Ontology [3], which includes classes and properties from external ontologies, such as BIBO [21], SKOS [22], FOAF [8], Event [18], GeoPolitical [1], and Eagle-I [23], is further customized to suit the incoming data from a variety of resources. For instance, we have expanded our own experimental data based on the VIVO ontology with additional data from a variety of sources, including NSF/NIH feeds and web scraping from our own institution. As a result, our data is a “mash-up” of a variety of semantic spaces, without and verifiable formal constraints. So we must empirically determine the domains and ranges of each class in our actual data. This type checking for domains and ranges is necessary for efficient graph processing. Exhibit [13] is a popular interface, which provides a faceted filtering view on graph-based data. Exhibit’s interface presents different facets for a given set of resources. To ‘exhibit’, we need to create the dataset that could be consumed by Exhibit, and the HTML page to present these data. Exhibit supports RDF data by harvesting RDF metadata locally and then translating into JSON format [4] using Babel web service [7]. The simplest realization of a faceted browser in Exhibit is by using resources as RDF subjects, facets as RDF predicates and restriction-values as RDF objects. The key user interaction is faceted filtering. For example, a list of persons might be filtered by selecting ‘Director’ in an ‘Organizational Position’ facet. This action will filter down the list of persons who hold the position of director in an organization. However, if another facet, such as ‘Journal’ is selected along with ‘Organizational Position’, an Exhibit based faceted browser still returns only the list of persons that satisfy both the criteria. In particular, it does not return the list of publications, which satisfied the ‘Journal’ criteria and are related to each of the Director’s obtained through earlier selection criteria. This loss of relationship information in Exhibit is due to its limited declarative language, which only allows for a single collection of items, in this case “persons”, to be rendered. To render publications belonging to persons, Exhibit’s query language needs to support graph navigation on RDF data. The minimalist graph navigation that is possible in Exhibit is if the resource “Publication” is a priori joined to the resource “Person”. However, this is manual work and the underlying RDF schema rarely shows a direct join relationship--several intermediate nodes belonging to different semantic spaces such as BIBO, Event, may connect the Person and Publication resources, thus making it difficult to describe in Exhibit as to how to traverse the graph a priori. B. Graph Operations for Faceted Browsing In this subsection, we explain how graph browsing can be achieved for arbitrary RDF data obtained from scholarly systems. In the next subsection, we demonstrate how this browsing can be efficiently implemented in Exhibit to give a usable, relationship-preserving faceted browser. Given a graph G= (V,E), and source S and target T node, graph navigation implies determining the path that leads from S to T. In facet-based graph navigation, S represents the root node or primary information resource about which information is being sought. For instance, in scholarly domains, nodes of type “Person” form the root nodes---all information desired is in the context of root nodes. T represents the facet nodes, which the user wishes to select and constrain and determine what relevant root nodes are linked to the selected T nodes. Given the RDF schema graph shown in Figure 1(a) and its instantiation in Figure 1(b), central nodes of type “Person” represents the root nodes and other nodes represent possible facets and their facet values. There are two challenges in performing navigation on such graphs: (1) The RDF schema is not known a priori from the RDF data but must be inferred; and (2) It is not evident immediately which target facet nodes are more useful and more important than others in navigating the information space. We address the first challenge by performing type checking on the RDF data to determine some structure of the underlying schema. We perform analytics on nodes to determine which nodes are more useful than others. We explain these operations and then describe our algorithms for graph navigation. other hand, we found that the predicates that are informative for the user are those in which relationship edges are strongly correlated with a specific type of node. We can identify more Type Checking helps to determine if there is a path between source and facet nodes using the domain and range constraints present in RDF schema. For example, if the user selects a Figure 1(b): An instance of the RDF schema useful facets, then, by computing predicate frequency not on individual types, but for each distinct relationship type. Figure 1(a): RDF schema showing types and their relationships value V of a facet node with the name "Publication-Journal" then we know the type of this selection. However, the relationship of the publication item to the primary records of type Person is still unknown. A naive implementation would find “any Person reachable by any path in the graph from a Publication with Journal value V”. This would be inefficient and may probably return the entire graph for most queries. Instead, we can use properties on source and target nodes to determine the most efficient path. For any selection, we can infer the type and the set of objects resulting from the selection. We can then search for a reliable path, i.e., a sequence of non-recurring edge labels that always contain exactly one item of each type in the set (this is only feasible to compute for edge labels in which both source and target nodes are typed). Analytics help us determine relevant target nodes. One of the most predictable analytic is frequency of a potential facet node. A suitable facet occurs frequently inside the collection: the more distinct resources covered by the facet, the more useful it is in dividing the information space [6]. If a facet occurs infrequently, selecting a restriction value for that predicate would result in too many items being returned or affect a small subset of the resources. In [17], Oren, et. al. attempt to compute a metric for facet frequency, to aid in selecting facets that divide the information space in the most useful form. However, we have observed that it is simply not the facet frequency that is crucial in determining the relevance of a facet, but how many of those facets are reliably linked to the root node. In other words, it is the aggregate join frequency of the path from the facet to the root, than simply the frequency of the facet. Our analysis shows that the most frequently occurring predicates are either unique descriptors, which are unsuitable for faceting, or internal data that is uninformative for the general user, such as types and numeric identifiers. On the Having determined the underlying schema to an extent and the target facet nodes, and reliable paths, we are ready to describe the graph navigation algorithms. We define a facet F as <name, <path, property>> tuple, where name is the name of the facet and a resource class in the RDF schema, and <path, property> tuple represents the path from the root node to the set of property nodes whose type is literal. The literal values of property nodes are available for user selection in the faceted UI. An empty path implies the property on the root node directly. Thus the facet “Department” in the graph of Figure 1(a) can be expressed as: <“Department”,<personInPosition, positionInDepartment>,label> Given this definition of a facet we define three kinds of operations on facets: select, exists and join, which are also described in abstract in [17]. Select operation selects nodes that have a direct restriction value. The basic selection allows for example in Figure 1(b) to select CI. Exists operation selects all nodes when a property is true or false for those nodes. However, its exact value is not important. Join operation selects resources based on the properties of the nodes that they are connected to. A typical join will select source resources and not target resources. Target resources are obtained by performing an inverse-join. Path-Join By recursively composing join and select operations, we can traverse the path, and either lists all extant values, or applies a filter corresponding to the user’s selection in the facet, and return the root node. path-join is a recursive function that traverses a path, and a set of target nodes, and returns a set of matching source nodes; i.e., the source nodes which can reach any target node via the path. The target nodes are the list of values selected by the user in the facet’s UI. The path is the path between source and target, obtained a priori through type checking. //P[n]: path of length n; T: set of target nodes i=1 path-join(P[n], T) li = P[i] if (i == n) then return T else join(li, path-join(P[i+1], T) Inverse-Path Join Because our join operation returns the source node, and not the target. To reconstruct the graph fragments corresponding to the structure of the users query, we must construct one or more inverse joins, beginning with the root node and terminating with the property selection, for each constrained type, i.e., each unique path in F. We also suppress return of items of types with no constraints at all, so as to eliminate unnecessary nodes, but this is an optional design detail. //P[n]: path of length n; S: set of source nodes i=1 inverse-path-join(P[n], S) li = P[i] if (i == n) then return S else join(li, inverse-path-join(P[i+1], S) C. Implementation We now describe how basic UI interactions such as facet values, facet selection and facet search are enabled through path joins and inverse path joins. Facet Values builds a list of potential facet selections that are reachable from the root-type. For this, we traverse the path over the entire collection as follows: facet-values(path, property) = path-join(path, exists(property)) Facet Selection If the user makes a selection, we can then use the same path traversal function to apply a suitable filter. An inverse-selection can similarly be defined: facet-selection(path, property, selection) = {v ∈ V| ∃ s ∈ selection: v ∈ path-join(path, exists(select(property, s)) } inverse-selection(item, path, property, selection) = {v ∈ V | ∃ s ∈ selection: v ∈ intersect(inverse-path-join(path, item), select(property, s)) } Facet Search To execute the search in its entirety, we intersect the results of each facet, which has a selection (a facet without selections is understood to be unconstrained). If no facets whatsoever are constrained, then the search returns all elements of the root type. Here, n is the number of facet values. facet-search(F, root-type) = if (F = []) select(“rdf:type”, root-type) else intersect( facet-selection(path(F[n]), property(F[n]), select(F[n])), facet-search( (F[1] ... F[n - 1]), root-type) ) Inverse join can only be implemented after faceted search. This is done through an inverse query function that performs inverse-selection and inverse-path-join. inverse-query(F, item, unique-path) = {v ∈ V | ∀ f ∈ F: v ∈ inverse-selection(source, path(f), property(f), selection(f)) iff |selection(f)| > 0 and path(f) = unique-path} Query Implementation We initiated the project by using Exhibit’s existing query functionality and UI widgets. Each of the facet selection, facet values and facet search were implemented via a pair of Exhibit callback API’s in JavaScript. The onItemsChanged event fires after a user has made a facet selection and the current search results change, and the itemOnShow callback fires anytime a single result item is displayed to the user. By catching both of these, and interrogating the defined path for each facet, we can reconstruct the information necessary for our inverse-pathjoin() function. The implementation consists of two major components: approximately 200 lines of Ruby to perform graph analytics and optimization, and about 400 lines of JavaScript to enhance Exhibit’s functionality to display inverse path queries. Because Exhibit actually stores the unique label for each outgoing link in an object, we evaluate join’s in much the same manner as a relational database, maintaining a list of results as we evaluate each edge of the path. Unlike a relational database, property lookup in JavaScript is a constant-time operation in Google Chrome. We can then generate a representation of the result set in any form; our implementation provides HTML and SVG output. Our prototype is available from the following URL http://profiles.uchicago.edu/facetedbrowser. In order to implement these interactions efficiently, we had to further make modifications to the RDF semantics and make some optimizations on internal nodes. Semantics We found two major mismatches between RDF’s semantic model and Exhibit’s data model. RDF requires the rdf:about URI of an item to be globally unique and identifying, while the rdfs:label property is merely descriptive. However, Exhibit merges items with identical values of rdfs:label, whether they have unique URI’s or not. Thus, for several item types, such as Position, we experienced label collisions--which resulted, in this case, in all identical titles, such as “Professor”, to be merged, even though they belonged to different Departments. We corrected this behavior by generating globally unique numeric labels for each node. Figure 2(a): Lens before graph navigation. The selected facets of ‘Title:Professor’ and ‘Journal:Science’ retrieves only the list of persons that qualify the facet characteristics. It does not retrieve the associated publications or allow the user to classify publications further based on available facets such as “Publications:Year” or “Grants:Program”. Figure 2(b): Lens after graph navigation shows the list of persons with the list of publications that match the selected facets Optimization Beyond purely corrective and analytical processing, we also found several ways to optimize our graph for performance. Since it is very difficult to mechanically distinguish descriptive properties from genuinely uninformative nodes; however, our derived graph of type relationships allowed us to identify a large variety of node types that were never used in a query or a result description: these could be subdivided into two subsets, (1) those nodes that are present for ontological coherence but otherwise contain no data, such as Class and DatatypeProperty, which we simply omitted, and (2) nodes that only occurred in the intermediate steps of a multi-step path, such as Authorship, and could safely be consolidated into a single edge. These optimizations reduced the total number of nodes in our graph by approximately 50%. D. LENS Faceted Browser Figure 2(a) shows the default Lens faceted browser for scholarly data. The browser returns only the list of persons, which is not meaningful for a user even though both person and journal facet is selected. Figure 2(b) shows the faceted browser with graph navigation, and shows how by selecting person details (Professor) and publication (in “Science” journal) details, one can render the relationships of publications and positions of persons on the same UI. In addition users can browse the dataset by constraining one or several of these facets. IV. EXPERIMENTS Profiles Research Networking Tool: Our experiments are conducted on publicly available data about researchers at the University of Chicago (UoC). This data currently includes their directory, publication, grant and patent information. The data contains approximately 6000 nodes, containing 121 Fellows, Senior Fellows, and Faculty of a research institute at UoC. The data is resident in the Profiles Catalyst system, which provides an RDF export function. Even though our dataset is moderate in size it is sufficient for performance characterization. Experimental setup: We implemented this design in Simile Exhibit, client-side faceted browsing framework written in Javascript, due to the fact that it implemented the join, selection, and existential functions we need, while supporting a wide variety of data formats, including RDF. We used Exhibit’s implementations of the facet-search and facet-values functions without modification; the inverse query mechanism was implemented as a Javascript function and injected via Exhibit’s runtime callback API. We performed all testing on a 8-core 3GHx Intel Xeon Mac Pro with 16 GB of RAM. We used version 26.0.1410.65 of Google Chrome for all tests due to its strong JavaScript VM as well as the availability of performance profiling tools. We used a modified version of Exhibit 2.0 -- Exhibit 3.0 was deemed insufficiently stable for these modifications at the time the work began. Exhibit uses property lookup on a single large database object to perform joins, so we expected Chrome’s constant-time property lookup to speed up performance substantially. duplicate types--since both the source and the target of a single edge can have more than one type. Table 2: Type Relationship Analytics C. Loading Figure 3 demonstrates the start-up performance of Lens, not including graph analytics and optimization. Exhibit has RDF import capability built-in; however, it actually performs the conversion by a remote HTTP request to the Simile Babel service, with round-trip latency of a few hundred milliseconds--since this must be performed once per RDF file, it slowed our loading time down by a factor of ten. However, by consolidating the RDF files into a single JSON file in advance, we were able to avoid performing the conversion at load time altogether, taking our total load time from about 20 seconds to 2. A. Data Types Table 1 contains the total frequency of all the RDF types in our data set. Because of Profiles’s subclassing, each node may have one or more types. We've omitted the standard RDF Ontology, DataType, and ObjectProperty classes-although vital for semantic clarity, they are never accessed during query execution. Table 1: Frequency of Types Figure 3: Load Times D. Query Performance B. Edge-Type Correlations Table 2 shows a snapshot of our measurements of the type relationships of each edge label in our graph--for clarity, we’ve omitted literal datatypes and the edges that point at them. The major complicating factor in this table is the We inserted timestamps at three critical places in our code -(1) right before Lens begins processing the query, (2) at the beginning of the inverse-query function, called immediately after the main search routine returns, (3) at the end of the last rendering method after the inverse-query returns, when processing is completed--this allowed us to separate the execution time of Exhibit's Query mechanism from our Inverse-Query function.. We devised two sample queries to compare performance on: a) selective query, a sparse query with very few results, and b) scan query, a very large set of publications. We initially set out to compare performance on data sets of different size--however, our preliminary results showed a minuscule correlation between dataset size and performance--query time was effectively constant; we believe that the inverse-query was slowed somewhat because of the increased number of nodes it traversed while computing the inverse-joins. However, close examination of the performance profiler indicated that HTML rendering and garbage collection far outweighed the impact of doubled collection size. Based on these results, we devised an additional set of measurements to study the correlation of performance to query complexity. Rather than measuring the total number of nodes in the database, in Figure 4 we inserted a counter in the join function to track the number of nodes traversed throughout the inverse-query method, then tested it on 14 queries of varying sizes. As the figure demonstrates, the relative performance of the query and inverse-query functions was somewhat unpredictable; however, a trend is clearly visible in the overall times. Close examination of the logs showed that Chrome's garbage collector was firing at unpredictable times, and taking up to 70 ms to retrieve up to 5MB of discarded object per pass--most likely a consequence of the large arrays generated and discarded at each step in our inverse-join function. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] Figure 4: Query Performance V. [20] CONCLUSION Universities and organizations are increasingly adopting researcher-networking platforms to inform policy decisions. Visualization is key to explore and absorb dense information. This paper demonstrated an open source faceted browser for researcher profiling systems that enables deep exploration and classification of relationships of researchers with their research products. Developed with data of our own institution, the browser is open source and applicable to all scholarly data. We also plan to demonstrate our system in the upcoming VIVO conference for research profiling systems. [21] [22] [23] [24] Bonura Jr, C. J., Situating political culture within the construction of geopolitical ontologies. Rethinking Geopolitics, 86, 1998. Conlon, M., Scholarly Networking Needs and Desires, In VIVO: A Semantic Approach to Scholarly Networking and Discovery: Synthesis Lectures on the Semantic Web: Theory and Technology, Morgan Claypool, 2:1, 2012. Corson-Rikert, J., Mitchell, S., et. al., The VIVO Ontology, In VIVO: A Semantic Approach to Scholarly Networking and Discovery: Synthesis Lectures on the Semantic Web: Theory and Technology, Morgan Claypool, 2:1, 2012. Crockford, D. The application/json media type for javascript object notation (JSON), 2006. Dadzie, A. S., & Rowe, M., Approaches to visualising linked data: A survey. Semantic Web, 2(2), 89-124, 2011. W. Dakka, P. Ipeirotis, and K. Wood. Automatic construction of multifaceted browsing interfaces. In CIKM. 2005. Epperly, T. G., Kumfert, G., Dahlgren, T., Ebner, D., Leek, J., Prantl, A., & Kohn, S., High-performance language interoperability for scientific computing through Babel. International Journal of High Performance Computing Applications, 26(3), 260-274, 2012. FOAF Vocabulary Specification, http://xmlns.com/foaf/spec/ Gewin, V., Collaboration: Social networking seeks critical mass. Nature, 468(7326), 993-994, 2010. Harvard Catalyst Profiles. http://profiles.catalyst.harvard.edu/. Hearst, M.. Clustering versus faceted categories for information exploration. Comm. of the ACM, 46(4), 2006. Heim, P., Ertl, T., & Ziegler, J., Facet graphs: Complex semantic querying made easy. In The Semantic Web: Research and Applications, 288-302. Springer Berlin Heidelberg, 2010. Huynh, D.F., Karger, D.R., Miller, R.C., Exhibit: Lightweight structured data publishing. In: Proc. of the 16th International Conference on World Wide Web, Banff, Canada, 737–746, 2007. Kobilarov, G., & Dickinson, I. Humboldt: Exploring linked data. context, 6, 7, 2008. Maali, F., & Loutas, N., SPARQL 1.1 AND RDF Faceted Browsing, 2012. E. Oren and R. Delbru. ActiveRDF: Object-oriented RDF in Ruby. In Scripting for Semantic Web (ESWC), 2006. Oren, E., Delbru, R., & Decker, S.. Extending faceted navigation for RDF data. In The Semantic Web-ISWC 2006, 559-572, Springer Berlin Heidelberg, 2006. Raimond, Y., & Abdallah, S., The event ontology. Technical Report, 2007. http://motools. sourceforge. net/event, 2007. Research Profiling Systems. http://en.wikipedia.org/wiki/Comparison_of_Research_Networking_Too ls_and_Research_Profiling_Systems. Rozell, E., Fox, P., Zheng, J., & Hendler, J. S2S Architecture and Faceted Browsing Applications. In Proceedings of the 21st international conference companion on World Wide Web, 413-416, 2012. Shotton, D. Cito: The citation typing ontology. Journal of Biomedical Semantics, 1(Suppl 1), S6, 2010. SKOS Core Guide, http://www.w3.org/TR/2005/WD-swbp-skos-coreguide-20051102. Tenenbaum, J. D., et. al.. The Biomedical Resource Ontology (BRO) to enable resource discovery in clinical and translational research. Journal of biomedical informatics, 44(1), 137-145, 2011. Weber, G. M., Barnett, W., Conlon, M., Eichmann, D., Kibbe, W., FalkKrzesinski, H. and Kahlon, M. Direct2Experts: A pilot national network to demonstrate interoperability among research-networking platforms. Journal of the American Medical Informatics Association, 18 (Suppl 1), 157-160, 2011.

Log In

Lens: A Faceted Browser for Research Networking Platforms

Sign up for access to the world's latest research

Related papers

Related topics