Xiufeng Liu

Recently, there has been a shifting focus of organizations and governments towards digiti-zation of academic and technical documents, adding a new facet to the concept of digital libraries. The volume, variety and velocity of this... more

Recently, there has been a shifting focus of organizations and governments towards digiti-zation of academic and technical documents, adding a new facet to the concept of digital libraries. The volume, variety and velocity of this generated data, satisfies the big data definition, as a result of which, this scholarly reserve is popularly referred to as big scholarly data. In order to facilitate data analytics for big scholarly data, architectures and services for the same need to be developed. The evolving nature of research problems has made them essentially interdisciplinary. As a result, there is a growing demand for scholarly applications like collaborator discovery, expert finding and research recommendation systems, in addition to several others. This research paper investigates the current trends and identifies the existing challenges in development of a big scholarly data platform, with specific focus on directions for future research and maps them to the different phases of the big data lifecycle.

Research Interests:
Survey Research, Big Data, and Scholarly Publishing

Download (.pdf)

Smart city data come from heterogeneous sources including various types of the Internet of Things such as traffic, weather, pollution, noise, and portable devices. They are characterized with diverse quality issues and with different... more

Smart city data come from heterogeneous sources including various types of the Internet of Things such as traffic, weather, pollution, noise, and portable devices. They are characterized with diverse quality issues and with different types of sensitive information. This makes data processing and publishing challenging. In this paper, we propose a framework to streamline smart city data management, including data collection, cleansing, anonymiza-tion, and publishing. The paper classifies smart city data in sensitive, quasi-sensitive, and open/public levels and then suggests different strategies to process and publish the data within these categories. The paper evaluates the framework using a real-world smart city data set, and the results verify its effectiveness and efficiency. The framework can be a generic solution to manage smart city data.

Research Interests:
Open Data, Smart Cities, Big Data, and Big Data Platform-as-a-Service

Download (.pdf)

Publication Date: 2014

Publication Name: Proceedings of the 18th International Database Engineering & Applications Symposium on - IDEAS '14

Download (.pdf)

Publication Date: 2015

Publication Name: Communications in Computer and Information Science

—With the prevalence of cloud computing and In-ternet of Things (IoT), smart meters have become one of the main components of smart city strategies. Smart meters generate large amounts of fine-grained data that is used to provide useful... more

—With the prevalence of cloud computing and In-ternet of Things (IoT), smart meters have become one of the main components of smart city strategies. Smart meters generate large amounts of fine-grained data that is used to provide useful information to consumers and utility companies for decision-making. Now-a-days, smart meter analytics systems consist of analytical algorithms that process massive amounts of data. These analytics algorithms require ample amounts of realistic data for testing and verification purposes. However, it is usually difficult to obtain adequate amounts of realistic data, mainly due to privacy issues. This paper proposes a smart meter data generator that can generate realistic energy consumption data by making use of a small real-world data set as seed. The generator generates data using a prediction-based method that depends on historical energy consumption patterns along with Gaussian white noise. In this paper, we comprehensively evaluate the efficiency and effectiveness of the proposed method based on a real-world energy data set.

Research Interests:
Smart Metering, Smart Meter, and Data Generator

Download (.pdf)

General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide... more

General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. ABSTRACT The eGovMon Data Warehouse (eGovMon DW) is built as a data repository for eGovernment services benchmarking results. We propose a DW architecture with open source business intelligence technologies for eGovernment. This DW architecture uses PostgreSQL as the DBMS, eGovernment operational system as the data source, and a right-time ETL tool to populate the data. Through this proposal, we give the potential research interests and issues for our future work.

Research Interests:
Data Warehousing, E-Government, and Data Warehouse

Download (.pdf)

An increasing number of (semantic) web applications store a very large number of (subject, predicate, object) triples in specialized storage engines called triple-stores. Often, triple-stores are used mainly as plain data stores, i.e.,... more

An increasing number of (semantic) web applications store a very large number of (subject, predicate, object) triples in specialized storage engines called triple-stores. Often, triple-stores are used mainly as plain data stores, i.e., for inserting and retrieving large amounts of triples, but not using more advanced features such as logical inference, etc. However, current triple-stores are not optimized for such bulk operations and/or do not support OWL Lite. Further, triple-stores can be inflexible when the data has to be integrated with other kinds of data in non-triple form, e.g., standard relational data. This paper presents 3XL, a triple-store that efficiently supports operations on very large amounts of OWL Lite triples. 3XL also provides the user with high flexibility as it stores data in an object-relational database in a schema that is easy to use and understand. It is, thus, easy to integrate 3XL data with data from other sources. The distinguishing features of 3XL include (a) flexibility as the data is stored in a database, allowing easy integration with other data, and can be queried by means of both triple queries and SQL, (b) using a specialized data-dependent schema (with intelligent partitioning) which is intuitive and efficient to use, (c) using object-relational DBMS features such as inheritance, (d) efficient loading through extensive use of bulk loading and caching, and (e) efficient triple query operations, especially in the important case when the subject and/or predicate is known. Extensive experiments with a PostgreSQL-based implementation show that 3XL performs very well for such operations and that the performance is comparable to state-of-the-art triple-stores.

Research Interests:
Computer Science, Database Systems, Databases, and Triple Store

Download (.pdf)

—This paper demonstrates the use of 3XL, a DBMS-based triple-store for OWL Lite data. 3XL is characterized by its use of a database schema specialized for the data to represent. The specialized database schema uses object-relational... more

—This paper demonstrates the use of 3XL, a DBMS-based triple-store for OWL Lite data. 3XL is characterized by its use of a database schema specialized for the data to represent. The specialized database schema uses object-relational features – particularly inheritance – and partitions the data such that it is fast to locate the needed data when it is queried. Further, the generated database schema is very intuitive and it is thus easy to integrate the OWL data with other kinds of data. 3XL offers performance comparable to the leading file-based triple-stores. We will demonstrate 1) how a specialized database schema is generated by 3XL based on an OWL ontology; 2) how triples are loaded, including how they pass through the 3XL system and how 3XL can be configured to fine-tune performance; and 3) how (simple and complex) queries can be expressed and how they are executed by 3XL.

Research Interests:
Database Systems and Databases

Download (.pdf)

Extract-Transform-Load (ETL) programs process data into data warehouses (DWs). Rapidly growing data volumes demand systems that scale out. Recently, much attention has been given to MapReduce for parallel handling of massive data sets in... more

Extract-Transform-Load (ETL) programs process data into data warehouses (DWs). Rapidly growing data volumes demand systems that scale out. Recently, much attention has been given to MapReduce for parallel handling of massive data sets in cloud environments. Hive is the most widely used RDBMS-like system for DWs on MapReduce and provides scalable analytics. It is, however , challenging to do proper dimensional ETL processing with Hive; e.g., the concept of slowly changing dimensions (SCDs) is not supported (and due to lacking support for UPDATEs, SCDs are complex to handle manually). Also the powerful Pig platform for data processing on MapReduce does not support such dimensional ETL processing. To remedy this, we present the ETL framework CloudETL which uses Hadoop to parallelize ETL execution and to process data into Hive. The user defines the ETL process by means of high-level constructs and transformations and does not have to worry about technical MapReduce details. CloudETL supports different dimensional concepts such as star schemas and SCDs. We present how CloudETL works and uses different performance optimizations including a purpose-specific data placement policy to co-locate data. Further, we present a performance study and compare with other cloud-enabled systems. The results show that CloudETL scales very well and outperforms the dimensional ETL capabilities of Hive both with respect to performance and programmer productivity. For example, Hive uses 3.9 times as long to load an SCD in an experiment and needs 112 statements while CloudETL only needs 4.

Research Interests:
Computer Science, Machine Learning, Database Systems, and Databases

Download (.pdf)

Smart electricity meters have been replacing conventional meters worldwide, enabling automated collection of fine-grained (e.g., every 15 minutes or hourly) consumption data. A variety of smart meter analytics algorithms and applications... more

Smart electricity meters have been replacing conventional meters worldwide, enabling automated collection
of fine-grained (e.g., every 15 minutes or hourly) consumption data. A variety of smart meter analytics
algorithms and applications have been proposed, mainly in the smart grid literature. However, the focus has
been on what can be done with the data rather than how to do it efficiently. In this article, we examine smart
meter analytics from a software performance perspective. First, we design a performance benchmark that
includes common smart meter analytics tasks. These include offline feature extraction and model building
as well as a framework for online anomaly detection that we propose. Second, since obtaining real smart
meter data is difficult due to privacy issues, we present an algorithm for generating large realistic datasets
from a small seed of real data. Third, we implement the proposed benchmark using five representative
platforms: a traditional numeric computing platform (Matlab), a relational DBMS with a built-in machine
learning toolkit (PostgreSQL/MADlib), a main-memory column store (“System C”), and two distributed
data processing platforms (Hive and Spark/Spark Streaming). We compare the five platforms in terms of
application development effort and performance on a multicore machine as well as a cluster of 16 commodity
servers

Research Interests:
Smart Metering, Big Data Analytics, and Big Data / Analytics / Data Mining

Download (.pdf)

Research Interests:
Survey Research, Big Data, and Scholarly Publishing

Research Interests:
Open Data, Smart Cities, Big Data, and Big Data Platform-as-a-Service

Publication Date: 2014

Publication Name: Proceedings of the 18th International Database Engineering & Applications Symposium on - IDEAS '14

Publication Date: 2015

Publication Name: Communications in Computer and Information Science

Research Interests:
Smart Metering, Smart Meter, and Data Generator

Research Interests:
Data Warehousing, E-Government, and Data Warehouse

Research Interests:
Computer Science, Database Systems, Databases, and Triple Store

Research Interests:
Database Systems and Databases

Research Interests:
Computer Science, Machine Learning, Database Systems, and Databases

Research Interests:
Smart Metering, Big Data Analytics, and Big Data / Analytics / Data Mining

Research Interests:
Data Warehousing, Big Data, and ETL

Research Interests:
Cloud Computing and Big Data Analytics

Xiufeng Liu

Research Interests: Survey Research, Big Data, and Scholarly Publishing<div>()</div>

Research Interests: Open Data, Smart Cities, Big Data, and Big Data Platform-as-a-Service<div>()</div>

Publication Date: 2014

Publication Name: Proceedings of the 18th International Database Engineering & Applications Symposium on - IDEAS '14

Publication Date: 2015

Publication Name: Communications in Computer and Information Science

Research Interests: Smart Metering, Smart Meter, and Data Generator<div>()</div>

Research Interests: Data Warehousing, E-Government, and Data Warehouse<div>()</div>

Research Interests: Computer Science, Database Systems, Databases, and Triple Store<div>()</div>

Research Interests: Database Systems and Databases<div>()</div>

Research Interests: Computer Science, Machine Learning, Database Systems, and Databases<div>()</div>

Research Interests: Smart Metering, Big Data Analytics, and Big Data / Analytics / Data Mining<div>()</div>

Research Interests: Data Warehousing, Big Data, and ETL<div>()</div>

Research Interests: Cloud Computing and Big Data Analytics<div>()</div>

Log In

Research Interests:
Survey Research, Big Data, and Scholarly Publishing

Research Interests:
Open Data, Smart Cities, Big Data, and Big Data Platform-as-a-Service

Research Interests:
Smart Metering, Smart Meter, and Data Generator

Research Interests:
Data Warehousing, E-Government, and Data Warehouse

Research Interests:
Computer Science, Database Systems, Databases, and Triple Store

Research Interests:
Database Systems and Databases

Research Interests:
Computer Science, Machine Learning, Database Systems, and Databases

Research Interests:
Smart Metering, Big Data Analytics, and Big Data / Analytics / Data Mining

Research Interests:
Data Warehousing, Big Data, and ETL

Research Interests:
Cloud Computing and Big Data Analytics