Latent Semantic Indexing is a widely used text mining technology nowadays due its effectiveness i... more Latent Semantic Indexing is a widely used text mining technology nowadays due its effectiveness in dealing with the problems of synonymy and polysemy within a proper matrix scale. However LSI is enormously computationally intensive especially for processing large scale data. And effective solution is to increase the computational power available to LSI using multiple computing nodes. In this paper we propose a novel MapReduce based distributed LSI using Hadoop distributed computing architecture to implement K-means algorithm to cluster the documents and then using LSI on the clustered results. We evaluated the performances of the proposed MapReduce based LSI and comparison are made with standalone LSI. The results show a great improvement of LSI's performance in terms of speed.
The past decade has seen a rapid development in content based image retrieval (CBIR). CBIR is the... more The past decade has seen a rapid development in content based image retrieval (CBIR). CBIR is the retrieval of images based on their low level features such as color, texture, shape etc. To improve the retrieval accuracy, the research focus has been shifted from designing sophisticated low-level feature extraction algorithms to reducing the `semantic gap' between the visual features and the richness of human semantics. Image annotation techniques have been proposed to facilitate CBIR. This paper evaluates 7 representative machine learning techniques for automatic image annotations using 5000 images. An image annotation prototype is implemented and the evaluation results are presented and analyzed.
Healthcare data and medical information need to be seamlessly accessible and available at all tim... more Healthcare data and medical information need to be seamlessly accessible and available at all times to the various healthcare stakeholders. Inability to share, integrate and access critical healthcare information is a challenge for the healthcare IT. Moreover, semantic interoperability of health-related heterogeneous data sources is a challenging issue and healthGrids are expected to address this challenge in a systematic manner. This paper proposes a new architecture: ASIDS (architecture for semantic integration of data sources), that could be a potential candidate for solving the challenge of semantic interoperability of geographically distributed heterogeneous data sources. ASIDS has three main components that are loosely coupled (through interfaces) in a distributed manner. This architecture sets the basis for future research in terms of implementing a healthGrid application in real environments.
A Problem Solving Environment (PSE) should aim to hide implementation and systems details from ap... more A Problem Solving Environment (PSE) should aim to hide implementation and systems details from application developers, to enable a scientist or engineer to concentrate on the science. A PSE is, by definition, problem domain specific, but the infrastructure for a PSE can be problem domain independent. A domain independent infrastructure for a PSE is described, followed by two application dependent PSEs for Molecular Dynamics and Boundary Element codes that make use of our generic PSE infrastructure.
The architecture of a component based environment for constructing scientific applications — gene... more The architecture of a component based environment for constructing scientific applications — generally referred to as a Problem Solving Environment (PSE), is described. Each component is a self-contained program, and may be a sequential code developed in C, Fortran or Java, or may contain internal parallelism using MPI or PVM libraries. A user visually constructs an application by combining components from a local or remote repository as a data flow graph. Components are self-documenting, with their interfaces defined in XML, which enables a user to search for components suitable to a particular application, enables a component to be configured when instantiated, enables each component to register with an event listener and facilitates the sharing of components between repositories. The data flow graph is also encoded in XML, and sent to a resource manager for executing the application on a workstation cluster, or a heterogeneous environment made of workstations and high performance parallel machines. Components in the PSE can also wrap legacy codes. We also describe the architecture and implementation of a molecular dynamics application based on the Lennard-Jones code [18], containing MPI calls, executed on a cluster of workstations, and based on our generic component model. A user can submit simulation data to the application remotely using a Java based user interface. Users need not download any softwares for the simulation and do not need to know the exact implementation.
Information services play a crucial role in grid environments in that the state information can b... more Information services play a crucial role in grid environments in that the state information can be used to facilitate the discovery of resources and the services available to meet user requirements, and also to help tune the performance of a grid system. However, the large size and dynamic nature of the grid brings forth a number of challenges for information services. This paper presents PIndex, a grouped peer-to-peer network that can be used for scalable grid information services. PIndex builds on Globus MDS4, but introduces peer groups to dynamically split the large grid information search space into many small sections to enhance its scalability and resilience. PIndex is subsequently modeled with Colored Petri Nets for performance evaluation. The simulation results show that PIndex is scalable and resilient in dealing with a large number of peer nodes.
Abstract Grid is computer-based infrastructure that provides dependable, consistent, pervasive ac... more Abstract Grid is computer-based infrastructure that provides dependable, consistent, pervasive access to distributed resources. Built on top of a Grid, a Semantic Grid is a service-oriented infrastructure that provides a range of computation, information and knowledge services. A purpose of a Grid portal is to provide easy and seamless access to Grid heterogeneous resources and services through a Web-based user interface. This paper presents PortalLab, a Web Services oriented toolkit for designing, integrating and building ...
Latent Semantic Indexing is a widely used text mining technology nowadays due its effectiveness i... more Latent Semantic Indexing is a widely used text mining technology nowadays due its effectiveness in dealing with the problems of synonymy and polysemy within a proper matrix scale. However LSI is enormously computationally intensive especially for processing large scale data. And effective solution is to increase the computational power available to LSI using multiple computing nodes. In this paper we propose a novel MapReduce based distributed LSI using Hadoop distributed computing architecture to implement K-means algorithm to cluster the documents and then using LSI on the clustered results. We evaluated the performances of the proposed MapReduce based LSI and comparison are made with standalone LSI. The results show a great improvement of LSI's performance in terms of speed.
The past decade has seen a rapid development in content based image retrieval (CBIR). CBIR is the... more The past decade has seen a rapid development in content based image retrieval (CBIR). CBIR is the retrieval of images based on their low level features such as color, texture, shape etc. To improve the retrieval accuracy, the research focus has been shifted from designing sophisticated low-level feature extraction algorithms to reducing the `semantic gap' between the visual features and the richness of human semantics. Image annotation techniques have been proposed to facilitate CBIR. This paper evaluates 7 representative machine learning techniques for automatic image annotations using 5000 images. An image annotation prototype is implemented and the evaluation results are presented and analyzed.
Healthcare data and medical information need to be seamlessly accessible and available at all tim... more Healthcare data and medical information need to be seamlessly accessible and available at all times to the various healthcare stakeholders. Inability to share, integrate and access critical healthcare information is a challenge for the healthcare IT. Moreover, semantic interoperability of health-related heterogeneous data sources is a challenging issue and healthGrids are expected to address this challenge in a systematic manner. This paper proposes a new architecture: ASIDS (architecture for semantic integration of data sources), that could be a potential candidate for solving the challenge of semantic interoperability of geographically distributed heterogeneous data sources. ASIDS has three main components that are loosely coupled (through interfaces) in a distributed manner. This architecture sets the basis for future research in terms of implementing a healthGrid application in real environments.
A Problem Solving Environment (PSE) should aim to hide implementation and systems details from ap... more A Problem Solving Environment (PSE) should aim to hide implementation and systems details from application developers, to enable a scientist or engineer to concentrate on the science. A PSE is, by definition, problem domain specific, but the infrastructure for a PSE can be problem domain independent. A domain independent infrastructure for a PSE is described, followed by two application dependent PSEs for Molecular Dynamics and Boundary Element codes that make use of our generic PSE infrastructure.
The architecture of a component based environment for constructing scientific applications — gene... more The architecture of a component based environment for constructing scientific applications — generally referred to as a Problem Solving Environment (PSE), is described. Each component is a self-contained program, and may be a sequential code developed in C, Fortran or Java, or may contain internal parallelism using MPI or PVM libraries. A user visually constructs an application by combining components from a local or remote repository as a data flow graph. Components are self-documenting, with their interfaces defined in XML, which enables a user to search for components suitable to a particular application, enables a component to be configured when instantiated, enables each component to register with an event listener and facilitates the sharing of components between repositories. The data flow graph is also encoded in XML, and sent to a resource manager for executing the application on a workstation cluster, or a heterogeneous environment made of workstations and high performance parallel machines. Components in the PSE can also wrap legacy codes. We also describe the architecture and implementation of a molecular dynamics application based on the Lennard-Jones code [18], containing MPI calls, executed on a cluster of workstations, and based on our generic component model. A user can submit simulation data to the application remotely using a Java based user interface. Users need not download any softwares for the simulation and do not need to know the exact implementation.
Information services play a crucial role in grid environments in that the state information can b... more Information services play a crucial role in grid environments in that the state information can be used to facilitate the discovery of resources and the services available to meet user requirements, and also to help tune the performance of a grid system. However, the large size and dynamic nature of the grid brings forth a number of challenges for information services. This paper presents PIndex, a grouped peer-to-peer network that can be used for scalable grid information services. PIndex builds on Globus MDS4, but introduces peer groups to dynamically split the large grid information search space into many small sections to enhance its scalability and resilience. PIndex is subsequently modeled with Colored Petri Nets for performance evaluation. The simulation results show that PIndex is scalable and resilient in dealing with a large number of peer nodes.
Abstract Grid is computer-based infrastructure that provides dependable, consistent, pervasive ac... more Abstract Grid is computer-based infrastructure that provides dependable, consistent, pervasive access to distributed resources. Built on top of a Grid, a Semantic Grid is a service-oriented infrastructure that provides a range of computation, information and knowledge services. A purpose of a Grid portal is to provide easy and seamless access to Grid heterogeneous resources and services through a Web-based user interface. This paper presents PortalLab, a Web Services oriented toolkit for designing, integrating and building ...
Uploads
Papers by Maozhen Li