Abstract: In this paper we propose several significant advances in online employment services. We... more Abstract: In this paper we propose several significant advances in online employment services. We address issues such as privacy, interaction, and scalability to worldwide services. The details of each user are managed in completely relevant-to-service formats and both jobseekers & employers have considerable control over just how many of their own personal details are visible at any time.
Abstract The rapid advance of computer technologies in data processing, collection, and storage h... more Abstract The rapid advance of computer technologies in data processing, collection, and storage has provided unparalleled opportunities to expand capabilities in production, services, communications, and research. However, immense quantities of high-dimensional data renew the challenges to the state-of-the-art data mining techniques. Feature selection is an effective technique for dimension reduction and an essential step in successful data mining applications.
[1] Erwin D H. The Great Paleozoic Crisis: Life and Death in the Permian [M]. New York: Columbia ... more [1] Erwin D H. The Great Paleozoic Crisis: Life and Death in the Permian [M]. New York: Columbia Univ Press, 1993.1-327. [2] Vogt P R. Evidence for global synchronism in mantle plume convection and possible significance for geology [J]. Nature, 1972, 240:338-342. [3] Lamb H H. Volcanic dust in the atmosphere; with a chronology and assessment of its meteorological significance [J]. Philosophical Transactions of the Royal Society of London, 1970, A266: 425-533. [4] Newell R E. Introduction [J].
Abstract The successful execution of grasping by a robot hand requires translation of visual info... more Abstract The successful execution of grasping by a robot hand requires translation of visual information into control signals to the hand, which produce the desired spatial orientation and preshape for grasping an arbitrary object. An approach to this problem that is based on separation of the task into two modules is presented. A vision module is used to transform an image into a volumetric shape description using generalized cones.
Abstract Discretization can turn numeric attributes into discrete ones. Feature selection can eli... more Abstract Discretization can turn numeric attributes into discrete ones. Feature selection can eliminate some irrelevant and/or redundant attributes. Chi2 is a simple and general algorithm that uses the χ 2 statistic to discretize numeric attributes repeatedly until some inconsistencies are found in the data. It achieves feature selection via discretization. It can handle mixed attributes, work with multiclass data, and remove irrelevant and redundant attributes
Abstract This paper introduces concepts and algorithms of feature selection, surveys existing fea... more Abstract This paper introduces concepts and algorithms of feature selection, surveys existing feature selection algorithms for classification and clustering, groups and compares different algorithms with a categorizing framework based on search strategies, evaluation criteria, and data mining tasks, reveals unattempted combinations, and provides guidelines in selecting feature selection algorithms. With the categorizing framework, we continue our efforts toward-building an integrated system for intelligent feature selection.
Abstract As the competition of Web search market increases, there is a high demand for personaliz... more Abstract As the competition of Web search market increases, there is a high demand for personalized Web search to conduct retrieval incorporating Web users' information needs. This paper focuses on utilizing clickthrough data to improve Web search. Since millions of searches are conducted everyday, a search engine accumulates a large volume of clickthrough data, which records who submits queries and which pages he/she clicks on.
Abstract: Due to the digitization of data and advances in technology, it has become extremely eas... more Abstract: Due to the digitization of data and advances in technology, it has become extremely easy to obtain and store large quantities of data, particularly Multimedia data. Fields ranging from Commercial to Military need to analyze these data in an efficient and fast manner. Presently, tools for mining images are few and require human intervention. Feature selection and extraction is the pre-processing step of Image Mining. Obviously this is a critical step in the entire scenario of Image Mining.
Abstract Blogosphere is expanding in an unprecedented speed. A better understanding of the blogos... more Abstract Blogosphere is expanding in an unprecedented speed. A better understanding of the blogosphere can greatly facilitate the development of the Social Web to serve the needs of users, service providers and advertisers. One important task in this process is clustering blog sites. Clustering blog sites presents new challenges. We propose to tap into collective wisdom in clustering blog sites, present statistical and visual results, report findings, and suggest future work extending to many real-world applications.
Abstract The study of collective behavior is to understand how individuals behave in a social net... more Abstract The study of collective behavior is to understand how individuals behave in a social network environment. Oceans of data generated by social media like Facebook, Twitter, Flickr and YouTube present opportunities and challenges to studying collective behavior in a large scale. In this work, we aim to learn to predict collective behavior in social media. In particular, given information about some individuals, how can we infer the behavior of unobserved individuals in the same network?
ABSTRACT Feature selection is defined as a problem to find a minimum set of M features for an ind... more ABSTRACT Feature selection is defined as a problem to find a minimum set of M features for an inductive algorithm to achieve the highest predictive accuracy from the data described by the original Ar features where.!/<. Y. A probabilistic wrapper model is proposed as another method besides the exhaustive search and the heuristic approach. The aim of this model is to avoid local minima and exhaustive search. The highest predictive accuracy is the criterion in search of the smallest M.
The problem of inferring a user's intentions in Machine–Human Interaction has been the key resear... more The problem of inferring a user's intentions in Machine–Human Interaction has been the key research issue for providing personalized experiences and services. In this paper, we propose novel approaches on modeling and inferring user's actions in a computer. Two linguistic features–keyword and concept features–are extracted from the semantic context for intention modeling. Concept features are the conceptual generalization of keywords. Association rule mining is used to find the proper concept of corresponding keyword.
Abstract. Researchers from the same lab often spend a considerable amount of time searching for p... more Abstract. Researchers from the same lab often spend a considerable amount of time searching for published articles relevant to their current project. Despite having similar interests, they conduct independent, time consuming searches. While they may share the results afterwards, they are unable to leverage previous search results during the search process.
Feature selection, as a preprocessing step to machine learning, has been very effective in reduci... more Feature selection, as a preprocessing step to machine learning, has been very effective in reducing dimensionality, removing irrelevant data, increasing learning accuracy, and improving result comprehensibility. Traditional feature selection methods resort to random sampling in dealing with data sets with a huge number of instances. In this paper, we introduce the concept of active feature selection, and investigate a selective sampling approach to active feature selection in a filter model setting.
ABSTRACT A good distance metric is crucial for many data mining tasks. To learn a metric in the u... more ABSTRACT A good distance metric is crucial for many data mining tasks. To learn a metric in the unsupervised setting, most metric learning algorithms project observed data to a lowdimensional manifold, where geometric relationships such as pairwise distances are preserved. It can be extended to the nonlinear case by applying the kernel trick, which embeds the data into a feature space by specifying the kernel function that computes the dot products between data points in the feature space.
Abstract The increasing popularity of social media is shortening the distance between people. Soc... more Abstract The increasing popularity of social media is shortening the distance between people. Social activities, eg, tagging in Flickr, book marking in Delicious, twittering in Twitter, etc. are reshaping people's social life and redefining their social roles. People with shared interests tend to form their groups in social media, and users within the same community likely exhibit similar social behavior (eg, going for the same movies, having similar political viewpoints), which in turn reinforces the community structure.
Abstract The SocioDim framework demonstrates promising results toward predicting collective behav... more Abstract The SocioDim framework demonstrates promising results toward predicting collective behavior. However, many challenges require further research. For example, networks in social media are continually evolving, with new members joining a network and new connections established between existing members each day. This dynamic nature of networks entails efficient update of the model for collective behavior prediction.
Abstract A topic taxonomy is an effective representation that describes salient features of virtu... more Abstract A topic taxonomy is an effective representation that describes salient features of virtual groups or online communities. A topic taxonomy consists of topic nodes. Each internal node is defined by its vertical path (ie, ancestor and child nodes) and its horizonal list of attributes (or terms). In a text-dominant environment, a topic taxonomy can be used to flexibly describe a group's interests with varying granularity. However, the stagnant nature of a taxonomy may fail to timely capture the dynamic change of a group's interest.
Abstract Transverse flux permanent magnet machine (TFPMM) offers a higher power density than the ... more Abstract Transverse flux permanent magnet machine (TFPMM) offers a higher power density than the conventional radial and axial ones. Based on the principle of TFPMM, a novel flux switching transverse flux PM generator (FS-TFPMG) with a unique structure is presented for low speed wind power applications in this paper. The stator space utilization can be improved by arranging more stator cores and the flux density in air-gap can be gathered up by adopting a novel topology.
Abstract Feature selection is often applied to high-dimensional data as a preprocessing step in t... more Abstract Feature selection is often applied to high-dimensional data as a preprocessing step in text classification. When dealing with highly skewed data, we observe that typical feature selection metrics like information gain or chi-squared are biased toward selecting features for the minor class, and the metric of bi-normal separation can select features for both minor and major classes.
Abstract: In this paper we propose several significant advances in online employment services. We... more Abstract: In this paper we propose several significant advances in online employment services. We address issues such as privacy, interaction, and scalability to worldwide services. The details of each user are managed in completely relevant-to-service formats and both jobseekers & employers have considerable control over just how many of their own personal details are visible at any time.
Abstract The rapid advance of computer technologies in data processing, collection, and storage h... more Abstract The rapid advance of computer technologies in data processing, collection, and storage has provided unparalleled opportunities to expand capabilities in production, services, communications, and research. However, immense quantities of high-dimensional data renew the challenges to the state-of-the-art data mining techniques. Feature selection is an effective technique for dimension reduction and an essential step in successful data mining applications.
[1] Erwin D H. The Great Paleozoic Crisis: Life and Death in the Permian [M]. New York: Columbia ... more [1] Erwin D H. The Great Paleozoic Crisis: Life and Death in the Permian [M]. New York: Columbia Univ Press, 1993.1-327. [2] Vogt P R. Evidence for global synchronism in mantle plume convection and possible significance for geology [J]. Nature, 1972, 240:338-342. [3] Lamb H H. Volcanic dust in the atmosphere; with a chronology and assessment of its meteorological significance [J]. Philosophical Transactions of the Royal Society of London, 1970, A266: 425-533. [4] Newell R E. Introduction [J].
Abstract The successful execution of grasping by a robot hand requires translation of visual info... more Abstract The successful execution of grasping by a robot hand requires translation of visual information into control signals to the hand, which produce the desired spatial orientation and preshape for grasping an arbitrary object. An approach to this problem that is based on separation of the task into two modules is presented. A vision module is used to transform an image into a volumetric shape description using generalized cones.
Abstract Discretization can turn numeric attributes into discrete ones. Feature selection can eli... more Abstract Discretization can turn numeric attributes into discrete ones. Feature selection can eliminate some irrelevant and/or redundant attributes. Chi2 is a simple and general algorithm that uses the χ 2 statistic to discretize numeric attributes repeatedly until some inconsistencies are found in the data. It achieves feature selection via discretization. It can handle mixed attributes, work with multiclass data, and remove irrelevant and redundant attributes
Abstract This paper introduces concepts and algorithms of feature selection, surveys existing fea... more Abstract This paper introduces concepts and algorithms of feature selection, surveys existing feature selection algorithms for classification and clustering, groups and compares different algorithms with a categorizing framework based on search strategies, evaluation criteria, and data mining tasks, reveals unattempted combinations, and provides guidelines in selecting feature selection algorithms. With the categorizing framework, we continue our efforts toward-building an integrated system for intelligent feature selection.
Abstract As the competition of Web search market increases, there is a high demand for personaliz... more Abstract As the competition of Web search market increases, there is a high demand for personalized Web search to conduct retrieval incorporating Web users' information needs. This paper focuses on utilizing clickthrough data to improve Web search. Since millions of searches are conducted everyday, a search engine accumulates a large volume of clickthrough data, which records who submits queries and which pages he/she clicks on.
Abstract: Due to the digitization of data and advances in technology, it has become extremely eas... more Abstract: Due to the digitization of data and advances in technology, it has become extremely easy to obtain and store large quantities of data, particularly Multimedia data. Fields ranging from Commercial to Military need to analyze these data in an efficient and fast manner. Presently, tools for mining images are few and require human intervention. Feature selection and extraction is the pre-processing step of Image Mining. Obviously this is a critical step in the entire scenario of Image Mining.
Abstract Blogosphere is expanding in an unprecedented speed. A better understanding of the blogos... more Abstract Blogosphere is expanding in an unprecedented speed. A better understanding of the blogosphere can greatly facilitate the development of the Social Web to serve the needs of users, service providers and advertisers. One important task in this process is clustering blog sites. Clustering blog sites presents new challenges. We propose to tap into collective wisdom in clustering blog sites, present statistical and visual results, report findings, and suggest future work extending to many real-world applications.
Abstract The study of collective behavior is to understand how individuals behave in a social net... more Abstract The study of collective behavior is to understand how individuals behave in a social network environment. Oceans of data generated by social media like Facebook, Twitter, Flickr and YouTube present opportunities and challenges to studying collective behavior in a large scale. In this work, we aim to learn to predict collective behavior in social media. In particular, given information about some individuals, how can we infer the behavior of unobserved individuals in the same network?
ABSTRACT Feature selection is defined as a problem to find a minimum set of M features for an ind... more ABSTRACT Feature selection is defined as a problem to find a minimum set of M features for an inductive algorithm to achieve the highest predictive accuracy from the data described by the original Ar features where.!/<. Y. A probabilistic wrapper model is proposed as another method besides the exhaustive search and the heuristic approach. The aim of this model is to avoid local minima and exhaustive search. The highest predictive accuracy is the criterion in search of the smallest M.
The problem of inferring a user's intentions in Machine–Human Interaction has been the key resear... more The problem of inferring a user's intentions in Machine–Human Interaction has been the key research issue for providing personalized experiences and services. In this paper, we propose novel approaches on modeling and inferring user's actions in a computer. Two linguistic features–keyword and concept features–are extracted from the semantic context for intention modeling. Concept features are the conceptual generalization of keywords. Association rule mining is used to find the proper concept of corresponding keyword.
Abstract. Researchers from the same lab often spend a considerable amount of time searching for p... more Abstract. Researchers from the same lab often spend a considerable amount of time searching for published articles relevant to their current project. Despite having similar interests, they conduct independent, time consuming searches. While they may share the results afterwards, they are unable to leverage previous search results during the search process.
Feature selection, as a preprocessing step to machine learning, has been very effective in reduci... more Feature selection, as a preprocessing step to machine learning, has been very effective in reducing dimensionality, removing irrelevant data, increasing learning accuracy, and improving result comprehensibility. Traditional feature selection methods resort to random sampling in dealing with data sets with a huge number of instances. In this paper, we introduce the concept of active feature selection, and investigate a selective sampling approach to active feature selection in a filter model setting.
ABSTRACT A good distance metric is crucial for many data mining tasks. To learn a metric in the u... more ABSTRACT A good distance metric is crucial for many data mining tasks. To learn a metric in the unsupervised setting, most metric learning algorithms project observed data to a lowdimensional manifold, where geometric relationships such as pairwise distances are preserved. It can be extended to the nonlinear case by applying the kernel trick, which embeds the data into a feature space by specifying the kernel function that computes the dot products between data points in the feature space.
Abstract The increasing popularity of social media is shortening the distance between people. Soc... more Abstract The increasing popularity of social media is shortening the distance between people. Social activities, eg, tagging in Flickr, book marking in Delicious, twittering in Twitter, etc. are reshaping people's social life and redefining their social roles. People with shared interests tend to form their groups in social media, and users within the same community likely exhibit similar social behavior (eg, going for the same movies, having similar political viewpoints), which in turn reinforces the community structure.
Abstract The SocioDim framework demonstrates promising results toward predicting collective behav... more Abstract The SocioDim framework demonstrates promising results toward predicting collective behavior. However, many challenges require further research. For example, networks in social media are continually evolving, with new members joining a network and new connections established between existing members each day. This dynamic nature of networks entails efficient update of the model for collective behavior prediction.
Abstract A topic taxonomy is an effective representation that describes salient features of virtu... more Abstract A topic taxonomy is an effective representation that describes salient features of virtual groups or online communities. A topic taxonomy consists of topic nodes. Each internal node is defined by its vertical path (ie, ancestor and child nodes) and its horizonal list of attributes (or terms). In a text-dominant environment, a topic taxonomy can be used to flexibly describe a group's interests with varying granularity. However, the stagnant nature of a taxonomy may fail to timely capture the dynamic change of a group's interest.
Abstract Transverse flux permanent magnet machine (TFPMM) offers a higher power density than the ... more Abstract Transverse flux permanent magnet machine (TFPMM) offers a higher power density than the conventional radial and axial ones. Based on the principle of TFPMM, a novel flux switching transverse flux PM generator (FS-TFPMG) with a unique structure is presented for low speed wind power applications in this paper. The stator space utilization can be improved by arranging more stator cores and the flux density in air-gap can be gathered up by adopting a novel topology.
Abstract Feature selection is often applied to high-dimensional data as a preprocessing step in t... more Abstract Feature selection is often applied to high-dimensional data as a preprocessing step in text classification. When dealing with highly skewed data, we observe that typical feature selection metrics like information gain or chi-squared are biased toward selecting features for the minor class, and the metric of bi-normal separation can select features for both minor and major classes.
Uploads
Papers by Huan Liu