Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020
Apache Flink is an open-source system for scalable processing of batch and streaming data. Flink ... more Apache Flink is an open-source system for scalable processing of batch and streaming data. Flink does not natively support efficient processing of spatial data streams, which is a requirement of many applications dealing with spatial data. Besides Flink, other scalable spatial data processing platforms including GeoSpark, Spatial Hadoop, etc. do not support streaming workloads and can only handle static/batch workloads. To fill this gap, we present GeoFlink, which extends Apache Flink to support spatial data types, indexes and continuous queries over spatial data streams. To enable efficient processing of spatial continuous queries and for the effective data distribution across Flink cluster nodes, a gird-based index is introduced. GeoFlink currently supports spatial range, spatial kNN and spatial join queries on point data type. An experimental study on real spatial data streams shows that GeoFlink achieves significantly higher query throughput than ordinary Flink processing.
Due to the rapid growth of stream data being generated by sensors, micro-blogs, e-businesses, etc... more Due to the rapid growth of stream data being generated by sensors, micro-blogs, e-businesses, etc., many organizations require on-line processing of their data for real time analysis and actionable alerts. It is not possible to process such voluminous and velocious data in real time using the traditional centralized stream processing engines. Hence distributed stream processing has emerged to facilitate such large scale real time processing. In this work we present a smart distributed event-driven stream processing approach. In contrast to the ordinary stream processing, event-driven stream processing generates query results on the occurrence of specified events only. In the basic event-driven stream processing, even when no event is raised input stream tuples are continuously processed by query operators, though they do not generate any query result. This results in increased system load and wastage of system resources. Whereas in the smart event-driven stream processing scheme, in...
Uncertain data management, querying and mining have become important because the majority of real... more Uncertain data management, querying and mining have become important because the majority of real world data is accompanied with uncertainty these days. Uncertainty in data is often caused by the deficiency in underlying data collecting equipments or sometimes manually introduced to preserve data privacy. The uncertainty information in the data is useful and can be used to improve the quality of the underlying results. Therefore in this dissertation, three problems are being solved related to outlier detection on uncertain data. 1) Distancebased outlier detection on uncertain data: In this research, we give a novel definition of distance-based outliers on uncertain data. Since the distance probability computation is expensive, a cell-based approach is proposed to index the dataset objects and to speed up the outlier detection process. The cell-based approach identifies and prunes the cells containing only inliers based on its bounds on outlier score (#D-neighbors). Similarly it can ...
Location has always been a primary concern for business startups to be successful. Therefore, muc... more Location has always been a primary concern for business startups to be successful. Therefore, much research has focused on the problem of identification of an ideal business site for a new business. The process of ideal business site selection is complex and depends on a number of criteria or factors. Since the ultimate goal of all businesses is to increase customer footprints and to thus increase sales, criteria including traffic accessibility, visibility, ease of access, vehicle parking, customers availability, etc. play important roles. In other words, we can say that optimal business site selection is a multi-criteria decision-making (MCDM) problem. MCDM is used to identify an optimal solution or decision out of many alternatives by utilizing a number of criteria. In mathematics, there exist a number of structured techniques for organizing and analyzing complex decisions, for instance, AHP, ANP, TOPSIS, etc. In this work, we present a hybrid of two such techniques to solve the M...
Modern robotic exploratory strategies assume multi-agent cooperation that raises a need for an ef... more Modern robotic exploratory strategies assume multi-agent cooperation that raises a need for an effective exchange of acquired scans of the environment with the absence of a reliable global positioning system. In such situations, agents compare the scans of the outside world to determine if they overlap in some region, and if they do so, they determine the right matching between them. The process of matching multiple point-cloud scans is called point-cloud registration. Using the existing point-cloud registration approaches, a good match between any two-point-clouds is achieved if and only if there exists a large overlap between them, however, this limits the advantage of using multiple robots, for instance, for time-effective 3D mapping. Hence, a point-cloud registration approach is highly desirable if it can work with low overlapping scans. This work proposes a novel solution for the point-cloud registration problem with a very low overlapping area between the two scans. In doing s...
Apache Flink is an open-source system for the scalable processing of batch and streaming data. Fl... more Apache Flink is an open-source system for the scalable processing of batch and streaming data. Flink does not natively support efficient processing of spatial data streams, which is the requirement of many applications dealing with spatial data. Besides Flink, other scalable spatial data processing platforms including GeoSpark, Spatial Hadoop, GeoMesa and Parallel Secondo do not support streaming workloads and can only handle static/batch workloads. Hence this work presents GeoFlink, which extends Apache Flink to support spatial data types, index and continuous queries. To enable efficient processing of continuous spatial queries and for the effective data distribution among the Flink cluster nodes, a grid-based index is introduced. The grid index enables the pruning of spatial objects which cannot be part of a spatial query result and thus can guarantee efficient query processing, similarly it helps in preserving spatial data proximity, hence resulting in effective data distributio...
Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020
Apache Flink is an open-source system for scalable processing of batch and streaming data. Flink ... more Apache Flink is an open-source system for scalable processing of batch and streaming data. Flink does not natively support efficient processing of spatial data streams, which is a requirement of many applications dealing with spatial data. Besides Flink, other scalable spatial data processing platforms including GeoSpark, Spatial Hadoop, etc. do not support streaming workloads and can only handle static/batch workloads. To fill this gap, we present GeoFlink, which extends Apache Flink to support spatial data types, indexes and continuous queries over spatial data streams. To enable efficient processing of spatial continuous queries and for the effective data distribution across Flink cluster nodes, a gird-based index is introduced. GeoFlink currently supports spatial range, spatial kNN and spatial join queries on point data type. An experimental study on real spatial data streams shows that GeoFlink achieves significantly higher query throughput than ordinary Flink processing.
Due to the rapid growth of stream data being generated by sensors, micro-blogs, e-businesses, etc... more Due to the rapid growth of stream data being generated by sensors, micro-blogs, e-businesses, etc., many organizations require on-line processing of their data for real time analysis and actionable alerts. It is not possible to process such voluminous and velocious data in real time using the traditional centralized stream processing engines. Hence distributed stream processing has emerged to facilitate such large scale real time processing. In this work we present a smart distributed event-driven stream processing approach. In contrast to the ordinary stream processing, event-driven stream processing generates query results on the occurrence of specified events only. In the basic event-driven stream processing, even when no event is raised input stream tuples are continuously processed by query operators, though they do not generate any query result. This results in increased system load and wastage of system resources. Whereas in the smart event-driven stream processing scheme, in...
Uncertain data management, querying and mining have become important because the majority of real... more Uncertain data management, querying and mining have become important because the majority of real world data is accompanied with uncertainty these days. Uncertainty in data is often caused by the deficiency in underlying data collecting equipments or sometimes manually introduced to preserve data privacy. The uncertainty information in the data is useful and can be used to improve the quality of the underlying results. Therefore in this dissertation, three problems are being solved related to outlier detection on uncertain data. 1) Distancebased outlier detection on uncertain data: In this research, we give a novel definition of distance-based outliers on uncertain data. Since the distance probability computation is expensive, a cell-based approach is proposed to index the dataset objects and to speed up the outlier detection process. The cell-based approach identifies and prunes the cells containing only inliers based on its bounds on outlier score (#D-neighbors). Similarly it can ...
Location has always been a primary concern for business startups to be successful. Therefore, muc... more Location has always been a primary concern for business startups to be successful. Therefore, much research has focused on the problem of identification of an ideal business site for a new business. The process of ideal business site selection is complex and depends on a number of criteria or factors. Since the ultimate goal of all businesses is to increase customer footprints and to thus increase sales, criteria including traffic accessibility, visibility, ease of access, vehicle parking, customers availability, etc. play important roles. In other words, we can say that optimal business site selection is a multi-criteria decision-making (MCDM) problem. MCDM is used to identify an optimal solution or decision out of many alternatives by utilizing a number of criteria. In mathematics, there exist a number of structured techniques for organizing and analyzing complex decisions, for instance, AHP, ANP, TOPSIS, etc. In this work, we present a hybrid of two such techniques to solve the M...
Modern robotic exploratory strategies assume multi-agent cooperation that raises a need for an ef... more Modern robotic exploratory strategies assume multi-agent cooperation that raises a need for an effective exchange of acquired scans of the environment with the absence of a reliable global positioning system. In such situations, agents compare the scans of the outside world to determine if they overlap in some region, and if they do so, they determine the right matching between them. The process of matching multiple point-cloud scans is called point-cloud registration. Using the existing point-cloud registration approaches, a good match between any two-point-clouds is achieved if and only if there exists a large overlap between them, however, this limits the advantage of using multiple robots, for instance, for time-effective 3D mapping. Hence, a point-cloud registration approach is highly desirable if it can work with low overlapping scans. This work proposes a novel solution for the point-cloud registration problem with a very low overlapping area between the two scans. In doing s...
Apache Flink is an open-source system for the scalable processing of batch and streaming data. Fl... more Apache Flink is an open-source system for the scalable processing of batch and streaming data. Flink does not natively support efficient processing of spatial data streams, which is the requirement of many applications dealing with spatial data. Besides Flink, other scalable spatial data processing platforms including GeoSpark, Spatial Hadoop, GeoMesa and Parallel Secondo do not support streaming workloads and can only handle static/batch workloads. Hence this work presents GeoFlink, which extends Apache Flink to support spatial data types, index and continuous queries. To enable efficient processing of continuous spatial queries and for the effective data distribution among the Flink cluster nodes, a grid-based index is introduced. The grid index enables the pruning of spatial objects which cannot be part of a spatial query result and thus can guarantee efficient query processing, similarly it helps in preserving spatial data proximity, hence resulting in effective data distributio...
Uploads