Skip to main content

    In Kweon

    This paper describes a new methodology for estimating the relative pose between two 2-D lidars. Scanned points of 2-D lidars do not have enough feature information for correspondence matching. For this reason, additional image sensors or... more
    This paper describes a new methodology for estimating the relative pose between two 2-D lidars. Scanned points of 2-D lidars do not have enough feature information for correspondence matching. For this reason, additional image sensors or artificial landmarks at known locations have been used to find the relative pose. We propose a novel method of estimating the relative pose between 2-D lidars without any additional sensors or artificial landmarks. By scanning two orthogonal planes, we utilize the coplanarity of the scan points on each plane and the orthogonality of the plane normals. Even if we capture planes which are not exactly orthogonal, the method provides good results using nonlinear optimization. Experiments with both synthetic and real data show the validity of the proposed method. We also derive two degenerate cases: one related to plane poses, and the other caused by the relative pose. To the best of our knowledge, this study provides the first solution for the problem.
    Abstract This paper presents a practical means of extrinsic calibration between a camera and a 2D laser sensor, without overlap. In previous calibration methods, the sensors must be able to see a common geometric structure such as a plane... more
    Abstract This paper presents a practical means of extrinsic calibration between a camera and a 2D laser sensor, without overlap. In previous calibration methods, the sensors must be able to see a common geometric structure such as a plane or a line. In order to calibrate a non-overlapping camera–laser system, it is necessary to attach an extra sensor, such as a camera or a 3D laser sensor, whose relative poses from both the camera and the 2D laser sensor can be calculated. In this paper, we propose two means of calibrating a non-overlapping camera–laser system directly without an extra sensor. For each method, the initial solution of the relative pose between the camera and the 2D laser sensor is computed by adopting a reasonable assumption about geometric structures. This is then refined via non-linear optimization, even if the assumption is not met perfectly. Both simulation results and experiments using actual data show that the proposed methods provide reliable results compared to the ground truth, as well as similar or better results than those provided by conventional methods.
    In this paper, we present a sensor fusion system of cameras and 2D laser sensors for 3D reconstruction. The proposed system is designed to capture data on a fast-moving ground vehicle. The system consists of six cameras and one 2D laser... more
    In this paper, we present a sensor fusion system of cameras and 2D laser sensors for 3D reconstruction. The proposed system is designed to capture data on a fast-moving ground vehicle. The system consists of six cameras and one 2D laser sensor. In order to capture data at high speed, we synchronized all sensors by detecting the laser ray at a specific angle and generating a trigger signal for the cameras. Reconstruction of 3D structures is done by estimating frame-by-frame motion and accumulating vertical laser scans. The difference between the proposed system and the previous works using two 2D laser sensors is that we do not assume 2D motion. The motion of the system in 3D space (including absolute scale) is estimated accurately by data-level fusion of images and range data. The problem of error accumulation is solved by loop closing, not by GPS. The moving objects are detected by utilizing the depth information provided by the laser sensor. The experimental results show that the estimated path is successfully overlayed on the satellite images.
    We present the rst large-scale synthetic dataset of egocentric viewpoints for disaster scenarios. We simulate pre- and post-disaster cases with drastic changes in appearance, such as buildings on re and earthquakes. The dataset consists... more
    We present the rst large-scale synthetic dataset of egocentric viewpoints for disaster scenarios. We simulate pre- and post-disaster cases with drastic changes in appearance, such as buildings on re and earthquakes. The dataset consists of more than 300K high-resolution stereo image pairs, all annotated with ground-truth data for the semantic label, depth in metric scale, optical ow with sub-pixel precision, and surface normal as well as their corresponding camera poses. We train various state-of-the-art methods to perform computer vision tasks using our dataset, evaluate how well these methods recognize the disaster situations, and produce reliable results of virtual scenes as well as real-world images. We also present a convolutional neural network-based egocentric localization method that is robust to drastic appearance changes, such as the texture changes in a re, and layout changes from a collapse. To address these key challenges, we propose a new model that learns a shape-based representation by training on stylized images, and incorporate the dominant planes of query images as approximate scene coordinates. We evaluate the proposed method using various scenes including a simulated disaster dataset to demonstrate the effectiveness of our method when confronted with signicant changes in scene layout.
    This paper describes a new methodology for estimating a relative pose of two 2D laser sensors. Two dimensional laser scan points do not have enough feature information for motion tracking. For this reason, additional image sensors or... more
    This paper describes a new methodology for estimating a relative pose of two 2D laser sensors. Two dimensional laser scan points do not have enough feature information for motion tracking. For this reason, additional image sensors or artificial landmarks have been used to find a relative pose. We propose the method to estimate a relative pose of 2D laser sensors without any additional sensor or artificial landmark. By scanning two orthogonal planes, we utilize only the coplanarity of the scan points on each plane and the orthogonality of the plane normals. Experiments with both synthetic and real data show the validity of the proposed method. To the best of our knowledge this works provides the first solution for the problem.
    In this paper, we present the first large-scale synthetic dataset for visual perception in disaster scenarios, and analyze state-of-the-art methods for multiple computer vision tasks with reference baselines. We simulated before and after... more
    In this paper, we present the first large-scale synthetic dataset for visual perception in disaster scenarios, and analyze state-of-the-art methods for multiple computer vision tasks with reference baselines. We simulated before and after disaster scenarios such as fire and building collapse for fifteen different locations in realistic virtual worlds. The dataset consists of more than 300K high-resolution stereo image pairs, all annotated with ground-truth data for semantic segmentation, depth, optical flow, surface normal estimation and camera pose estimation. To create realistic disaster scenes, we manually augmented the effects with 3D models using physical-based graphics tools. We use our dataset to train state-of-the-art methods and evaluate how well these methods can recognize the disaster situations and produce reliable results on virtual scenes as well as real-world images. The results obtained from each task are then used as inputs to the proposed visual odometry network for generating 3D maps of buildings on fire. Finally, we discuss challenges for future research.
    Deep learning based recognition systems have shown high performances in various tasks. Most of them are single-modality based, using camera inputs only, thus are vulnerable to look-alike fraud inputs. Fraud inputs may frequently be abused... more
    Deep learning based recognition systems have shown high performances in various tasks. Most of them are single-modality based, using camera inputs only, thus are vulnerable to look-alike fraud inputs. Fraud inputs may frequently be abused when rewards are given to the users, such as in reverse vending machines. Joint use of multi-modal inputs can be a solution to fraud inputs since modalities contain different information about the target task. In this work, we propose a deep neural network that utilizes multi-modal inputs with an attention mechanism and a correspondence learning scheme. With an attention mechanism, the network can learn better feature representation for multiple modalities; with the correspondence learning scheme, the network learns intermodal relationships and thus can detect fraud inputs where modalities do not correspond to each other. We investigate the proposed approach in a reverse vending machine system, where the task is to perform classification among 3 gi...
    ABSTRACT In recent year, much progress has been made in outdoor obstacle detection. However, for fast moving robotic platform, high-speed obstacle detection is still a daunting challenge. This paper describes laser based system for the... more
    ABSTRACT In recent year, much progress has been made in outdoor obstacle detection. However, for fast moving robotic platform, high-speed obstacle detection is still a daunting challenge. This paper describes laser based system for the fast obstacle detection. To do this, we introduce how to configure laser range finders by using a plane ruler for outdoor robotic platform. For high-speed obstacle detection, we use the gradient of points. We evaluate processing time and accuracy of our system by testing on real drive track including off-road course.
    OpenAI has recently released GPT-4 (a.k.a. ChatGPT plus), which is demonstrated to be one small step for generative AI (GAI), but one giant leap for artificial general intelligence (AGI). Since its official release in November 2022,... more
    OpenAI has recently released GPT-4 (a.k.a. ChatGPT plus), which is demonstrated to be one small step for generative AI (GAI), but one giant leap for artificial general intelligence (AGI). Since its official release in November 2022, ChatGPT has quickly attracted numerous users with extensive media coverage. Such unprecedented attention has also motivated numerous researchers to investigate ChatGPT from various aspects. According to Google scholar, there are more than 500 articles with ChatGPT in their titles or mentioning it in their abstracts. Considering this, a review is urgently needed, and our work fills this gap. Overall, this work is the first to survey ChatGPT with a comprehensive review of its underlying technology, applications, and challenges. Moreover, we present an outlook on how ChatGPT might evolve to realize general-purpose AIGC (a.k.a. AI-generated content), which will be a significant milestone for the development of AGI.
    The booming interest in adversarial attacks stems from a misalignment between human vision and a deep neural network (DNN), \ie~a human imperceptible perturbation fools the DNN. Moreover, a single perturbation, often called universal... more
    The booming interest in adversarial attacks stems from a misalignment between human vision and a deep neural network (DNN), \ie~a human imperceptible perturbation fools the DNN. Moreover, a single perturbation, often called universal adversarial perturbation (UAP), can be generated to fool the DNN for most images. A similar misalignment phenomenon has also been observed in the deep steganography task, where a decoder network can retrieve a secret image back from a slightly perturbed cover image. We attempt explaining the success of both in a unified manner from the Fourier perspective. We perform task-specific and joint analysis and reveal that (a) frequency is a key factor that influences their performance based on the proposed entropy metric for quantifying the frequency distribution; (b) their success can be attributed to a DNN being highly sensitive to high-frequency content. We also perform feature layer analysis for providing deep insight on model generalization and robustness...
    Abstract- In this paper, we propose a robust l1 track-ing method based on a two phases sparse representation, which consists of a patch and a global appearance track-ers. While recently proposed l1 trackers showed impres-sive tracking... more
    Abstract- In this paper, we propose a robust l1 track-ing method based on a two phases sparse representation, which consists of a patch and a global appearance track-ers. While recently proposed l1 trackers showed impres-sive tracking accuracies, tracking the dynamic appear-ance is not easy to them. To overcome dynamic appear-ance change and achieve robust visual tracking, we model the dynamic appearance of the object by a set of local rigid patches and enhance the distinctiveness of the global appearance tracker by positive/negative learning. The in-tegration of two approaches makes visual tracking robust to occlusion and illumination variation. We demonstrate the experiments with five challenging video sequences and compare with state-of-art trackers. We show that the proposed method successfully handle occlusion, noise, scale, illumination, and appearance change of the object.
    Convolutional Neural Networks (CNNs) have become the de facto gold standard in computer vision applications in the past years. Recently, however, new model architectures have been proposed challenging the status quo. The Vision... more
    Convolutional Neural Networks (CNNs) have become the de facto gold standard in computer vision applications in the past years. Recently, however, new model architectures have been proposed challenging the status quo. The Vision Transformer (ViT) relies solely on attention modules, while the MLP-Mixer architecture substitutes the selfattention modules with Multi-Layer Perceptrons (MLPs). Despite their great success, CNNs have been widely known to be vulnerable to adversarial attacks, causing serious concerns for security-sensitive applications. Thus, it is critical for the community to know whether the newly proposed ViT and MLP-Mixer are also vulnerable to adversarial attacks. To this end, we empirically evaluate their adversarial robustness under several adversarial attack setups and benchmark them against the widely used CNNs. Overall, we find that the two architectures, especially ViT, are more robust than their CNN models. Using a toy example, we also provide empirical evidence ...
    Data hiding is one widely used approach for protecting authentication and ownership. Most multimedia content like images and videos are transmitted or saved in the compressed form. This kind of lossy compression, such as JPEG, can destroy... more
    Data hiding is one widely used approach for protecting authentication and ownership. Most multimedia content like images and videos are transmitted or saved in the compressed form. This kind of lossy compression, such as JPEG, can destroy the hidden data, which raises the need of robust data hiding. It is still an open challenge to achieve the goal of data hiding that can be against these compressions. Recently, deep learning has shown large success in data hiding, while nondifferentiability of JPEG makes it challenging to train a deep pipeline for improving robustness against lossy compression. The existing SOTA approaches replace the non-differentiable parts with differentiable modules that perform similar operations. Multiple limitations exist: (a) large engineering effort; (b) requiring a white-box knowledge of compression attacks; (c) only works for simple compression like JPEG. In this work, we propose a simple yet effective approach to address all the above limitations at onc...
    In this paper, we propose an efficient algorithm for robust place recognition and loop detection using camera information only. Our pipeline purely relies on spatial localization and semantic information of road markings. The creation of... more
    In this paper, we propose an efficient algorithm for robust place recognition and loop detection using camera information only. Our pipeline purely relies on spatial localization and semantic information of road markings. The creation of the database of road markings sequences is performed online, which makes the method applicable for real-time loop closure for visual SLAM techniques. Furthermore, our algorithm is robust to various weather conditions, occlusions from vehicles, and shadows. We have performed an extensive number of experiments which highlight the effectiveness and scalability of the proposed method.
    Neural networks have been shown effective in deep steganography for hiding a full image in another. However, the reason for its success remains not fully clear. Under the existing cover (C) dependent deep hiding (DDH) pipeline, it is... more
    Neural networks have been shown effective in deep steganography for hiding a full image in another. However, the reason for its success remains not fully clear. Under the existing cover (C) dependent deep hiding (DDH) pipeline, it is challenging to analyze how the secret (S) image is encoded since the encoded message cannot be analyzed independently. We propose a novel universal deep hiding (UDH) meta-architecture to disentangle the encoding of S from C. We perform extensive analysis and demonstrate that the success of deep steganography can be attributed to a frequency discrepancy between C and the encoded secret image. Despite S being hidden in a cover-agnostic manner, strikingly, UDH achieves a performance comparable to the existing DDH. Beyond hiding one image, we push the limits of deep steganography. Exploiting its property of being universal, we propose universal watermarking as a timely solution to address the concern of the exponentially increasing number of images and vide...
    With the emerging interest of autonomous vehicles (AV), the performance and reliability of the land vehicle navigation are also becoming important. Generally, the navigation system for passenger car has been heavily relied on the existing... more
    With the emerging interest of autonomous vehicles (AV), the performance and reliability of the land vehicle navigation are also becoming important. Generally, the navigation system for passenger car has been heavily relied on the existing Global Navigation Satellite System (GNSS) in recent decades. However, there are many cases in real world driving where the satellite signals are challenged; for example, urban streets with buildings, tunnels, or even underpasses. In this paper, we propose a novel method for simultaneous vehicle dead reckoning, based on the lane detection model in GNSS-denied situations. The proposed method fuses the Inertial Navigation System (INS) with learning-based lane detection model to estimate the global position of vehicle, and effectively bounds the error drift compared to standalone INS. The integration of INS and lane model is accomplished by UKF to minimize linearization errors and computing time. The proposed method is evaluated through the real-vehicl...
    The intriguing phenomenon of adversarial examples has attracted significant attention in machine learning and what might be more surprising to the community is the existence of universal adversarial perturbations (UAPs), i.e. a single... more
    The intriguing phenomenon of adversarial examples has attracted significant attention in machine learning and what might be more surprising to the community is the existence of universal adversarial perturbations (UAPs), i.e. a single perturbation to fool the target DNN for most images. With the focus on UAP against deep classifiers, this survey summarizes the recent progress on universal adversarial attacks, discussing the challenges from both the attack and defense sides, as well as the reason for the existence of UAP. We aim to extend this work as a dynamic survey that will regularly update its content to follow new works regarding UAP or universal attack in a wide range of domains, such as image, audio, video, text, etc. Relevant updates will be discussed at: https://bit.ly/2SbQlLG. We welcome authors of future works in this field to contact us for including your new findings.

    And 134 more