Relating View Directions of Complementary-View Mobile Cameras via the Human Shadow

Ruize Han¹^na1,
Yiyang Gan^1,2^na1,
Likai Wang¹,
Nan Li¹,
Wei Feng¹ &
…
Song Wang³

639 Accesses
1 Altmetric
Explore all metrics

Abstract

The potential of video surveillance can be further explored by using mobile cameras. Drone-mounted cameras at a high altitude can provide top views of a scene from a global perspective while cameras worn by people on the ground can provide first-person views of the same scene with more local details. To relate these two views for collaborative analysis, we propose to localize the field of view of the first-person-view cameras in the global top view. This is a very challenging problem due to their large view differences and indeterminate camera motions. In this work, we explore the use of sunlight direction as a bridge to relate the two views. Specifically, we design a shadow-direction-aware network to simultaneously locate the shadow vanishing point in the first-person view as well as the shadow direction in the top view. Then we apply multi-view geometry to estimate the yaw and pitch angles of the first-person-view camera in the top view. We build a new synthetic dataset consisting of top-view and first-person-view image pairs for performance evaluation. Quantitative results on this synthetic dataset show the superiority of our method compared with the existing methods, which achieve the view angle estimation errors of 1.61$^{\circ }$ (pitch angle) and 15.13$^{\circ }$ (yaw angle), respectively. The qualitative results on real images also show the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Fig. 3

Fig. 6

Fig. 13

Progressive Temporal Transformer for Bird’s-Eye-View Camera Pose Estimation

MODE: Multi-view Omnidirectional Depth Estimation with 360 $$^\circ $$ Cameras

Multiview Detection with Feature Perspective Transformation

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Antunes, M., & Barreto, J. P. (2013). A global approach for the detection of vanishing points and mutually orthogonal vanishing directions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR’13 (pp. 1336–1343).
Ardeshir, S., & Borji, A. (2016). Ego2top: Matching viewers in egocentric and top-view videos. In Proceedings of the European Conference on Computer Vision, ECCV’16 (pp. 253–268).
Ardeshir, S., & Borji, A. (2018a). Egocentric meets top-view. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(6), 1353–1366.
Article Google Scholar
Ardeshir, S., & Borji, A. (2018b). Integrating egocentric videos in top-view surveillance videos: Joint identification and temporal alignment. In Proceedings of the European Conference on Computer Vision, ECCV’18 (pp. 285–300).
Ardeshir, S., Regmi, K., & Borji, A. (2016). Egotransfer: Transferring motion across egocentric and exocentric domains using deep neural networks. arXiv:1612.05836.
Balcı, H., & Güdükbay, U. (2017). Sun position estimation and tracking for virtual object placement in time-lapse videos. Signal, Image and Video Processing, 11(5), 817–824.
Article Google Scholar
Barekatain, M., Martí, M., Shih, H. F., Murray, S., & Prendinger, H. (2017). Okutama-action: An aerial view video dataset for concurrent human action detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop, CVPRW’17.
Barinova, O., Lempitsky, V., Tretiak, E., & Kohli, P. (2010). Geometric image parsing in man-made environments. In Proceedings of the European Conference on Computer Vision, ECCV’10 (pp. 57–70).
Barnard, S. T. (1983). Interpreting perspective images. Artificial Intelligence, 21(4), 435–462.
Article Google Scholar
Birdal, T., Bala, E., Eren, T., & Ilic, S. (2016). Online inspection of 3D parts via a locally overlapping camera network. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, WACV’16 (pp. 1–10).
Bolles, R. C., & Fischler, M. A. (1981). A RANSAC-based approach to model fitting and its application to finding cylinders in range data. In Proceedings of the International Joint Conference on Artificial Intelligence, IJCAI’81 (pp. 637–643).
Borji, A. (2016). Vanishing point detection with convolutional neural networks. arXiv:1609.00967.
Censi, A., Franchi, A., Marchionni, L., & Oriolo, G. (2013). Simultaneous calibration of odometry and sensor parameters for mobile robots. IEEE Transactions on Robotics, 29(2), 475–492.
Article Google Scholar
Chen, Z., Zhu, L., Wan, L., Wang, S., Feng, W., & Heng, P. A. (2020). A multi-task mean teacher for semi-supervised shadow detection. In (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR’20) (pp. 5611–5620).
Chong, E., Ruiz, N., Wang, Y., Zhang, Y., Rozga, A., & Rehg, J. M. (2018). Connecting gaze, scene, and attention: Generalized attention estimation via joint modeling of gaze and scene saliency. In Proceedings of the European conference on computer vision, ECCV’18 (pp. 383–398).
Coughlan, J. M., & Yuille, A. L. (1999). Manhattan world: Compass direction from a single image by bayesian inference. In Proceedings of the IEEE International Conference on Computer Vision, ICCV’99 (pp. 941–947).
Doğan, Y., Sonlu, S., & Güdükbay, U. (2021). An augmented crowd simulation system using automatic determination of navigable areas. Computers & Graphics, 95, 141–155.
Article Google Scholar
Dong, S., Shao, X., Kang, X., Yang, F., & He, X. (2016). Extrinsic calibration of a non-overlapping camera network based on close-range photogrammetry. Applied Optics, 55(23), 6363–6370.
Article Google Scholar
Dusmanu, M., Rocco, I., Pajdla, T., Pollefeys, M., & Sattler, T. (2019). D2-Net: a trainable CNN for joint description and detection of local features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR’19 (pp. 8092–8101).
Fan, L., Chen, Y., Wei, P., Wang, W., & Zhu, S. C. (2018). Inferring shared attention in social scene videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR’18 (pp. 6460–6468).
Fan, L., Wang, W., Huang, S., Tang, X., & Zhu, S. C. (2019). Understanding human gaze communication by spatio-temporal graph reasoning. In Proceedings of the IEEE International Conference on Computer Vision, ICCV’19 (pp. 5724–5733).
Guan, B., Zhao, J., Li, Z., Sun, F., & Fraundorfer, F. (2021). Relative pose estimation with a single affine correspondence. IEEE Transactions on Cybernetics. https://doi.org/10.1109/TCYB.2021.3069806
Article Google Scholar
Han, R., Zhang, Y., Feng, W., Gong, C., Zhang, X., Zhao, J., Wan, L., & Wang, S. (2019). Multiple human association between top and horizontal views by matching subjects’ spatial distributions. arXiv:1907.11458.
Han, R., Feng, W., Zhao, J., Niu, Z., Zhang, Y., Wan, L., & Wang, S. (2020a). Complementary-view multiple human tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, AAAI’20 (pp. 10917–10924).
Han, R., Zhao, J., Feng, W., Gan, Y., Wan, L., & Wang, S. (2020b). Complementary-view co-interest person detection. In Proceedings of the ACM International Conference on Multimedia, ACM MM’20 (pp. 2746–2754).
Han, R., Feng, W., Zhang, Y., Zhao, J., & Wang, S. (2022). Multiple human association and tracking from egocentric and complementary top views. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9), 5225–5242.
Google Scholar
Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge: Cambridge University Press.
MATH Google Scholar
Hough, P. V. (1959). Machine analysis of bubble chamber pictures. In Proceedings of the International Conference on High Energy Accelerators and Instrumentation (pp. 554–556).
Kluger, F., Ackermann, H., Yang, MY., & Rosenhahn, B. (2017). Deep learning for vanishing point detection using an inverse gnomonic projection. In German Conference on Pattern Recognition (pp. 17–28).
Kogecka, J., & Zhang, W. (2002). Efficient computation of vanishing points. In Proceedings of the IEEE International Conference on Robotics and Automation, ICRA’20 (pp. 223–228).
Lee, S., Kim, J., Yoon, J. S., Shin, S., Bailo, O., Kim, N., Lee, T. H., Hong, H. S., Han, S. H., & Kweon, I. S. (2017). VPGNet: Vanishing point guided network for lane and road marking detection and recognition. In Proceedings of the IEEE International Conference on Computer Vision, ICCV’17 (pp. 1947–1955).
Lezama, J., Grompone von Gioi, R., Randall, G., & Morel, J. (2014). Finding vanishing points via point alignments in image primal and dual domains. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR’14 (pp. 509–515).
Li, T., Liu, J., Zhang, W., Ni, Y., & Li, Z. (2021). UAV-Human: A large benchmark for human behavior understanding with unmanned aerial vehicles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR’21 (pp. 16266–16275).
Lin, Y., Ezzeldeen, K., Zhou, Y., Fan, X., Yu, H., Qian, H., & Wang, S. (2015). Co-interest person detection from multiple wearable camera videos. In Proceedings of the IEEE International Conference on Computer Vision, ICCV’15 (pp. 4426–4434).
Liu, P., Yang, P., Wang, C., Huang, K., & Tan, T. (2016). A semi-supervised method for surveillance-based visual location recognition. IEEE Transactions on Cybernetics, 47(11), 3719–3732.
Article Google Scholar
Liu, Z., Li, F., & Zhang, G. (2014). An external parameter calibration method for multiple cameras based on laser rangefinder. Measurement, 47, 954–962.
Article Google Scholar
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Article Google Scholar
Magee, M. J., & Aggarwal, J. K. (1984). Determining vanishing points from perspective images. Computer Vision, Graphics, and Image Processing, 26(2), 256–267.
Article Google Scholar
Micusik, B. (2011). Relative pose problem for non-overlapping surveillance cameras with known gravity vector. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR’11 (pp. 3105–3112).
Miraldo, P., Araujo, H., & Goncalves, N. (2015). Pose estimation for general cameras using lines. IEEE Transactions on Cybernetics, 45(10), 2156–2164.
Article Google Scholar
Nister, D. (2004). An efficient solution to the five-point relative pose problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6), 756–777.
Article Google Scholar
Perera, A. G., Law, Y. W., & Chahl, J. (2019). UAV-gesture: A dataset for UAV control and gesture recognition. In Proceedings of the European Conference on Computer Vision Workshop, ECCVW’19.
Recasens, A., Khosla, A., Vondrick, C., & Torralba, A. (2015). Where are they looking? In Proceedings of the Advances in neural information processing systems, NeurIPS’15 (vol. 28).
Riccitiello, J. (2018). John riccitiello sets out to identify the engine of growth for unity technologies (interview). In Venture Beat. Interview with Dean Takahashi. Retrieved January.
Schindler, G., & Dellaert, F. (2004). Atlanta world: An expectation maximization framework for simultaneous low-level edge grouping and camera calibration in complex man-made environments. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR’04.
Schonberger, J. L., & Frahm, J. M. (2016). Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR’16 (pp. 4104–4113).
Singh, A., Patil, D., & Omkar, S. (2018) Eye in the sky: Real-time drone surveillance system (DSS) for violent individuals identification using scatternet hybrid deep learning network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop, CVPRW’18.
Sun, X., & Zheng, L. (2019). Dissecting person re-identification from the viewpoint of viewpoint. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR’19 (pp. 608–617).
Tardif, J. P. (2009). Non-iterative approach for fast and accurate vanishing point detection. In Proceedings of the IEEE International Conference on Computer Vision, ICCV’09 (pp. 1250–1257).
Vedaldi, A., & Zisserman, A. (2012). Self-similar sketch. In Proceedings of the European Conference on Computer Vision, ECCV’12 (pp. 87–100).
Wang, T., Hu, X., Wang, Q., Heng, P. A., & Fu, C. W. (2020). Instance shadow detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR’20 (pp. 1880–1889).
Wildenauer, H., & Hanbury, A. (2012a). Robust camera self-calibration from monocular images of manhattan worlds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR’12 (pp. 2831–2838).
Wildenauer, H., & Hanbury, A. (2012b). Robust camera selfcalibration from monocular images of Manhattan worlds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR’12 (pp. 2831–2838).
Yang, W., Fang, B., & Tang, Y. Y. (2016). Fast and accurate vanishing point detection and its application in inverse perspective mapping of structured road. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 48(5), 755–766.
Article Google Scholar
Zhai, M., Workman, S., & Jacobs, N. (2016). Detecting vanishing points using global image context in a non-manhattanworld. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR’16 (pp. 5657–5665).
Zhang, S., Zhang, Q., Yang, Y., Wei, X., Wang, P., Jiao, B., & Zhang, Y. (2020). Person re-identification in aerial imagery. IEEE Transactions on Multimedia, 23, 281–291.
Zhao, J., Han, R., Gan, Y., Wan, L., Feng, W., & Wang, S. (2020). Human identification and interaction detection in cross-view multi-person videos with wearable cameras. In Proceedings of the ACM International Conference on Multimedia, ACM MM’20 (pp. 2608–2616).
Zheng, K., Fan, X., Lin, Y., Guo, H., & Wang, S. (2017). Learning view-invariant features for person identification in temporally synchronized videos taken by wearable cameras. In Proceedings of the IEEE International Conference on Computer Vision, ICCV’17 (pp. 2858–2866).
Zhou, Y., Qi, H., Huang, J., & Ma, Y. (2019) NeurVPS: Neural vanishing point scanning via conic convolution. In Proceedings of the Advances in Neural Information Processing Systems, NeurIPS’19 (Vol. 32).

Download references

Acknowledgements

This work was supported in part by the NSFC under Grants U1803264, 62072334.

Author information

Ruize Han and Yiyang Gan contributed equally to this work.

Authors and Affiliations

School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China
Ruize Han, Yiyang Gan, Likai Wang, Nan Li & Wei Feng
Meituan, Beijing, 100083, China
Yiyang Gan
Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, 29208, USA
Song Wang

Authors

Ruize Han
View author publications
You can also search for this author in PubMed Google Scholar
Yiyang Gan
View author publications
You can also search for this author in PubMed Google Scholar
Likai Wang
View author publications
You can also search for this author in PubMed Google Scholar
Nan Li
View author publications
You can also search for this author in PubMed Google Scholar
Wei Feng
View author publications
You can also search for this author in PubMed Google Scholar
Song Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Wei Feng or Song Wang.

Additional information

Communicated by Stefano Mattoccia

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Han, R., Gan, Y., Wang, L. et al. Relating View Directions of Complementary-View Mobile Cameras via the Human Shadow. Int J Comput Vis 131, 1106–1121 (2023). https://doi.org/10.1007/s11263-022-01744-z

Download citation

Received: 14 February 2022
Accepted: 14 December 2022
Published: 11 January 2023
Issue Date: May 2023
DOI: https://doi.org/10.1007/s11263-022-01744-z

Relating View Directions of Complementary-View Mobile Cameras via the Human Shadow

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Progressive Temporal Transformer for Bird’s-Eye-View Camera Pose Estimation

MODE: Multi-view Omnidirectional Depth Estimation with 360 $$^\circ $$ Cameras

Multiview Detection with Feature Perspective Transformation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Relating View Directions of Complementary-View Mobile Cameras via the Human Shadow

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Progressive Temporal Transformer for Bird’s-Eye-View Camera Pose Estimation

MODE: Multi-view Omnidirectional Depth Estimation with 360 $$^\circ $$ Cameras

Multiview Detection with Feature Perspective Transformation

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation