0% found this document useful (0 votes)

23 views9 pages

On-Device, Real-Time Hand Tracking With MediaPipe

this is what i research until now

Uploaded by

dipsonbhujel1025

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views9 pages

On-Device, Real-Time Hand Tracking With MediaPipe

this is what i research until now

Uploaded by

dipsonbhujel1025

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

9/14/24, 6:35 PM On-Device, Real-Time Hand Tracking w ith MediaPipe

Research

Home Blog

On-Device,
Real-Time Hand
Tracking with
MediaPipe
August 19, 2019
Posted by Valentin Bazarevsky and Fan Zhang,
Research Engineers, Google Research

QUICK LINKS

https://research.google/blog/on-device-real-time-hand-tracking-w ith-mediapipe/ 1/9

9/14/24, 6:35 PM On-Device, Real-Time Hand Tracking w ith MediaPipe

Share
Research

The ability to perceive the shape and motion of hands can be a vital component in
improving the user experience across a variety of technological domains and
platforms. For example, it can form the basis for sign language understanding and
hand gesture control, and can also enable the overlay of digital content and
information on top of the physical world in augmented reality. While coming naturally
to people, robust real-time hand perception is a decidedly challenging computer
vision task, as hands often occlude themselves or each other (e.g. finger/palm
occlusions and hand shakes) and lack high contrast patterns.

Today we are announcing the release of a new approach to hand perception, which
we previewed CVPR 2019 in June, implemented in MediaPipe—an open source cross
platform framework for building pipelines to process perceptual data of different
modalities, such as video and audio. This approach provides high-fidelity hand and
finger tracking by employing machine learning (ML) to infer 21 3D keypoints of a hand
from just a single frame. Whereas current state-of-the-art approaches rely primarily
on powerful desktop environments for inference, our method achieves real-time
performance on a mobile phone, and even scales to multiple hands. We hope that
providing this hand perception functionality to the wider research and development
community will result in an emergence of creative use cases, stimulating new
applications and new research avenues.

3D hand perception in real-time on a mobile phone via MediaPipe. Our solution uses machine
learning to compute 21 3D keypoints of a hand from a video frame. Depth is indicated in
https://research.google/blog/on-device-real-time-hand-tracking-w ith-mediapipe/ 2/9
9/14/24, 6:35 PM On-Device, Real-Time Hand Tracking w ith MediaPipe

grayscale.

Research
An ML Pipeline for Hand Tracking and Gesture Recognition
Our hand tracking solution utilizes an ML pipeline consisting of several models
working together:
A palm detector model (called BlazePalm) that operates on the full image and
returns an oriented hand bounding box.
A hand landmark model that operates on the cropped image region defined by
the palm detector and returns high fidelity 3D hand keypoints.
A gesture recognizer that classifies the previously computed keypoint
configuration into a discrete set of gestures.

This architecture is similar to that employed by our recently published face mesh ML
pipeline and that others have used for pose estimation. Providing the accurately
cropped palm image to the hand landmark model drastically reduces the need for
data augmentation (e.g. rotations, translation and scale) and instead allows the
network to dedicate most of its capacity towards coordinate prediction accuracy.

Hand perception pipeline overview.

BlazePalm: Realtime Hand/Palm Detection

To detect initial hand locations, we employ a single-shot detector model called
BlazePalm, optimized for mobile real-time uses in a manner similar to BlazeFace,
which is also available in MediaPipe. Detecting hands is a decidedly complex task: our
model has to work across a variety of hand sizes with a large scale span (~20x)
relative to the image frame and be able to detect occluded and self-occluded hands.
Whereas faces have high contrast patterns, e.g., in the eye and mouth region, the lack
of such features in hands makes it comparatively difficult to detect them reliably from
their visual features alone. Instead, providing additional context, like arm, body, or
person features, aids accurate hand localization.

Our solution addresses the above challenges using different strategies. First, we train
https://research.google/blog/on-device-real-time-hand-tracking-w ith-mediapipe/ 3/9
9/14/24, 6:35 PM On-Device, Real-Time Hand Tracking w ith MediaPipe

a palm detector instead of a hand detector, since estimating bounding boxes of rigid
Research
objects like palms and fists is significantly simpler than detecting hands with
articulated fingers. In addition, as palms are smaller objects, the non-maximum
suppression algorithm works well even for two-hand self-occlusion cases, like
handshakes. Moreover, palms can be modelled using square bounding boxes (anchors
in ML terminology) ignoring other aspect ratios, and therefore reducing the number of
anchors by a factor of 3-5. Second, an encoder-decoder feature extractor is used for
bigger scene context awareness even for small objects (similar to the RetinaNet
approach). Lastly, we minimize the focal loss during training to support a large
amount of anchors resulting from the high scale variance.

With the above techniques, we achieve an average precision of 95.7% in palm

detection. Using a regular cross entropy loss and no decoder gives a baseline of just
86.22%.

Hand Landmark Model

After the palm detection over the whole image our subsequent hand landmark model
performs precise keypoint localization of 21 3D hand-knuckle coordinates inside the
detected hand regions via regression, that is direct coordinate prediction. The model
learns a consistent internal hand pose representation and is robust even to partially
visible hands and self-occlusions.

To obtain ground truth data, we have manually annotated ~30K real-world images
with 21 3D coordinates, as shown below (we take Z-value from image depth map, if it
exists per corresponding coordinate). To better cover the possible hand poses and
provide additional supervision on the nature of hand geometry, we also render a high-
quality synthetic hand model over various backgrounds and map it to the
corresponding 3D coordinates.

https://research.google/blog/on-device-real-time-hand-tracking-w ith-mediapipe/ 4/9

9/14/24, 6:35 PM On-Device, Real-Time Hand Tracking w ith MediaPipe

Top: Aligned hand crops passed to the tracking network with ground truth annotation. Bottom:
Research
Rendered synthetic hand images with ground truth annotation

However, purely synthetic data poorly generalizes to the in-the-wild domain. To

overcome this problem, we utilize a mixed training schema. A high-level model training
diagram is presented in the following figure.

Mixed training schema for hand tracking network. Cropped real-world photos and rendered
synthetic images are used as input to predict 21 3D keypoints.

The table below summarizes regression accuracy depending on the nature of the
training data. Using both synthetic and real world data results in a significant
performance boost.

Mean regression error

Dataset normalized by palm size
Only real-world 16.1 %
Only rendered synthetic 25.7 %
Mixed real-world + synthetic 13.4 %

Gesture Recognition
On top of the predicted hand skeleton, we apply a simple algorithm to derive the
gestures. First, the state of each finger, e.g. bent or straight, is determined by the
accumulated angles of joints. Then we map the set of finger states to a set of pre-
defined gestures. This straightforward yet effective technique allows us to estimate
basic static gestures with reasonable quality. The existing pipeline supports counting
gestures from multiple cultures, e.g. American, European, and Chinese, and various
hand signs including “Thumb up”, closed fist, “OK”, “Rock”, and “Spiderman”.

https://research.google/blog/on-device-real-time-hand-tracking-w ith-mediapipe/ 5/9

9/14/24, 6:35 PM On-Device, Real-Time Hand Tracking w ith MediaPipe

Research

Implementation via MediaPipe

With MediaPipe, this perception pipeline can be built as a directed graph of modular
components, called Calculators. Mediapipe comes with an extendable set of
Calculators to solve tasks like model inference, media processing algorithms, and
data transformations across a wide variety of devices and platforms. Individual
calculators like cropping, rendering and neural network computations can be
performed exclusively on the GPU. For example, we employ TFLite GPU inference on
most modern phones.

Our MediaPipe graph for hand tracking is shown below. The graph consists of two
subgraphs—one for hand detection and one for hand keypoints (i.e., landmark)
computation. One key optimization MediaPipe provides is that the palm detector is
only run as necessary (fairly infrequently), saving significant computation time. We
achieve this by inferring the hand location in the subsequent video frames from the
computed hand key points in the current frame, eliminating the need to run the palm
detector over each frame. For robustness, the hand tracker model outputs an
additional scalar capturing the confidence that a hand is present and reasonably
aligned in the input crop. Only when the confidence falls below a certain threshold is
the hand detection model reapplied to the whole frame.

https://research.google/blog/on-device-real-time-hand-tracking-w ith-mediapipe/ 6/9

9/14/24, 6:35 PM On-Device, Real-Time Hand Tracking w ith MediaPipe

Research

The hand landmark model’s output (REJECT_HAND_FLAG) controls when the hand detection
model is triggered. This behavior is achieved by MediaPipe’s powerful synchronization building
blocks, resulting in high performance and optimal throughput of the ML pipeline.

A highly efficient ML solution that runs in real-time and across a variety of different
platforms and form factors involves significantly more complexities than what the
above simplified description captures. To this end, we are open sourcing the above
hand tracking and gesture recognition pipeline in the MediaPipe framework,
accompanied with the relevant end-to-end usage scenario and source code, here.
This provides researchers and developers with a complete stack for experimentation
and prototyping of novel ideas based on our model.

Future Directions
We plan to extend this technology with more robust and stable tracking, enlarge the
amount of gestures we can reliably detect, and support dynamic gestures unfolding
in time. We believe that publishing this technology can give an impulse to new

https://research.google/blog/on-device-real-time-hand-tracking-w ith-mediapipe/ 7/9

9/14/24, 6:35 PM On-Device, Real-Time Hand Tracking w ith MediaPipe

creative ideas and applications by the members of the research and developer
Research
community at large. We are excited to see what you can build with it!

Acknowledgements
Special thanks to all our team members who worked on the tech with us: Andrey
Vakunov, Andrei Tkachenka, Yury Kartynnik, Artsiom Ablavatski, Ivan Grishchenko,
Kanstantsin Sokal‎, Buck Bourdon, Mogan Shieh, Ming Guang Yong, Anastasia Tkach,
Jonathan Taylor, Sean Fanello, Sofien Bouaziz, Juhyun Lee‎, Chris McClanahan,
Jiuqiang Tang‎, Esha Uboweja‎, Hadon Nash‎, Camillo Lugaresi, Michael Hays, Chuo-Ling
Chang, Matsvei Zhdanovich and Matthias Grundmann.

Labels:

Machine Intelligence
Machine Perception
Mobile Systems

https://research.google/blog/on-device-real-time-hand-tracking-w ith-mediapipe/ 8/9

9/14/24, 6:35 PM On-Device, Real-Time Hand Tracking w ith MediaPipe

Research

AUGUST 21, 2024 AUGUST 16, 2024 AUGUST 9, 2024

Speculative RAG: Transformers in HALVA:

Enhancing music Hallucination
retrieval recommendation Attenuated
augmented Language and
Data Mining & Modeling
generation ·
Vision Assistant
through drafting Machine Intelligence ·
Generative AI ·
Product
Generative AI · Machine Intelligence
Machine Intelligence ·
Natural Language
Processing

About Google Google Products Privacy Terms

Help Submit feedback

https://research.google/blog/on-device-real-time-hand-tracking-w ith-mediapipe/ 9/9

(Tutorial) Real-Time 3D Pose Detection & Pose Classification With Mediapipe and Python - Bleed AI
No ratings yet
(Tutorial) Real-Time 3D Pose Detection & Pose Classification With Mediapipe and Python - Bleed AI
40 pages
Conference Presentation
No ratings yet
Conference Presentation
15 pages
Artificial Intelligence 2021
No ratings yet
Artificial Intelligence 2021
577 pages
Seminar
No ratings yet
Seminar
16 pages
Project: Game Made With Next - Js and Media Pipe Library: Mediapipe Solutions Guide
No ratings yet
Project: Game Made With Next - Js and Media Pipe Library: Mediapipe Solutions Guide
4 pages
10th Batch Final Review
No ratings yet
10th Batch Final Review
25 pages
American Sign Language Recognition For Alphabets Using Media Pipe and LSTM
No ratings yet
American Sign Language Recognition For Alphabets Using Media Pipe and LSTM
10 pages
Air Canvas Hands On Drawing With MediaPipe
No ratings yet
Air Canvas Hands On Drawing With MediaPipe
10 pages
Hand Gesture Feature Control (Harsh)
No ratings yet
Hand Gesture Feature Control (Harsh)
15 pages
Conference PP T Final
No ratings yet
Conference PP T Final
18 pages
Mini
No ratings yet
Mini
27 pages
A Vision Base Application For Virtual Mouse Interface Using Hand Gesture
No ratings yet
A Vision Base Application For Virtual Mouse Interface Using Hand Gesture
6 pages
ASL Recognition in Real Time With RNN - Antonio Domènech
No ratings yet
ASL Recognition in Real Time With RNN - Antonio Domènech
53 pages
Mini Final
No ratings yet
Mini Final
23 pages
Blank Slide - Miniproject 1B
No ratings yet
Blank Slide - Miniproject 1B
12 pages
HCIA-AI V3.0 Training Material
No ratings yet
HCIA-AI V3.0 Training Material
469 pages
Report
No ratings yet
Report
18 pages
Bhuvanesh and Abi 2nd Review
No ratings yet
Bhuvanesh and Abi 2nd Review
11 pages
Virtual Mouse Using Hand Gestures Project
No ratings yet
Virtual Mouse Using Hand Gestures Project
14 pages
Real-Time Motion Insight Using Mediapipe: A. Lakshmiprabha, Dr. G. Arockia Sahaya Sheela
No ratings yet
Real-Time Motion Insight Using Mediapipe: A. Lakshmiprabha, Dr. G. Arockia Sahaya Sheela
26 pages
Cgip Report (Prajwal) 2
No ratings yet
Cgip Report (Prajwal) 2
22 pages
Bci Final and Final
No ratings yet
Bci Final and Final
17 pages
MediaPipe Seattle Public Feb 2020
No ratings yet
MediaPipe Seattle Public Feb 2020
30 pages
Ieee 659 New
No ratings yet
Ieee 659 New
5 pages
Mediapipe Hands: On-Device Real-Time Hand Tracking
No ratings yet
Mediapipe Hands: On-Device Real-Time Hand Tracking
5 pages
Hand Gesture Detection Using Deep Learning Demo
No ratings yet
Hand Gesture Detection Using Deep Learning Demo
9 pages
Tteh 000553
No ratings yet
Tteh 000553
5 pages
Hand Tracker Details Codez48
No ratings yet
Hand Tracker Details Codez48
3 pages
Efficient Human Pose Estimation Leveraging Advance
No ratings yet
Efficient Human Pose Estimation Leveraging Advance
6 pages
HGR Progress Presentation Apr 8
No ratings yet
HGR Progress Presentation Apr 8
46 pages
Back To RGB - 3D Tracking of Hands and Hand-Object Interactions Based On Short-Baseline Stereo - 1705.05301
No ratings yet
Back To RGB - 3D Tracking of Hands and Hand-Object Interactions Based On Short-Baseline Stereo - 1705.05301
10 pages
Pps Project
No ratings yet
Pps Project
13 pages
Gesture Recognition System With Machine Learning
No ratings yet
Gesture Recognition System With Machine Learning
10 pages
Applsci 13 07433
No ratings yet
Applsci 13 07433
16 pages
Hand Tracking in Computer Vision
No ratings yet
Hand Tracking in Computer Vision
10 pages
Final Paper - Feijóo, Inga, Moncayo, Quishpe
No ratings yet
Final Paper - Feijóo, Inga, Moncayo, Quishpe
6 pages
RESEARCH
No ratings yet
RESEARCH
8 pages
Hands - Mediapipe
No ratings yet
Hands - Mediapipe
16 pages
Universal Hand Control
No ratings yet
Universal Hand Control
10 pages
Mueller GANerated Hands For CVPR 2018 Paper
No ratings yet
Mueller GANerated Hands For CVPR 2018 Paper
11 pages
Hand Gesture Project Synopsis
No ratings yet
Hand Gesture Project Synopsis
3 pages
Google Project Soli: Presenter: Wenguang Mao
No ratings yet
Google Project Soli: Presenter: Wenguang Mao
28 pages
NewTitle May1 MediaPipe CVPR CV4ARVR Workshop 2019
No ratings yet
NewTitle May1 MediaPipe CVPR CV4ARVR Workshop 2019
4 pages
Deep Reinforcement Learning 1st Ed 2022 Aske Plaat Download
No ratings yet
Deep Reinforcement Learning 1st Ed 2022 Aske Plaat Download
58 pages
Implementation - of - MediaPipe - Hand - Tracking - For - IJACSA - and - IJARAI v1.1
No ratings yet
Implementation - of - MediaPipe - Hand - Tracking - For - IJACSA - and - IJARAI v1.1
6 pages
Fin Irjmets1656344520
No ratings yet
Fin Irjmets1656344520
8 pages
Dynamic Hand Gesture Detector Using Python and Open CV
No ratings yet
Dynamic Hand Gesture Detector Using Python and Open CV
3 pages
Wa0003.
No ratings yet
Wa0003.
10 pages
Ai RPT
No ratings yet
Ai RPT
11 pages
Hand Gesture Recognition - A Deep Dive Into Innovative Systems
No ratings yet
Hand Gesture Recognition - A Deep Dive Into Innovative Systems
4 pages
Kinect Sensor (Temporary) Utfhhd
No ratings yet
Kinect Sensor (Temporary) Utfhhd
6 pages
Applying Hand Gesture Recognition For User Guide Application Using Mediapipe
No ratings yet
Applying Hand Gesture Recognition For User Guide Application Using Mediapipe
8 pages
Research Paper
No ratings yet
Research Paper
3 pages
(23MCA1030) Industry Conclave Poster
No ratings yet
(23MCA1030) Industry Conclave Poster
1 page
114 Submission
No ratings yet
114 Submission
5 pages
Vol 7 Issue 4 46
No ratings yet
Vol 7 Issue 4 46
5 pages
Development of A Hand Pose Recognition System On An Embedded Computer Using Artificial Intelligence
No ratings yet
Development of A Hand Pose Recognition System On An Embedded Computer Using Artificial Intelligence
4 pages
BE02000041 Funda of AI Unit 1 Introduction
No ratings yet
BE02000041 Funda of AI Unit 1 Introduction
63 pages
Leveraging Ms Office With Ai in Boosting Productivity
No ratings yet
Leveraging Ms Office With Ai in Boosting Productivity
57 pages
Book1 Artifficial Intelligence
No ratings yet
Book1 Artifficial Intelligence
84 pages
Marengo Privacy and AI Sample 1693399544
No ratings yet
Marengo Privacy and AI Sample 1693399544
13 pages
Lec3-Posner Intro
No ratings yet
Lec3-Posner Intro
30 pages
RP Methodology
No ratings yet
RP Methodology
33 pages
Artificial Narrow Intelligence
No ratings yet
Artificial Narrow Intelligence
12 pages
Word To Vec
No ratings yet
Word To Vec
9 pages
Marking Scheme For L3 Comprehension Final 25
No ratings yet
Marking Scheme For L3 Comprehension Final 25
4 pages
Artificial Intelligence - Notes
No ratings yet
Artificial Intelligence - Notes
12 pages
1 Future of AI
No ratings yet
1 Future of AI
56 pages
Soccer Analysis Using Computer Virsion and Deep Learning 1
No ratings yet
Soccer Analysis Using Computer Virsion and Deep Learning 1
7 pages
b7 Comp Wk10
No ratings yet
b7 Comp Wk10
3 pages
Machine Learning Presentation
No ratings yet
Machine Learning Presentation
13 pages
An Introduction To Syntax
No ratings yet
An Introduction To Syntax
2 pages
At Just 19
No ratings yet
At Just 19
9 pages
Ics 2404 KBS Course Outline
No ratings yet
Ics 2404 KBS Course Outline
3 pages
Class Note
No ratings yet
Class Note
3 pages
CBSE Class10 AI SamplePaper
No ratings yet
CBSE Class10 AI SamplePaper
4 pages
Paragraph - Samiha
No ratings yet
Paragraph - Samiha
2 pages
Vàng Hồng và Xanh dương Dụng cụ Học tập Giới thiệu bản thân Bản thuyết trình Giáo dục
No ratings yet
Vàng Hồng và Xanh dương Dụng cụ Học tập Giới thiệu bản thân Bản thuyết trình Giáo dục
12 pages
Ai Introduction
No ratings yet
Ai Introduction
2 pages
IT & Agricultre
No ratings yet
IT & Agricultre
20 pages
Concepts Related To Artificial Intelligence
No ratings yet
Concepts Related To Artificial Intelligence
4 pages
RAG Based Chatbot Using LLMs
No ratings yet
RAG Based Chatbot Using LLMs
4 pages
Ai Research-4
No ratings yet
Ai Research-4
1 page
MCQ Unit 6
No ratings yet
MCQ Unit 6
7 pages
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
Percept: Fundamentals and Applications
From Everand
Percept: Fundamentals and Applications
Fouad Sabry
No ratings yet
Gesture Recognition: Unlocking the Language of Motion
From Everand
Gesture Recognition: Unlocking the Language of Motion
Fouad Sabry
No ratings yet
Augmented Reality: Exploring the Frontiers of Computer Vision in Augmented Reality
From Everand
Augmented Reality: Exploring the Frontiers of Computer Vision in Augmented Reality
Fouad Sabry
No ratings yet
Activity Recognition: Fundamentals and Applications
From Everand
Activity Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Computer Vision: Exploring the Depths of Computer Vision
From Everand
Computer Vision: Exploring the Depths of Computer Vision
Fouad Sabry
No ratings yet