Real-time hand shape and motion trackers are an invaluable part of sign language recognition and gesture control systems, not to mention a number of augmented reality experiences. But they’re often hobbled by occlusion and a lack of contrast patterns, preventing them from performing reliably or robustly.
Those challenges and others motivated scientists at Google to investigate a new computer vision approach to hand perception — one bolstered by machine learning. They say that in experiments, it managed to infer up to 21 3D points of a hand (or multiple hands) on a mobile phone from just a single frame.
Google previewed the new technique at the 2019 Conference on Computer Vision and Pattern Recognition in June and recently implemented it in MediaPipe, a cross-platform framework for building multimodal applied machine learning pipelines to process perceptual data of different modalities (such as video and audio). Both the source code and an end-to-end usage scenario are available on GitHub.
Read more here: