HandOS: 3D Hand Reconstruction in One Stage



Peking University IDEA Research University of Chinese Academy of Sciences
This work was done during Xingyu Chen's academic visit at IDEA Research and while Zhuheng Song was an intern at IDEA Research.

🛠️ Application of HandOS in real-world scenes


🧬 What is HandOS

We present HandOS, a one-stage approach for hand reconstruction that substantially streamlines the paradigm. Additionally, we demonstrate that HandOS effectively adapts to diverse complex scenarios, making it highly applicable to real-world applications.

👣 A brief overview

Existing approaches of hand reconstruction predominantly adhere to a multi-stage framework, encompassing detection, left-right classification, and pose estimation. This paradigm induces redundant computation and cumulative errors. In this work, we propose HandOS, an end-to-end framework for 3D hand reconstruction. Our central motivation lies in leveraging a frozen detector as the foundation while incorporating auxiliary modules for 2D and 3D keypoint estimation. In this manner, we integrate the pose estimation capacity into the detection framework, while at the same time obviating the necessity of using the left-right category as a prerequisite. Specifically, we propose an interactive 2D-3D decoder, where 2D joint semantics is derived from detection cues while 3D representation is lifted from those of 2D joints. Furthermore, hierarchical attention is designed to enable the concurrent modeling of 2D joints, 3D vertices, and camera translation. Consequently, we achieve an end-to-end integration of hand detection, 2D pose estimation, and 3D mesh reconstruction within a one-stage framework, so that the above multi-stage drawbacks are overcome. Meanwhile, the HandOS reaches state-of-the-art performances on public benchmarks.

test
Overview of HandOS framework. Left: overall architecture. Right: interactive 2D-3D decoder. With off-the-shelf features, bounding boxes, and category scores from a frozen detector, the interactive 2D-3D decoder, including query filtering, expansion, lifting, and interactive layers, can understand hand pose and shape via estimating keypoints in both 2D and 3D spaces.

💯 Performance

Evaluated on 4 benchmarks, the HandOS achieves state-of-the-art performance (Green and light green indicate the best and second number).
Even when compared with methods trained on large-scale data (i.e., HaMeR and Hamba), HandOS is still competitive.

🗓 Future work and collaboration opportunities.

1. End-to-end hand-object representation.

2. HandOS for robot data collection and annotation.

3. Hand skill learning from internet-scale data.

If you have any interest in collaboration, feel free to contact us!

BibTeX


        @article{bib:handos,
          title={HandOS: 3D Hand Reconstruction in One Stage},
          author={Chen, Xingyu and Song, Zhuheng and Jiang, Xiaoke and Hu, Yaoqing and Yu, Junzhi and Zhang, Lei},
          journal={arXiv preprint},
          year={2024}
        }