AdaDexGrasp: Learning Adaptive Dexterous Grasping from Single Demonstrations

Abstract

How can robots learn dexterous grasping skills efficiently and apply them adaptively based on user instructions? This work tackles two key challenges: efficient skill acquisition from limited human demonstrations and context-driven skill selection. We introduce AdaDexGrasp, a framework that learns a library of grasping skills from a single human demonstration per skill and selects the most suitable one using a vision-language model (VLM). To improve sample efficiency, we propose a trajectory following reward that guides reinforcement learning (RL) toward states close to a human demonstration while allowing flexibility in exploration. To learn beyond the single demonstration, we employ curriculum learning, progressively increasing object pose variations to enhance robustness. At deployment, a VLM retrieves the appropriate skill based on user instructions, bridging low-level learned skills with high-level intent. We evaluate AdaDexGrasp in both simulation and real-world settings, showing that our approach significantly improves RL efficiency and enables learning human-like grasp strategies across varied object configurations. Finally, we demonstrate zero-shot transfer of our learned policies to a real-world PSYONIC Ability Hand, with a 90% success rate across objects, significantly outperforming the baseline.

Method Overview

AdaDexGrasp consists of three key components: (i) Converting human videos into robot demonstrations via hand motion retargeting and closed-loop inverse kinematics algorithm; (ii) Learning a grasp policy through trajectory-guided reinforcement learning with a curriculum. The reward scheme we designed consists of trajectory-following reward, contact reward, and height reward; (iii) Skill selection using vision-language models to ground robot behavior into user preference.

Real-World Results

Diverse Grasp Poses: AdaDexGrasp can dynamically select the most suitable grasp pose for an object when faced with significantly different placement orientations of the same item. (Video played at 3× speed)

Diverse Object Pose Configurations: Even with small shifts or adjustments in object placement for similar items, AdaDexGrasp consistently succeeds in executing a reliable grasp. (Video played at 3× speed)

Experiments

Main Result

Quantitative results in simulation: We evaluate our policies, each trained on different grasp poses from distinct human demonstrations, across objects with varying visual appearances, sizes, and physical properties. We compare our method with ViViDex, showing that ViViDex is sensitive to the morphological gap between the human hand and the robot, highlighting the robustness of our reward function.

Analysis of Trajectory Following vs. Trajectory Mapping Reward

We compare the proposed trajectory following reward with a standard trajectory mapping reward.
Experiments on the sugar pose-2 scenario show that our reward guides the robot to achieve a pre-grasp pose closer to the human demonstration. Unlike strict trajectory mapping, our method allows flexibility in execution, enabling the robot to refine the grasp for better success.
Even if fingertip distances temporarily increase after grasping, the policy achieves a more effective grasp overall. This shows that strict adherence to reference trajectories is not always optimal, as human demonstrations are imperfect and motion retargeting introduces errors. By allowing controlled deviation, the trajectory following reward balances fidelity to demonstration with adaptability, resulting in more robust grasping across diverse objects.

Illustration of pre-grasp and final grasp states

Comparison between the proposed trajectory following reward and a standard trajectory mapping reward

Ablation Study of Reward Terms

Ablation study in simulation: We conduct ablation studies across multiple grasping environments to evaluate each reward term. Removing the trajectory following reward drastically reduces success, omitting contact reward impairs hand-object interactions, and removing height reward lowers guidance toward target poses. Overall, the full pipeline achieves the best performance, highlighting the importance of each component.

Effects of Curriculum Learning

Impact of Curriculum Learning: Curriculum learning improves training efficiency by gradually increasing object pose variability while maintaining high success rates. Policies trained with a structured curriculum converge faster and achieve higher success than direct training, demonstrating that progressive exposure to task difficulty enhances reinforcement learning performance and sample efficiency.

Evaluating Adaptive Skill Selection

The percentage of times each skill is selected for a given task.

Overall success rates of skill selection and execution.

We obtain three distinct skills:
Skill 1: Grasp the bottom of a standing bottle and lift it.
Skill 2: Grasp the upper middle of a standing bottle and lift it.
Skill 3: Grasp a lying bottle, rotate it upright, and lift it.

We define five tasks that vary based on the object's initial pose and the given human preference:
T1: The cleanser is initially standing, and the human asks the robot to grasp it with no preference.
T2: The cleanser is initially standing, and the human asks the robot to grasp it with bottom.
T3: The cleanser is initially standing, and the human asks the robot to grasp it with top.
T4: The cleanser is initially lying, and the human asks the robot to grasp it with no preference.
T5: The cleanser is initially lying, and the human asks the robot to grasp it with bottom.

Conclusion: The VLM reliably selects the appropriate skill according to human intent and environmental context, achieving high success across tasks. In contrast, random selection often fails to choose the correct skill.

BibTeX

@misc{shi2025learningadaptivedexterousgrasping,
    title={Learning Adaptive Dexterous Grasping from Single Demonstrations}, 
    author={Liangzhi Shi and Yulin Liu and Lingqi Zeng and Bo Ai and Zhengdong Hong and Hao Su},
    year={2025},
    eprint={2503.20208},
    archivePrefix={arXiv},
    primaryClass={cs.RO},
    url={https://arxiv.org/abs/2503.20208}, 
}