Safe MPC Alignment with Human Directional Feedback

1Arizona State University, 2Northwestern University,
3University of Pennsylvania
The holistic framework of the proposed method.

The proposed method use human directional feedback to do safety alignment between MPC controller and Human.

Abstract

In safety-critical robot control or planning, specifying safety constraints manually or learning them from demonstrations can be challenging. In this paper, we propose an alignment method to enable a robot to learn a safety constraint in its model predictive control (MPC) policy from human directional feedback online. This method is based on the empirical observation that human directional feedback, when available, tends to guide the robot toward safer regions. The proposed method only requires the direction of human feedback to update the learning hypothesis space. It is certifiable, providing an upper bound on the total number of human feedback in the case of successful learning of safety constraints, or declaring the misspecification of the hypothesis space, i.e., the true implicit safety constraint cannot be found within the specified hypothesis space. We evaluated the proposed method using numerical examples and user studies in two developed simulation games. Additionally, we implemented and tested the proposed method on a real-world Franka robot arm performing water-pouring tasks in a user study. The simulation and experimental results demonstrate the efficacy and efficiency of our method, showing that it enables a robot to successfully learn safety constraints with a small handful (tens) of human directional corrections.

Contribution

(I) We propose the method of safe MPC alignment with human directional feedback. The method enables a robot to online learn a safe MPC policy from a human directional corrections. To our knowledge, this is the first work for online interactive learning of safety constraint from human feedback.

(II) We theoretically show the human-data efficiency of the proposed safe MPC alignment method: the proposed method is guaranteed to have exponential convergence with finite human interactions/corrections. Furthermore, we also prove that the proposed algorithm enables the identification of hypothesis space misspecification failure.

(III) We conduct extensive user studies to validate the effectiveness of the proposed method, including user studies on two computer simulation games and one real-world experiment of teaching a Franka robot arm for water pouring. The results demonstrate the effectiveness and efficiency of the proposed method, and show that robots can successfully learn a safe MPC policy with only tens of human corrections.

Video

Mujoco Demos

Drone Navigation Game

The user use directional corrections to guide the UAV through a gate in mujoco.

Arm Reaching Game

The user use directional corrections to guide the Franka robot arm's gripper through two bars in mujoco.

BibTeX

@misc{xie2024safempcalignmenthuman,
      title={Safe MPC Alignment with Human Directional Feedback}, 
      author={Zhixian Xie and Wenlong Zhang and Yi Ren and Zhaoran Wang and George J. Pappas and Wanxin Jin},
      year={2024},
      eprint={2407.04216},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2407.04216}, 
}