We evaluate HSBC on a diverse set of MPC tasks, including locomotion (Cartpole-Swingup, Walker-Walk, Humanoid-Standup, Go2-Standup) and dexterous manipulation (Allegro-Cube, Allegro-Bunny).
Our method achieves performance comparable to or better than state-of-the-art baselines such as PREF-BI and Disagreement Learning in noise-free settings. Under high levels of incorrect human feedback (up to 30%), HSBC significantly outperforms existing approaches in both reward accuracy and task performance.
These results demonstrate HSBC's strong robustness and sample efficiency in aligning with true reward functions, even in the presence of substantial label noise.