Critical Pitfall in Reward Learning from Human Feedback

On how dynamics misconceptions impact human feedback

In (Shaheen et al., 2026), we show that human feedback in reward learning can reflect not only what people want, but also what people believe about the environment dynamics. If a feedback provider misunderstands how the world works, standard reward learning methods can mistake that misconception for a true preference.

More details coming soon…

Acknowledgements

This research was supported in part by NSF grant 2047186 and the 2025 ASU Graduate Student Government JumpStart Grant. The study was approved by the Arizona State University Institutional Review Board.

References

2026

  1. IJCAI
    Empirical Evidence and Analysis of a Critical Pitfall in Reward Learning from Human Feedback
    Taha Shaheen ,  Stephen G. West ,  and  Yu Zhang
    In Proceedings of the 35th International Joint Conference on Artificial Intelligence (IJCAI-ECAI 2026), Aug 2026