MoralityGym: A Benchmark for Evaluating Hierarchical Moral Alignment in Sequential Decision-Making Agents

This paper introduces MoralityGym, a benchmark for testing how well AI agents handle moral tradeoffs across sequential decision-making tasks rather than isolated prompts. It focuses on hierarchical moral alignment, where choices unfold over time in ethical-dilemma environments and can reveal failures that static evaluations may miss. The work is relevant to AI ethics because it offers a structured way to measure whether agent behavior stays aligned under longer-horizon decisions.