The paper critiques the current evaluation methods for adaptive tutoring systems, arguing that classification accuracy metrics often do not accurately reflect their effectiveness. It proposes a new paradigm called LEOPARD (Learner Effort-Outcomes Paradigm) to assess both the effort expended by the system and the outcomes achieved by students without requiring randomized controlled trials. The authors stress the need for better evaluation metrics that correlate more meaningfully with real learning gains.