Cognitive Episodes in LLM Reasoning Traces Enable Interpretable Human Item Difficulty Prediction

カートのアイテムが多すぎます

ご購入は五十タイトルがカートに入っている場合のみです。

カートに追加できませんでした。

しばらく経ってから再度お試しください。

ウィッシュリストに追加できませんでした。

しばらく経ってから再度お試しください。

ほしい物リストの削除に失敗しました。

しばらく経ってから再度お試しください。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

Cognitive Episodes in LLM Reasoning Traces Enable Interpretable Human Item Difficulty Prediction

無料で聴く

ポッドキャストの詳細を見る

Predicting how hard an exam question will be for human test-takers — without running expensive human trials — would transform educational assessment. This paper proposes using the reasoning traces of large language models as a proxy for human cognitive effort. Rather than treating these traces as raw text, Epi2Diff structures them into meaningful "cognitive episodes" — functional states like planning, implementing, and verifying — and uses the dynamics between these states to predict difficulty. Tested on four real-world human difficulty datasets including SAT-derived benchmarks, it consistently outperforms strong baselines. Applications include automated test construction, adaptive learning platforms, and AI-assisted item difficulty calibration for standardized assessments. Authors: Chenguang Wang, Ming Li, Xinyue Zeng, Zhuochun Li, Hong Jiao, Tianyi Zhou, Dawei Zhou Paper: https://arxiv.org/abs/2606.28186v1

まだレビューはありません