Aligning AI with Human Intent: RLHF in Action

カートのアイテムが多すぎます

ご購入は五十タイトルがカートに入っている場合のみです。

カートに追加できませんでした。

しばらく経ってから再度お試しください。

ウィッシュリストに追加できませんでした。

しばらく経ってから再度お試しください。

ほしい物リストの削除に失敗しました。

しばらく経ってから再度お試しください。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

Aligning AI with Human Intent: RLHF in Action

無料で聴く

ポッドキャストの詳細を見る

今ならプレミアムプランが3カ月月額99円

2026年5月12日まで。4か月目以降は月額1,500円で自動更新します。

概要

In this episode, we demystify how researchers teach AI models to behave helpfully and safely using Reinforcement Learning from Human Feedback (RLHF). We discuss why even very large models can generate undesired outputs and how RLHF addresses this by incorporating human preferences. You’ll learn how methods like InstructGPT were trained: first by gathering human-written demonstration responses, then by having humans rank model outputs to train a reward model, and finally using reinforcement learning (e.g. with PPO) to fine-tune the model so that it better aligns with what users want. We also talk about improvements like Constitutional AI and why aligning AI with human values is an ongoing challenge.

まだレビューはありません