『Why Nvidia builds open models with Bryan Catanzaro』のカバーアート

Why Nvidia builds open models with Bryan Catanzaro

Why Nvidia builds open models with Bryan Catanzaro

無料で聴く

ポッドキャストの詳細を見る

概要

One of the big stories of 2025 for me was how Nvidia massively stepped up their open model program — more releases, higher quality models, joining a small handful of companies releasing datasets, etc. In this interview, I sat down with one of the 3 VP’s leading the effort of 500+ technical staff, Bryan Catanzaro, to discuss:* Their very impressive Nemotron 3 Nano model released in Dec. 2025, and the bigger Super and Ultra variants coming soon,* Why Nvidia’s business clearly benefits from them building open models,* How the Nemotron team culture was crafted in pursuit of better models,* Megatron-LM and the current state of open-source training software,* Career reflections and paths into AI research,* And other topics.The biggest takeaway I had from this interview is how Nvidia understands their unique roll as a company that and both build and directly capture the value they get from building open language models, giving them a uniquely sustainable advantage. Bryan has a beautiful analogy for open models this early in AI’s development, and how they are a process of creating “potential energy” for AI’s future applications.I hope you enjoy it!Guest: Bryan Catanzaro, VP Applied Deep Learning Research (ADLR), NVIDIA. X: @ctnzr, LinkedIn, Google Scholar.Listen on Apple Podcasts, Spotify, YouTube, and where ever you get your podcasts. For other Interconnects interviews, go here.Nemotron Model Timeline2019–2022 — Foundational Work* Megatron-LM (model parallelism framework that has become very popular again recently; alternatives: DeepSpeed, PyTorch FSDP). * NeMo Framework (NVIDIA’s end-to-end LLM stack: training recipes, data pipelines, evaluation, deployment).Nov 2023 — Nemotron-3 8B: Enterprise-ready NeMo models. Models: base, chat-sft, chat-rlhf, collection. Blog.Feb 2024 — Nemotron-4 15B: Multilingual LLM trained to 8T tokens. Paper.Jun 2024 — Nemotron-4 340B: Major open release detailing their synthetic data pipeline. Paper, blog. Models: Instruct, Reward. Jul–Sep 2024 — Minitron / Nemotron-Mini: First of their pruned models, pruned from 15B. Minitron-4B (base model), Nemotron-Mini-4B-Instruct. Paper, code.Oct 2024 — Llama-3.1-Nemotron-70B: Strong post-training on Llama 3.1 70B. Model, collection. Key dataset — HelpSteer2, paper.Mar–Jun 2025 — Nemotron-H: First hybrid Mamba-Transformer models for inference efficiency. Paper, research page, blog. Models: 8B, 47B, 4B-128K.May 2025 — Llama-Nemotron: Efficient reasoning models built ontop of Llama (still!). Paper.Sep 2025 — Nemotron Nano 2: 9B hybrid for reasoning, continuing to improve in performance. 12B base on 20T tokens (FP8 training) pruned to 9B for post-training. Report, V2 collection.Nov 2025 — Nemotron Nano V2 VL: 12B VLM. Report.Dec 2025 — Nemotron 3: Nano/Super/Ultra family, hybrid MoE, up to 1M context. Super/Ultra H1 2026. Nano: 25T tokens, 31.6B total / ~3.2B active, releases recipes + code + datasets. Papers: White Paper, Technical Report. Models: Nano-30B-BF16, Base, FP8.Nemotron’s Recent DatasetsNVIDIA began releasing substantially more data in 2025, including pretraining datasets — making them one of few organizations releasing high-quality pretraining data at scale (which comes with non-negligible legal risk).Pretraining DataCollection — CC-v2, CC-v2.1, CC-Code-v1, Code-v2, Specialized-v1, CC-Math-v1. Math paper: arXiv:2508.15096.Post-Training DataCore post-training dumps (SFT/RL blends):* Llama Nemotron Post-Training v1.1 (Apr 2025)* Nemotron Post-Training v1 (Jul 2025)* Nemotron Post-Training v2 (Aug 2025)2025 reasoning/code SFT corpora:* OpenMathReasoning (Apr 2025)* OpenCodeReasoning (Apr 2025), OpenCodeReasoning-2 (May 2025)* AceReason-1.1-SFT (Jun 2025)* Nemotron-Math-HumanReasoning (Jun 2025), Nemotron-PrismMath (Apr 2025)NeMo Gym RLVR datasets: CollectionNemotron v3 post-training (Dec 2025): CollectionHelpSteer (human feedback/preference):* HelpSteer (Nov 2023)* HelpSteer2 (Jun 2024)* HelpSteer3 (Mar 2025)And others, not linked here.Chapters* 00:00:00 Intro & Why NVIDIA Releases Open Models* 00:05:17 Nemotron’s two jobs: systems R&D + ecosystem support* 00:15:23 Releasing datasets, not just models* 00:22:25 Organizing 500+ people with “invitation, not control”* 0:37:29 Scaling Nemotron & The Evolution of Megatron* 00:48:26 Career Reflections: From SVMs to DLSS* 00:54:12 Lessons from the Baidu Silicon Valley AI Lab* 00:57:25 Building an Applied Research Lab with Jensen Huang * 01:00:44 Advice for Researchers & Predictions for 2026Transcript00:00:06 Nathan Lambert: Okay. Hey, Bryan. I’m very excited to talk about Nemotron. I think low-key, one of the biggest evolving stories in twenty-five of open models, outside the obvious things in China that everybody talks about, that gets a ton of attention. So th- thanks for coming on the pod.00:00:22 Bryan Catanzaro: Oh, yeah, it’s my honor.00:00:23 Nathan Lambert: So I wanted to start, and some of these questions are honestly ...
まだレビューはありません