NVIDIA Introduces a 4-Bit Pretraining Methodology Using NVFP4, Validated on a 12B Hybrid — 2026-05-18

カートのアイテムが多すぎます

ご購入は五十タイトルがカートに入っている場合のみです。

カートに追加できませんでした。

しばらく経ってから再度お試しください。

ウィッシュリストに追加できませんでした。

しばらく経ってから再度お試しください。

ほしい物リストの削除に失敗しました。

しばらく経ってから再度お試しください。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

NVIDIA Introduces a 4-Bit Pretraining Methodology Using NVFP4, Validated on a 12B Hybrid — 2026-05-18

無料で聴く

ポッドキャストの詳細を見る

## Short Segments Today, NVIDIA unveils a groundbreaking 4-bit pretraining methodology using NVFP4, validated on a 12-billion-parameter hybrid Mamba-Transformer model. This development could redefine efficiency in AI training. Coming up, we'll explore how this innovation could change the landscape of large language model training. ## Feature Story NVIDIA has introduced a new 4-bit pretraining methodology using NVFP4, marking a significant advancement in AI model training. This approach was validated on a 12-billion-parameter hybrid Mamba-Transformer model, trained on an unprecedented 10 trillion tokens. The NVFP4 format, supported by Blackwell Tensor Cores, represents a leap forward in efficiency, potentially halving memory usage and reducing computational demands compared to the current FP8 standard. Traditionally, pretraining large language models (LLMs) in FP8 has been the norm, but the shift to a 4-bit floating point format has posed challenges due to the compressed dynamic range and increased quantization error over long token sequences. NVIDIA's NVFP4 addresses these issues by introducing a microscaling format that enhances precision and stability, even at reduced bit levels. NVFP4's innovation lies in its structure. It reduces the block size from 32 to 16 elements, allowing for a more precise dynamic range. The block scale factors are stored in a format that trades exponent range for mantissa precision, ensuring that the maximum representable values are closely mapped. Additionally, NVFP4 incorporates a second scaling level with an FP32 per-tensor scale, maintaining the block scales within range and ensuring at least 6.25% of values in each block are accurately represented. This methodology was put to the test with a 12-billion-parameter hybrid Mamba-Transformer model, achieving a performance score of 62.58% on the MMLU-Pro 5-shot benchmark, closely matching the 62.62% score of the FP8 baseline. This demonstrates that NVFP4 can maintain high accuracy levels while significantly reducing resource requirements. The implications of this development are substantial. By enabling efficient training of large models with reduced precision, NVFP4 could lower the cost and time associated with AI model development. This is particularly relevant as the demand for more complex and capable AI systems grows, necessitating models that can handle dense technical problems and long-context analysis efficiently. Moreover, NVFP4's compatibility with NVIDIA's Transformer Engine means that developers can integrate this format into existing workflows, leveraging the benefits of reduced memory and compute usage without sacrificing performance. This could accelerate the deployment of advanced AI models across various industries, from natural language processing to autonomous systems. Looking ahead, the success of NVFP4 in pretraining large models could pave the way for further innovations in low-precision AI training. As researchers continue to explore the potential of 4-bit formats, we may see even more efficient and powerful AI systems emerge, capable of tackling increasingly complex tasks with minimal resource expenditure. In summary, NVIDIA's introduction of NVFP4 represents a pivotal moment in AI model training, offering a path to more efficient and cost-effective development of large language models. As this technology gains traction, it could transform the landscape of AI research and deployment, making advanced capabilities more accessible and sustainable.

まだレビューはありません