Episode 4: Skills and Smarts – What GR00T N1 Can Do

カートのアイテムが多すぎます

ご購入は五十タイトルがカートに入っている場合のみです。

カートに追加できませんでした。

しばらく経ってから再度お試しください。

ウィッシュリストに追加できませんでした。

しばらく経ってから再度お試しください。

ほしい物リストの削除に失敗しました。

しばらく経ってから再度お試しください。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

Episode 4: Skills and Smarts – What GR00T N1 Can Do

無料で聴く

ポッドキャストの詳細を見る

このコンテンツについて

Welcome back to our GR00T N1 deep dive. So far, we’ve covered the “what” and the “how” – what GR00T N1 is made of and how it was trained. Now it’s time for the exciting part: What can GR00T N1 actually do? In this episode, we’ll explore the capabilities of this model and why they’re a big leap forward for robotics. We’ll talk about the tasks it can perform, how well it performs them, and how it stacks up against previous methods. The ultimate goal of GR00T N1 is to give robots a broad set of generalized skills. Straight out of training, without heavy specialization, GR00T N1 can tackle a range of common manipulation tasks. These include things like: Grasping objects: The model can control a robot to reach out and grasp items, whether it’s picking up a tool, a toy, or a package. Importantly, it can handle both single-handed grasps or using two hands together for larger objects.Moving and placing objects: Once it picks something up, it can move it to a desired location. This could be as simple as moving a box from the floor to a shelf, or as involved as rearranging objects on a table following instructions.Hand-to-hand transfers: GR00T N1 even learned behaviors like passing an object from one hand to the other. Imagine a humanoid robot that picks up a can with its left hand, then transfers it to the right hand to place it on a higher shelf – that kind of coordinated bimanual action is within the model’s repertoire.Multi-step tasks: Because of the “thinker” part of the model, it can plan multiple steps in sequence. So if you tell the robot, “open the cabinet, then take out the bowl and put it on the counter,” GR00T N1 can break that down: open door (one action sequence), reach for bowl (next sequence), place bowl (next). It keeps the context of the overall goal so it can chain these skills together in the right order. What’s truly impressive is that GR00T N1 can generalize these skills to new combinations and contexts. It wasn’t explicitly pre-programmed for each exact scenario. Instead, because it has seen so many variations during training, it can adapt on the fly. For example, it learned the concept of “grasping” in general, so it can apply it to objects it hasn’t seen before, within reason. Or it understands the notion of left vs right hand, so it can decide to switch hands if a task would be easier that way. Now, how well does it do these things? NVIDIA and researchers put GR00T N1 through a battery of tests, both in simulations and in some real-world trials. In simulation benchmarks, GR00T N1 outperformed previous state-of-the-art models that were trained for each specific task (like imitation learning models specialized to certain environments). For instance, on standard robotic test tasks (such as stacking blocks or navigating a simple obstacle course to reach an object), GR00T N1 achieved higher success rates than models that didn’t have its benefit of broad training. This is remarkable because those specialized models had an advantage: they were tuned just for that task, whereas GR00T N1 was more of a generalist. Yet, the foundation model’s massive training gave it an edge, demonstrating the power of breadth of knowledge. One key area of evaluation was multiple robot embodiments. GR00T N1 was tested on controlling different kinds of robots – not just one specific humanoid. In simulations, they tried its brain on, say, a dexterous two-armed system and also on a different bipedal robot, and possibly even wheeled robots. The results showed that GR00T N1 could adapt with minimal adjustment, whereas traditionally you’d need to train a new model from scratch for each robot. This cross-embodiment skill is a game changer: it hints that we could have a single intelligent model that powers many kinds of robots in the future, much like one operating system can run on different hardware. Beyond simulation, let’s talk about real-world demonstrations, because that’s the true test. One milestone was deploying GR00T N1 on a real humanoid robot known as the Fourier GR-1. The GR-1 is a human-sized, bipedal robot developed by a company called Fourier Intelligence. Using GR00T N1 as its brain, the GR-1 was tasked with some language-conditioned bimanual manipulation tasks – essentially, the robot was given verbal instructions to use both its hands to do something, like “pick up the two objects and put them together” kind of tasks. The outcome? The robot performed impressively well, completing the tasks with high success rates. Even more striking was the data efficiency observed – the team didn’t need to collect months of new data on the real robot to make this work. They did a light fine-tuning with a small amount of robot-specific data, and GR00T N1 was able to generalize its learned skills to this actual machine’s body. Achieving fluent bimanual action on a physical humanoid is a big step forward, since coordinating two arms, vision, and ...