https://www.interconnects.ai/p/the-american-deepseek-projectWhile America has the best AI models in Gemini, Claude, o3, etc. and the best infrastructure with Nvidia it’s rapidly losing its influence over the future directions of AI that unfold in the open-source and academic communities. Chinese organizations are releasing the most notable open models and datasets across all modalities, from text to robotics or video, and at the same time it’s common for researchers worldwide to read far more new research papers from Chinese organizations rather than their Western counterparts.This balance of power has been shifting rapidly in the last 12 months and reflects shifting, structural advantages that Chinese companies have with open-source AI — China has more AI researchers, data, and an open-source default.On the other hand, America’s open technological champions for AI, like Meta, are “reconsidering their open approach” after yet another expensive re-org and the political environment is dramatically reducing the interest of the world’s best scientists in coming to our country.It’s famous lore of the AI industry that much of the flourishing of progress around ChatGPT is downstream from Google Research’s, and the industry’s writ-large, practice of openly sharing the science of AI until approximately 2022. Stopping this practice, and the resulting power shifts mean it will be likely that the next “Transformer”-style breakthrough will be built on or related to Chinese AI models, AI chips, ideas, or companies. Countless Chinese individuals are some of the best people I’ve worked with, both at a technical and personal level, but this direction for the ecosystem points to AI models being less accountable, auditable, and trustworthy due to inevitable ties to the Chinese Government.The goal for my next few years of work is what I’m calling The American DeepSeek Project — a fully open-source model at the scale and performance of current (publicly available) frontier models, within 2 years. A fully open model, as opposed to just an “open weights” model, comes with data, training code, logs, and decision making — on top of the weights to run inference — in order to distribute the knowledge and access for how to train AI models fully.This project serves two goals, where balancing the scales with the pace of the Chinese ecosystem is only one piece:* Reclaim the AI research default home being on top of American (or Western) technologies and tools, and* Reduce the risk that the only viable AI ecosystem for cutting edge products in built atop of proprietary, closed, for-profit AI models.More people should be focused on this happening. A lot of people talk about how nice it would be to have “open-source AGI for all,” but very few people are investing in making it reality. With the right focus, I estimate this will take ~$100M-500M over the next two years.Within the context of recent trends, this is a future that has a diminishing, minute probability. I want to do this at Ai2, but it takes far more than just us to make it happen. We need advocates, peers, advisors, and compute.The time to do this is now, if we wait then the future will be in the balance of extremely powerful, closed American models counterbalancing a sea of strong, ubiquitous, open Chinese models. This is a world where the most available models are the hardest to trust. The West historically has better systems to create AI models that are trustworthy and fair across society. Consider how:* Practically speaking, there will never be proof that Chinese models cannot leave vulnerabilities in code or execute tools in malicious ways, even though it’s very unlikely in the near future.* Chinese companies will not engage as completely in the U.S. legal system on topics from fair use or non-consensual deepfakes.* Chinese models will over time shift to support a competitive software ecosystem that weakens many of America and the West’s strongest companies due to in-place compute restrictions.Many of these practical problems cannot be fixed by simply fine-tuning the model, such as Perplexity’s R1-1776 model. These are deep, structural realities that can only be avoided with different incentives and pretrained models.My goal is to make a fully open-source model at the scale of DeepSeek V3/R1 in the next two years. I’ve been starting to champion this vision in multiple places that summarizes the next frontier for performance on open-source language models, so I needed this document to pin it down.I use scale and not performance as a reference point for the goal because the models we’re collectively using as consumers of the AI industry haven’t really been getting much bigger. This “frontier scale” is a ballpark for where you’ve crossed into a very serious model, and, by the time a few years has gone by, the efficiency gains that would’ve accumulated by then will mean this model will far outperform DeepSeek V3. ...
続きを読む
一部表示