『Arcee AI goes all-in on open models built in the U.S.』のカバーアート

Arcee AI goes all-in on open models built in the U.S.

Arcee AI goes all-in on open models built in the U.S.

無料で聴く

ポッドキャストの詳細を見る

概要

Arcee AI is a the startup I’ve found to be taking the most real approach to monetizing their open models. With a bunch of experience (and revenue) in the past in post-training open models for specific customer domains, they realized they needed to both prove themselves and fill a niche by pretraining larger, higher performance open models built in the U.S.A. They’re a group of people that are most eagerly answering my call to action for The ATOM Project, and I’ve quickly become friends with them.Today, they’re releasing their flagship model — Trinity Large — as the culmination of this pivot. In anticipation of this release, I sat down with their CEO Mark McQuade, CTO Lucas Atkins, and pretraining lead, Varun Singh, to have a wide ranging conversation on:* The state (and future) of open vs. closed models,* The business of selling open models for on-prem deployments,* The story of Arcee AI & going “all-in” on this training run,* The ATOM project,* Building frontier model training teams in 6 months,* and other great topics. I really loved this one, and think you well too.The blog post linked above and technical report have many great details on training the model that I’m still digging into. One of the great things Arcee has been doing is releasing “true base models,” which don’t contain any SFT data or learning rate annealing. The Trinity Large model, an MoE with 400B total and 13B active tokens trained to 17 trillion tokens is the first publicly shared training run at this scale on B300 Nvidia Blackwell machines. As a preview, they shared the scores for the underway reasoning model relative to the who’s-who of today’s open models. It’s a big step for open models built in the U.S. to scale up like this. I won’t spoil all the details, so you still listen to the podcast, but their section of the blogpost on cost sets the tone well for the podcast, which is a very frank discussion on how and why to build open models:When we started this run, we had never pretrained anything remotely like this before.There was no guarantee this would work. Not the modeling, not the data, not the training itself, not the operational part where you wake up, and a job that costs real money is in a bad state, and you have to decide whether to restart or try to rescue it.All in—compute, salaries, data, storage, ops—we pulled off this entire effort for $20 million. 4 Models got us here in 6 months.That number is big for us. It’s also small compared to what frontier labs spend just to keep the lights on. We don’t have infinite retries.Once I post this, I’m going to dive right into trying the model, and I’m curious what you find too.Listen on Apple Podcasts, Spotify, YouTube, and where ever you get your podcasts. For other Interconnects interviews, go here.GuestsLucas Atkins —X,LinkedIn — CTO; leads pretraining/architecture, wrote the Trinity Manifesto.Mark McQuade — X, LinkedIn — Founder/CEO; previously at Hugging Face (monetization), Roboflow. Focused on shipping enterprise-grade open-weight models + tooling.Varun Singh — LinkedIn — pretraining lead.Most of this interview is conducted with Lucas, but Mark and Varun make great additions at the right times.LinksCore:* Trinity Large (400B total, 13B active) collection, blog post. Instruct model today, reasoning models soon.* Trinity Mini, 26B total 3B active (base, including releasing pre-anneal checkpoint)* Trinity Nano Preview, 6B total 1B active (base)* Open Source Catalog: https://www.arcee.ai/open-source-catalog* API Docs and Playground (demo)* Socials: GitHub, Hugging Face, X, LinkedIn, YouTubeTrinity Models:* Trinity models page: https://www.arcee.ai/trinity* The Trinity Manifesto (I recommend you read it): https://www.arcee.ai/blog/the-trinity-manifesto* Trinity HF collection — (Trinity Mini & Trinity Nano Preview)Older models:* AFM-4.5B (and base model) — their first open, pretrained in-house model (blog post).* Five open-weights models (blog): three production models previously exclusive to their SaaS platform plus two research models, released as they shifted focus to AFM — Arcee-SuperNova-v1, Virtuoso-Large, Caller, GLM-4-32B-Base-32K, HomunculusOpen source tools:* MergeKit — model merging toolkit (LGPL license return)* DistillKit — knowledge distillation library* EvolKit — synthetic data generation via evolutionary methodsRelated:* Datology case study w/ ArceeChapters* 00:00:00 Intro: Arcee AI, Trinity Models & Trinity Large* 00:08:26 Transitioning a Company to Pre-training* 00:13:00 Technical Decisions: Muon and MoE* 00:18:41 Scaling and MoE Training Pain* 00:23:14 Post-training and RL Strategies* 00:28:09 Team Structure and Data Scaling* 00:31:31 The Trinity Manifesto: US Open Weights* 00:42:31 Specialized Models and Distillation* 00:47:12 Infrastructure and Hosting 400B* 00:50:53 Open Source as a Business Moat* 00:56:31 Predictions: Best Model in 2026* 01:02:29 Lightning Round & ...
まだレビューはありません