Video AI Search Company Twelve Labs Raises $100M Series B Led by Amazon

Claire Weston

Published todayAbout 9 min read

Video AI startup Twelve Labs closed a $100 million Series B, with Amazon signing a multi-year compute deal alongside its investment — the first time a cloud giant has backed a video-AI company with both capital and infrastructure.

Who put up the $100 million — and what does the investor mix signal?

The $100M Series B was co-led by NEA and South Korea's Naver Ventures, with Amazon participating. Nvidia was already on the cap table.

This means → three types of capital converged at once: top-tier VC (NEA, Index Ventures), a chip giant (Nvidia), and a cloud platform (Amazon) — a strong vote of confidence in video AI's infrastructure-level value.

Other participants include Radical Ventures, Index Ventures, and Korea Investment Partners, giving the round a US-Korea cross-border profile.

Why did Amazon attach a compute contract, not just a cheque?

AWS signed a multi-year deal to run Twelve Labs' workloads on its custom Trainium chips and to launch new models on the AWS platform for developers.

In plain terms = Amazon didn't just invest — it plugged Twelve Labs into its cloud ecosystem: Amazon's chips run the models, Amazon's marketplace sells them to developers.

This reflects a new playbook among cloud providers competing for the AI application layer: lock startups onto your infrastructure with compute contracts, not just equity.

What problem does Twelve Labs' technology actually solve?

Video accounts for roughly 90% of global data, yet most of it sits in archives, unsearchable and unanalyzed.

The company's core products are two models: Marengo 3.0 — which converts raw video into data searchable across sound, speech, and motion — and Pegasus 1.5 — which structures that output into formats AI tools can parse.

In plain terms = Twelve Labs makes machines understand video. It does not generate video; it turns the vast stock of existing footage into searchable, analyzable assets.

How is this fundamentally different from GPT or Gemini?

CEO Jae Lee's thesis: video, not text, is the signal data closest to how humans perceive the world — the contrarian bet the company was founded on five years ago.

He explicitly noted that the latest frontier models (e.g. Fable 5, Mythos) "are still language models," fundamentally different from Twelve Labs' video-native approach.

This means → Twelve Labs is not bolting video onto a language model. It is building comprehension from video itself — a choice that puts its tech stack and business logic on an entirely separate track from mainstream LLMs.

Who is using this, and how far along is commercialization?

Current clients span sports, media, and nonprofits: Maple Leaf Sports & Entertainment (parent of the Toronto Raptors), Condé Nast International, and UNICEF.

The target market is broader — Hollywood studios, advertisers, social-media creators, sports clubs — with a core value proposition of monetizing dormant video assets.

The company is also developing a video agent product that can search, interpret, plan, and execute tasks via text commands. This means → the product roadmap is extending from "search tool" toward "autonomous AI agent that acts on video."

Content is for reference only, not financial advice.