🔍

Research-Based Review

This review is based on documented features, verified pricing, and community sentiment — not hands-on testing. See how we research →

🜂

MiniMax M3

platform.minimax.io

MiniMax M3 Review 2026 — Open-Weight Frontier Coding, 1M Context, Native Multimodal

Name: MiniMax M3 Review 2026
Item: MiniMax M3
Rating: 7.9
Author: Marcus Veil

📅 June 2026 ⏱ 13 min read 📊 Research-based

7.9

Editor's Verdict: The Most Technically Ambitious Open-Weight Model of 2026 — With Caveats Attached

Frontier-level coding, a 1M context, and native multimodal in one open-weight model at $0.60/M — held back by vendor-only benchmarks, data-residency exposure, and new-model maturity.

MiniMax M3 launched June 1, 2026 as one of the first open-weight models to combine three things at once: frontier-level coding (59% SWE-Bench Pro, edging GPT-5.5's 58.6%), a 1-million-token context window, and native text/image/video input. The new MiniMax Sparse Attention (MSA) architecture delivers 15.6x faster decoding at 1M context, and at $0.60/M input it runs roughly 8x cheaper than Claude Opus 4.8. The constraints are equally specific: the benchmarks are vendor-reported and not yet independently verified, the open weights hadn't shipped at launch, and a Shanghai-based developer raises the same data-residency questions as DeepSeek and Kimi. Genuinely interesting — but a 60-to-90-day "wait and verify" tool for most production use.

Researched by Marcus Veil, AI Tools Analyst & Industry Writer · AIToolGrade Editorial Team · Last verified June 2026

⚠️ Editorial Disclosure

AIToolGrade uses Claude (Anthropic) for content production. Claude Opus 4.8 is a direct competitor to MiniMax M3. We have applied our standard research methodology — documented features, verified pricing, community sentiment — and have not received compensation from MiniMax. Where M3's vendor-reported benchmarks compare it favorably to Claude, we have flagged that those figures are not yet independently verified.

What is MiniMax M3?

MiniMax M3 is an open-weight frontier language model launched June 1, 2026 by MiniMax, a Shanghai-based AI lab. The pitch is unusually specific for a model release: M3 is positioned as the first open-weight model to do three things simultaneously — code at a frontier level, hold a million tokens of context, and accept native multimodal input across text, image, and video. Plenty of models do one or two of those. M3 claims all three in a single open-weight system, and that combination is what makes it worth a serious look rather than a passing one.

The headline coding number is 59.0% on SWE-Bench Pro, the hardest of the real-world coding benchmarks — narrowly ahead of GPT-5.5's 58.6%. Carry the obligatory caveat with that figure everywhere it appears: it's vendor-reported, measured under MiniMax's own evaluation setup, and not yet independently confirmed. The architecture underneath is the genuinely new part. MiniMax Sparse Attention (MSA) brings back sparse attention the company had deliberately stripped out of the previous M2 generation, this time with a lightweight index branch that scans tokens and decides which key-value blocks actually need attention. The result MiniMax reports is 15.6x faster decoding and 9.7x faster prefill at 1M context versus M2 — a direct attack on the cost of running long-context inference.

Then there's the price. At $0.60/M input tokens (with a launch-week promo at $0.30/M), M3 costs roughly 8x less than Claude Opus 4.8 at what MiniMax claims is comparable benchmark performance. The open weights — under an MIT-style license — are expected around June 10-11, 2026 — verify current availability on HuggingFace before planning self-hosted deployments. Once they land, full self-hosting is on the table for anyone with the infrastructure. If the benchmarks hold up to independent scrutiny, that is one of the strongest cost-performance propositions in the open-weight category. The conditional in that sentence is the whole review.

Who Is It For?

M3 is an API-first frontier model, and the fit is sharpest for developers who actually need its specific combination rather than any one piece. The clearest match is multimodal agentic work. If your workload involves reading screenshots, parsing diagrams, processing short video, and then writing or fixing code off that input, M3 is unusual: it does all of that natively in a single open-weight model, with no separate vision model bolted on. That combination — multimodal input plus frontier coding plus a million-token window — doesn't have a direct open-weight equivalent right now.

It also fits high-volume agentic pipelines where token cost is the binding constraint. At $0.60/M input against Claude Opus 4.8's $5, the per-call math changes what's economically sane to run — batch refactors, long-running agent loops, large-corpus analysis. And for teams that need full data sovereignty, the open weights (once they ship) make self-hosting a real option: every token stays inside your own environment, API cost drops to infrastructure-only. Researchers evaluating the MSA sparse-attention architecture are another natural audience — the technical contribution here is the kind of thing worth studying directly. So are teams already running DeepSeek or Kimi who want to compare their open-weight options head to head.

The misfits are worth naming just as plainly. Enterprise teams with strict US or EU data-residency requirements face the same Shanghai-hosting concerns covered below — the same gate that applies to DeepSeek V4 and Kimi Code. Anyone whose production decision depends on independently verified benchmarks should wait; at the time of this review, the numbers are MiniMax's own. Developers who want native IDE integration won't find it — M3 is an API and a set of weights, not a VS Code extension. And if you needed the weights at launch, they weren't there: the HuggingFace release was expected around June 10-11, 2026 — verify current availability on HuggingFace before planning self-hosted deployments. If you want a polished, turnkey agentic coding tool today, Claude Code or Cursor are the more practical starting points.

Pros and Cons

What works well

59% SWE-Bench Pro at $0.60/M input — if the benchmarks verify independently, the strongest cost-performance ratio in open-weight coding models

MSA architecture is a genuine technical contribution — 15.6x faster decoding at 1M context attacks a real inference-cost problem

Native multimodal (text, image, video) in a single open-weight model is unique in the category right now

Desktop computer operation extends M3 into agentic territory comparable to Claude Cowork and Google Antigravity

Open weights enable complete data sovereignty via self-hosting once they ship

OpenAI-compatible API means migration is usually a single endpoint change, not a rewrite

What to watch out for

Benchmarks are vendor-reported and not yet independently verified — the SWE-Bench Pro scores need third-party confirmation before any production commitment

Chinese company — China's 2017 National Intelligence Law raises data-access concerns for hosted use, requiring Chinese companies to cooperate with government intelligence requests; the same concern as DeepSeek V4 and Kimi Code

Open weights weren't shipped at review time (due ~June 11) — architecture and safety behavior are unverifiable until then

No native IDE extension — an API-first tool; you build or reuse integrations

New-model maturity — a June 1, 2026 launch means edge cases and production reliability are unproven

Promotional pricing ($0.30/M) reverts to $0.60/M — still competitive, but factor it into cost projections

Score Breakdown

Category scores — AIToolGrade methodology

Ease of Use

7.0

Features

9.5

Value for Money

9.5

Integration

7.5

Support & Docs

5.5

The shape is lopsided in an instructive way. Features and Value for Money both sit at 9.5 — the feature set (59% SWE-Bench Pro, 1M context, MSA, native multimodal, desktop operation) is among the most complete in the open-weight category, and at $0.60/M input with a self-host option, the price-performance is hard to argue with on paper. Ease of Use (7.0) and Integration (7.5) sit in the mid-7s: the OpenAI-compatible API and OpenRouter access ease onboarding, but it's API-first with no IDE extension, and self-hosting demands real infrastructure. Support & Documentation drags hardest at 5.5 — this is a ten-day-old model, the docs are still developing, support runs through a Chinese-company structure, and the community is only beginning to form. The 7.9 overall deliberately holds the line: the feature and value scores would push higher, but unverified benchmarks, data-residency exposure, and new-model maturity cap it. We expect the number to move once independent verification arrives — likely upward, in 60 to 90 days.

Benchmarks and the Verification Question

Every number in this section carries the same asterisk: the scores are vendor-reported, measured under each company's own evaluation setup, and may not be directly comparable. Independent verification is pending for M3 specifically. That isn't a throwaway disclaimer — it's the single most important fact about M3 at the time of this review. A 59% SWE-Bench Pro result that edges GPT-5.5 is a meaningful claim if it survives third-party testing, and a marketing line if it doesn't. We're reporting the claim and the uncertainty together because that's the honest state of the evidence.

Benchmark	MiniMax M3	Kimi K2.6	DeepSeek V4-Pro	Claude Opus 4.8
SWE-Bench Pro	59.0%	58.6%	—	—
SWE-Bench Verified	—	80.2%	80.6%	88.6%
Terminal-Bench 2.1	66.0%	—	—	—
BrowseComp	83.5	86.3 (swarms)	—	—
Context window	1M tokens	1M tokens	1M tokens	1M tokens
Input price / M	$0.60	$0.60	$0.435	$5.00
Open weight	✓	✓ Apache 2.0	✓ MIT	✗
Multimodal	✓ Text/Image/Video	✗	✗	✓ Text/Image

Read the table as a profile, not a leaderboard. M3 and Kimi K2.6 report SWE-Bench Pro within half a point of each other, but the benchmarks don't line up cleanly across models — M3 leans on Pro and Terminal-Bench, the others on Verified — so cross-model comparison is approximate at best. The two facts that do hold up regardless of evaluation noise are price and modality: M3 matches Kimi on input cost while undercutting Claude Opus 4.8 by roughly 8x, and it's the only model in this group accepting native video input. Its 66% on Terminal-Bench 2.1 and 83.5 on BrowseComp point to real agentic and web-research capability, but again — vendor-reported, awaiting confirmation. The TechTimes analysis is worth weighing here too: it argues M3 trails Claude Opus 4.8 by meaningful margins on directly comparable agent evaluations, a reminder that "beats GPT-5.5 on SWE-Bench Pro" is one slice of a larger picture.

Pricing

Pricing is verified June 2026 and splits into three paths: MiniMax's own standard API, OpenRouter access, and self-hosting the open weights. The launch week carried a 50%-off promotion that's worth understanding before you build a cost model on it — the promo rate reverts to standard, and your projections should use the standard figure.

Standard pricing: $0.60/M input, $2.40/M output. Launch promotional rate (first week): $0.30/M input, $1.20/M output — verify current rate at platform.minimax.io before committing to cost projections.

Access path	Input / M	Output / M	Notes
Standard API (platform.minimax.io)	$0.60	$2.40	Standard rate
Launch promo (first week)	$0.30	$1.20	50% off — reverts to standard
Via OpenRouter	$0.30–$0.60	$1.20–$2.40	Promo / standard; routing redundancy
Self-hosted (open weights)	Infrastructure only		Weights on HuggingFace ~June 11, 2026

The cost case is the cleanest part of the M3 story. At $0.60/M standard input, it's roughly 8x cheaper than Claude Opus 4.8's $5/M, and it matches Kimi K2.6 on input price while sitting just above DeepSeek V4-Pro's $0.435/M. For high-volume agentic workloads, that wide cost gap against the closed frontier alternatives is the number that drives adoption. The MiniMax Code subscription product — a coding-workflow tool built on M3 — gives a managed path for teams that don't want to wire up the raw API. And the self-hosting route, once the weights land, drops the per-token fee to zero entirely; you trade API spend for server and infrastructure cost. One planning note: the promotional $0.30/M is a launch lever, not the baseline. Model your spend on $0.60/M input and $2.40/M output, and treat anything cheaper as upside.

What Changed

MiniMax M3 launched June 1, 2026. It's the first MiniMax model to use the MSA sparse-attention architecture — the M2 generation had removed sparse attention, and M3 brings it back with a lightweight index branch (15.6x faster decoding, 9.7x faster prefill at 1M context vs M2). It's positioned as the first open-weight model to combine frontier coding, a 1M context, and native multimodal input. The open weights are expected around June 10-11, 2026 on HuggingFace. M3 reports 59% SWE-Bench Pro, edging GPT-5.5 — vendor-reported, independent verification pending.

Key Features

59% SWE-Bench Pro coding. M3 reports 59.0% on SWE-Bench Pro, the most demanding real-world coding benchmark, narrowly ahead of GPT-5.5's 58.6%. SWE-Bench Pro measures resolution of genuinely hard GitHub issues rather than synthetic puzzles, so a frontier-adjacent score there is the headline capability claim. The caveat travels with the number: vendor-reported, independent verification pending.

1M-token context window. A single call can hold an entire codebase, a full documentation set, or a long research corpus. For agents reasoning across large systems, that's the difference between whole-repo context and chunking compromises — and it's a capability M3 shares with Kimi, DeepSeek, and Claude at the top of the category.

MiniMax Sparse Attention (MSA). The architectural centerpiece. A lightweight index branch scans tokens and selects which key-value blocks actually need attention, rather than attending to everything. MiniMax reports 15.6x faster decoding and 9.7x faster prefill at 1M context versus M2. Notably, M2 had removed sparse attention; M3 brings it back in a more refined form. Independent developers have called this a genuine contribution rather than a marketing claim — it targets the real cost of long-context inference.

Native multimodal input. Text, image, and video in a single model, with no separate vision component. In the open-weight category this is currently unique — Kimi and DeepSeek are text-only, and even Claude Opus 4.8's multimodal stops at text and image. For coding-adjacent tasks that involve screenshots, diagrams, or short clips, the inputs aren't limited to text.

Desktop computer operation. M3 can operate a desktop computer for agentic workflows — comparable in scope to Claude Cowork and Google Antigravity. This pushes M3 beyond a pure API model into agentic territory, where it can take actions in a real environment rather than only returning text.

66% Terminal-Bench 2.1. Strong performance on realistic shell tasks — the kind of work an agentic coding tool actually does at the command line. Combined with the coding scores, it points to a model built for agent loops, not just chat completion. Vendor-reported, like the rest.

83.5 BrowseComp (vendor-reported). Autonomous browsing and web-research capability. For agents that need to gather information from the live web as part of a task, this is the relevant signal — though Kimi's swarm configuration reports a higher 86.3 on the same benchmark.

Open weights. Released under an MIT-style license, self-hostable, with weights expected around June 10-11, 2026 — verify current availability on HuggingFace before planning self-hosted deployments. Self-hosting carries no per-token fee and keeps every token inside your environment — the path to full data sovereignty. The catch at review time: the weights hadn't actually shipped yet.

OpenAI-compatible API. A drop-in replacement for most OpenAI API calls. Migrating an existing pipeline — whether it currently targets OpenAI, Anthropic, or anything OpenAI-shaped — is usually a single endpoint change rather than a rewrite.

MiniMax Code. A subscription product built on M3 for coding workflows, giving teams a managed path into the model without wiring up the raw API. It's the productized front door for developers who want the coding capability without the integration work.

Evaluate MiniMax M3

Open-weight, OpenAI-compatible, multimodal, and priced from $0.60/M input. Call the standard API, route through OpenRouter, or self-host the weights once they ship.

Visit MiniMax →

We may earn a commission at no extra cost to you

MiniMax M3 vs Kimi K2.6 vs DeepSeek V4

These are the three open-weight reference points the target audience actually weighs against each other. Kimi K2.6 leads on agent swarms and MCP compatibility; DeepSeek V4-Pro leads on raw cost efficiency; M3's distinguishing move is folding multimodal input and a 1M context into the same open-weight coding model. The table lays out where each one earns its place.

	MiniMax M3	Kimi K2.6	DeepSeek V4-Pro
Primary strength	Multimodal + coding + 1M context	Agent swarms + MCP	Cost efficiency + coding
SWE-Bench	59% Pro (vendor)	80.2% Verified	80.6% Verified
Multimodal	✓ Text/Image/Video	✗	✗
Context window	1M tokens	1M tokens	1M tokens
Architecture	MSA sparse attention	MoE 1T params	MoE
License	Open weight (MIT-style)	Apache 2.0	MIT
Input price / M	$0.60 ($0.30 promo)	$0.60	$0.435
Chinese company	✓	✓	✓
Desktop operation	✓	✗	✗
Independent verification	Pending	✓	✓
Best for	Multimodal agentic workflows	Parallel coding agents	High-volume API workloads

The pattern is clean. If you need multimodal input or desktop operation in an open-weight model, M3 is the only one of the three that offers it — that's its lane, and nothing here competes in it. If you want maximum parallelism and a frictionless migration off Claude Code, Kimi's swarms and MCP compatibility win. If you want the lowest raw token cost for high-volume API workloads, DeepSeek edges it on price. The one row that should give a cautious buyer pause is the last technical line: M3's benchmarks are still pending independent verification, while Kimi's and DeepSeek's have been third-party confirmed. For a production decision, that's not a tiebreaker — it's a reason to evaluate M3 in parallel and commit once the numbers are confirmed.

The Chinese Company Question

This is the part of the evaluation that has nothing to do with the benchmark and everything to do with whether you can deploy it. MiniMax is a Shanghai-based lab, and for US and EU organizations that raises the same concern that applies to DeepSeek and Kimi: China's 2017 National Intelligence Law raises data-access concerns for hosted use — the law requires Chinese companies to cooperate with government intelligence requests. For any deployment involving sensitive code, proprietary data, or regulated information, that's a hard gate, not a soft preference — the same framing the TechTimes analysis applied, and the same one we apply here.

These are legitimate questions, not a reason to dismiss the tool — and the distinction matters. A regulated enterprise handling customer data or proprietary source under EU residency rules has a genuine blocker on the hosted API. A solo developer, a startup, or a team working non-sensitive code faces a far lower bar. And the open-weight release changes the calculus for anyone with infrastructure: self-hosting keeps every token inside your own environment, which neutralizes the data-routing concern entirely — at the cost of running the model yourself, and contingent on the weights actually shipping as scheduled around June 11. OpenRouter and other third-party hosts also provide routing paths outside MiniMax's own servers, a middle option worth investigating when the tool fits but the default endpoint doesn't.

The honest framing is the same one we applied to DeepSeek V4 and Kimi Code: treat data residency as a hard gate to clear before the cost savings matter, not as a footnote. If your compliance posture rules out a Chinese-hosted API and you can't self-host, the 8x price advantage is irrelevant — the tool isn't deployable for you. If your posture allows it, or you can route around it via self-hosting or a non-China host, the cost case stands on its own. Apply the same compliance review you would to any Chinese-developed open-weight model.

Community Sentiment

What Users Are Saying

We track discussion across developer forums, independent Medium evaluations, TechTimes, and Hacker News to gauge how M3 lands beyond the launch announcement. The early response is technically excited about the MSA architecture and the multimodal combination, and appropriately cautious about the two open questions: benchmarks that haven't been independently verified, and the data-residency exposure that comes with a Chinese-hosted API.

59%

SWE-Bench Pro*

~8x

Cheaper / Token

15.6x

Faster Decode

Token Context

* Vendor-reported, independent verification pending

● What developers consistently praise

"The MSA architecture's efficiency at long context is a genuine contribution, not a marketing claim. The agentic demonstrations, particularly the CUDA kernel optimization, are the kind of results that should change how seriously people take this model. The cost advantage is real and significant."

Independent developer evaluation · Medium · June 2026

"The first open-weight model to pack frontier coding, 1M context, and native video understanding into one system. The MSA architecture solves a real inference cost problem — 15.6x faster decoding at scale is not incremental."

Developer community analysis · June 2026

● Common reservations

"MiniMax M3's benchmark scores are company-reported and not yet independently verified. M3 trails Claude Opus 4.8 by meaningful margins on directly comparable agent evaluations. The promised open weights had not shipped at launch, making the architecture and safety behavior unverifiable."

TechTimes independent analysis · June 2026

"China's 2017 National Intelligence Law raises data-access concerns for prompts processed through MiniMax's API — the law requires Chinese companies to cooperate with government intelligence requests. For any deployment involving sensitive code, proprietary data, or regulated information, this is a hard gate, not a soft preference."

TechTimes independent analysis · June 2026

AIToolGrade Take

MiniMax M3 is the most technically ambitious open-weight model launched in 2026 — the combination of MSA sparse attention, 59% SWE-Bench Pro (vendor-reported), native multimodal, and $0.60/M input pricing represents a real advance in what open-weight models can deliver per dollar. The honest assessment requires holding two things at once: the cost-performance proposition is real if the benchmarks verify independently, and the data-residency concern is equally real for any organization deploying sensitive workloads. The pattern is consistent with DeepSeek V4 and Kimi Code — Chinese company, open-weight, dramatically cheaper than Western alternatives, same 2017 National Intelligence Law exposure. Our recommendation: evaluate M3 seriously for multimodal agentic workflows, where the architecture is genuinely interesting and the pricing is compelling. Wait for independent benchmark verification before production commitment. Self-host the open weights if data sovereignty is required. And apply the same compliance review you would to DeepSeek V4 or Kimi Code.

The Bottom Line

MiniMax M3 is the clearest evidence yet that open-weight models are pushing into territory the closed frontier used to own outright. MiniMax took a frontier-adjacent coding model (59% SWE-Bench Pro, vendor-reported), wrapped it in a genuinely new sparse-attention architecture that runs 15.6x faster at 1M context, added native text/image/video input and desktop operation, and priced the underlying API at roughly an eighth of Claude Opus 4.8. On paper, the combination of capabilities per dollar is among the strongest the open-weight category has produced in 2026. The combination is the achievement — no other open-weight model folds multimodal input, a million-token context, and frontier coding into one system.

The reasons to hold back are real and specific, and none of them is about whether the architecture is interesting. The benchmarks are MiniMax's own, measured on MiniMax's setup, and not yet independently verified — and the SWE-Bench Pro lead over GPT-5.5 needs third-party confirmation before anyone bets production on it. The open weights weren't shipped at launch, which means the architecture and safety behavior were unverifiable at review time, pending the ~June 11 HuggingFace release. There's no native IDE extension. It's a ten-day-old model with unproven production reliability. And the data-residency question for a Shanghai-hosted API is a genuine gate for regulated organizations — clear it before the savings mean anything, because if you can't, they don't.

So the recommendation is conditional and specific. Best for: developers who need native multimodal processing in a single open-weight model; teams running high-volume agentic workflows where $0.60/M versus $5/M changes the math; organizations comfortable with self-hosting who want full data sovereignty once the weights ship; researchers evaluating the MSA architecture; and teams already on DeepSeek or Kimi comparing open-weight options. Not for: enterprise teams with strict US/EU data-residency requirements; production deployments that need independently verified benchmarks before commitment; developers who need native IDE integration; or anyone who needed the weights at launch. Evaluate M3 seriously now, wait for independent verification before you commit production to it, and self-host if data sovereignty is non-negotiable. This 7.9 reflects the June 2026 state — unverified benchmarks, data-residency exposure, new-model maturity — and we expect to revisit it, likely upward, in 60 to 90 days as third-party verification arrives. If you want absolute verified capability or a turnkey IDE assistant today, Claude Code and Cursor remain the more practical picks.