Back to today's topics

Verified · Jun 29, 2026

Zhipu's GLM-5.2 goes open, and Hugging Face shows how to deploy it in one command

3 sources

Z.AI released GLM-5.2 on the Hugging Face blog on June 17, 2026: 753B parameters, 1M-token context, MIT license. Self-reported scores: Terminal-Bench 2.1 = 81.0, SWE-bench Pro = 62.1, FrontierSWE = 74.4 (highest open-source). On June 26, 2026, Hugging Face published a tutorial showing how to launch a private OpenAI-compatible vLLM endpoint on HF Jobs in one `hf jobs run` command, billed per second.

Why now

The open-weights release and the one-command deploy guide landed within 10 days, so creators can pair them into a single 'run it today' workflow with no private API needed.

Why it is worth publishing

It is one of the rare 'trillion-parameter-class, 1M context, MIT, runnable today' combos in the open model space, and it can be demonstrated rather than just narrated.

Evidence basis

A 'Chinese open-source + one-command deploy + real benchmark' package has a track record of traveling in the English-speaking AI creator community, especially when the per-hour cost (a10g-large at $1.50/hr) is clearly stated.

I just spun up a 753B-parameter open-source model in the cloud with one command, and it has 1M-token context.

Angle

Pair 'a 1M-context open-source model from China' with 'a one-command deploy guide' and show the full workflow on camera.

Format

Short talking-head video

Demo idea

Show the full `hf jobs run` command, the returned `https://<job_id>--8000.hf.jobs` URL, one curl call against the OpenAI-compatible endpoint, and a 1M-context long-document QA pass — with the a10g-large $1.50/hr price on screen.

Platform notes

Always label the benchmark numbers as 'reported by Z.AI'; do not claim 'state of the art' on their behalf. The MIT license covers the model weights; the GLM Coding Plan's peak-3x / off-peak-2x quota rules are commercial terms and should be discussed separately. The HF Jobs endpoint requires an HF token and bills per second — never describe it as a free public API. Link the model card and the blog post separately, and link to the blog when quoting specific scores.

Usable claims

  • Z.AI released GLM-5.2 on June 17, 2026 via the Hugging Face blog: a 753B-parameter language model with 1M-token context, MIT license, and weights published on Hugging Face and ModelScope.
  • GLM-5.2 reports Terminal-Bench 2.1 = 81.0, SWE-bench Pro = 62.1, and FrontierSWE = 74.4, the highest open-source score on FrontierSWE.
  • Hugging Face published an official guide on June 26, 2026 showing how to launch a private OpenAI-compatible vLLM endpoint on HF Jobs in one command, with worked examples for Qwen3-4B and Qwen3.5-122B-A10B.

Evidence pipeline

Breakdown

This breakdown turns the GLM-5.2 release (753B / 1M context / MIT) and the June 26, 2026 Hugging Face vLLM-on-HF-Jobs tutorial into a single runnable workflow, including the per-second cost of an a10g-large ($1.50/hr) and the bearer-token requirement. It also separates the MIT-licensed model weights from the GLM Coding Plan's commercial peak-3x / off-peak-2x quota rules.

Risks

  • Always attribute the benchmark numbers to Z.AI's launch post, and pair the FrontierSWE claim with a note that it is the post's own ranking.
  • Separate the open-weights story (license, context length, parameter count) from the coding-plan pricing story; do not frame pricing as part of the license.
  • Show the exact cost line in the demo and the bearer-token requirement; never imply the endpoint is public or free.

Demo ideas

  • Record a 30-second terminal clip: one command, copy the returned URL, send one OpenAI-compatible curl request
  • Drop a public technical white paper (around 800K tokens) into the 1M context window and compare the result against the same prompt on a 200K model that truncates
  • Make a tag card showing 753B / 1M / MIT, and call out the Terminal-Bench 2.1 jump from 63.5 (GLM-5.1) to 81.0 (GLM-5.2)