GLM-5.2-FP8 Overview
Description:
GLM-5.2 is a flagship model for long-horizon tasks, offering a solid 1M-token context. It features advanced coding capabilities with flexible effort levels, improved architecture through IndexShare (reusing indexers across sparse attention layers to reduce FLOPs), and enhanced MTP layer for speculative decoding. The model is open-source under MIT license with no regional limits.
GLM-5.2-FP8 was developed by Zhipu AI as a part of GLM.
This model is ready for commercial/non-commercial use.
Third-Party Community Consideration
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party's requirements for this application and use case; see link to Non-NVIDIA Zhipu AI Model Card.
License/Terms of Use:
Deployment Geography:
Global
Use Case:
Developers building AI agent systems, chatbots, and applications requiring advanced reasoning and long-context understanding.
Release Date:
HuggingFace: 06/16/2026 via https://huggingface.co/zai-org/GLM-5.2-FP8
Reference(s):
GLM-5: from Vibe Coding to Agentic Engineering
Model Architecture:
Architecture Type: Transformer
Network Architecture: Mixture of Experts (MoE)
This model was developed based on GLM-5.1.
Number of model parameters: 753B
Input:
Input Type(s): Text
Input Format(s): String
Input Parameters: One-Dimensional (1D)
Other Properties Related to Input: Context length up to 1,048,576 tokens. Uses SentencePiece tokenizer with vocabulary size not specified.
Output:
Output Type(s): Text
Output Format: String
Output Parameters: One-Dimensional (1D)
Other Properties Related to Output: Maximum output length configurable via max_new_tokens parameter (default 4096, max 32768). Supports streaming output. UTF-8 encoded text.
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
Software Integration:
Runtime Engine(s):
- Transformers
- vLLM
- TensorRT-LLM
Supported Hardware Microarchitecture Compatibility:
- NVIDIA Ampere
- NVIDIA Hopper
Supported Operating System(s): Linux
The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
(Optional) This AI model can be embedded as an Application Programming Interface (API) call into the software environment described above.
Model Version(s):
GLM-5.2-FP8
GLM-5.2 supports deployment with frameworks like SGLang, vLLM, Transformers, KTransformers, and Unsloth. Each framework has specific version requirements and documentation links provided.
Training, Testing, and Evaluation Datasets:
Training Dataset:
Data Modality: Text
Text Training Data Size: More than 1 Trillion Tokens
Data Collection Method by dataset: Not Applicable
Labeling Method by dataset: Not Applicable
Properties (Quantity, Dataset Descriptions, Sensor(s)): Not Applicable (N/A)
Testing Dataset:
Data Collection Method by dataset: Not Applicable
Labeling Method by dataset: Not Applicable
Properties (Quantity, Dataset Descriptions, Sensor(s)): Not Applicable (N/A)
Evaluation Dataset:
Benchmark Score: HLE: 40.5, HLE (w/ Tools): 54.7, CritPt: 20.9, AIME 2026: 99.2, HMMT Nov. 2025: 94.4, HMMT Feb. 2026: 92.5, IMOAnswerBench: 91.0, GPQA-Diamond: 91.2, SWE-bench Pro: 62.1, NL2Repo: 48.9, DeepSWE: 46.2, ProgramBench: 63.7, Terminal Bench 2.1 (Terminus-2): 81.0, Terminal Bench 2.1 (Best Reported Harness): 82.7, FrontierSWE (Dominance): 74.4, PostTrainBench: 34.3, SWE-Marathon: 13.0, MCP-Atlas (Public Set): 76.8, Tool-Decathlon: 48.2
| Benchmark | GLM-5.2 | GLM-5.1 | Qwen3.7-Max | MiniMax M3 | DeepSeek-V4-Pro | Claude Opus 4.8 | GPT-5.5 | Gemini 3.1 Pro |
|---|---|---|---|---|---|---|---|---|
| Reasoning | ||||||||
| HLE | 40.5 | 31 | 41.4 | 37 | 37.7 | 49.8* | 41.4* | 45 |
| HLE (w/ Tools) | 54.7 | 52.3 | 53.5 | - | 48.2 | 57.9* | 52.2* | 51.4* |
| CritPt | 20.9 | 4.6 | 13.4 | 3.7 | 12.9 | 20.9 | 27.1 | 17.7 |
| AIME 2026 | 99.2 | 95.3 | 97 | - | 94.6 | 95.7 | 98.3 | 98.2 |
| HMMT Nov. 2025 | 94.4 | 94 | 95 | 84.4 | 94.4 | 96.5 | 96.5 | 94.8 |
| HMMT Feb. 2026 | 92.5 | 82.6 | 97.1 | 84.4 | 95.2 | 96.7 | 96.7 | 87.3 |
| IMOAnswerBench | 91.0 | 83.8 | 90 | - | 89.8 | 83.5 | - | 81 |
| GPQA-Diamond | 91.2 | 86.2 | 90 | 93 | 90.1 | 93.6 | 93.6 | 94.3 |
| Coding | ||||||||
| SWE-bench Pro | 62.1 | 58.4 | 60.6 | 59 | 55.4 | 69.2 | 58.6 | 54.2 |
| NL2Repo | 48.9 | 42.7 | 47.2 | 42.1 | 35.5 | 69.7 | 50.7 | 33.4 |
| DeepSWE | 46.2 | 18 | 18 | 20 | 8 | 58 | 70 | 10 |
| ProgramBench | 63.7 | 50.9 | - | - | 47.8 | 71.9 | 70.8 | 39.5 |
| Terminal Bench 2.1 (Terminus-2) | 81.0 | 63.5 | 75 | 65 | 64 | 85 | 84 | 74 |
| Terminal Bench 2.1 (Best Reported Harness) | 82.7 | 69 | - | - | - | 78.9 | 83.4 | 70.7 |
| FrontierSWE (Dominance) | 74.4 | 30.5 | - | - | 29.0 | 75.1 | 72.6 | 39.6 |
| PostTrainBench | 34.3 | 20.1 | - | - | - | 37.2 | 28.4 | 21.6 |
| SWE-Marathon | 13.0 | 1.0 | - | - | - | 26.0 | 12.0 | 4.0 |
| Agentic | ||||||||
| MCP-Atlas (Public Set) | 76.8 | 71.8 | 76.4 | 74.2 | 73.6 | 77.8 | 75.3 | 69.2 |
| Tool-Decathlon | 48.2 | 40.7 | - | - | 52.8 | 59.9 | 55.6 | 48.8 |
Data Collection Method by dataset: Hybrid: Automated/Human
Labeling Method by dataset: Hybrid: Automated/Human
Properties (Quantity, Dataset Descriptions, Sensor(s)): Evaluated on diverse reasoning benchmarks including mathematical problem solving, code generation challenges, question answering tasks, and instruction following assessments. Comprehensive evaluation across standard academic benchmarks for language understanding, mathematical reasoning, and code generation capabilities.
Inference:
Acceleration Engine: TensorRT-LLM Test Hardware: NVIDIA A100 80GB
Ethical Considerations:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards.
Generated by NVIDIA Model Card Generator Toolkit.