TLDR: GUANGDONG—Researchers with Huawei used Ascend 910C chips to complete full parameter post training for DeepSeek V4 Pro, refining the model with instructions and safety. The 1.6 trillion parameter run on at least 1,000 chips signals China moving from inference to training despite US sanctions, affecting domestic AI teams and chipmakers.
Key Takeaways:
- China has excelled at AI inference with home chips, but training models, especially post training, has stayed far harder under US sanctions.
- Huawei and partners used at least 1,000 Ascend 910C chips to run DeepSeek V4 Pro full parameter post training for its instruction and safety behavior.
- If the approach scales, Chinese AI and semiconductor supply chains could shift from one way answering to iterative model refinement that raises performance ceilings.
Inference is the easy part. Post training is where the model learns the rules of the road, and China is clearly trying to build the whole highway system, not just the exits.
Inference is the easy part. Post training is where the model learns the rules of the road, and China is clearly trying to build the whole highway system, not just the exits.
Q&A
Why does full parameter post training matter more than inference wins for China’s AI ambitions?
Inference proves chips can run models. Full parameter post training shows they can actively update model weights at scale, a step closer to training pipelines that determine long term capability.
What computational bottlenecks could appear next when scaling from post training to broader training tasks?
Post training still demands heavy communication across many chips. Extending to longer pre training or repeated fine tuning could strain memory bandwidth, interconnect throughput, and system stability.
How might US sanctions indirectly shape the technical choices behind Ascend 910C based workflows?
Restrictions can push teams toward domestic toolchains, model sizes that fit available memory, and training methods that tolerate hardware differences, making optimization strategy as important as raw compute.
What does the use of large clusters on at least 1,000 chips suggest about China’s approach to AI training under constraints?
It points to a scaling playbook built around parallelism and cluster orchestration, leaning on system integration and software optimization to compensate for limits in cutting edge access.
If models can self reflect during post training, how will companies measure quality without drifting into unsafe behavior?
Expect tighter evaluation loops using instruction following tests, safety rule checks, and human preference comparisons, because iterative weight updates can amplify both strengths and failure modes.
No comments yet. Be the first to share your thoughts!