TLDR: SHENZHEN—Huawei backed researchers say they post trained DeepSeek V4-Pro, a 1.6 trillion parameter model, with at least 1,000 Ascend 910C chips in Shenzhen. It signals Chinese accelerators can now tackle training like workloads, though no benchmarks were provided.
Key Takeaways:
- U.S. export controls pushed Chinese firms to replace Nvidia training hardware, where accelerators have lagged most.
- Researchers including Huawei and Harbin Institute of Technology Shenzhen claim full parameter post training on DeepSeek V4-Pro using 1,000 Ascend 910C chips.
- If real, Ascend 910C can improve instruction tuning on domestic silicon, but it still does not prove frontier pre training.
Huawei is trying to close the loop from inference bragging to training credibility. The only trouble is the claim arrives without proof, so the market still has to decide whether this is progress or salesmanship.
Huawei is trying to close the loop from inference bragging to training credibility. The only trouble is the claim arrives without proof, so the market still has to decide whether this is progress or salesmanship.
Q&A
What does full parameter post training validate that thin adapter tuning does not?
It shows the system can update all weights, which stresses compute, memory bandwidth, and optimizer stability more than lightweight adapter layers.
Why does post training matter even if frontier pre training remains the bigger hurdle?
Post training drives instruction following and alignment behavior, so better post training can make a model feel smarter even when pre training capability is unchanged.
What could make the Ascend 910C cluster look good on paper but fail under broader workloads?
Without benchmarks, unknowns include effective utilization, interconnect bottlenecks, dataset pipeline speed, and software layer performance under long runs.
How might Huawei software stack gaps have blocked earlier DeepSeek training attempts, and what would need to improve?
Reports pointed to CANN gaps and unstable inter chip performance; closing those would require smoother distributed training primitives and better throughput at scale.
If Shenzhen can deliver post training, what is the next likely test Chinese teams will push publicly?
Teams will likely try reproducible training runs with clear metrics such as time to train, loss curves, and comparison to Nvidia baselines on the same model family.
No comments yet. Be the first to share your thoughts!