Huawei Researchers Develop LLM With 1.085 Trillion Parameters

It uses 329 billion tokens in more than 40 natural and programming languages.
Listen to this story

A group of Huawei researchers developed a system that trained a language model—PanGu-Σ under the framework of MindSpore 5 on a cluster of Ascend 910 AI processors with 329 billion tokens over 100 days and launched it towards the second half of March.

PanGu-Σ’s built-in parameters are expanded using Random Routed Experts and the Transformer decoder architecture from PanGu-α ‘s Random Routed Experts.

It is simple to extract sub-models using RRE design from the PanGu-Σ for a variety of downstream applications, including conversation, translation, code production, and interpreting natural language in general.

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

According to the research paper—in total, the training throughout is 6.3 times faster than it was for the model with the MoE architecture but the same hyper-parameters. The sub-modal of PanGu-Σ in the Chinese domain significantly outperforms the previous SOTA models, including PanGu-α- with 13 billion parameters and ERNIE 3.0 Titan with 260 billion parameters over 16 downstream tasks in six categories in the zero-shot setting without any multitask finetuning or instruction tuning. It uses 329 billion tokens in more than 40 natural and programming languages.

Huawei gathere datasets in 40 domains, with a significant amount of data in four key domains: Chinese, English, Bilingual (Chinese and English), and code, to further illustrate the PanGu-Σ’s ability model’s to learn effectively and independently from many domains.

Download our Mobile App

The research paper asserts that PanGu-Σ has successfully produced state-of-the-art results in a variety of downstream tasks like few-shot NLU, open-domain discussion, question answering, machine translation, and code creation by expanding and continuously training from PanGu-α using 329B tokens.

Sign up for The AI Forum for India

Analytics India Magazine is excited to announce the launch of AI Forum for India – a community, created in association with NVIDIA, aimed at fostering collaboration and growth within the artificial intelligence (AI) industry in India.

Shyam Nandan Upadhyay
Shyam is a tech journalist with expertise in policy and politics, and exhibits a fervent interest in scrutinising the convergence of AI and analytics in society. In his leisure time, he indulges in anime binges and mountain hikes.

Our Upcoming Events

Regular Passes expiring on Friday
27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023

21 Jul, 2023 | New York
MachineCon USA 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Council Post: Retail Business through Generative AI

Today, retail technology is developing at a fast pace – whether it is business transformation or even exploring emerging tech (AR/VR and metaverse etc.) to give customers a more experiential journey. Businesses are innovating not only to remain relevant, but also, ahead. Some are really shaping the future of omni-channel retail by predicting customer expectations and market trends.

Cerebras Wants What NVIDIA Has

While OpenAI apparently utilised 10,000 NVIDIA GPUs to train ChatGPT, Cerebras claims to have trained their models to the highest accuracy for a given compute budget.