The Environmental Impact of LLMs

GPT-3 produced carbon emissions equivalent to 500 times the emission of that of a New York-San Francisco round trip flight.
Listen to this story

With the adoption of LLMs, there’s a new concern gaining ground – the environmental impact of training these models. Believe it or not, the carbon footprints left behind from training large models run into hundreds of tonnes. As per the sixth edition of AI Index Report 2023 published by Stanford University, the carbon dioxide-equivalent emissions produced by GPT-3 stood at 502 tonnes in 2022, the highest when compared to similar-parameter trained models.

However, the study hasn’t factored in the latest GPT-4 model, which would be even worse on this account. Notably, OpenAI has not revealed the size of its parameters to the public. Researchers have different criteria to calculate the carbon emissions by AI systems. This includes the number of parameters used for training the model, a data centre’s power usage effectiveness, and the grid carbon intensity. In its latest technical paper as well, OpenAI did not reveal anything about the environmental impact, carbon emissions or even the parameter size.

The AI Index Report compared four LLM models, where GPT-3 had the highest emission out of all the other models. It was even higher than Gopher , an open-source model trained on large 280B parameters. Multilingual language open model BLOOM , with equivalent parameters as GPT-3, produced 25 tonnes of carbon in 2022, which was 20 times lower than GPT-3. Meta’s open pre-trained language model OPT consumed the least power with 1/7th the carbon emissions produced by GPT-3.

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Source: AI Index Report 2023

The power usage effectiveness (PUE) in the above table is a metric to evaluate a data centre’s energy efficiency. It is calculated as a ratio of energy consumed by a data centre, including cooling and air conditioning, to the energy delivered to the computing equipment. The value is inversely proportional to the efficiency of the data centre.


Download our Mobile App



Below is a representation of carbon-emission estimates when compared with real-life examples such as cars, air travel, and human life (for a year). GPT-3 emitted almost 500 times that of a flying passenger in a New York to San Francisco round trip.

Source: AI Index Report 2023

(The Stanford report calls out the challenges in directly comparing the carbon footprints of these models as the accounting methodologies for reporting carbon emissions are not standardised.)

AI for Reducing Energy?

Now, AI is being tested to combat the high levels of energy consumption in AI systems. While training LLM models will expend energy, there have been efforts to experiment on reinforcement learning for controlled commercial cooling systems. New reinforcement learning models like DeepMind’s BCOOLER (BVE-based Constrained Optimisation Learner with Ensemble Regularization), are working towards energy optimisation in data centres.

DeepMind and Google have been conducting live experiments on two-real world facilities for reducing energy. The experiment showed energy savings of 9% and 13% at the two experiment sites.

Energy savings resulted over time with BCOOLER experiment.

Source: arxiv.org

Training on lesser GPU

Efforts are on to lessen the generous amounts of carbon footprints left behind by LLM models. Experiments on reducing the compute that powers these models have been considered. Recently, AI research students released FlexGen , a high throughput generation engine for running large language models with limited resources such as single commodity GPU. FlexGen uses a linear programming optimizer to search for the most efficient pattern to store and access tensors. By compressing these weights, and enabling larger space of batch size, FlexGen is able to increase throughput. FlexGen was able to achieve high throughput while running OPT-175B on a single 16GB GPU.

DistilBERT , a ‘distilled version’ of BERT , a technique for NLP Pre-Training, that allows the training of any question-answering system or models using one GPU. DistilBERT is a lighter, faster and cheaper version of BERT. Maintaining over 95% of BERT’s performances, it has 40% fewer parameters and runs 60% faster.

Advancement in the development of smaller-sized models could also result in lesser emissions owing to the reduced number of parameters used for training. Meta AI released LLaMA , a foundation model that ranges from 7B to 65B parameters. The LLaMA-13B is said to surpass GPT-3 despite being ten times smaller than it.

Sign up for The AI Forum for India

Analytics India Magazine is excited to announce the launch of AI Forum for India – a community, created in association with NVIDIA, aimed at fostering collaboration and growth within the artificial intelligence (AI) industry in India.

Vandana Nair
As a rare breed of engineering, MBA, and journalism graduate, I bring a unique combination of technical know-how, business acumen, and storytelling skills to the table. My insatiable curiosity for all things startups, businesses, and AI technologies ensure that I'll always bring a fresh and insightful perspective to my reporting.

Our Upcoming Events

Regular Passes expiring on Friday
27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023

21 Jul, 2023 | New York
MachineCon USA 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

Council Post: Retail Business through Generative AI

Today, retail technology is developing at a fast pace – whether it is business transformation or even exploring emerging tech (AR/VR and metaverse etc.) to give customers a more experiential journey. Businesses are innovating not only to remain relevant, but also, ahead. Some are really shaping the future of omni-channel retail by predicting customer expectations and market trends.

Cerebras Wants What NVIDIA Has

While OpenAI apparently utilised 10,000 NVIDIA GPUs to train ChatGPT, Cerebras claims to have trained their models to the highest accuracy for a given compute budget.