14 Feb 2024

The Cost of Inference: Running the Models

Resources

In our previous blog post, we discussed the various factors that influence the carbon footprint of LLM models. We also performed a calculation of the carbon footprint associated with the training of several commonly used LLMs and foundational models. In this blog post, we will delve further into model inference's environmental impact and carbon footprint.

The Cost of Inference

Training LLMs like GPT-4 is an intensive process, demanding substantial computational resources. Typically, these models are trained on clusters of GPUs or specialized hardware over months. The power consumption during this phase is significant due to the complexity of tasks like processing vast datasets and continuously adjusting the model's parameters.

Inference, on the other hand, is generally less power-intensive than training. Once a model is trained, running it for inference doesn’t require the same level of continuous, heavy computation. Inference can be performed on less powerful machines, including CPUs (though usually not the case for LLMs), depending on the model's complexity and the task at hand.

The training phase is responsible for the bulk of the carbon footprint associated with LLMs. The extensive use of thousands of high-powered GPUs and the duration of training contribute significantly to greenhouse gas emissions.

The carbon footprint of the inference phase is markedly lower compared to training. Since inference requires fewer computational resources, the associated emissions are correspondingly reduced. However, it's crucial to consider the frequency of inference operations. In applications where LLMs are queried incessantly, the cumulative carbon footprint can become substantial.

The real-world environmental cost of using LLMs hinges on the scale and frequency of their application. Services that continuously rely on these models for real-time responses, like chatbots or content generation tools, can accumulate significant energy usage over time.

Inference at Meta

Meta has been notably transparent about the environmental impact of its AI operations. In a paper, they disclosed that power is allocated in a 10:20:70 ratio within its AI infrastructure across three key phases: Experimentation, Training, and Inference—with Inference consuming the lion's share.

This distribution reflects a crucial aspect of AI usage: while Experimentation and Training are intensive, they are finite phases, Inference is a long running process. As such, the carbon emissions from Inference accumulate over time, potentially surpassing the total emissions from the initial training of the model.

The diagram from the paper showcases the operational carbon footprint of various large-scale machine learning tasks. The black bars represent the carbon footprint during the offline training phase. This phase has a substantial carbon impact, indicating the significant energy required to process and learn from massive datasets.

The orange bars, although fewer, indicate that the models undergoing online training also contribute notably to carbon emissions. Online training allows models to update and refine their learning continuously, which, while beneficial for performance, adds to the carbon footprint.

The patterned bars illustrate the carbon footprint during the inference phase. For many models, this footprint is smaller per unit of time compared to training phases. However, because inference is ongoing, these emissions will accumulate and in many cases eclipses the one-time training emissions, especially for heavily used models.

Real-World Implications: LLMs in Daily Use

As we have established, the environmental cost of LLMs depends on the scale of their application. For instance, energy consumption becomes a crucial factor with the increasing use of models like ChatGPT as search engines. A single ChatGPT query might consume around 0.3 kWh, compared to a mere 0.0003 kWh for a standard Google search. This means GPT-3's energy consumption is roughly 1000 times more than a simple Google search, highlighting the significant environmental impact of frequent LLM usage.

Explore more

View all news

Explore solutions

Resources

Carbon Aware Computing for GenAI Developers Course by Deep Learning AI

1 Jul 2024

Resources

Google's AI search summaries use 10x more energy than just doing a normal Google search

28 Jun 2024

Resources

2024: The Year We Should Start Caring about the Carbon Footprint of LLMs

10 Jan 2024

Resources

Watt's in our Query? Decoding the Energy of AI Interactions

9 Mar 2024

Resources

Carbon Aware Computing for GenAI Developers Course by Deep Learning AI

1 Jul 2024

Resources

Carbon Aware Computing for GenAI Developers Course by Deep Learning AI

1 Jul 2024

Resources

Google's AI search summaries use 10x more energy than just doing a normal Google search

28 Jun 2024

Optimize AI. Reduce Costs. Minimize Environmental Impact. Carbon ScaleDown empowers businesses and individuals to make every AI interaction more efficient, cost-effective, and sustainable