21 Mar 2024
The Cost of Inference: Running the Models Part 2
Resources
How was the energy usage calculated?
Unfortunately, we do not know much about the inference infrastructure of OpenAI. Sam Altman provided some insights into this in a tweet where he suggested that a single prompt costs "probably single-digits cents,” which gives it a worst-case scenario of being $0.09 per request.
This stack exchange answer delves into how this figure is arrived at and what it means in more tangible terms.
The cost of processing an AI request is not just a matter of computational power but also involves significant energy consumption. Altman estimates that at least half of the cost of a single AI request can be attributed to energy usage. Considering the energy cost at $0.15 per kilowatt-hour (kWh), we can dissect the expenses further:
Cost per Request: $0.09
Proportion of Energy Cost: 50% of the total cost
Energy Price: $0.15 per 1 kWh
Using these figures, the energy consumption per AI request can be calculated as follows:
Plugging in the numbers, we get an Energy Consumption of 0.3kWh/request
This translates to 300 watt-hours (Wh) per request.
Consider the energy required to charge a smartphone to put this into a more relatable context. An average smartphone charge might take about 5Wh. Therefore, the energy used for a single request to ChatGPT is equivalent to charging a smartphone 60 times!
Calculating the Carbon Footprint of GPT4
The energy consumption of LLMs in a given time frame (say, one hour) can be calculated using the formula:
From leaks and tweets, we can get an estimate about the infrastructure running ChatGPT:
Number of Hardware Units: 28,936 Nvidia A100 GPUs
TDP: 6.5 kW/server
PUE: 1.2 (a measure of how efficiently a data center uses energy). This number is reported by Azure.
This formula gives us the total energy consumption for the ChatGPT infrastructure in one hour.
Next, we need to determine the total number of tokens generated by ChatGPT in one hour:
From leaks and industry estimates, we know the following:
DAU: 13 million
Average Daily Queries/User: 15
Average Tokens/Query: 2,000
With this data, we can finally calculate the Energy needed to generate each token. We can use that to calculate the carbon footprint:
Energy per Token: Based on the energy consumption and total tokens calculated.
gCO2e/KWh: 240.6 gCO2e/KWh for Microsoft Azure US West
This gives the gCO2e for the operations of GPT-4 to be 0.3 gCo2e for 1k tokens
Estimation for DALL-E 3
The estimated carbon footprint for DALL-E 2 was 2.2 gCO2e per image. Assuming technological advancements and increased efficiency, but also the increased complexity of DALL-E 3, we can hypothesize that DALL-E 3 might have a carbon footprint of at least 4 gCO2e per image.
Typical ChatGPT Emissions
So a typical query with 1 thousand tokens and two generated images will release approximately 8.3 gCO2e of carbon, equivalent to charging one smartphone or driving 30 meters in a gas-powered car.
Carbon ScaleDown
As Foundational Models like GPT-4 and DALL-E become more ubiquitous, it becomes increasingly clear that these powerful also bring significant environmental costs. We must weigh the incredible capabilities of LLMs in enhancing our digital experiences against the environmental impact they carry.