Finding Climate Targets with LLMs - Part 1: The Basics
An intro to corporate climate targets and LLM/RAG in long corporate disclosures.
· 7 min read
Introduction #
This is the first of a series of posts in which I set out to explore the problem of obtaining machine-readable climate emission targets from unstructured corporate disclosures using LLMs.
To clarify what all of that means, I will start by giving some background into corporate climate change mitigation and the challenges of finding data, along with an introduction into how LLMs can be used. In subsequent posts, I will run experiments to try various approaches and assess them quantitatively.
What are climate targets? #
Understanding corporate climate emissions targets requires awareness of how companies manage climate risk and the regulatory frameworks they have to comply with.
As climate change increasingly appeared to pose material business risks, companies began enacting climate-related policies. Investors also began paying closer attention, especially to long-term exposure to climate risks across their portfolios. This led to the formation of the Task Force on Climate-related Financial Disclosures, which started standardising the kind of climate-related information that financial markets should known about each company.
Prior to this, the Greenhouse Gas Protocol had already developed a standard for measuring and reporting corporate emissions, divided into directly and indirectly created emission categories. The idea behind it assumed that you cannot meaningfully change what you cannot measure. Companies were given a framework to measure and account for their contribution to greenhouse gases (GHGs), covering a diverse range of sources, including production, travel, buildings, purchased materials, logistics, and more.
Companies investing this time and effort would then often try to develop a strategy to “drawdown” their emissions. A strategy becomes actionable when the company develops it into a climate transition plan. The plan should set out decarbonisation goals, explain how the company plans to manage climate-related risks, and define measurable targets for tracking progress.
Climate-related targets can cover any activity that materially affects climate change risks. Arguably, the most important targets reflect an anticipated decrease in either absolute emissions or the emissions intensity, the concentration of emissions by unit of business activity (e.g. the amount of emissions produced per macguffin), by a given date. We refer to these as corporate climate emission targets. They relate to an outside observer what a company aims to achieve in terms of their emissions and by when. The company should then report annually on their progress towards them.
Setting climate targets should only come after all the work of measuring emissions and developing a transition plan. Nonetheless, nothing stops a company from stating “we plan to cut emissions by 50%”; in fact, many do make such statements, until they realise how much work they have inadvertently committed themselves to, or they educate themselves a little more and realise they need to measure their emissions and come up with a strategic plan first.
In recent years, increasing sustainability regulations have motivated many companies to committing to undertake “climate action.” However, today, political pressure amid a poorer economic outlook has led to the weakening of many of these regulations.
Where can we find climate targets? #
Companies ought to report the targets they set. The IFRS Foundation took over the monitoring of the progress of companies’ climate-related disclosures. This shows that climate-related disclosures now function like financial reporting.
Despite this, regulators still have yet to introduce strong compliance requirements, and no universal standards for formulating or communicating climate targets exist. Companies can conceptualise and communicate them any way they want.
Companies often communicate climate targets in writing within investor-focused disclosures. Authors often compress a target into one or two sentences inside a 300-page report, which can feel like searching for a needle in a haystack. Authors also present targets in charts or tables. As a result, disclosures often contain targets as unstructured information in multiple forms.
To reiterate, companies can communicate climate targets in their main annual reports, in dedicated sustainability reports, or in standalone climate reports, all aimed at investors. Companies decide what to disclose and how to present it, within their compliance requirements. These reports often span hundreds of pages and use dense, complex formatting that can challenge even human readers.
Structured formats for targets do exist specifically targets, however. The Science Based Targets initiative (SBTi) was created as a voluntary initiative to give the standards, tools, and guidance for setting credible, science-based targets and a validation service. Once validated, SBTi represents company targets in a structured database provided through excel spreadsheet exports on their website.
SBTi sets high standards for their targets, which typically suit large organisations in a limited set of industries. However, more than 9000 companies now have validated targets and the number increases each year.
One of the founding members of SBTi, the Carbon Disclosure Project (CDP), also provides structured targets information. CDP aims to make climate and other related sustainability disclosures more structured and standardised in general, and many consider it the “gold standard” of corporate climate-related reporting data. Reporting to CDP takes a considerable amount of effort and skill and so only relatively few companies report to them. CDP disclosures follow a well-designed and constantly revised structured format.
CDP targets in particular follow a very well defined schema. However, accessing and using CDP data requires an expensive data license. Anyone serious about analysing corporate climate data should get access to CDP data but bear in mind the expense and the narrow coverage.
If you want to find a company’s climate targets, start by establishing if you have access to CDP. If you don’t see the company in CDP, check both SBTi and their most recent annual data. If the company has made a CDP disclosure for the latest reporting year, then use that as a reliable snapshot of their climate target status. Otherwise, go to SBTi and check there. However, the company may have only validated some of their targets with SBTI! You will then have to go to the annual investor reporting or their website or any other primary possible disclosure touchpoint to find a mention of their targets.
ESG-focused investors will seek out targets to help understand the climate risk of a company. Procurement or sustainability teams of a company’s value chain partners also have use of targets, since companies in a value chain all have interlinked climate risk dependencies. In other words, to manage its own climate risk position, a company should know the climate performance and forward-looking position of its suppliers. In some cases, the position of its customers must be known too, if judged material. Climate targets become an important leading indicator in the forecasting of climate risk.
Using LLMs to find climate targets #
As LLMs matured, they improved at turning unstructured text into structured, machine-readable data at useful quality levels. Previously, edge cases would routinely defeat automation, requiring substantial manual work by humans, making scalability impossible in many cases. LLMs, on the other hand, can trawl through piles of unstructured text looking for items of interest, and then convert them into useful structured forms.
In practice, we would use something like Retrieval Augmented Generation (RAG) for such a task. RAG combines LLMs with external knowledge sources so the model can access and use those sources at query time. Data from these sources is first loaded and indexed. When a user submits a query, the system searches the index to filter and rank the most relevant pieces of information. That retrieved context, together with the user’s query and a system prompt, is then passed to the LLM. This allows the model to ground its answers in the proprietary data, reducing hallucinations and improving accuracy and verifiability. Diagram 1 shows how such a RAG system looks at a schematic level.
flowchart TD
%% Retrieval-Augmented Generation
subgraph I[Indexing]
PDFS@{ shape: docs, label: "PDFs"}
LI[Extract Text & Load]
PDFS --> LI
end
subgraph Q[Querying]
U@{ shape: manual-input, label: "User Query"}
R[Retrieve]
LLM[Send to LLM]
U --> R
U --> LLM
end
IND[(Index)]
A[Get Answer]
LLM --> A
LI --> IND
IND --> R --> LLM
Diagram 1: Overview of a basic RAG system.
Nowadays, RAG systems have advanced considerably, reaching levels of sophisticated seen in this article here.
RAG systems like these could operate on a set of documents, find unstructured information of interest within, and then turn it into structured, machine-readable data. For example, we could operate on a big set of corporate documents in order to search for and extract climate targets for downstream analysis.
Next steps #
Subsequent work will involve harvesting data and beginning the process of building the baseline RAG system, along with the requirement to quantitatively evaluate the performance.
After building, a baseline, I would like to explore some of the ideas in this article, and experimentally assess them to see what improvements they can bring.