RESEARCH

Publications

Informing climate risk analysis using textual information - A research agenda (2024)
With Malte Schierholz, Bolei Ma, Jacob Beck, Andreas Dimmelmeier, Hendrik Christian Doll, Maurice Fehr, Frauke Kreuter, and Alex Fraser. 

Proceedings of the 1st Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2024), available at ACL 2024 Workshop ClimateNLP homepage.

This project is part of the larger research agenda GIST - Greenhouse Gas Insights and Sustainability Tracking, a research collaboration between Deutsche Bundesbank and LMU Munich to generate high-quality, granular firm-level emissions and sustainability data. More information can be found here

Abstract:
We present a research agenda focused on efficiently extracting, assuring quality, and consolidating textual company sustainability information to address urgent climate change decision-making needs. Starting from the goal to create integrated FAIR (Findable, Accessible, Interoperable, Reusable) climate-related data, we identify research needs pertaining to the technical aspects of information extraction as well as to the design of the integrated sustainability datasets that we seek to compile. Regarding extraction, we leverage technological advancements, particularly in large language models (LLMs) and Retrieval-Augmented Generation (RAG) pipelines, to unlock the underutilized potential of unstructured textual information contained in corporate sustainability reports. In applying these techniques, we review key challenges, which include the retrieval and extraction of CO2 emission values from PDF documents, especially from unstructured tables and graphs therein, and the validation of automatically extracted data through comparisons with human-annotated values. We also review how existing use cases and practices in climate risk analytics relate to choices of what textual information should be extracted and how it could be linked to existing structured data.

Working Papers

Do Investors Use Sustainable Assets as Carbon Offsets? (2024)
With Jakob Famulok and Daniel Worring.

Available at SSRN.

Presentations:

Prizes:

Media: 

Abstract:
We present novel evidence that retail investors attempt offsetting their carbon footprints by investing sustainably. Analyzing 6,151 bank clients and conducting an experiment with 4,249 participants, we find higher footprints are linked to greener portfolios. In a randomized control trial, we show that the salience of investors’ carbon footprints compared to their peers causally shifts sustainable asset allocations, driven by participants with moderate environmental beliefs. We additionally identify a substitution effect between carbon offsetting through donations and sustainable assets. Our findings contribute to understanding behavioral drivers in sustainable investing, crucial for designing policies which align financial markets with environmental goals.

Extraction of CO2 emissions from corporate sustainability reports (2024)
With Malte Schierholz, Anna Steinberg, Jacob Beck, and Laia Domenech Burin

Solicited contribution to the 65th ISI World Statistics Congress 2025, The Hague, NL. 

This project is part of the larger research agenda GIST - Greenhouse Gas Insights and Sustainability Tracking, a research collaboration between Deutsche Bundesbank and LMU Munich to generate high-quality, granular firm-level emission and sustainability data. More information can be found here

Abstract:
Financial regulators and central banks are increasingly integrating sustainability aspects into their operations, but significant data gaps remain. The CSRD directive requires all large European enterprises to annually publish their greenhouse gas emissions (CO2-equivalents) in their management report, annual report, or sustainability report. The amount of information available, i.e., the value and unit for each scope, direct emissions (Scope 1), indirect energy-related emissions (Scope 2), and other indirect emissions (Scope 3), is immense, but the data are spread over thousands of PDF documents, published online on company websites, and historically often without abiding to official standards or guidelines. Until now, private companies extract carbon emissions and other indicators from these PDF documents and sell it in a structured, tabular data format to the Bundesbank and to other public authorities. However, despite little apparent difficulties in value extraction from PDF documents the reliability between values extracted by different companies is rather low. Given the current dim situation, we leverage Large Language Models (LLMs) and Retrieval Augmented Generation (RAG) to build several fully automated data extraction pipelines, which are then being compared with data bought from private providers and evaluated using a specially curated gold standard dataset of our own. Open-source software is shared with the community which enables everyone to extract CO2-related indicators from company sustainability reports.

The President Reacts to News Channel of Government Communication (2023)
With Farshid Abdi, Loriana Pelizzon, Mila Getmansky Sherman, and Zorka Simon. 

SAFE Working Paper No. 314, available at SSRN.
Revise & Resubmit at Management Science. 

Presented at:

Abstract:
Studying about 1,200 economy-related tweets of President Trump, we establish the "President reacts to news" channel of stock returns. Using high-frequency identification of market movements and machine learning to classify the topics and textual sentiment of tweets, we address the observed heterogeneity in the aggregate stock market response to these messages. After controlling for market trends preceding tweets, we find that 80% of tweets are reactive and predictable rather than novel and informative. The exceptions are trade war tweets, where the President has direct policy authority, and his tweets can reveal investable private information or information about his policy function.

Do Gamblers Invest in Lottery Stocks? (2023)
With Tobin Hanspal and Andreas Hackethal. 

SAFE Working Paper No. 373, available at SSRN.

Presented at:

Media

Abstract:
Previous studies document a relationship between gambling activity at the aggregate level and investments in securities with lottery-like features. We combine data on individual gambling consumption with portfolio holdings and trading records to examine whether gambling and trading act as substitutes or complements. We find that gamblers are more likely than the average investor to hold lottery stocks, but significantly less likely than active traders who do not gamble. Our results suggest that gambling behavior across domains is less relevant compared to other portfolio characteristics that predict investing in high-risk and high-skew securities, and that gambling on and off the stock market act as substitutes to satisfy the same need, e.g., sensation seeking.

Gray literature

Houston, we have a problem: Can satellite information bridge the climate-related data gap? (2024)
With Andrés Alonso Robisco, José Manuel Carbó Martínez, and Elena Triebskorn.

Documento Ocasional No. 2428, available via Banco de España

Presentations: 

Abstract:
Central banks and international supervisors have identified the difficulty of obtaining climate information as one of the key obstacles to the development of green financial products and markets. To bridge this data gap, the use of satellite information from Earth Observation (EO) systems may be necessary. To better understand this process, we analyse the potential of applying satellite data to green finance. First, we summarise the policy debate from a central banking perspective. We then briefly describe the main challenges for economists in dealing with the EO data format and quantitative methodologies for measuring its economic materiality. Finally, using topic modelling, we perform a systematic literature review of recent academic studies to identify the research areas in which satellite data are currently being used in green finance. We find the following topics: physical risk materialisation (including both acute and chronic risk), deforestation, energy and emissions, agricultural risk and land use and land cover. We conclude with a comprehensive analysis on the financial materiality of this alternative data source, a mapping of these application domains to new green financial instruments and markets under development, such as thematic bonds or carbon credits, and some key considerations for policy discussion. 

The climate data iceberg – A depth of information to integrate (2024)
With Hendrik Christian Doll, Susanne Walter, and Gabriela Alves Werb.

Forthcoming in Bulletin on the 12th biennial IFC Conference on "Statistics and beyond: new data for decision making in central banks", slides available via this download link.

Presentations:

Abstract:
Central banks need climate-related data to align evidence-based climate change considerations with their core tasks. While structured data from administrative and proprietary sources are limited and contain considerable gaps, a wealth of climate-related information is dispersed and lies below the surface in unstructured form, such as sustainability reports or satellite images. To characterise this situation, we introduce the image of the climate data iceberg. Information from unstructured sources can bridge current data gaps and enhance the usability of existing data by improving its accuracy, extending its scope, and reducing data sharing barriers. In this paper, we discuss the challenges and opportunities central banks and supervisors face in leveraging this unstructured information for climate analysis and research. We further investigate how innovative efforts between central banks and other institutions can help generate actionable and usable climate-related data, exemplified by our own experiences and early-stage learnings from such collaborations.

Extracting Data Citations with Large Language Models  (2024)
With Hendrik Christian Doll and Sebastian Seltmann.

Presentations: 

Abstract:
Empirical researchers and research data centers (RDCs) face challenges in efficiently understanding and categorizing data sources and methodologies used in scholarly papers. This process currently relies on human readers and is time-consuming and prone to errors. To address this, we explore the potential of using Large Language Models (LLMs), specifically GPT-3.5, to automate the identification and categorization of research data sources. We analyze the accuracy of GPT-3.5 in detecting and summarizing data sources and methods in economics and finance papers. By employing web-scraping techniques, we collect a comprehensive sample of research papers and create human-labeled validation datasets. We evaluate the detection and prediction accuracy and address the issue of false answers provided by the model. Additionally, we assess the pre-processing requirements of GPT-3.5 for cost-effective implementation. Our paper also provides a guide for implementing our proposed solution at research institutions and RDCs worldwide, aiming to enhance data analysis and research data provision services.

Includes upcoming presentations. Unpublished papers available upon request.