Can Expert Prediction Markets Forecast Climate-Related Risks? in Science Advances

Can Expert Prediction Markets Forecast Climate-Related Risks? in Science Advances

The Promise of Prediction Markets for Climate Forecasting

In The Critique of Pure Reason, Immanuel Kant wrote that “The usual touchstone, whether that which someone asserts is merely his persuasion—or at least his subjective conviction, that is, his firm belief—is betting.” The mathematical theory of probability was born out of gambling when, in the early seventeenth century, Blaise Pascal and Pierre de Fermat set out to explain the consistent losses of the Chevalier de Méré in a popular dice game. While recreational gambling is frowned upon by some, regulated in many places, and banned in others, examples involving tossed coins, dice, and lotteries are a staple of probability and statistics classes.

In recent decades, economists have taken a more than recreational interest in the ability of betting to elicit and aggregate information and have advocated the use of “information markets” or “prediction markets” as a way of synthesizing disparate sources of information and expertise using the mechanics of betting (Arrow et al. 2008; Abramowicz 2008). The challenge of combining multiple sources of information, different modeling approaches, and human knowledge is common to seasonal and climate forecasting.

While the use of prediction markets for climate forecasting has been suggested by some climate scientists, legal scholars, and economists, there are few examples of their use (Hsu 2011; Vandenbergh et al. 2013; Nay et al. 2016; Lucas and Mormann 2018; Aliakbari and McKitrick 2018; Roulston et al. 2022). This is partly because of regulatory obstacles, due to their similarity with recreational gambling, and because switching to a system that directly rewards forecasters for accuracy would represent a fundamental change for many funders of seasonal and climate forecasting.

In this article, we will introduce prediction markets and explain how they can solve some pressing problems faced by the users of climate forecasts. We will also describe a suite of climate-related prediction markets, with expert participants, that we have run with horizons up to one year ahead. These have demonstrated the ability of relevant experts to engage with prediction markets and have allowed us to make a preliminary evaluation of whether the collective probability forecasts the markets generate are probabilistically calibrated, or reliable, in the terminology of meteorology.

How Prediction Markets Work

Most prediction markets are based on conditional contracts which economists call “Arrow–Debreu” securities. These are simply contracts that pay out a fixed amount (often $1.00) if the event specified in the contract occurs. For example, a contract might pay out $1.00 if it snows in New York on Christmas Day. Participants trade these contracts with each other and, if we assume the prevailing price is equal to the expected pay out, Prob(snow) × $1.00, then the price can be interpreted as the probability that it will snow according to the collective wisdom of “the market.” At the point of buying a contract, the arrangement is no different to fixed-odds betting: a price of $0.25 would correspond to odds of 3:1 against.

In a prediction market, however, participants have the option of selling and so do not have to hold the contract until expiration. Buying and selling by participants determines a prevailing price, and it is this that allows a prediction market to be a mechanism for dynamically aggregating information.

There are different mechanisms by which contracts are traded. Many prediction markets use a continuous double auction (CDA) in which participants can post the price at which they are prepared to buy a contract (a “bid”) or the price at which they are prepared to sell one (an “ask”). If the highest bid matches or exceeds, the lowest ask a trade occurs. CDAs work well if there are many participants with divergent views. If there are frequent trades, the price of the most recent one can be used as a proxy for the probability of the event occurring. CDAs work less well if there are fewer participants or if there is strong agreement about the fair value of the contract, in which case very few trades may occur. The market-based estimate of the probability can be assumed to lie between the highest bid and the lowest ask, but sometimes this “bid–ask spread” can be large, making estimation of the probability difficult.

An alternative to a CDA is an automated market maker (AMM) that will always quote a price at which it will buy or sell contracts. Many financial markets have market makers who are participants prepared to both buy and sell, but these market makers seek to make a profit by buying at prices lower than they sell. In contrast, the AMM in a prediction market can be subsidized and designed to lose money in return for accurate information. This is a fundamentally different goal than that of traditional market makers or bookmakers in gambling. AMMs adjust their pricing based on how many contracts for different outcomes they have already sold and can be based on “proper scoring rules” for evaluating probability forecasts, such as the logarithmic or Brier scores (Hanson 2003). Under proper scores, which are a standard way of verifying probabilistic weather forecasts, participants maximize their expected reward by expressing their true beliefs about probabilities (Gneiting and Raftery 2007). Proper scoring rules are “incentive compatible” in the jargon of economics. By always being prepared to take the other side of a bet, AMMs allow prediction markets to function with a small number of participants, or if participants agree.

While subsidized AMMs can greatly improve the functioning of a prediction market, they require a subsidy. A sponsor must essentially provide a prize which will be shared among participants in proportion to the contribution they make to the accuracy of the collective forecast. It is not necessary for participants to understand the functioning of the AMM; they merely need to decide whether the prices it quotes are above or below what they consider “fair value.”

Prediction Markets in Action

The first academic prediction market, now known as the Iowa Electronic Markets (IEM), was established by the University of Iowa in the 1980s. The IEM focuses on predicting the outcome of elections (Berg and Rietz 2014). Participants stake real money, although the number of participants and the size of their stakes are restricted in an agreement with the Commodity Futures Trading Commission (CFTC). Forecasts produced by the IEM have been shown to be on average more accurate than conventional opinion polling (Berg et al. 2008a). It is likely, however, that participants take polls into consideration when placing their trades, which underlines the important point that prediction markets are not really an alternative mechanism for making forecasts but a mechanism for aggregating many forecasts into a unified prediction. This aggregation can include tacit expertise concerning the relative merits of different information sources.

More recently, prediction markets with expert participants have been used to predict whether psychology experiments will replicate. These markets were found to be more accurate than surveying the same experts (Dreber et al. 2015). Their ability to allow a wider range of participants to contribute is an oft-cited advantage of prediction markets, but results such as Dreber et al. suggest that, even for a given group of experts, it is advantageous to use prediction markets for combining their views.

Prediction markets for climate are not a new idea. Mark Boslough created a pilot market in 2011 for the global mean temperature anomaly in the following year on the now-defunct commercial platform Intrade (Boslough 2011). This market consisted of a strip of contracts for whether the temperature anomaly would exceed a range of thresholds from +0.30° to +1.10°C at intervals of 0.05°C. By partitioning a continuous quantity, like the temperature anomaly, into discrete intervals, a prediction market using binary contracts can be used to generate an implied probability distribution for the variable. Unfortunately, Intrade used a CDA and the market suffered from the problem of low activity and very large bid–ask spreads discussed above. This problem was exacerbated by the number of different contracts available. These issues illustrate the advantages of using an AMM when markets are for more specialized topics that struggle to attract mass interest in the way that sports betting does.

Solving Problems in Climate Forecasting with Prediction Markets

Prediction markets offer a solution to a couple of significant problems that affect climate forecasting: how to aggregate many sources of information and how to align the incentives of forecasters with those of forecast users. In addition, more sophisticated prediction markets can address the interdependence of forecasts of greenhouse gas (GHG) concentrations and forecasts of future climate, a problem that has been called “circularity.”

Climate forecasting is a multidisciplinary activity. Predicting future climate given the future concentration of GHGs involves atmospheric science, oceanography, and other physical sciences. Much of this expertise is codified in coupled atmosphere–ocean general circulation models (CGCMs), but there is also knowledge not included within the models, as demonstrated by differences in simulations from different CGCMs. Furthermore, unconditional forecasts of future climate require future GHG concentrations to be predicted as well, an endeavor that calls for insight into economics, public policy, the quality of institutions, and the likely path of technological innovation (Moore et al. 2022; Venmans and Carr 2024). As well as the diversity of disciplines, there is a diversity of modeling approaches: CGCMs are augmented with different ways to downscale predictions to produce more localized forecasts, including regional climate models, statistical methods, and more recently AI-based techniques. Combining disparate information to produce forecasts relevant to decision-makers is a task for which prediction markets are well suited.

Weather forecasters, whose predictions are for a few days ahead, can be rigorously evaluated with only a few months of forecasts and their subsequent verifications. Evaluating seasonal forecasts, with prediction horizons of a few months, with similar statistical robustness requires several years of forecast-verification pairs. Some practitioners attempt to get around this constraint by evaluating “reforecasts,” which are forecasts made retrospectively for verification times in the past (e.g., Hamill et al. 2006; Weisheimer and Palmer 2014; Risbey et al. 2022). While reforecasts should only use information that would have been available at the point when the forecast was made, in practice reforecasting can be susceptible to model overfitting and model selection biases, leading to exaggerated forecast skill. In finance, this type of retrospective forecasting is called “backtesting” and, because of selection biases, its results are treated with extreme caution (Bailey et al. 2014). In climate forecasting, the phrase “artificial skill” is used to describe the way that forecast skill estimates can be inflated when information that would not have been available when the forecast was made is indirectly incorporated into the forecast, such as when predictor variables are screened (DelSole and Shukla 2009) or via bias correction or the definition of climatology (Risbey et al. 2021). The most stringent forecast evaluations use truly out-of-sample predictions in which the forecast was issued before the verification time. This makes accumulating a sufficiently large dataset difficult for seasonal forecasts and usually impossible for climate forecasts with horizons of years to decades.

This is a major problem for users of these forecasts who do not know how good a forecaster is and have no robust way of evaluating them before making use of their forecasts. Economists call this “information asymmetry” and have explained how it can cause the breakdown of markets when buyers are not prepared to pay for quality they cannot verify, and sellers are unwilling to invest in quality they cannot demonstrate (Akerlof 1970). Under these conditions, it is more rational for forecast providers to focus on presentation and the user-friendliness of their portals than on the accuracy of their forecasts. A common solution to the problem of asymmetric information is for the compensation of sellers to be contingent on quality, for example, by providing warranties (Grossman 1981). Prediction markets do this by rewarding forecasters based on the accuracy of their predictions.

If a prediction market for the future global temperature anomaly implies only a moderate rise in temperatures over the next few decades, this could be because market participants believe that the sensitivity of climate to GHGs is low, or it could be because they expect effective action to reduce GHGs. If these two scenarios cannot be differentiated, the prediction is of limited use to policy makers trying to mitigate climate change (although it is still potentially useful for climate adaptation). This interdependence of policy and prediction is sometimes referred to as circularity in the analogous context of interest rate setting and inflation forecasting (Bernanke and Woodford 1997; Sumner and Jackson 2008). It can be addressed using more complicated prediction markets, which, for example, allow for joint predictions of future GHG concentrations and global temperature anomalies. Such a market generates a two-dimensional distribution of prices that can be interpreted as the joint probability distribution for GHG concentrations and temperature anomaly. Probability distributions for temperature, conditional on a particular GHG concentration, can then be extracted, allowing the low-sensitivity scenario to be distinguished from the low-GHG-concentration scenario.

The distinguishing feature of prediction markets is that their primary and often only purpose is “information discovery.” Traditional financial markets for stocks and futures also perform information discovery, but they do this as a side effect of their primary purpose, which is the transfer of assets or risks. For example, there are futures contracts called weather derivatives whose payout depends on weather-related variables, such as the average monthly temperature in a specified city (Zeng 2000; Dutton 2002; Jewson and Brix 2005). These contracts were invented to allow firms, such as power companies, to hedge their weather-related risks. The prices of these derivatives provide indirect predictions of the weather variables to some extent, but they focus on specific cities and are only actively traded up to a year or so ahead, so they provide little information about long-range climate change. Researchers have also studied real estate prices and found that homes more exposed to sea level rise can sell at discounts compared to similar but less exposed properties (Bernstein et al. 2019), although these discounts may still be underpricing the risk (Gourevitch et al. 2023). Many of the most significant climate risks will ultimately fall to governments who may be unable or unwilling to transfer them but who could still benefit from market-based predictions of the risks. Prediction markets can decouple the ability of markets to aggregate information from their role in the transfer of risk.

Putting Prediction Markets to the Test

Since 2018, we have run two dozen individual prediction markets for climate-related risks covering four topics: U.K. monthly temperatures and rainfall, the Niño-3.4 sea surface temperature anomaly, Atlantic hurricane activity, and U.K. wheat yield. The primary goal of these markets was to test and refine the design of prediction markets for climate-related applications and to familiarize relevant experts with the concept. However, this collection of proof-of-concept markets also constitutes an “experiment-of-opportunity” that allows us to perform a preliminary evaluation of the collective forecasts produced by prediction markets.

None of the markets were “pay-to-play.” Instead, teams and individuals with relevant expertise were invited to participate and endowed with credits with which to trade contracts. After the actual outcomes were known, and the markets were settled, the credits that participants had accumulated were converted into cash rewards provided by the market sponsor. This arrangement avoided falling foul of laws regulating online betting. The exact mechanism used for converting on-platform credits to cash varied and was influenced by the sponsor and a desire to test different incentive schemes.

The prediction horizons of the demonstration markets ranged from a couple of months to 1 year ahead. Even the longest horizons were significantly shorter than those relevant to long-range climate prediction, but, because the markets were proofs-of-concept, the horizons are a compromise between using time scales relevant to climate while still being able to collect enough forecast-verification pairs to allow statistically meaningful analyses.

All the markets were hosted on versions of the AGORA prediction market platform developed by Winton Group and Hivemind Technologies Ltd. (Roulston et al. 2016). On this platform, each market has a set of outcomes defined so that one outcome, and only one, will occur. Once the actual outcome is known, any contract including this outcome converts to 1.00 credit while all other contracts become worthless. If a participant believes an outcome is undervalued, that is, the price of the outcome is less than the probability of it occurring, then they can buy the outcome and the AMM responds by raising its price and lowering the price of other outcomes so that prices across all outcomes always sum to one. If a participant believes an outcome is overvalued, then, because the outcomes cover every eventuality and their prices always sum to one, there must be other outcomes they believe are undervalued which they can buy.

The 24 markets fell into four groups:

  1. U.K. Temperature and Rainfall Markets: In 2018, six markets were run by Winton to simultaneously predict the monthly average of the maximum daily temperature for the United Kingdom and the total monthly rainfall—statistics published by the Met Office—for April–September. The outcome space of each market was a two-dimensional grid with temperature partitioned into intervals of 0.2°C, ranging from 0° to 25°C, and rainfall partitioned into intervals of 5 mm, from 0 to 200 mm. Open intervals covered temperatures below 0°C and above 25°C and rainfall above 200 mm. With 127 temperature intervals and 41 rainfall intervals, there were 5207 joint outcomes. The purpose of these markets was to test the viability of joint-outcome markets with two-dimensional outcome spaces and a very large number of distinct outcomes. A market with this structure could be used to make joint predictions of
Scroll to Top