Back to Research
Abstract isometric cubes dissolving into smoke — binary events research artwork

What we found last time

In our v1 analysis we looked at what it costs to split a single question into multiple binary contracts on Polymarket. The dataset was the full census of multi-market events — 18,863 events containing 172,869 binary contracts across 6 categories. Two pathologies came out cleanly. Volume concentrates in the top 5 markets per event regardless of how many markets the event contains. Ghost market rate (we defined as <$1k lifetime volume) scales with N (number of markets) — 36% of markets dead at N=11–20, 72% at N=51+.

That cut was multi-market vs single-market — what does splitting cost compared to running the question as one market? But v1 treated all multi-market events the same. "Who wins the NBA championship?" and "What price will BTC close at this month?" sat in the same dataset. One is a list of named teams — a genuinely unordered set. The other is a continuous quantity bucketed into arbitrary price ranges. The concentration findings held across both, but the argument that a continuous-distribution primitive could fix these problems only applies to the second type.


v2 looks at continuous vs the rest

The question for v2 was simple — when we separate events that discretise a continuous quantity (price ranges, margin-of-victory percentages, temperature brackets) from events that list named entities (election candidates, sports teams, award nominees), do the pathology findings survive?

The answer turned out to be more interesting than "yes" or "no". Some findings held across both types. One finding — the ghost market scaling — turned out to be largely a categorical phenomenon, obvious in hindsight but enlightening nevertheless.


How we classified events

We built a heuristic classifier that labels each of the 18,863 multi-market events based on the structure of its sibling market questions. The method extracts the varying fragment across siblings (strip the common prefix and suffix from each question title), then tags each fragment as numeric or categorical using pattern-matching signals.

Event classification — share of 18,863 multi-market events
Label Events Volume Share
Categorical10,179$25.8B72.0%
Continuous7,751$6.7B18.6%
Discrete numeric42$2.8B7.8%
Mixed838$0.5B1.4%
Ambiguous53$0.05B0.1%

Categorical dominates by volume because of a handful of very large events — the 2024 Presidential Election ($3.7B), NBA Champion ($1.7B), Super Bowl ($1.2B). The continuous slice is $6.7B across 7,751 events, driven by crypto price thresholds, weather brackets, political margin-of-victory questions, and date timing events.

The 42 discrete numeric events are all central bank rate decisions ($2.8B). We excluded these from the continuous headline because the Fed only chooses between a small fixed set of basis-point moves. A continuous-distribution primitive does not add resolution where the outcome set is not continuous in the first place. We explored expanding this exclusion to other institutions (tariff rates, turnout, vote counts, medal counts) and found none of the other candidates are genuinely policy-bounded the way Fed events are. The full decision rules, keyword lists, and QA spot checks are documented in the sprint notes.

The classifier is an experimental model. The point is transparency — the reader can check our working and argue with any specific decision.

Top continuous events headline-slice sanity check — table of top 10 continuous events with rank, name, N markets, dollar volume, example varying span, and classifier verdict

Figure 1. Top continuous events with the varying fragment highlighted and the assigned label, to show how the heuristic splits continuous from categorical.


How much of Polymarket is continuous, and how is it changing?

Two charts side by side: left shows volume per quarter by category from 2022Q3 to 2026Q1; right shows continuous share of volume by category. Politics spikes during election cycles, crypto and weather are nearly 100% continuous.

Figure 2. Quarterly volume by category (left) and continuous share of volume by category (right), 2022Q3 – 2026Q1.

By event count, continuous has been growing fast. In 2026Q1 continuous events overtook categorical for the first time — 3,012 new continuous events vs 1,192 categorical. By volume, categorical still dominates historically because of election-cycle spikes ($5.3B in 2025Q3 alone). But continuous volume has been growing steadily — from $452M in 2025Q1 to $1.7B in 2025Q4 — driven by crypto and weather events that don't spike with political cycles.

Volume by category — continuous vs categorical (lifetime, $M)
Category Continuous vol Categorical vol Total vol % continuous
Politics$3,788M$12,608M$19,511M19%
Sports$87M$11,814M$11,962M<1%
Crypto$1,872M$77M$1,963M95%
Culture$463M$1,097M$1,569M29%
Finance$198M$176M$534M37%
Weather$264M$0.4M$267M99%

Crypto is 95% continuous. Weather is 99%. These are almost entirely events that discretise a continuous quantity into binary buckets — BTC price brackets, temperature ranges, precipitation thresholds. Sports is the opposite — 99% categorical (team winners). Politics is the battleground: $3.8B continuous (margins, timing, seat counts) alongside $12.6B categorical (election winners).

The growth trend matters because it means the discretisation cost analysis applies to a growing share of the platform. As Polymarket expands beyond high-attention election markets into crypto, weather, and finance, a larger fraction of its events are the type where continuous-distribution primitives would be the natural fit.


Liquidity concentration: both labels hit 90% by rank 6

For each event we ranked markets by volume and computed the cumulative share at each rank. The question is the same one from v1: how many of the N markets do you need to account for most of the trading?

Cumulative share of event volume by market rank
Rank Continuous Categorical
131.7%42.8%
368.9%78.0%
589.0%93.2%
1099.4%99.9%
Cumulative volume by market rank — median across events with N></picture>=5 and event volume above $1K. Continuous (blue) and categorical (orange) both reach 90% by rank 5–6, with categorical slightly more concentrated at low ranks.

Figure 3. Cumulative share of event volume by market rank, continuous vs categorical. 90% threshold shown as a red dashed reference line.

Both curves are steep and flatten before rank 10. The median continuous event reaches 90% of total volume by rank 6. Categorical gets there by rank 5.

Concentration in the top handful of markets is a property of the multi-market binary architecture. It holds on both labels with similar magnitude. Categorical is slightly more concentrated because in an election or championship the favourite naturally dominates. In a price-bracket event the "at-the-money" bucket is hot but doesn't swallow 43% of volume because adjacent brackets also carry meaningful probability.

The small gap between the labels is worth noting. Continuous events spread volume slightly more evenly across their top buckets, which means there is real multi-bucket trading activity. Traders are engaging with 5–6 buckets, not just one. A continuous distribution would capture that existing demand on a single shared instrument instead of splitting it across 5–6 separate order books.


Concentration vs N

Continuous events distribute better at higher N. The cumulative curve shows median behaviour across all N values. This chart tests whether adding more markets helps spread volume more evenly. We plotted top-3 concentration against N for every event, with a red reference line showing where uniform allocation (3/N) would sit.

Top-3 share of event volume by N tier
N tier Continuous top-3 Categorical top-3 If uniform
3–589%100%75%
6–1067%93%38%
11–1567%79%23%
16–2049%73%17%
21–3033%63%12%
31–5028%62%8%

Ghost markets: a categorical phenomenon

A ghost market is a market within a multi-market event with less than $1K in lifetime volume. v1 found ghost rates of 36% at N=11–20, 60% at N=21–50, and 72% at N=51+. The implication was that adding more buckets creates dead markets.

v2 shows that finding was largely driven by categorical events.

Categorical ghost rate is roughly 2–3× higher than continuous at N=6–50. Sports tournaments with longshot teams, election fields with no-hope candidates, primary races with placeholder names — these are the events where Polymarket creates many binary contracts and most go dead. On continuous events (BTC price brackets, weather temperature ranges, margin percentages) the bucketing is more functional. Most buckets at intermediate N do get traded.

Continuous events still show meaningful ghost rates at N=11–20 (40%) and N=51+ (45%), so the architecture is not perfect.

Bar chart of ghost market rate by N tier — continuous (blue) vs categorical (orange) — across six tiers from N=2–3 to N=51+. Categorical bars are 2–3× taller than continuous at most tiers; the gap closes at the extremes.

Figure 4. Mean ghost-market rate by N tier (sample sizes shown above each bar).

Sample size caveat

The N=21–50 continuous tier has only 108 events, dominated by Elon Musk tweet-count events (high engagement across all buckets). The mean (17%) is more representative than the median (0%) for this small sample. We use mean throughout for stability.


Where liquidity falls off

The ghost chart uses a fixed $1K threshold. To show where liquidity actually drops off as a continuous function, we computed a survival curve: at each volume threshold from $10 to $1M, what fraction of markets have at least that much lifetime volume?

Two-panel liquidity survival curve. Left: share of markets above each $ threshold, continuous vs categorical, log x-axis from $10 to $1M. Right: continuous slice broken out by N tier (2–10, 11–20, 21+), showing the cliff shifting leftward as N grows.

Figure 5. Market survival curves. Left: continuous vs categorical. Right: continuous slice broken out by N tier — the liquidity cliff shifts left as N grows.

Continuous markets survive longer at every threshold. 65% of continuous markets have at least $1K in volume vs 49% of categorical. At $10K the gap is 37% vs 27%.

The more interesting view is the continuous slice broken out by event size:

  • Small events (N=2–10) stay above 80% alive until ~$10K. The architecture works at low N. Most buckets get meaningful volume.
  • Medium events (N=11–20) drop more steeply, with the liquidity cliff between $1K and $10K.
  • Large events (N=21+) drop earliest, with the cliff starting at ~$100.

The liquidity cliff shifts left as N grows. At N=2–10 the cliff is at ~$10K. At N=21+ it is at ~$100. The architecture handles continuous bucketing reasonably well at low N and breaks at high N. The failure point is a smooth function of event size.

Share of markets alive at each volume threshold
Threshold Continuous alive Categorical alive
$10073%59%
$1,00065%49%
$10,00037%27%
$100,00011%9%

What v2 shows

v1 told a story about binary discretisation creating dead markets and un-tradeable tails. v2 splits the dataset by event type and finds the story is more specific than that.

Concentration is a property of the architecture. Both continuous and categorical events reach 90% of total volume by rank 5–6. This holds on both sides of the split with similar magnitude.

But continuous events distribute better as N grows. At N=21–30 the top 3 markets capture only 33% of continuous event volume (vs 63% for categorical). Traders on continuous events are already distributing capital across more buckets. The demand for multi-bucket expression is visible in the data.

Ghost markets are largely a categorical problem. The v1 finding that 60–72% of markets are dead at high N was driven by sports tournaments and election long-shots. Continuous events have substantially fewer ghosts at every N tier. The architecture handles continuous bucketing reasonably well at intermediate N, but this changes as the threshold for "ghost" increases.

The liquidity cliff shifts left with N. The survival curve shows continuous markets at N=2–10 stay above 80% alive until ~$10K. At N=21+ the cliff starts at ~$100. The architecture works at low N and breaks at high N, and the failure point is a smooth function of event size.

Polymarket's continuous activity is growing. Continuous events overtook categorical by event count in 2026Q1. Crypto (95% continuous) and Weather (99% continuous) are growing categories. The discretisation cost analysis applies to a growing share of the platform.


Notes and limitations

Analysis based on 18,863 multi-market events and 172,869 individual markets from the Polymarket Gamma API (July 2022 – March 2026). Classification methodology and reproducible notebooks: github.com/igorfunctionspace/polymarket-discretisation/tree/main/v2

Metadata only, no trade-level data. We used event and market-level volume, liquidity, and prices from the Gamma API. We cannot measure trader behaviour, order book dynamics, or how liquidity changes over a market's lifecycle.

Point-in-time liquidity. The liquidityClob field is a snapshot. We used volume as the primary metric and restricted liquidity-dependent analysis to open events.

Heuristic classifier. The continuous/categorical labels are produced by a regex-based classifier on event titles, not by manual tagging. The full rules, QA process, and known gaps are documented in the sprint notes. 0.1% of events are ambiguous.

6 categories. We fetched Politics, Crypto, Sports, Finance, Culture, and Weather. Events tagged only as Science, Tech, World, Business, or Entertainment are excluded.

~281 NFL/NBA matchup events ($63M) sit in the continuous slice because their spread/total spans parse as numeric. These are sports betting compound events and should probably be in mixed. 0.9% of continuous volume.