How we learned that schema purity matters less than matching the natural grain of your queries.

Image generated using ChatGPT

Setting the Scene: Digital Billboards and Audience Targeting

Before we talk schema design, we need to discuss the problem. Without the domain context, the technical decisions won’t make sense.

The Business: Out-of-Home Advertising

Think of the digital billboards you see on highways, in malls, airports, or gyms. Each physical screen is called a frame.

Advertisers don’t just ask — How many people see this screen?
They ask:

Which frames should I buy to best reach my target audience?

The Data We Work With

Frames: Physical advertising screens. In the US alone, there are millions — Times Square, shopping centers, transit hubs, everywhere.

Audiences: Behavioral audiences like luxury car buyers, fitness enthusiasts, or working parents. A typical advertisement campaign targets 1–6 audiences simultaneously.

Impressions: The estimated number of people who see a frame during a specific hour. Example: A Times Square billboard at 5 pm has more impressions than a suburban bus stop at 3 am.

DMA (Designated Market Area): Geographic regions used in advertising. Think New York metro area or Los Angeles market.

The Key Metric: Audience Index

Raw impression counts are misleading.

A Times Square frame may reach more fitness enthusiasts than a suburban bus stop — but it also reaches far more everyone else. What advertisers really care about is concentration, not absolute volume.

We measure this using an audience index:

audience_index = (frame's audience ratio) / (market's natural delivery) × 100

Where:

  • Frame’s audience ratio = audience impressions / total impressions at this frame

  • Natural delivery = audience impressions / total impressions across ALL frames in the market

Interpretation:

  • 100 → performs exactly at market average

  • 150 → over-indexes by 50% (1.5× better than average)

  • 80 → under-indexes by 20%

Why Hourly Granularity Is Non-Negotiable?

Audience composition changes dramatically over the day:

  • 8 am at a subway station: Commuters, working professionals

  • 2 pm at a mall: Stay-at-home parents, retirees

  • 8 pm at a gym: Fitness enthusiasts

Advertisers buy specific dayparts, such as weekday mornings or weekend evenings. That leads to a crucial requirement:

Audience index must be computed at the hourly level, and only then averaged across the selected hours.

You cannot calculate a single daily index without losing correctness.

The Query in Plain English

Show me all frames in the New York market that over-index (index ≥ 100) for young professionals during weekday commute hours (8–10 am and 5–7 pm, Monday through Friday), subject to dynamic frame-level filters such as location type (e.g., roadside vs mall), media owner, pricing constraints, and availability.

This query is not a batch report or an offline analysis. It sits directly in the critical path of a chat-based user experience, where advertisers explore options conversationally and expect responses in near real time.

This requires:

  1. For each frame, for each selected hour: calculate the audience index

  2. Average those hourly indices to get a single score per frame

  3. Filter to frames with average index ≥ 100

  4. The query must return within 1–2 seconds to keep the conversational flow intact, even as users refine audiences and time windows interactively.

This query cannot be answered using precomputed DMA-level indices alone. While market-level natural delivery can be aggregated, the final index must be recomputed over the user-selected subset of frames, which varies per request based on interactive filters.

Now, let’s design a schema to make this query fast.

The Scale of the Problem

At full resolution, the data looks like this:

  • 1.7 million Frames

  • We support around 300 Audiences on the platform

  • 168 Time slots (24 hours × 7 days)

That’s potentially 85 billion data points (1.7M × 300 × 168).

We don’t materialize all of them at once, but the schema must support this dimensionality efficiently.


The Intuition: Columnar Should Win

We use ClickHouse, a columnar OLAP database. The standard advice is clear:

Put different attributes in different columns so queries read only what they need.

Applied naively, that suggested a very wide table. We add 16,800 columns to capture hourly audience impressions across 168 hours.

To test this locally, we decided to ingest 80,000 frames with 100 audiences:

CREATE TABLE frames_hourly_insights (
    id String,
    dma LowCardinality(String),
    -- ... other frame attributes ...
    -- 100 audiences × 168 hours = 16,800 columns
    `audience1_mon_08` Float32,
    `audience1_mon_09` Float32,
    -- ... continues for all 168 hours per audience ...
)

The appeal was obvious:

  • Perfect column pruning

  • Querying Young professionals across during weekday commute hours (8–10 am and 5–7 pm, Monday through Friday) reads just 20 columns

  • ~99.9% of data skipped

On paper, this looked ideal. In practice, it was dead on arrival.

Why Extreme Width Breaks Down:

In ClickHouse, each column is a first-class entity:

  • It has its own data files and mark files per part

  • It participates independently in merges

  • It contributes metadata that must be tracked, planned, and opened at query time

At tens of thousands of columns, these costs compound quickly.
Even before considering query complexity, the system had to:

  • Manage and reason about thousands of column files per part

  • Open and coordinate large numbers of file handles

  • Perform background merges that touch every column, regardless of how many are queried

This overhead exists independent of the ingestion mechanism. While ClickPipe can mitigate write-time memory pressure by streaming data from object storage, it does not eliminate the operational costs of maintaining and merging extremely wide parts.

Even on a reduced dataset of 1000 frames, early queries exhibited disproportionate overhead:

Duration: 1167 ms
Memory:   52 MiB
Parts:    40

The parts selected metric revealed the problem: ClickHouse was opening 40 separate files. Despite reading very little actual data, query execution was dominated by column metadata handling rather than I/O or computation.

At this scale, the benefits of column pruning were outweighed by the cost of managing the column set itself.

Lesson:

  • Extremely wide tables impose metadata, merge, and planning overheads that eventually dominate execution time.

  • Columnar databases are optimized for scanning fewer columns, not for managing tens of thousands of them.

  • In practice, to support ~300 audiences the ~50,000 columns is well beyond the operational comfort zone of a ClickHouse table — regardless of whether data arrives via direct INSERTs or through ClickPipe.


The Array Approach: A Promising Middle Ground

We pivoted to arrays. Instead of 168 columns per audience, one column containing an array of 168 values:

CREATE TABLE frames_hourly_insights_array (
    id String,
    dma LowCardinality(String),
    -- ... other frame attributes ...
    -- 168-element array per audience (7 days × 24 hours)
    -- Index = day_index × 24 + hour + 1 (Mon=0, Tue=1, ..., Sun=6)
    all_adults_impressions Array(Float32),
    `audience1_impressions` Array(Float32),
    `audience2_impressions` Array(Float32),
    -- ... 100 audience columns total ...
)

Now we had:

  • ~100 audience columns instead of 16,800

  • Clean writes

  • Much lower metadata overhead

Initial tests with 80,000 frames and 100 audiences on local were promising:

Duration: 137 ms
Memory:   21.7 MiB
Parts:    2

For a moment, this felt like the solution.

The Query Pattern Trap:

Those early tests used a simplified calculation:

  • Sum impressions across selected hours

  • Compute one index from the totals

But that wasn’t the real requirement.
The correct computation is: For each selected hour h:

  1. frame_ratio[h] = audience[h] / total[h]

  2. natural_delivery[h] = SUM(audience[h]) / SUM(total[h]) (per DMA)

  3. hourly_index[h] = frame_ratio[h] / natural_delivery[h] × 100

Final score = AVG(hourly_index[h])

This requires joining at an hourly granularity.

Query:

WITH target_hours AS (
      -- Array index = day_index * 24 + hour + 1 (Mon=0, Tue=1, ..., Fri=4)
      SELECT arrayJoin([
          9, 10, 18, 19,      -- Monday: 8am, 9am, 5pm, 6pm
          33, 34, 42, 43,     -- Tuesday
          57, 58, 66, 67,     -- Wednesday
          81, 82, 90, 91,     -- Thursday
          105, 106, 114, 115  -- Friday
      ]) AS hour_idx
  ),
  exploded AS (
      SELECT
          id,
          dma,
          hour_idx,
          all_adults_impressions[hour_idx] AS all_adults_imp,
          audience_impressions[hour_idx] AS audience_imp
      FROM frames_hourly_insights_array
      CROSS JOIN target_hours
  ),
  hourly_nat AS (
      SELECT
          dma,
          hour_idx,
          sum(audience_imp) / sum(all_adults_imp) AS natural_delivery
      FROM exploded
      GROUP BY dma, hour_idx
  ),
  hourly_index AS (
      SELECT
          e.id,
          e.dma,
          (e.audience_imp / e.all_adults_imp) / n.natural_delivery * 100 AS hour_index
      FROM exploded e
      JOIN hourly_nat n
          ON e.dma = n.dma
          AND e.hour_idx = n.hour_idx
  )
  SELECT
      id,
      dma,
      round(avg(hour_index), 2) AS audience_index
  FROM hourly_index
  GROUP BY id, dma
  HAVING audience_index >= 100
  ORDER BY audience_index DESC

With arrays, that means one thing: explosion.

To join per-hour natural delivery, arrays must be converted into rows, which turns:

80,000 frames * 1 row to 80,000 frames * 24 hours = 1,920,000 rows

Every query. This intermediate explosion:

  • Consumes memory

  • Adds CPU overhead

  • Defeats the original elegance of the array schema

Lesson:

  • Arrays are cheap to store. Exploding arrays repeatedly is not.

  • The array-based model would have been a good fit if the requirement were to aggregate impressions across hours first, rather than compute hour-level indices before aggregation.


The “Wrong” Solution: Pre-Exploded Rows

Against our instincts, we tried a schema that already lived at the query’s natural grain:

CREATE TABLE frames_hourly_insights_row_based (
    id String,
    dma LowCardinality(String),
    -- ... other frame attributes (repeated 168× per frame!) ...
    day_of_week LowCardinality(String),  -- 'mon', 'tue', etc.
    hour_of_day UInt8,                    -- 0-23
    all_adults_impressions Float32,       -- Single value for this hour
    `audience1_impressions` Float32,
    `audience2_impressions` Float32,
    -- ... 100 audience columns ...
)

The Pre-Exploded table is ordered by (dma, day_of_week, hour_of_day, id), ensuring data locality for both hourly natural delivery aggregation and frame-level joins.

However, with this approach, each frame becomes 168 rows — one per hour.

Yes:

  • Frame attributes are repeated

  • Row count explodes

  • Storage grows dramatically

But look at the shape:

dma | day | hour | id | audience impressions

This is exactly what the query wants.

  • No array explosion.

  • No CROSS JOIN.

  • Just filters and joins on native columns.

Query: Query for determining frames for weekdays between 8–10 am and 5–7 pm becomes natural:

WITH hourly_nat AS (
      SELECT
          dma,
          day_of_week,
          hour_of_day,
          sum(audience_impressions) / sum(all_adults_impressions) AS natural_delivery
      FROM frames_hourly_insights_row_based
      WHERE day_of_week IN ('mon', 'tue', 'wed', 'thu', 'fri')
        AND hour_of_day IN (8, 9, 17, 18)
      GROUP BY dma, day_of_week, hour_of_day
  ),
  hourly_index AS (
      SELECT
          r.id,
          r.dma,
          (r.audience_impressions / r.all_adults_impressions)
              / n.natural_delivery * 100 AS hour_index
      FROM frames_hourly_insights_row_based r
      JOIN hourly_nat n
          ON r.dma = n.dma
          AND r.day_of_week = n.day_of_week
          AND r.hour_of_day = n.hour_of_day
      WHERE r.day_of_week IN ('mon', 'tue', 'wed', 'thu', 'fri')
        AND r.hour_of_day IN (8, 9, 17, 18)
  )
  SELECT
      id,
      dma,
      round(avg(hour_index), 2) AS audience_index
  FROM hourly_index
  GROUP BY id, dma
  HAVING audience_index >= 100
  ORDER BY audience_index DESC

Let’s break down what this query does:

  • hourly_nat CTE: For each DMA + day + hour combination, calculate the market’s natural delivery (what percentage of all impressions are from our target audience?)

  • hourly_index CTE: For each frame at each hour, compare the frame’s audience concentration to the market average. A frame where 15% of viewers are young professionals in a market where only 10% are — that’s an index of 150.

  • Final aggregation: Average each frame’s hourly indices to get one score per frame.

No CROSS JOIN. No array explosion. Just a natural join on existing dimensions.


Results: Less Data Magic, More Performance

At 80,000 frames with hourly-level index calculation:

| Metric         | Array Table | Pre-Exploded Table| Winner                  |
|----------------|-------------|-------------------|-------------------------|
| Query Duration | 1,391 ms    | 718 ms            | Pre-Exploded 1.9x faster|
| Query Memory   | 95.73 MiB   | 69.11 MiB         | Pre-Exploded 1.4x less  |
| Bytes Read     | 211.54 MiB  | 614.32 MiB        | Array (2.9× less)       |
| Storage Rows   | 160,002     | 17,694,010        | Array (110× less)       |

The Pre-Exploded table reads more data but processes it faster because:

  1. No intermediate explosion: Data is already at the right granularity

  2. Efficient joins: Day and hour are native columns, not computed indices

  3. Lower memory: No need to materialize exploded array data


What This Is Really About

This wasn’t a win for Pre-Exploded storage over columnar storage.

It was a win for matching schema grain to query grain.

All three designs were columnar in the physical sense. The difference was where data reshaping happened:

  • The wide-table design pushed complexity into the schema width

  • The array-based design pushed complexity into runtime reshaping

  • The pre-exploded design pushed complexity into storage

Only one of those aligned with how the query actually executed.

  • Arrays work beautifully when you aggregate within them.

  • Arrays become expensive when you must repeatedly explode and rejoin them.

  • Pre-exploded data trades storage efficiency for predictable execution, stable memory usage, and simpler queries. Beyond median latency, this design exhibited more predictable memory usage, making it safer under concurrent query load.

The deeper lesson is this:

If your query needs to repeatedly reshape data at runtime, that reshaping likely belongs in the schema.

This principle isn’t ClickHouse-specific. It applies to any analytical system where correctness, latency, and predictability matter more than theoretical elegance.

Schema design is ultimately about choosing where complexity lives.
In this case, putting that complexity into storage turned out to be the least expensive option.

Key Takeaways

  • Schema design is query-driven
    Don’t design for elegance — design for how the data is actually consumed.

  • Avoid runtime reshaping in hot paths
    Array explosion, CROSS JOINs, and dynamic joins are often more expensive than extra rows.

  • Columnar systems still have practical limits
    Thousands of columns introduce memory and metadata overhead.

  • Benchmark real queries, not toy ones
    Our initial tests favored arrays — until we ran the real calculation.

  • Storage is cheaper than correctness bugs and slow queries
    Especially when user-facing performance is involved. In practice, the storage overhead is mitigated by efficient compression: low-cardinality dimensions (day_of_week, dma) and compact numeric types compress extremely well in ClickHouse.

Closing Thought

Sometimes a theoretical “wrong” approach is right. The goal isn’t schema purity, it’s query performance. And performance comes from understanding your access patterns, not from following architectural dogma.

Build what works. Measure. Iterate.


Review Credits

Thanks, Amaan, Puneet, for reviewing this post and providing valuable feedback.


Tested on ClickHouse 24.1 on a local Docker container with 8GB RAM. Results may vary based on hardware, data distribution, and ClickHouse version.


Tags: ClickHouse, Database Design, Schema Design, Columnar Databases, Performance Optimization, Data Engineering