Segments: How Sequential Filtering Makes Journey Analytics Actually Useful

This is part of a series where we're building a DIY journey analytics platform from scratch. If you're just joining, check out the earlier posts to catch up!

In the last post, we covered how the semantic layer turns messy event data into clean, trustworthy metrics and dimensions. That's a huge step forward, but it also highlights a gap that I think is worth digging into.

Think about a retail brand with both an online store and physical locations. Looking at the web purchases alone might tell you the site has a 2% conversion rate, which might cause you to reasonably question the digital marketing spend when it looks like 98% of visitors leave without buying anything.

The problem is that a significant chunk of those "non-converting" visitors might actually be browsing products online and then walking into a store the next day to purchase. They look like bounces on the web, but they're really cross-channel conversions that traditional tools just can't connect. To find these people, you'd need to join online browsing sessions to in-store purchase events across different data sources, match them to the same person, and verify that the online browsing happened before the in-store purchase - and that's a really hard thing to express in a standard WHERE clause.

This is where segments become extremely important. Segments in a journey analytics context aren't just filters - they're cross-event, cross-session, cross-channel definitions that understand the order things happened in, and they're honestly one of the features I'm most excited about in what we've built so far.

Demo: Building segments with sequences, scoping, and the AI segment builder

What Makes Journey Analytics Segments Different?

If you've used filters in Tableau, Looker, or even basic SQL, you're used to the idea of narrowing down data. "Show me orders from California." "Show me revenue from Q4." These are row-level filters - each row either matches or it doesn't.

Journey analytics segments are different in three important ways:

They evaluate across rows. "Users who had at least 3 page views" requires counting events per person, not checking a single row.
They care about order. "Users who viewed a product and then purchased" is fundamentally different from "users who purchased and then viewed a product."
They operate at different scopes. "Sessions where a purchase happened" is a different question than "People who ever purchased" - even though the raw filter condition looks the same.

These three capabilities - cross-row evaluation, sequence awareness, and scope control - are what make journey analytics segments powerful. And they're what Claude and I set out to build.

Scope: The Foundation of Segment Evaluation

Every segment in Trevorwithdata has a scope that determines how it's evaluated: event, session, or person.

This might sound like a small detail, but it's actually the most important design decision in the whole segment engine. Scope determines the "lens" through which your segment condition is applied.

Event Scope

Event scope is the simplest case - each event is evaluated independently, with no aggregation, no window functions, and no cross-row logic.

-- Segment: "Mobile Events"
-- Scope: event
{device_type} = 'mobile'

This evaluates to true or false for each individual event. It's fast and straightforward - basically a WHERE clause.

Session Scope

Session scope evaluates the condition across all events within each session, then marks every event in matching sessions.

-- Segment: "Sessions with a Purchase"
-- Scope: session
ANY({event_type} = 'purchase')

That ANY() keyword is one of several we built to make segment expressions readable. It means "at least one event in this group matches the condition." Under the hood, our query engine expands this to a window function:

(countIf(event_type = 'purchase')
  OVER (PARTITION BY person_id, _session_id)) > 0
  as _seg_sessions_with_purchase

The engine automatically adds the OVER (PARTITION BY ...) clause based on the scope you chose. You just write the logic, and the system handles the windowing.

Person Scope

Person scope works the same way, but evaluates across all of a person's events across every session in the reporting + lookback window.

-- Segment: "High Value Customers"
-- Scope: person
SUM({revenue}) > 500

This finds people whose total revenue exceeds $500 across all their sessions in the reporting + lookback window, then includes every event for those people. The engine wraps this with OVER (PARTITION BY person_id) automatically.

Why Scope Matters So Much

Here's a concrete example of why this matters. Say you want to analyze the marketing channels that drive purchases. You could define the same basic condition - ANY({event_type} = 'purchase') - at two different scopes:

Session scope: Shows you only the sessions where a purchase happened. You'll see the marketing channel that directly preceded the purchase.
Person scope: Shows you all sessions for people who have ever purchased. You'll see every marketing channel they've ever interacted with - including the ones from weeks ago that might have started the journey.

It's the same filter expression, but the analytical questions are completely different - and that's the power of scope.

Keywords: Making Complex Logic Readable

One of the things I'm happiest with is the keyword system. ClickHouse aggregate functions like countIf(), uniqExact(), and argMax() are powerful, but they're not exactly something you want to be writing by hand every time you build a segment. So we created two sets of keyword wrappers that make segment expressions way more intuitive.

Before we get into the keywords though, one important detail: all column references in segment expressions use {brace} syntax - like {event_type} or {revenue} - which maps to dimensions and metrics defined in our semantic layer. This means segment expressions build on top of your existing business definitions rather than referencing raw column names. If a dimension handles complex object array joins or lookup logic under the hood, the segment author doesn't need to worry about any of that.

Convenience Keywords

These simplify common comparisons and work at any scope. You can write the raw SQL equivalents, but the keywords are easier to read and less error-prone:

Keyword	What It Does	Expands To
`CONTAINS(field, 'value')`	String contains value	`LIKE '%value%'`
`NOT_CONTAINS(field, 'value')`	String doesn't contain value	`NOT LIKE '%value%'`
`STARTS_WITH(field, 'value')`	String starts with value	`LIKE 'value%'`
`ENDS_WITH(field, 'value')`	String ends with value	`LIKE '%value'`
`IS_EMPTY(field)`	Null or empty string	`IS NULL OR = ''`
`IS_NOT_EMPTY(field)`	Has a value	`IS NOT NULL AND != ''`
`BETWEEN(field, low, high)`	Value in range (inclusive)	`BETWEEN low AND high`
`IN_LIST(field, val1, val2, ...)`	Matches any value in list	`IN (val1, val2, ...)`
`MATCHES(field, 'pattern')`	Regex match (re2 syntax)	`match(field, 'pattern')`
`NOT_MATCHES(field, 'pattern')`	Negated regex match	`NOT match(field, 'pattern')`
`CONTAINS_ALL(field, v1, v2, ...)`	All substrings present	`LIKE '%v1%' AND LIKE '%v2%'`
`CONTAINS_ANY(field, v1, v2, ...)`	Any substring present	`LIKE '%v1%' OR LIKE '%v2%'`

A few examples of these in action:

-- Find events where the page URL contains "/pricing"
CONTAINS({page_url}, '/pricing')
 
-- Users from North America
IN_LIST({country}, 'US', 'CA', 'MX')
 
-- Events with a coupon code applied
IS_NOT_EMPTY({coupon_code})
 
-- Product pages matching a URL pattern
MATCHES({page_url}, '^/products/[0-9]+')
 
-- Referrals from any major search engine
CONTAINS_ANY({referrer}, 'google', 'bing', 'yahoo', 'duckduckgo')

These are small quality-of-life improvements, but they really add up. Nobody wants to write {coupon_code} IS NOT NULL AND {coupon_code} != '' when IS_NOT_EMPTY({coupon_code}) says the same thing more clearly.

Aggregate Keywords

For cross-row logic - things like "3+ purchases", "never did X", or "total revenue over $500" - we have aggregate keywords. These are best used with session or person scope since they evaluate across multiple events:

Keyword	What It Does	Expands To
`ANY(condition)`	At least one event matches	`countIf(condition) > 0`
`EVERY(condition)`	All events match	`countIf(NOT(condition)) = 0`
`NONE(condition)`	No events match	`countIf(condition) = 0`
`COUNT(condition)`	Number of matching events	`countIf(condition)`
`UNIQUE(expr)`	Count of distinct values	`uniqExact(expr)`
`SUM(expr)`	Sum of values	`sum(expr)`
`AVG(expr)`	Average across rows	`avg(expr)`
`MIN(expr)`	Minimum value	`min(expr)`
`MAX(expr)`	Maximum value	`max(expr)`
`FIRST(expr)`	Value at earliest event	`argMax(expr, -event_timestamp)`
`LAST(expr)`	Value at latest event	`argMax(expr, event_timestamp)`

Both keyword types compose naturally together. A segment like "people who visited from more than 3 unique marketing channels, used a coupon, and spent over $100" becomes:

-- Scope: person
UNIQUE({marketing_channel}) > 3 AND ANY(IS_NOT_EMPTY({coupon_code})) AND SUM({revenue}) > 100

Sequential Segments: The Game Changer

This is the feature I'm most excited about. Sequential segments let you define ordered patterns of events using a THEN syntax.

-- "Users who viewed a product, then added to cart, then purchased"
-- Scope: person
{event_type} = 'product_view' THEN {event_type} = 'add_to_cart' THEN {event_type} = 'purchase'

This isn't just checking that all three event types exist for a person - it's verifying they happened in this specific order. Under the hood, our engine compiles this to ClickHouse's sequenceMatch() function, which is purpose-built for pattern matching in ordered event streams.

Adding Time Constraints with WITHIN

Sequences become even more powerful when you add time constraints:

-- "Browsed, then purchased within 30 minutes"
-- Scope: session
{event_type} = 'product_view' THEN WITHIN 30m {event_type} = 'purchase'

The WITHIN modifier constrains how much time can elapse between sequential steps. You can use s (seconds), m (minutes), h (hours), or d (days). This is incredibly useful for conversion funnel analysis - the difference between "eventually purchased" and "purchased within 30 minutes of browsing" tells you a lot about buying intent.

WITHIN SESSION: Cross-Session vs. Same-Session Sequences

Sometimes you need the entire sequence to happen within a single session. The WITHIN SESSION modifier handles this:

-- "Browsed the website then called the call center in the same session"
-- Scope: person
{event_type} = 'page_view' THEN WITHIN SESSION IS_NOT_EMPTY({call_reason})

This is a great example of cross-channel journey analytics in action. The sequence evaluates at person scope (across all sessions) but requires both events to happen within a single session. You're finding people who browsed and then called - maybe they had questions about something they saw on the site. That's a pattern that would be nearly impossible to detect without unified cross-channel data and sequence-aware segments.

Combining Sequences with Aggregates

You can mix sequential conditions with aggregate conditions using parentheses:

-- "Users who browsed then purchased, AND spent over $200 total"
-- Scope: person
({event_type} = 'product_view' THEN {event_type} = 'purchase') AND SUM({revenue}) > 200

The sequential part goes in parentheses, and the aggregate part combines with AND. This lets you build surprisingly sophisticated audience definitions in a single expression.

Temporal Modifiers: Windowing Around Events

Beyond sequences, we support temporal modifiers that let you restrict your analysis to events relative to a specific anchor event. This is something I haven't seen in most analytics tools, and it's incredibly powerful for lifecycle analysis.

The syntax uses AFTER, FROM, BEFORE, or UNTIL with a FIRST or LAST anchor:

-- "Page views that happened after the user's first purchase"
-- Scope: person
AFTER FIRST {event_type} = 'purchase': {event_type} = 'page_view'

The colon separates the temporal boundary from the main expression. Here's what each modifier does:

Modifier	Includes Boundary Event?	Meaning
`AFTER`	No	Strictly after the anchor
`FROM`	Yes	At or after the anchor
`BEFORE`	No	Strictly before the anchor
`UNTIL`	Yes	Up to and including the anchor

The main expression after the colon is actually optional - if you leave it off, the system just includes all events in that window. This is useful when you want to restrict your analysis to a specific lifecycle phase without any additional filtering:

-- "Everything from the user's first signup onward"
-- Scope: person
AFTER FIRST {event_type} = 'signup'
 
-- "All events after first purchase but before churn"
-- Scope: person
AFTER FIRST {event_type} = 'purchase': BEFORE LAST {event_type} = 'churn'

You can also combine temporal modifiers with sequences or aggregate keywords:

-- "After first signup, did they view a product then purchase?"
-- Scope: person
AFTER FIRST {event_type} = 'signup': {event_type} = 'product_view' THEN {event_type} = 'purchase'
 
-- "After first purchase, did they return anything?"
-- Scope: person
FROM FIRST {event_type} = 'purchase': ANY({event_type} = 'return')

This kind of lifecycle-aware segmentation is really hard to express in traditional BI tools. In SQL, you'd be writing multiple CTEs to find boundary timestamps, then joining back to the event stream, then applying your filter logic. Here it's one line.

How Segments Integrate with the Query Engine

If you read the query-time processing post, you know our SQL builder uses a layered architecture. Segments slot into Layer 4 of that stack:

Layer 1: Raw event table
Layer 2: Session window functions (session boundaries, session IDs)
Layer 3: Dimension computation + metric aggregation
Layer 3.5: Temporal boundary timestamps (if temporal segments exist)
Layer 4: Segment evaluation ← segments get evaluated here
Layer 5: Deduplication
Layer 6: Final aggregation + GROUP BY

For standard segments, the engine adds inline window functions at Layer 4. For sequential segments, it uses CTEs (Common Table Expressions) that compute sequenceMatch() results and join them back to the main query.

The key insight is that segments produce a boolean column per event row. Every event gets a _seg_your_segment_id column that's either 1 or 0. Then when a report applies that segment, it simply filters on _seg_your_segment_id = 1 before aggregation. This means segments compose cleanly - you can apply multiple segments and they AND together naturally.

// In the report configuration
metrics: [
  {
    metric_id: "revenue",
    segment_ids: ["high_value_customers", "mobile_sessions"]
  }
]
// Revenue is only counted for events where BOTH segments are true

The AI Segment Builder

Here's where things get really fun. We built an AI-powered segment builder that takes plain English descriptions and generates validated segment expressions.

When you type something like "people who came from paid search and then purchased within the same session, but only if they spent more than $50", the AI:

Inspects your dimensions and metrics - Calls a tool to see what dimensions and metrics are actually available in your data group
Checks for duplicates - If you already have a segment that does what you're asking for, it'll point you to the existing one rather than creating a redundant copy
Pushes back if it can't help - If your data doesn't have the dimensions or metrics needed to fulfill the request, it tells you that instead of guessing and generating something broken
Samples data if needed - If it's not sure about field values (like what marketing channels exist), it looks at actual data to verify
Builds the expression - Generates the segment expression, picks the right scope, and decides whether to use sequential or standard syntax
Validates against real data - Runs the expression against ClickHouse to make sure it actually works before returning it

That last step is critical. The AI doesn't just generate SQL and hope for the best - it has a mandatory validation step where it tests the expression against your actual data. If the validation fails (say, a field name was wrong), it sees the error message and tries again.

The whole thing streams progress updates to the UI in real time, so you can see what the AI is doing: "Inspecting schemas...", "Sampling data...", "Building segment...", "Validating..."

And because the AI knows about all the keywords - convenience keywords like CONTAINS and IN_LIST, aggregate keywords like ANY and SUM, plus the THEN/WITHIN syntax - it generates expressions in the same human-readable format you'd write by hand. No raw ClickHouse functions, no OVER (PARTITION BY ...) clauses - just clean, readable segment logic.

Segment Validation and Coverage Preview

Before you save a segment, you can preview its coverage - how much of your data it actually matches. The validation endpoint runs your expression against real data and returns three coverage metrics:

People coverage: What percentage of identified users match this segment?
Session coverage: What percentage of sessions match?
Event coverage: What percentage of events match?

This is genuinely helpful for catching mistakes. If your "High Value Customers" segment matches 98% of all users, something is probably wrong with your threshold. If your sequential segment matches 0 people, maybe the event types you referenced don't exist in this data group. The preview lets you iterate quickly before committing to a definition.

Applying Segments to Reports

Once segments are defined, applying them to reports is straightforward - just drag a segment from the available list into the segment zone above your table or onto a metric. The segment appears as a removable pill, and the table recalculates immediately.

Multiple segments AND together, so you can layer them: "High Value Customers" + "Mobile Sessions" gives you high-value customers but only their mobile sessions. Any segment can also be toggled to exclude mode, which inverts the logic - a "Mobile Users" segment with exclude turned on becomes "Non-Mobile Users." This is especially handy for table-level filtering when you want to remove a specific audience from a report without creating a separate segment.

And because segments are evaluated as boolean columns at query time, adding or removing them doesn't require restructuring your data or rebuilding any tables.

This is a fundamental advantage of the query-time architecture. Segments aren't materialized into separate tables or pre-computed user lists. They're evaluated fresh on every query, which means they always reflect the latest data and the current reporting window. Change your date range, and the segment coverage adjusts automatically.

Why This Is Hard to Do in Traditional BI

Let me be clear: you can do sequence analysis in many tools. SQL-based BI tools can use window functions, LAG/LEAD, and self-joins to detect patterns. Some have dedicated sequence or funnel features.

What's different here is the combination of three things:

Unified syntax. Event-level filters, cross-row aggregates, sequential patterns, and temporal windows all live in one expression language. You don't need a different tool or feature for each type of segment.
Scope as a first-class concept. Changing a segment from session scope to person scope is a single dropdown change, not a query rewrite. The engine handles the window function differences automatically.
Composability. Segments compose with each other (AND multiple segments together), with metrics (apply segments to specific metrics), and with the rest of the query engine (dimensions, date ranges, deduplication). Everything just works together.

In my experience, most teams that try to do this kind of analysis in traditional BI end up with a sprawling collection of CTEs, temp tables, and pre-computed user lists that are fragile and hard to maintain. The power of having a purpose-built segment engine is that all of that complexity is handled once, correctly, and every segment benefits from it.

What's Next

With segments in place, our next step will be creating the last type of analysis component you need to start slicing and dicing and visualizing data - calculated metrics. Stay tuned!

In the meantime, if you want to try building segments yourself, hit the signup for early access button at trevorwithdata.com or just below this post. And if you're working on similar problems in your own analytics stack, I'd love to hear about it - connect with me on LinkedIn.

Trevor Paulsen is a data product professional at UKG and a former product leader of Adobe's Customer Journey Analytics. All views expressed are his own.