From measurement plan to causal proof
Align on what to measure, diagnose where credit and causality break down, then optimize spend.
START HERE
You have tracking but no shared plan. Map business outcomes to GA4 events and align your team on what to measure.
📊PROBLEM
Your marketing team is fighting over credit. See how last-touch vs multi-touch changes everything.
🔗PROBLEM
See how first-order Markov with removal calculates true credit by simulating what happens when you remove each channel.
🎯PROBLEM
MTA shows 1,600 conversions. Past tests suggest 50% incrementality. Calculate true impact.
📉PROBLEM
When does more spend stop working? How long do effects last? Explore diminishing returns.
📈PROBLEM
Is your growth real or just correlation? Interactive demonstration of confounded vs randomized data.
📝PROBLEM
Post-purchase surveys cut through walled garden opacity and memory gap.
📐PROBLEM
Backwards and forwards views of incremental lift.
💰OPTIMIZE
Set your efficiency goal, add your channels, and instantly see which to scale, hold, reduce, or cut.
A simple change in the attribution model can dramatically shift budget decisions. Explore how different models assign value across a customer journey.
Platforms like Facebook, Google, and TikTok can claim the same sale. Watch how their claims can add up to more than actual revenue. This is why calibration exists.
Prefer a step-by-step exercise? Open our simple walkthrough ↗
Use past holdout data to calibrate MTA and reveal true campaign impact.
| Last Quarter's Holdout Test (Ground Truth) | |
|---|---|
| Actual Conversions (from Test) | 600 |
| MTA-Attributed Conversions (during Test) | 800 |
| This Quarter's Attribution (MTA) | |
| Campaign A | 1,800 |
| Campaign B | 1,200 |
| Your Calibration Task | |
| Calibrated Campaign A | |
| Calibrated Campaign B | |
| Calibrated Total | |
When sales and ad spend both rise, it’s easy to assume causation. But is the growth real, or just correlation? This simulation reveals the expensive truth of confusing the two.
This view suggests a 17x return, justifying aggressive spending. However, it fails to account for customers who would have converted anyway.
By isolating a control group, the A/B test reveals the true return is 2x. This is a profitable, but vastly different, business case that prevents millions in misallocated budget.
Find where more spend stops working and how long your marketing efforts last.
As spend increases, each additional dollar brings back less revenue. The aim is to stop before your Marginal ROAS falls too low.
A single pulse of spend decays over time. The "Half-Life" tells you how many weeks it takes for impact to fall by 50%.
The more you want to learn, the more data you need.
MMM works best when you have time on your side, movement in budgets, and a focused set of questions. The tiles below outline the key signals that indicate readiness for a successful MMM project.
SIGNAL
≥ 2 years of weekly data (or 4-5 years monthly) allows the model to see seasonality and trends.
SIGNAL
Flat budgets hide impact. Sensible increases and decreases are required for the model to learn.
SIGNAL
Every channel, control, and seasonal factor costs data. Start with a focused scope.
SIGNAL
Use a consistent metric like revenue or conversions. Noisy or sparse data may need aggregation.
SIGNAL
While there's no magic number, MMM becomes more cost-effective as media spend grows ($1M+/yr).
SIGNAL
Markets change. Plan to retrain your model on a cadence that matches your planning cycles.
A short plan beats a hundred random events. Watch the quick explainer, then sketch your own plan with the interactive builder.
You have tracking. But no shared measurement plan.
No one agrees on goals, events, or what "good" looks like. You want a short, written plan that maps real business outcomes to GA4 and the rest of your stack.
See the difference between scattered tracking and intentional measurement. One creates confusion, the other creates clarity.
Answer five quick questions. We'll generate a mini measurement plan and a prioritized implementation table you can export.
| KPI | SUGGESTED GA4 EVENT NAME | OTHER TOOLS |
|---|
| KPI | EVENT NAME | PRIORITY | NOTES |
|---|
Set your efficiency goal, add your channels, and instantly see which to scale, hold, reduce, or cut.
CPA mode: enter spend and conversions. Lower CPA = better performance.
| Channel | Category | Spend | Conv. | CPA |
|---|
No channels yet. Click "+ Add" to get started.
A first-order Markov chain with removal effects—the math behind advanced attribution.
A Markov chain is a system that moves between states, where the probability of the next state depends only on the current state—not on how you got there. This is the Markov property (memorylessness).
In attribution, the states are marketing channels (Display, Search, Email, etc.) plus two absorbing states: Conversion and No Conversion. Once you enter an absorbing state, you stay there forever—the journey is over.
Removal effects are the key insight. For each channel, we ask: “What happens to the overall conversion rate if we remove this channel entirely?”
If removing Search drops conversions from 70% to 30%, Search has a large removal effect. If removing Display only drops conversions from 70% to 65%, Display is less critical. We normalize these effects so they sum to 100%, giving each channel its attribution share.
Unlike last-click or first-click models, Markov chain attribution captures interdependencies between channels. A channel that rarely gets the last click but frequently assists conversions will still receive appropriate credit.
Enter customer journeys using the format: Channel → Channel → Conversion ($value) or No Conversion
Channels: D (Display), F (Facebook), S (Search), E (Email)
This matrix shows the probability of moving from one state to another based on your journey data.
Enter your Test vs Control results per segment to measure incremental lift Incremental LiftThe causal effect of your campaign. Test response minus counterfactual (what would have happened without treatment)..
| Segment | Test Group | Control Group | ||||
|---|---|---|---|---|---|---|
| Conv | Exp | Rev | Conv | Exp | Rev | |
Use post-purchase surveys to triangulate walled garden attribution and build statistically rigorous incrementality estimates.
Post-purchase surveys provide a ground-truth layer for marketing measurement by asking customers directly about their purchase journey, enabling cross-platform comparison and walled garden calibration.
Traditional platform attribution (Meta, TikTok, Google) has inherent limitations:
Understand why customers purchased and which channels they actually recall
Much cheaper than MTA platforms or geo-lift studies
Compare performance across all paid and organic channels
Create multipliers to normalize platform-reported metrics
⚠️ Important: Surveys have response bias, attrition issues, and memory limitations. When properly designed and analyzed with rigorous survey methodology, they provide invaluable triangulation data.
Ask customers: "How did you hear about us?"
What matters is why someone purchased and what they remember. A customer's first touchpoint (platform-attributed) could be something they don't recall at all.
Adjust the inputs below to see how implied revenue and ROAS are calculated:
| Channel | Spend | Survey Revenue | Implied Revenue | Implied ROAS |
|---|
The multiplier translates platform-reported ROAS to comparable survey-implied ROAS:
Enter platform-reported revenue to calculate normalization multipliers:
Platform revenue fields are automatically added for each channel with spend. Add channels in the Methodology tab.
| Channel | Survey ROAS | Platform ROAS | % Multiplier | Calibrated ROAS |
|---|
Use calibrated ROAS for budget decisions. This accounts for platform attribution inflation and provides a more accurate cross-channel comparison.
Track completion rates, drop-off points, and timing patterns
Analyze which channels customers remember vs. actual touchpoints
Understand multi-touch journeys from survey responses
Break down responses by age, gender, location, etc.
Create binary variables for extreme responses and sum across items to identify respondents with systematic extreme answering patterns.
Compute proportion of middle-option selections per respondent and flag those >2 SD above mean.
Identify respondents who systematically agree regardless of content.
For conjoint/vignette designs in PPS:
Total attrition rate, stage-specific attrition, correlation with demographics
Page load times, drop-off probability by page, question-type friction
Weighting methods, model-based approaches, imputation
Raw vs. corrected results, robustness checks, benchmark comparisons
Define cells (age × gender × education) and adjust weights to match population totals.
Estimate probability of responding given covariates, then weight by inverse probability.
Iteratively match marginal distributions when joint distributions are sparse.
Random or deterministic donor selection from observed data
Predict missing values using observed covariates
Add random residuals to preserve variance
⚠️ Critical: All imputations must be conditional, multivariate, and stochastic to preserve data structure.
Remove bot responses, filter failed attention checks, handle partial completes
Typical range: 15-30%
Below 15%: Investigate design issues
Above 30%: Excellent engagement
Computed from your channel data in the Methodology and Calibration tabs. Adjust inputs there to update recommendations.
| Channel | Budget | Survey ROAS | Platform ROAS | Recommendation |
|---|
SurveyMonkey, Typeform, Google Forms, Qualtrics
Excel, R, Python (pandas, statsmodels), SPSS
Zapier, Shopify apps, custom API integrations
A technical dive into calculating incremental impact, defining customer segments, and visualizing model performance with Propensity and Qini charts.
Uplift modeling requires a fundamental shift in how we prepare data. We cannot simply train on "conversions." We must train on the difference in conversion caused by treatment.
You cannot build an uplift model without a control group. Run a randomized experiment (A/B test) where one group receives the treatment (email, coupon) and one is held out (control). This removes bias.
Since we can't observe the counterfactual (we don't know what a treated user would have done if untreated), we use proxy methods:
The model outputs an Uplift Score for every user. This score represents the estimated difference in probability:
Based on the Uplift Score, we categorize users into four distinct quadrants. This dictates our strategy.
Negative Uplift. They are likely to buy if left alone, but the treatment annoys them and prevents the sale.
Action: Do Not Disturb.Zero Uplift. High probability to buy regardless of treatment. Marketing spend here is wasted budget.
Action: Ignore.Zero Uplift. They won't buy, regardless of whether you treat them or not.
Action: Ignore.Positive Uplift. They only buy if treated. This is the only group that generates incremental ROI.
Action: Target Aggressively.We plot customers based on their two probabilities: Probability to Buy if Treated (X-axis) vs. Probability to Buy if Control (Y-axis).
How to read this:
Standard accuracy metrics (ROC-AUC) don't work for uplift because we never observe the true ground truth for an individual. Instead, we use the Qini Curve.
The Qini Coefficient (AUUC):
Evaluating Causal Inference models. How to check for common support and covariate balance using Box Plots and Love Plots.
In observational data, Treated users usually look different from Control users. We calculate a Propensity Score P(T=1|X) to estimate the probability of receiving treatment based on features.
A logistic regression predicts the likelihood of being treated. A Score = 0.85 means an 85% chance this user would have been treated.
We find Control users with the same score as Treated users. If a Control user has a score of 0.85, they are a perfect "counterfactual" twin.
Before calculating a causal effect, we must verify Common Support. Do we have Control units that look like our Treated units across the score range?
Named after Thomas Love, this plot displays the Standardized Mean Difference (SMD) for every covariate. We want the dots to be inside the central threshold lines.
Goal: Shrink the difference between groups towards zero.
The line connects the Unadjusted state to the Adjusted state.
Ideally, all teal dots should fall within the dashed lines (SMD < 0.1).
One score fixes the past (bias). The other predicts the future (impact).
You have historical data, but you didn't run a randomized experiment. Your "Treated" users are fundamentally different from your "Control" users (e.g., they were wealthier, more active, or older).
If you simply compare their averages, you are comparing apples to oranges. You have Selection Bias.
To simulate the randomized experiment you wish you had run. By calculating a Propensity Score, we re-weight the data to make the Control group look identical to the Treated group, allowing for a fair "counterfactual" comparison.
Explore Propensity Scores →Standard models only predict who will buy. They fail to distinguish between a user who buys because of your ad, and a user who would have bought anyway ("Sure Things").
Targeting "Sure Things" wastes budget. Targeting "Sleeping Dogs" (who react negatively to ads) actively destroys value.
To predict incrementality. We isolate the "Persuadables" — the specific segment of people who only convert if and only if they are treated. Stop wasting money on people who were going to buy anyway.
Explore Uplift Modeling →MMM uses regression to decompose total sales into the contribution of each marketing channel, baseline demand, and external factors—so you can see what actually drove results.
Marketing Mix Modeling (MMM) is a top-down, regression-based approach that estimates how much each marketing channel, along with external factors like seasonality and promotions, contributes to an outcome such as sales or sign-ups. It uses aggregated, historical data—no user-level tracking required—making it privacy-safe and resilient to signal loss.
How to Read a Waterfall Chart
The grey bar is your baseline—sales that happen without any marketing. This is often 40–70% of total revenue.
Each colored bar shows the incremental revenue that channel drove above what would have happened anyway.
Bars stack left to right. The running total grows with each channel.
The final outline bar is your total revenue—baseline plus all marketing contributions.
Key insight: If one channel’s bar is tiny despite heavy spend, that channel may have low efficiency. If a channel’s bar is large relative to its spend, it’s punching above its weight.
Model Limitations
A waterfall only shows modeled contributions. The model assumes it can separate each channel’s effect—in reality, channels interact (TV drives search, social drives brand recall). Treat these as estimates, not ground truth.
How to Read a Response Curve
The curve shows the relationship between how much you spend on a channel and how much revenue it generates.
Every curve starts steep (early dollars are highly productive) and flattens (later dollars are wasted).
The dot marks each channel’s current spend level.
The green zone means room to scale. The yellow zone means near optimal. The red zone means past the point of diminishing returns—reallocate that budget.
Average vs. Marginal ROI
The single most common mistake: confusing average ROI with marginal ROI. A channel can have a great average ROI (the line from origin looks steep) while having a terrible marginal ROI (the curve is flat at your current spend). Average ROI tells you about the past. Marginal ROI tells you what to do next.
From measurement plan to causal proof
Align on what to measure, diagnose where credit and causality break down, then optimize spend.
START HERE
You have tracking but no shared plan. Map business outcomes to GA4 events and align your team on what to measure.
📊PROBLEM
Your marketing team is fighting over credit. See how last-touch vs multi-touch changes everything.
🔗PROBLEM
See how first-order Markov with removal calculates true credit by simulating what happens when you remove each channel.
🎯PROBLEM
MTA shows 1,600 conversions. Past tests suggest 50% incrementality. Calculate true impact.
📉PROBLEM
When does more spend stop working? How long do effects last? Explore diminishing returns.
📈PROBLEM
Is your growth real or just correlation? Interactive demonstration of confounded vs randomized data.
📝PROBLEM
Post-purchase surveys cut through walled garden opacity and memory gap.
📐PROBLEM
Backwards and forwards views of incremental lift.
💰OPTIMIZE
Set your efficiency goal, add your channels, and instantly see which to scale, hold, reduce, or cut.
(813) 922-8725 (8139-CAUSAL)Whether you're interested in discussing potential opportunities, sharing insights about analytics challenges, or simply want to connect over shared interests in causal inference and measurement, I'd love to hear from you.
I appreciate your message and will respond as soon as possible. Looking forward to connecting with you.