GeoLift-SDID Library for Marketing Causal Inference¶
This document outlines the core components and technical advantages of the GeoLift-SDID Python library for measuring the causal impact of marketing and advertising campaigns.
Core Methodological Foundation¶
The GeoLift-SDID library implements the Synthetic Difference-in-Differences (SDID) methodology, a robust causal inference technique specifically optimized for geo-based marketing analysis. This approach provides a rigorous framework for isolating true incremental impact (lift) by comparing treatment geographic areas against a data-driven synthetic control group.
Technical Architecture and Implementation¶
Core Library Components¶
Modular Design: The codebase has a clean separation between:
Core statistical methods (
synthdiddirectory)Analysis scripts (
recipesdirectory)Configuration management (
configsdirectory)
Class Hierarchy:
Base
SyntheticDifferenceInDifferencesclass combines core functionality through compositionSpecialized implementations for single-cell and multi-cell analysis
Robust matrix handling with fallback mechanisms for dimension constraints
Robust Statistical Implementation:
Bootstrap standard error calculation when direct matrix computation fails
Matrix dimension mismatch detection and automatic fallback mechanisms
Proper significance testing via statistical best practices
AI-Powered Interpretation:
Automated business interpretation using DeepSeek API
Standardized output formats for consistent analysis
First-principles break down of statistical results into business implications
Analysis Capabilities¶
Single-Cell Analysis¶
The single-cell implementation provides focused analysis on individual treatment markets:
Technical Advantages:
Direct data extraction when model API fails due to structural constraints
Bootstrap resampling for standard error estimation
Specialized visualization for treatment vs. synthetic comparison
Mathematical Robustness:
Handles pre/post period dimension mismatches (60 vs. 30 periods)
Properly calculates relative effects against appropriate baselines
Quantifies uncertainty through bootstrap confidence intervals
Multi-Cell Analysis¶
For analysis across multiple treatment markets simultaneously:
Optimized Implementation:
Efficient parallel processing of multiple treatment units
Consistent statistical framework across all markets
Produces standardized outputs for cross-market comparison
Strategic Advantages:
Enables portfolio-level analysis of marketing impact
Facilitates cross-market comparisons of effectiveness
Identifies relative performance drivers
Analytical Utilities¶
Donor Evaluation¶
The donor evaluator identifies optimal control markets for synthetic comparison:
Evaluation Metrics:
Pre-treatment fit quality assessment
Similarity scoring through multiple correlation methods
Ranked recommendations based on composite metrics
Technical Implementation:
Produces detailed diagnostic outputs
Configurable through YAML profiles
Outputs standardized recommendations for downstream analysis
Power Analysis¶
Calculates minimum detectable effect and statistical power:
Core Functionality:
Simulation-based approach for power estimation
T-test approximation for efficient computation
Configurable effect sizes and test durations
Practical Applications:
Determines required test duration for desired sensitivity
Verifies experiment design can detect business-relevant effects
Informs resource allocation for marketing tests
Recent Technical Enhancements¶
Bootstrap Standard Error:
Implemented robust standard error calculation through bootstrap resampling
Handles cases where matrix dimension constraints prevent direct calculation
Provides valid p-values despite structural limitations
Dimension Mismatch Handling:
Detects and resolves matrix dimension incompatibilities
Implements fallback statistical calculations
Maintains calculation integrity despite dimensional constraints
AI Interpretation Pipeline:
Standardizes statistical outputs for AI consumption
Directly generates business-focused analysis from technical results
Produces actionable recommendations based on statistical findings
Data Validation and Preprocessing:
Robust type handling for diverse input formats
Automatic date format conversion
Treatment unit validation and type alignment
Usage Guidelines¶
The library provides a structured workflow for marketing effectiveness measurement:
Start with donor evaluation to identify optimal control markets
Run power analysis to verify experimental sensitivity
Conduct GeoLift analysis (single or multi-cell) to measure causal impact
Generate AI interpretation for business-focused insights
See the pipeline_workflow.md document for detailed command-line examples and parameter references.
The current implementation supports both configuration-file and direct command-line parameter approaches, providing flexibility while maintaining analytical rigor.