GeoLift-SDID Library for Marketing Causal Inference¶

This document outlines the core components and technical advantages of the GeoLift-SDID Python library for measuring the causal impact of marketing and advertising campaigns.

Core Methodological Foundation¶

The GeoLift-SDID library implements the Synthetic Difference-in-Differences (SDID) methodology, a robust causal inference technique specifically optimized for geo-based marketing analysis. This approach provides a rigorous framework for isolating true incremental impact (lift) by comparing treatment geographic areas against a data-driven synthetic control group.

Technical Architecture and Implementation¶

Core Library Components¶

Modular Design: The codebase has a clean separation between:
- Core statistical methods (synthdid directory)
- Analysis scripts (recipes directory)
- Configuration management (configs directory)
Class Hierarchy:
- Base SyntheticDifferenceInDifferences class combines core functionality through composition
- Specialized implementations for single-cell and multi-cell analysis
- Robust matrix handling with fallback mechanisms for dimension constraints
Robust Statistical Implementation:
- Bootstrap standard error calculation when direct matrix computation fails
- Matrix dimension mismatch detection and automatic fallback mechanisms
- Proper significance testing via statistical best practices
AI-Powered Interpretation:
- Automated business interpretation using DeepSeek API
- Standardized output formats for consistent analysis
- First-principles break down of statistical results into business implications

Analysis Capabilities¶

Single-Cell Analysis¶

The single-cell implementation provides focused analysis on individual treatment markets:

Technical Advantages:
- Direct data extraction when model API fails due to structural constraints
- Bootstrap resampling for standard error estimation
- Specialized visualization for treatment vs. synthetic comparison
Mathematical Robustness:
- Handles pre/post period dimension mismatches (60 vs. 30 periods)
- Properly calculates relative effects against appropriate baselines
- Quantifies uncertainty through bootstrap confidence intervals

Multi-Cell Analysis¶

For analysis across multiple treatment markets simultaneously:

Optimized Implementation:
- Efficient parallel processing of multiple treatment units
- Consistent statistical framework across all markets
- Produces standardized outputs for cross-market comparison
Strategic Advantages:
- Enables portfolio-level analysis of marketing impact
- Facilitates cross-market comparisons of effectiveness
- Identifies relative performance drivers

Analytical Utilities¶

Donor Evaluation¶

The donor evaluator identifies optimal control markets for synthetic comparison:

Evaluation Metrics:
- Pre-treatment fit quality assessment
- Similarity scoring through multiple correlation methods
- Ranked recommendations based on composite metrics
Technical Implementation:
- Produces detailed diagnostic outputs
- Configurable through YAML profiles
- Outputs standardized recommendations for downstream analysis

Power Analysis¶

Calculates minimum detectable effect and statistical power:

Core Functionality:
- Simulation-based approach for power estimation
- T-test approximation for efficient computation
- Configurable effect sizes and test durations
Practical Applications:
- Determines required test duration for desired sensitivity
- Verifies experiment design can detect business-relevant effects
- Informs resource allocation for marketing tests

Recent Technical Enhancements¶

Bootstrap Standard Error:
- Implemented robust standard error calculation through bootstrap resampling
- Handles cases where matrix dimension constraints prevent direct calculation
- Provides valid p-values despite structural limitations
Dimension Mismatch Handling:
- Detects and resolves matrix dimension incompatibilities
- Implements fallback statistical calculations
- Maintains calculation integrity despite dimensional constraints
AI Interpretation Pipeline:
- Standardizes statistical outputs for AI consumption
- Directly generates business-focused analysis from technical results
- Produces actionable recommendations based on statistical findings
Data Validation and Preprocessing:
- Robust type handling for diverse input formats
- Automatic date format conversion
- Treatment unit validation and type alignment

Usage Guidelines¶

The library provides a structured workflow for marketing effectiveness measurement:

Start with donor evaluation to identify optimal control markets
Run power analysis to verify experimental sensitivity
Conduct GeoLift analysis (single or multi-cell) to measure causal impact
Generate AI interpretation for business-focused insights

See the pipeline_workflow.md document for detailed command-line examples and parameter references.

The current implementation supports both configuration-file and direct command-line parameter approaches, providing flexibility while maintaining analytical rigor.