# GeoLift-SDID Library for Marketing Causal Inference This document outlines the core components and technical advantages of the GeoLift-SDID Python library for measuring the causal impact of marketing and advertising campaigns. ## Core Methodological Foundation The GeoLift-SDID library implements the Synthetic Difference-in-Differences (SDID) methodology, a robust causal inference technique specifically optimized for geo-based marketing analysis. This approach provides a rigorous framework for isolating true incremental impact (lift) by comparing treatment geographic areas against a data-driven synthetic control group. ## Technical Architecture and Implementation ### Core Library Components 1. **Modular Design**: The codebase has a clean separation between: - Core statistical methods (`synthdid` directory) - Analysis scripts (`recipes` directory) - Configuration management (`configs` directory) 2. **Class Hierarchy**: - Base `SyntheticDifferenceInDifferences` class combines core functionality through composition - Specialized implementations for single-cell and multi-cell analysis - Robust matrix handling with fallback mechanisms for dimension constraints 3. **Robust Statistical Implementation**: - Bootstrap standard error calculation when direct matrix computation fails - Matrix dimension mismatch detection and automatic fallback mechanisms - Proper significance testing via statistical best practices 4. **AI-Powered Interpretation**: - Automated business interpretation using DeepSeek API - Standardized output formats for consistent analysis - First-principles break down of statistical results into business implications ## Analysis Capabilities ### Single-Cell Analysis The single-cell implementation provides focused analysis on individual treatment markets: 1. **Technical Advantages**: - Direct data extraction when model API fails due to structural constraints - Bootstrap resampling for standard error estimation - Specialized visualization for treatment vs. synthetic comparison 2. **Mathematical Robustness**: - Handles pre/post period dimension mismatches (60 vs. 30 periods) - Properly calculates relative effects against appropriate baselines - Quantifies uncertainty through bootstrap confidence intervals ### Multi-Cell Analysis For analysis across multiple treatment markets simultaneously: 1. **Optimized Implementation**: - Efficient parallel processing of multiple treatment units - Consistent statistical framework across all markets - Produces standardized outputs for cross-market comparison 2. **Strategic Advantages**: - Enables portfolio-level analysis of marketing impact - Facilitates cross-market comparisons of effectiveness - Identifies relative performance drivers ## Analytical Utilities ### Donor Evaluation The donor evaluator identifies optimal control markets for synthetic comparison: 1. **Evaluation Metrics**: - Pre-treatment fit quality assessment - Similarity scoring through multiple correlation methods - Ranked recommendations based on composite metrics 2. **Technical Implementation**: - Produces detailed diagnostic outputs - Configurable through YAML profiles - Outputs standardized recommendations for downstream analysis ### Power Analysis Calculates minimum detectable effect and statistical power: 1. **Core Functionality**: - Simulation-based approach for power estimation - T-test approximation for efficient computation - Configurable effect sizes and test durations 2. **Practical Applications**: - Determines required test duration for desired sensitivity - Verifies experiment design can detect business-relevant effects - Informs resource allocation for marketing tests ## Recent Technical Enhancements 1. **Bootstrap Standard Error**: - Implemented robust standard error calculation through bootstrap resampling - Handles cases where matrix dimension constraints prevent direct calculation - Provides valid p-values despite structural limitations 2. **Dimension Mismatch Handling**: - Detects and resolves matrix dimension incompatibilities - Implements fallback statistical calculations - Maintains calculation integrity despite dimensional constraints 3. **AI Interpretation Pipeline**: - Standardizes statistical outputs for AI consumption - Directly generates business-focused analysis from technical results - Produces actionable recommendations based on statistical findings 4. **Data Validation and Preprocessing**: - Robust type handling for diverse input formats - Automatic date format conversion - Treatment unit validation and type alignment ## Usage Guidelines The library provides a structured workflow for marketing effectiveness measurement: 1. Start with **donor evaluation** to identify optimal control markets 2. Run **power analysis** to verify experimental sensitivity 3. Conduct **GeoLift analysis** (single or multi-cell) to measure causal impact 4. Generate **AI interpretation** for business-focused insights See the `pipeline_workflow.md` document for detailed command-line examples and parameter references. The current implementation supports both configuration-file and direct command-line parameter approaches, providing flexibility while maintaining analytical rigor.