# GeoLift-SDID Library for Marketing Causal Inference

This document outlines the core components and technical advantages of the GeoLift-SDID Python library for measuring the causal impact of marketing and advertising campaigns.

## Core Methodological Foundation

The GeoLift-SDID library implements the Synthetic Difference-in-Differences (SDID) methodology, a robust causal inference technique specifically optimized for geo-based marketing analysis. This approach provides a rigorous framework for isolating true incremental impact (lift) by comparing treatment geographic areas against a data-driven synthetic control group.

## Technical Architecture and Implementation

### Core Library Components

1. **Modular Design**: The codebase has a clean separation between:
   - Core statistical methods (`synthdid` directory)
   - Analysis scripts (`recipes` directory)
   - Configuration management (`configs` directory)

2. **Class Hierarchy**:
   - Base `SyntheticDifferenceInDifferences` class combines core functionality through composition
   - Specialized implementations for single-cell and multi-cell analysis
   - Robust matrix handling with fallback mechanisms for dimension constraints

3. **Robust Statistical Implementation**:
   - Bootstrap standard error calculation when direct matrix computation fails
   - Matrix dimension mismatch detection and automatic fallback mechanisms
   - Proper significance testing via statistical best practices

4. **AI-Powered Interpretation**:
   - Automated business interpretation using DeepSeek API
   - Standardized output formats for consistent analysis
   - First-principles break down of statistical results into business implications

## Analysis Capabilities

### Single-Cell Analysis

The single-cell implementation provides focused analysis on individual treatment markets:

1. **Technical Advantages**:
   - Direct data extraction when model API fails due to structural constraints
   - Bootstrap resampling for standard error estimation
   - Specialized visualization for treatment vs. synthetic comparison

2. **Mathematical Robustness**:
   - Handles pre/post period dimension mismatches (60 vs. 30 periods)
   - Properly calculates relative effects against appropriate baselines
   - Quantifies uncertainty through bootstrap confidence intervals

### Multi-Cell Analysis

For analysis across multiple treatment markets simultaneously:

1. **Optimized Implementation**:
   - Efficient parallel processing of multiple treatment units
   - Consistent statistical framework across all markets
   - Produces standardized outputs for cross-market comparison

2. **Strategic Advantages**:
   - Enables portfolio-level analysis of marketing impact
   - Facilitates cross-market comparisons of effectiveness
   - Identifies relative performance drivers

## Analytical Utilities

### Donor Evaluation

The donor evaluator identifies optimal control markets for synthetic comparison:

1. **Evaluation Metrics**:
   - Pre-treatment fit quality assessment
   - Similarity scoring through multiple correlation methods
   - Ranked recommendations based on composite metrics

2. **Technical Implementation**:
   - Produces detailed diagnostic outputs
   - Configurable through YAML profiles
   - Outputs standardized recommendations for downstream analysis

### Power Analysis

Calculates minimum detectable effect and statistical power:

1. **Core Functionality**:
   - Simulation-based approach for power estimation
   - T-test approximation for efficient computation
   - Configurable effect sizes and test durations

2. **Practical Applications**:
   - Determines required test duration for desired sensitivity
   - Verifies experiment design can detect business-relevant effects
   - Informs resource allocation for marketing tests

## Recent Technical Enhancements

1. **Bootstrap Standard Error**:
   - Implemented robust standard error calculation through bootstrap resampling
   - Handles cases where matrix dimension constraints prevent direct calculation
   - Provides valid p-values despite structural limitations

2. **Dimension Mismatch Handling**:
   - Detects and resolves matrix dimension incompatibilities
   - Implements fallback statistical calculations
   - Maintains calculation integrity despite dimensional constraints

3. **AI Interpretation Pipeline**:
   - Standardizes statistical outputs for AI consumption
   - Directly generates business-focused analysis from technical results
   - Produces actionable recommendations based on statistical findings

4. **Data Validation and Preprocessing**:
   - Robust type handling for diverse input formats
   - Automatic date format conversion
   - Treatment unit validation and type alignment

## Usage Guidelines

The library provides a structured workflow for marketing effectiveness measurement:

1. Start with **donor evaluation** to identify optimal control markets
2. Run **power analysis** to verify experimental sensitivity
3. Conduct **GeoLift analysis** (single or multi-cell) to measure causal impact
4. Generate **AI interpretation** for business-focused insights

See the `pipeline_workflow.md` document for detailed command-line examples and parameter references.

The current implementation supports both configuration-file and direct command-line parameter approaches, providing flexibility while maintaining analytical rigor.