Skip to content

🔍 Computer Vision Hybrid Approach

🎯 Overview

The Animation Designer Bot uses a breakthrough hybrid computer vision approach that combines dual-rendering from After Effects with deterministic computer vision methods to achieve pixel-perfect ground truth without the computational overhead of deep learning.

🌈 Dual-Rendering Methodology

Core Concept

Instead of relying on complex deep learning models, the system uses a dual-rendering approach from After Effects:

  1. Normal Render: Standard output for final video
  2. Color-Coded Render: Elements tagged with specific colors for automatic identification
  3. Deterministic Analysis: Classical CV methods applied to both renders

Color Encoding System

COLOR_ENCODING = {
    # Text Elements
    (255, 0, 0):     "texto_principal",      # Red
    (255, 255, 0):   "texto_secundario",     # Yellow
    (255, 128, 0):   "texto_descriptivo",    # Orange

    # Interactive Elements
    (0, 255, 0):     "cta_button",           # Green
    (0, 255, 128):   "link_interactivo",     # Light Green

    # Branding Elements
    (0, 0, 255):     "logo_principal",       # Blue
    (128, 0, 255):   "logo_secundario",     # Purple

    # Graphic Elements
    (0, 255, 255):   "icono",                # Cyan
    (255, 0, 255):   "decoracion",           # Magenta

    # Background Elements
    (128, 0, 0):     "background",           # Brown
    (0, 128, 0):     "imagen_producto",      # Dark Green
    (0, 0, 128):     "frame_border",         # Dark Blue

    # Special Elements
    (128, 128, 0):   "precio_oferta",        # Olive
    (128, 0, 128):   "countdown_timer",      # Dark Purple
    (0, 128, 128):   "progress_indicator",   # Teal
}

🔍 Deterministic Computer Vision Pipeline

1. Ground Truth Extraction

class GroundTruthExtractor:
    def extract_from_dual_render(self, normal_video, coded_video):
        """Extract pixel-perfect ground truth from color-coded render"""

        ground_truth_data = []

        for frame_idx, (normal_frame, coded_frame) in enumerate(
            zip(self.extract_frames(normal_video), 
                self.extract_frames(coded_video))):

            # Extract elements from color-coded frame
            elements = self.extract_elements_from_coded(coded_frame)

            # Map to normal frame coordinates
            mapped_elements = self.map_to_normal_frame(elements, normal_frame)

            frame_data = {
                "frame_index": frame_idx,
                "timestamp": frame_idx / 30.0,  # Assuming 30fps
                "elements": mapped_elements,
                "normal_frame": normal_frame,
                "coded_frame": coded_frame
            }

            ground_truth_data.append(frame_data)

        return ground_truth_data

2. Color Segmentation

class ColorSegmentator:
    def __init__(self, tolerance=10):
        self.tolerance = tolerance

    def segment_by_color(self, image, target_color):
        """Exact segmentation by RGB color"""
        # Create mask for specific color
        lower = np.array([max(0, c - self.tolerance) for c in target_color])
        upper = np.array([min(255, c + self.tolerance) for c in target_color])

        mask = cv2.inRange(image, lower, upper)

        # Find contours
        contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

        return self.contours_to_regions(contours)

    def extract_all_elements(self, coded_image):
        """Extract all elements from color-coded image"""
        elements = {}

        for color, element_type in COLOR_ENCODING.items():
            regions = self.segment_by_color(coded_image, color)
            elements[element_type] = regions

        return elements

3. Deterministic Feature Analysis

class DeterministicAnalyzer:
    def analyze_region_characteristics(self, region):
        """Analyze features without ML"""

        # Geometric characteristics
        area = cv2.contourArea(region.contour)
        perimeter = cv2.arcLength(region.contour, True)

        # Bounding box
        x, y, w, h = cv2.boundingRect(region.contour)
        aspect_ratio = w / h

        # Shape characteristics
        hull = cv2.convexHull(region.contour)
        hull_area = cv2.contourArea(hull)
        solidity = area / hull_area if hull_area > 0 else 0

        # Position characteristics
        image_h, image_w = region.image_shape
        center_x, center_y = x + w/2, y + h/2

        position_features = {
            "relative_x": center_x / image_w,
            "relative_y": center_y / image_h,
            "is_center": 0.3 < center_x/image_w < 0.7 and 0.3 < center_y/image_h < 0.7,
            "is_corner": center_x/image_w < 0.2 or center_x/image_w > 0.8 or 
                        center_y/image_h < 0.2 or center_y/image_h > 0.8
        }

        return {
            "geometry": {
                "area": area,
                "perimeter": perimeter,
                "aspect_ratio": aspect_ratio,
                "solidity": solidity,
                "relative_area": area / (image_w * image_h)
            },
            "position": position_features
        }

4. Rule-Based Classification Engine

class DeterministicRulesEngine:
    def __init__(self):
        self.rules = [
            # Text rules
            Rule("texto_largo", 
                 lambda f: f["geometry"]["aspect_ratio"] > 3.0 and 
                          f["geometry"]["relative_area"] < 0.1),

            # Button rules
            Rule("boton_probable", 
                 lambda f: 1.5 < f["geometry"]["aspect_ratio"] < 4.0 and
                          0.01 < f["geometry"]["relative_area"] < 0.05 and
                          f["position"]["is_center"]),

            # Logo rules
            Rule("logo_probable",
                 lambda f: 0.8 < f["geometry"]["aspect_ratio"] < 1.5 and
                          f["position"]["is_corner"] and
                          f["geometry"]["relative_area"] < 0.03),
        ]

    def classify_region(self, features):
        """Classify region using deterministic rules"""
        classifications = []

        for rule in self.rules:
            if rule.condition(features):
                classifications.append({
                    "type": rule.name,
                    "confidence": rule.confidence,
                    "rule_fired": rule.name
                })

        return classifications

🧠 Pattern Extraction for WFC

Wave Function Collapse Integration

The system extracts patterns from analyzed motion graphics to feed into a Wave Function Collapse (WFC) system for procedural generation:

class WFCPatternExtractor:
    def extract_spatial_patterns(self, ground_truth_data):
        """Extract spatial patterns for WFC system"""
        patterns = {
            "element_relationships": [],
            "spatial_constraints": [],
            "timing_patterns": [],
            "transition_patterns": []
        }

        for frame_data in ground_truth_data:
            # Extract element relationships
            relationships = self.analyze_element_relationships(frame_data["elements"])
            patterns["element_relationships"].extend(relationships)

            # Extract spatial constraints
            constraints = self.extract_spatial_constraints(frame_data["elements"])
            patterns["spatial_constraints"].extend(constraints)

        return patterns

    def extract_temporal_patterns(self, ground_truth_data):
        """Extract temporal patterns for animation generation"""
        temporal_patterns = {
            "entry_animations": [],
            "exit_animations": [],
            "transition_types": [],
            "timing_profiles": []
        }

        # Analyze frame-to-frame changes
        for i in range(len(ground_truth_data) - 1):
            current_frame = ground_truth_data[i]
            next_frame = ground_truth_data[i + 1]

            changes = self.analyze_frame_changes(current_frame, next_frame)
            temporal_patterns["transition_types"].extend(changes)

        return temporal_patterns

📊 Performance Metrics

Expected Performance Characteristics

Metric Target Method
Overall Precision >90% Validation vs ground truth
Element Recall >85% Detection of known elements
Processing Speed >30 FPS Real-time on standard CPU
False Positives <5% Incorrect classifications

Advantages of Hybrid Approach

✅ Technical Benefits

  1. Perfect Ground Truth: Pixel-exact correspondence between elements
  2. High Computational Efficiency: Deterministic methods vs deep learning
  3. Real-time Processing: Low memory consumption and fast execution
  4. Complete Interpretability: Explicit and auditable rules

✅ Business Benefits

  1. Cost-Effective: No GPU requirements for inference
  2. No Training Costs: No massive training data requirements
  3. Simplified Maintenance: Easy debugging and adjustment
  4. Scalable: Can process thousands of examples efficiently

🚀 Implementation Pipeline

Phase 1: Infrastructure Setup

  • Dual-rendering system in After Effects
  • Ground truth extraction pipeline
  • Basic computer vision pipeline

Phase 2: Deterministic Methods

  • Color segmentation implementation
  • Geometric feature analysis
  • Basic rule engine

Phase 3: Template Matching

  • Template library creation
  • Hierarchical matching system
  • Rule integration

Phase 4: Validation and Optimization

  • Automated validation system
  • Parameter optimization
  • Performance metrics

Phase 5: WFC Integration

  • Pattern extraction for WFC
  • Integration with main system
  • End-to-end testing

🔧 Technical Requirements

Software Dependencies

  • OpenCV: Computer vision operations
  • scikit-image: Image analysis
  • NumPy/SciPy: Array processing
  • Pillow: Image manipulation
  • After Effects: Dual-rendering setup

Hardware Requirements

  • CPU: Standard multi-core processor
  • Memory: 8GB+ RAM for processing
  • Storage: SSD for fast I/O operations
  • GPU: Optional for template matching acceleration

This hybrid approach combines the precision of automated ground truth with the efficiency of deterministic computer vision methods, creating a robust and scalable solution for motion graphics analysis.