• Home
  • Data Scientist- Nielsen

Data Scientist- Nielsen

Job Description

Role Overview

As a Hybrid Data Scientist you will sit at the intersection of high-scale data pipelining and advanced statistical methodology. You will be responsible for the end-to-end lifecycle of Incremental Reach and Audience Measurement products—from architecting Python-based data pipelines to implementing sophisticated Bayesian and Machine Learning models that quantify the lift of Digital media over a Linear TV baseline.

Key Responsibilities

1. Advanced Statistical Modeling (The "Science" Side)

  • Incremental Reach Frameworks: * Small-N Datasets: Implement Bayesian Model Averaging (BMA) to cycle through regression combinations, providing robust coefficients and credible intervals when study data is limited.

    • Large-Scale Prediction: Deploy Gradient Boosted Regression Trees (GBM) to identify non-linear patterns and rank the impact of "Reach Drivers" (Media Weight, On-Target %, Frequency).

  • Audience Deduplication: Use Maximum Entropy (MaxEnt) models to estimate unique audience reach across fragmented platforms by reconciling census and panel data.

  • Additional Frameworks:

    • Mixed-Effect Models: Use Hierarchical/Multilevel modeling to account for nested data (e.g., campaigns nested within specific industry verticals).

    • Causal Lift: Apply Synthetic Control Methods to measure incremental shifts in behavior for campaigns with fixed timeframes where a clean control group is unavailable.

2. Data Engineering & Pipeline Architecture (The "Engineering" Side)

  • Python-Centric ETL: Architect and maintain robust data pipelines using Python (Pandas, PySpark) to ingest, clean, and harmonize data from Linear TV logs and Digital ad servers.

  • Feature Engineering: Automate the extraction of Base Drivers (GRP, Reach Efficiency, Seasonality) and Custom Drivers (Share of Voice, Flighting) into a supervised learning-ready schema.

  • Productionization: Wrap statistical models into production-grade APIs or scheduled containers (Docker/Airflow) to ensure repeatable and scalable measurement.

  • Cloud Operations: Manage large-scale datasets within Cloud Data Warehouses (Snowflake, AWS, or GCP), optimizing SQL queries for high-performance analytics.

3. Experimental Design & Methodology

  • Control/Test Logistics: Design scientifically valid Control and Test groups, ensuring proper randomization or using Propensity Score Matching to mitigate selection bias.

  • Variable Importance: Provide stakeholders with Posterior Inclusion Probabilities to identify which media levers (Duration, Weight, etc.) most consistently drive incremental reach.

  • Cross-Media Calibration: Reconcile Linear TV's "One-to-Many" metrics with Digital's "One-to-One" tracking to provide a unified view of the consumer.

 

Qualifications

  • Experience: 3-6 years of statistical model development and Mastery of Python (specifically for data manipulation and ML) and advanced SQL. Experience with PySpark or Dask for distributed computing is a plus.

  • Statistical Mastery: Proven experience with GBM (XGBoost/LightGBM) and Bayesian Frameworks (e.g., PyMC, Stan, or R-BMA) among other Data Science models.

  • Media Knowledge: Understanding of Linear TV vs. Digital dynamics, including Reach/Frequency, GRPs, and Deduplication logic.

  • Education: Bachelor’s  or Master’s in a quantitative field (Statistics, Computer Science, Economics) or equivalent professional experience.

Connect
  • Last Date To Apply 27 Mar, 2026
  • Job Location: Bangalore, India
  • Salary (CTC): Not Disclosed - Not Disclosed/ PM
  • Experience: 3-6 Years