Welcome to The iScale India's Trusted Upskilling & Job Tech Platform for Future Readiness.!

Job Description

Role Overview

As a Hybrid Data Scientist you will sit at the intersection of high-scale data pipelining and advanced statistical methodology. You will be responsible for the end-to-end lifecycle of Incremental Reach and Audience Measurement products—from architecting Python-based data pipelines to implementing sophisticated Bayesian and Machine Learning models that quantify the lift of Digital media over a Linear TV baseline.

Key Responsibilities

1. Advanced Statistical Modeling (The "Science" Side)

Incremental Reach Frameworks: * Small-N Datasets: Implement Bayesian Model Averaging (BMA) to cycle through regression combinations, providing robust coefficients and credible intervals when study data is limited.
- Large-Scale Prediction: Deploy Gradient Boosted Regression Trees (GBM) to identify non-linear patterns and rank the impact of "Reach Drivers" (Media Weight, On-Target %, Frequency).
Audience Deduplication: Use Maximum Entropy (MaxEnt) models to estimate unique audience reach across fragmented platforms by reconciling census and panel data.
Additional Frameworks:
- Mixed-Effect Models: Use Hierarchical/Multilevel modeling to account for nested data (e.g., campaigns nested within specific industry verticals).
- Causal Lift: Apply Synthetic Control Methods to measure incremental shifts in behavior for campaigns with fixed timeframes where a clean control group is unavailable.

2. Data Engineering & Pipeline Architecture (The "Engineering" Side)

Python-Centric ETL: Architect and maintain robust data pipelines using Python (Pandas, PySpark) to ingest, clean, and harmonize data from Linear TV logs and Digital ad servers.
Feature Engineering: Automate the extraction of Base Drivers (GRP, Reach Efficiency, Seasonality) and Custom Drivers (Share of Voice, Flighting) into a supervised learning-ready schema.
Productionization: Wrap statistical models into production-grade APIs or scheduled containers (Docker/Airflow) to ensure repeatable and scalable measurement.
Cloud Operations: Manage large-scale datasets within Cloud Data Warehouses (Snowflake, AWS, or GCP), optimizing SQL queries for high-performance analytics.

3. Experimental Design & Methodology

Control/Test Logistics: Design scientifically valid Control and Test groups, ensuring proper randomization or using Propensity Score Matching to mitigate selection bias.
Variable Importance: Provide stakeholders with Posterior Inclusion Probabilities to identify which media levers (Duration, Weight, etc.) most consistently drive incremental reach.
Cross-Media Calibration: Reconcile Linear TV's "One-to-Many" metrics with Digital's "One-to-One" tracking to provide a unified view of the consumer.

Qualifications

Experience: 3-6 years of statistical model development and Mastery of Python (specifically for data manipulation and ML) and advanced SQL. Experience with PySpark or Dask for distributed computing is a plus.
Statistical Mastery: Proven experience with GBM (XGBoost/LightGBM) and Bayesian Frameworks (e.g., PyMC, Stan, or R-BMA) among other Data Science models.
Media Knowledge: Understanding of Linear TV vs. Digital dynamics, including Reach/Frequency, GRPs, and Deduplication logic.
Education: Bachelor’s or Master’s in a quantitative field (Statistics, Computer Science, Economics) or equivalent professional experience.

Application Link: https://jobs.smartrecruiters.com/TheNielsenCompany/3743990012006972-data-scientist?trid=2d92f286-613b-4daf-9dfa-6340ffbecf73

Last Date To Apply 27 Mar, 2026
Job Location: Bangalore, India
Salary (CTC): Not Disclosed - Not Disclosed/ PM
Experience: 3-6 Years

Data Scientist- Nielsen

Job Description

Qualifications