Data integration to improve inference

Introduction

In the week where ‘designed big data’ was discussed as a concept, the integration of design-based surveys with ‘Big data’ stemming from data donation, sensors, or other sources has already been discussed as a way to do inference in a world where there are more and more data.

In data integration we more generally try to combine and integrate data to improve inference. Whereas in designed big data multiple data sources are collected for the same individuals, we will this week discuss several approaches that integrate data that are collected in different people

Literature

  • Wisniowski, A., Sakshaug, J. W., Perez Ruiz, D. A., & Blom, A. G. (2020). Integrating probability and nonprobability samples for survey inference. Journal of Survey Statistics and Methodology, 8(1), 120-147.

Lecture

*Introduction into the reasons for data integration

  • several practical examples
    *More detailed discussion on integrating probability and non-probability datasets
  • Integrating micro-data and aggregate data

Slides

Class and Take home exercise

  • exercise on integrating probability and non-probability surveys (see lecture slides)
    -(if time permits): Class discussion on inference: Is there a general methodology for data integration, and inference for the 21st century?
Previous
Next