Estimation of a probability density function using interval aggregated data

abstract

2016 Informa UK Limited, trading as Taylor & Francis Group. In economics and government statistics, aggregated data instead of individual level data are usually reported for data confidentiality and for simplicity. In this paper we develop a method of flexibly estimating the probability density function of the population using aggregated data obtained as group averages when individual level data are grouped according to quantile limits. The kernel density estimator has been commonly applied to such data without taking into account the data aggregation process and has been shown to perform poorly. Our method models the quantile function as an integral of the exponential of a spline function and deduces the density function from the quantile function. We match the aggregated data to their theoretical counterpart using least squares, and regularize the estimation by using the squared second derivatives of the density function as the penalty function. A computational algorithm is developed to implement the method. Application to simulated data and US household income survey data show that our penalized spline estimator can accurately recover the density function of the underlying population while the common use of kernel density estimation is severely biased. The method is applied to study the dynamic of China's urban income distribution using published interval aggregated data of 19852010.

authors

published proceedings

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION

author list (cited authors)

Huang, J. Z., Wang, X., Wu, X., & Zhou, L.

citation count

2

complete list of authors

Huang, Jianhua Z||Wang, Xueying||Wu, Ximing||Zhou, Lan

publication date

October 2016

publisher

Taylor & Francis Publisher

published in

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION Journal

keywords

Density Estimation
Gini Index
Growth Incidence Curve
Income Disparity
Lorenz Curve
Penalized Splines

Digital Object Identifier (DOI)

10.1080/00949655.2016.1150481

start page

3093

end page

3105

volume

86

issue

15

URL

http%3A%2F%2Fdx.doi.org%2F10.1080%2F00949655.2016.1150481

Estimation of a probability density function using interval aggregated data Academic Article

Overview

abstract

authors

published proceedings

author list (cited authors)

citation count

complete list of authors

publication date

publisher

published in

Research

keywords

Identity

Digital Object Identifier (DOI)

Additional Document Info

start page

end page

volume

issue

Other

URL