Feature screening with large-scale and high-dimensional survival data.

abstract

Data with a huge size present great challenges in modeling, inferences, and computation. In handling big data, much attention has been directed to settings with "large p small n", and relatively less work has been done to address problems with p and n being both large, though data with such a feature have now become more accessible than before, where p represents the number of variables and n stands for the sample size. The big volume of data does not automatically ensure good quality of inferences because a large number of unimportant variables may be collected in the process of gathering informative variables. To carry out valid statistical analysis, it is imperative to screen out noisy variables that have no predictive value for explaining the outcome variable. In this paper, we develop a screening method for handling large-sized survival data, where the sample size n is large and the dimension p of covariates is of non-polynomial order of the sample size n, or the so-called NP-dimension. We rigorously establish theoretical results for the proposed method and conduct numerical studies to assess its performance. Our research offers multiple extensions of existing work and enlarges the scope of high-dimensional data analysis. The proposed method capitalizes on the connections among useful regression settings and offers a computationally efficient screening procedure. Our method can be applied to different situations with large-scale data including genomic data.

authors

Carroll, Raymond

published proceedings

Biometrics

altmetric score

2

author list (cited authors)

Yi, G. Y., He, W., & Carroll, R. J.

citation count

0

complete list of authors

Yi, Grace Y||He, Wenqing||Carroll, Raymond J

publication date

September 2022

publisher

Wiley Publisher

published in

Biometrics Journal

keywords

Case-control Data Analysis
Censored Data
Cox Proportional Hazards Model
Genome
Genomics
High-dimensional Covariates
Large Size
Proportional Hazards Models
Sample Size
Screening Analysis

PubMed Central ID

33881782

Digital Object Identifier (DOI)

10.1111/biom.13479

start page

894

end page

907

volume

78

issue

3

URL

http://dx.doi.org/10.1111/biom.13479

Feature screening with large-scale and high-dimensional survival data. Academic Article

Overview

abstract

authors

published proceedings

altmetric score

author list (cited authors)

citation count

complete list of authors

publication date

publisher

published in

Research

keywords

Identity

PubMed Central ID

Digital Object Identifier (DOI)

Additional Document Info

start page

end page

volume

issue

Other

URL