Merging nucleus datasets by correlation-based cross-training.

abstract

Fine-grained nucleus classification is challenging because of the high inter-class similarity and intra-class variability. Therefore, a large number of labeled data is required for training effective nucleus classification models. However, it is challenging to label a large-scale nucleus classification dataset comparable to ImageNet in natural images, considering that high-quality nucleus labeling requires specific domain knowledge. In addition, the existing publicly available datasets are often inconsistently labeled with divergent labeling criteria. Due to this inconsistency, conventional models have to be trained on each dataset separately and work independently to infer their own classification results, limiting their classification performance. To fully utilize all annotated datasets, we formulate the nucleus classification task as a multi-label problem with missing labels to utilize all datasets in a unified framework. Specifically, we merge all datasets and combine their labels as multiple labels. Thus, each data has one ground-truth label and several missing labels. We devise a base classification module that is trained using all data but sparsely supervised by the ground-truth labels only. We then exploit the correlation among different label sets by a label correlation module. By doing so, we can have two trained basic modules and further cross-train them with both ground-truth labels and pseudo labels for the missing ones. Importantly, data without any ground-truth labels can also be involved in our framework, as we can regard them as data with all labels missing and generate the corresponding pseudo labels. We carefully re-organized multiple publicly available nucleus classification datasets, converted them into a uniform format, and tested the proposed framework on them. Experimental results show substantial improvement compared to the state-of-the-art methods. The code and data are available at https://w-h-zhang.github.io/projects/dataset_merging/dataset_merging.html.

authors

Wang, Wenping

published proceedings

Med Image Anal

author list (cited authors)

Zhang, W., Zhang, J., Wang, X., Yang, S., Huang, J., Yang, W., Wang, W., & Han, X.

citation count

0

complete list of authors

Zhang, Wenhua||Zhang, Jun||Wang, Xiyue||Yang, Sen||Huang, Junzhou||Yang, Wei||Wang, Wenping||Han, Xiao

publication date

February 2023

publisher

Elsevier Publisher

published in

Medical Image Analysis Journal

Merging nucleus datasets by correlation-based cross-training.

Overview

abstract

authors

published proceedings

author list (cited authors)

citation count

complete list of authors

publication date

publisher

published in

Research

keywords

Identity

PubMed Central ID

Digital Object Identifier (DOI)

Additional Document Info

start page

end page

volume

Other

URL