Merging nucleus datasets by correlation-based cross-training. Academic Article uri icon

abstract

  • Fine-grained nucleus classification is challenging because of the high inter-class similarity and intra-class variability. Therefore, a large number of labeled data is required for training effective nucleus classification models. However, it is challenging to label a large-scale nucleus classification dataset comparable to ImageNet in natural images, considering that high-quality nucleus labeling requires specific domain knowledge. In addition, the existing publicly available datasets are often inconsistently labeled with divergent labeling criteria. Due to this inconsistency, conventional models have to be trained on each dataset separately and work independently to infer their own classification results, limiting their classification performance. To fully utilize all annotated datasets, we formulate the nucleus classification task as a multi-label problem with missing labels to utilize all datasets in a unified framework. Specifically, we merge all datasets and combine their labels as multiple labels. Thus, each data has one ground-truth label and several missing labels. We devise a base classification module that is trained using all data but sparsely supervised by the ground-truth labels only. We then exploit the correlation among different label sets by a label correlation module. By doing so, we can have two trained basic modules and further cross-train them with both ground-truth labels and pseudo labels for the missing ones. Importantly, data without any ground-truth labels can also be involved in our framework, as we can regard them as data with all labels missing and generate the corresponding pseudo labels. We carefully re-organized multiple publicly available nucleus classification datasets, converted them into a uniform format, and tested the proposed framework on them. Experimental results show substantial improvement compared to the state-of-the-art methods. The code and data are available at https://w-h-zhang.github.io/projects/dataset_merging/dataset_merging.html.

published proceedings

  • Med Image Anal

author list (cited authors)

  • Zhang, W., Zhang, J., Wang, X., Yang, S., Huang, J., Yang, W., Wang, W., & Han, X.

citation count

  • 0

complete list of authors

  • Zhang, Wenhua||Zhang, Jun||Wang, Xiyue||Yang, Sen||Huang, Junzhou||Yang, Wei||Wang, Wenping||Han, Xiao

publication date

  • February 2023