Self-supervised learning discovers novel morphological clusters linked to patient outcome and molecular phenotypes
Histopathological images are the largest source of data for studying the phenotype of tumours, from morphology of cell types to complex microenvironment interactions. Deep learning holds the promise for better understanding of these phenotypes and improving patient diagnosis and treatment. Despite recent progress, most methods focus on supervised paradigms, where their success depends on association with a restricted set of labels, limiting its impact in clinical applications and basic discovery. We present Histomorphological Phenotype Learning (HPL), a methodology to extract tissue patterns through self-supervised learning and community detection. HPL led to the de novo discovery of histomorphological phenotype clusters, which are tissue regions with common morphological features. We demonstrate these clusters’ utility in predicting lung adenocarcinoma and squamous cell carcinoma, and lung adenocarcinoma patient survival and recurrence. These predictions are exhaustively explained due to HPL's ability to link its clusters to classical morphological characteristics and a wide range of omics-based molecular phenotype profiles.