Kaggle Competition for Multi-label Classification of Cell Organelles in Proteome Scale Human Protein Atlas Data

Interview with Professor Emma Lundberg


The Cell Atlas, a part of the Human Protein Atlas (HPA), was created by the group of Prof. Emma Lundberg at the SciLifeLab, KTH Royal Institute of Technology, in Stockholm, Sweden. Currently, she is a visiting professor at Stanford University through the support of the Chan Zuckerberg Initiative. The Cell Atlas was created, in large part, using data acquired with Leica confocal instruments.

In the scope of the Kaggle competition regarding the Human Protein Atlas Image Classification Prof. Lundberg gave an interview to Dr. Constantin Kappel from Leica Microsystems.

The scope of an -omics scale project like the Human Protein Atlas (HPA) is obviously enormous. Can you share some of the history, especially the early beginnings, of the HPA?

The HPA project started back in 2003 and we started the work with the Cell Atlas in 2006. The Genome Project was finished in 2001 and with the knowledge that humans have approximately 20,000 genes, Prof. Mathias Uhlén decided to start a project with the aim to characterize the proteins that these genes encode for. This is important knowledge, because proteins are a closer proxy to function. By identifying the context in which the proteins are expressed, that is in which organs, cells and organelles, we can start to understand their function.

Initial efforts were directed towards the generation of a proteome-wide collection of antibodies and validation of their specificity. In order to pull off such a mammoth of a  task, early efforts were aimed to establish robust automated platforms and protocols for performing immunoassays at large-scale.

Learn more: Kaggle and HPA

Through sponsoring this competition, Leica Microsystems is further able to contribute to both the extension and improvement of biological knowledge, as well as, help build tools for precisely analyzing the vast amount of data created.

Learn more about Kaggle and the Human Protein Atlas (HPA).

What are the potential applications for the vast knowledge and data which are being created with the HPA?

The Human Protein Atlas database has over 250,000 visits per month from researchers around the world. The data is used for everything, from basic cell biology and systems biology to studies aimed to understand and prevent disease and develop better drugs. For the HPA Cell Atlas in particular, knowledge of the subcellular distribution of proteins is something that cannot be inferred well by sequencing. Thus, this data is highly complementary to sequencing based studies aimed at understanding cellular functions.

How will the “image classifier”, built during this competition, help to foster new insights into cellular biology?

A current bottleneck in our work is the reliable classification of the subcellular distribution of proteins in our images. There are several factors that render this task complex. We know that about half of all proteins are localized to multiple compartments in the cell and we are also using a range of different cell types with different morphology. There is also a significant class imbalance with a mix of very rare and highly common patterns in the images. With the Kaggle challenge, we hope to obtain a robust classifier that can assign the subcellular location(s) of proteins in all different cell types. This classifier will not only relieve us from time-consuming manual pattern classification, but also provide opportunities for improved analysis of the cellular architecture.

Once a successful “image classifier”, along with the HPA, is publicly available, what would be the next steps?

The next computational problem to tackle would be to not only identify the patterns, but actually segment them to allow better subsequent analysis of the data. Segmentation of single fine patterns is a challenging task and segmentation of mixed patterns even more so. Being able to segment more efficiently the different organelles would enable a great leap forward towards quantitative measurements of differences in protein localization upon perturbation and quantitative understanding of the cell as a system.

What do you see as the future of imaging for cellular biology?

Recent development in machine learning and novel methods for large-scale high-parametric imaging will greatly influence the future of imaging for cellular biology. Development in computational imaging will improve resolution over large fields of views, and ‘label-free’ applications where structures of the cell are predicted will enable minimally-invasive and high-parametric measurements of live cells.  Another area of interest is highly multiplexed technologies where hundreds of proteins or RNAs can be visualized in the same sample. I believe that these developments together with the establishment of open-access image repositories, as well as improved computational models for quantification of spatial patterns and environments, will position imaging as a key technology in the quest to characterize all human cells.

What are your expectations when collaborating with a company on research projects like this one?

Collaborative projects between academia and industry is greatly needed to advance science. Our lab develops pipelines for automated feedback microscopy, and in this case we are dependent on a good communication with the microscope provider and ideally also to be included in the company development plans to maximally align the synergy.

Related Articles

Related Pages

Interested to know more?

Talk to our experts. We are happy to answer all your questions and concerns.

Contact Us

Do you prefer personal consulting? Show local contacts

Scroll to top