Integrative Open-Source Software for Image Analysis in Biology

November 28, 2012

Imaging techniques are indispensable in many fields of life sciences today. With state-of-the-art optics and metrology, they provide hundreds of gigabytes of still images and videos. Correspondingly, there is a growing need for complex software solutions to ensure that the amounts of generated data can be automatically managed, processed and analyzed – and shared online with a large group of users. The combination of individual open-source software projects is proving especially useful for solving such complex image analysis problems.

Overview

There are many different applications for imaging techniques in life sciences: they quantify and localize signaling proteins, measure dynamic changes of entire cell structures, track cancer cell growth in time-lapse recordings, or distinguish the phases of the cell life cycle. Automatic image analysis enables huge amounts of data to be processed within a short time and ensures reproducible results.

The last few years have seen a remarkable number of new platforms and software packages for processing biological image data. So far, proprietary file formats of image analysis software supplied with microscopes have made it difficult to use open-source platforms. However, open-source applications are now getting off the ground, partly because freely available program libraries are making it possible to convert almost any image format into standard formats. The BioFormats library [1] for example reads approximately 125 image formats of various microscope manufacturers including Zeiss LSM, Metamorph Stack, Leica LCS LEI or DICOM.

Modern software for processing and analyzing biological image data has to meet a wide range of specifications. Besides being able to process a gigantic amount of heterogeneous image data, it must be easy to use. However, open-source tools are often designed to solve highly specific problems and can therefore satisfy only certain aspects of the requirements made of them.

Even applications developed for universal use have frequently been created within the framework of specific projects and are found wanting when required to perform other image analysis tasks. This makes it very difficult to choose a suitable application for a particular image analysis problem. There is so much freely available software that it is practically impossible for anyone without expert knowledge to know which to use in any one case. In fact, it is usually necessary to use a combination of different programs in order to solve all the aspects of the image analysis task in hand.

With the combination of different open-source projects a new, increasingly important requirement of image analysis software is emerging: interoperability [2, 3]. This is understood as the capability of heterogeneous, independent applications to be used together without major restrictions. A sign of the growing significance of interoperability in the open-source development are the efforts made in many software projects to support co-operation and shared interfaces or program libraries.

A program library usually consists of basic data structures and algorithms that software developers use to design software. These "toolsets" enable swift and flexible integration and testing of new functions. If two applications are designed using the same program library, it is often easier to accomplish their integration and joint use.

Image databases

The breathtaking pace of developments in optical imaging techniques for studying complex biological processes in ever-increasing resolution and dimensionality is leading to a phenomenal increase in the amount of new image data, so much so that experiments are often restricted by the fact that the available hardware often lacks the capacity to store and manage the data.

Today, therefore, large amounts of scientific image data are stored in image databases. These serve as integrative platforms, frequently offering functions for managing, exploring, processing and analyzing the experiment data [2]. Besides the actual images, they also handle the corresponding metadata for more precise descriptions of the images and experiments. Two popular web-based open-source image databases are Open Microscopy Environment Remote Objects (OMERO) [4] and the Bio-Image Semantic Query User Environment (BISQUE) [5].

Program libraries

Recently, the open-source library ImgLib2 [6] was published. It uses special algorithms and data structures to allow the efficient storing of image data in memory for instant access.  In particular, ImgLib2 enables algorithms to be developed independently of dimensionality (one-, two-, three-, n-dimensional), image type (8-bit, 12-bit, 16-bit, etc.) or memory strategy (main memory, hard disk, external server). This can substantially reduce the implementation effort, etc. Many significant projects, such as KNIME [7], OMERO [4], Fiji [8] and ImageJ2 (www.imagej-dev.org) are already using ImgLib2 as a basic or additional library. The shared use of the same basic libraries increases the desired interoperability of the various software packages.

Non Java-based open-source image analysis libraries such as VTK and ITK consist of a large number of functions for reading in, processing and outputting data. Used in combination, they provide a flexible solution for a wide variety of image analysis problems. Whereas VTK was primarily designed for the visualization of 2D and 3D image data, ITK is mainly for direct image processing and analysis.

Analysis software

Imaging for evaluating and displaying biological structures and dynamic processes is making more and more demands on image analysis software. The quantification of thousands of microscopic images, sample screenings, fluorescence lifetime measurements and many other biological investigations creates a proliferation of image data requiring objective and reproducible evaluation. Therefore, increasing use is being made of segmentation, object tracking, machine learning or visualization algorithms. Widely used open-source solutions that support this include 3D Slicer [9], BioImageXD [10], CellProfiler [11], Fiji [8], FluoRender [12], Icy [13], Image Surfer [14], IMOD [15], KNIME [7], OsiriX [16] and Reconstruct [17].

In this context, ImageJ (originally NIH Image) plays a special role among open-source programs for biological image processing [2, 18–20]. It is the most widely known and frequently used bioimage analysis tool. One of the main reasons for its success is its modular design, realized by a plug-in mechanism. This plug-in mechanism not only allows the compilation of user-specific ImageJ versions, but also enables software developers to design algorithms without particularly detailed knowledge of the native ImageJ programming interfaces. Several hundreds of plug-ins and macros have been published in the last few years for solving a wide variety of image analysis problems.

However, it can be difficult for end users to decide on the right ImageJ plug-in for their specific application as there are so many to choose from and many are only marginally different. For this reason, Fiji offers a version of ImageJ that is tailored to the needs of the bioimaging community in that it has special plug-ins and macros for analyzing microscope images. These plug-ins can be easily and transparently managed via an integrated update mechanism.

Another well-known and flexible open-source image analysis software program is CellProfiler [11]. Complex image processing pipelines can be mapped by linking different modules. Many different problems have already been solved with the software, especially in the field of high-content screening.

The further analysis, evaluation and publication of image analysis results often calls for visual editing of both statistic data or processed image data. The sheer choice of visualization options for potentially multidimensional and sometimes extremely large image data alone requires special methods and software [21].

Fig. 1: After segmentation of the two channels (nucleus, cytoplasm) with standard methods (left box), numeric properties for each cell are calculated (center box), enabling cell classification (right box).

Integration platforms

As a monolithic platform, ImageJ already offers a wealth of functions for analyzing biological image data. In view of the highly varied nature of the problems, however, it often makes sense to combine various applications. To analyze the publicly available image data set of Ilya Ravkin ("human cytoplasm-nucleus translocation assay", Broad Bioimage Benchmark Collection – www.broadinstitute.org/bbbc), for example, it may be necessary to first read in image data from OMERO and then use a combination of ImageJ filters to pre-process the images and segmentation techniques from ImgLib before finally applying classification algorithms of the external WEKA Data Mining Library (www.cs.waikato.ac.nzlml/weka) to classify the resulting segmentation (Figure 1).

Manual data transfer between the individual applications or program libraries may prove time-consuming, complex and error-prone. In addition, such combinations of several open-source tools are often hard to document and incomprehensible for other users. This explains the growing popularity of integrative workflow systems for easier mapping and orchestration of complex processing sequences in image processing and analysis.

Workflow-based platforms provide an easy possibility for configuring complex analysis pipelines on the basis of standardized dialogs and mapping them in an understandable visual form while providing high flexibility for software developers.

With some open-source workflow systems like Taverna (www.taverna.org.uk) or Galaxy (www.galaxy.psu.edu), the focus is on algorithms of traditional bioinformatics. On the other hand, the Konstanz Information Miner (KNIME – www.knime.org) [7] offers not only integrations for text and network mining, etc., but also interfaces to ImageJ, ImageJ2, ImgLib2, BioFormats, Weka and OMERO and thus image analysis.

KNIME is a user-friendly open-source integration, processing, analysis and exploration platform that is capable of processing large amounts of heterogeneous data sets. The platform allows visual modeling of workflows, as can be seen in Figure 1. The smallest functional unit of a workflow is called a node. Properties of particular relevance for image processing are:

  • Visual modeling of complex workflows: Once created, a workflow can be easily modified, reconfigured, adapted and shared with other users. Results can be quickly reproduced and validated by other users.
  • Caching: Thanks to the intelligent caching strategy, thousands of images can be processed with one single workflow and limited hardware resources.
  • Integration platform: External program libraries and tools can be flexibly integrated and provided as nodes. Both the software platform itself and the software community already offer integrations for a multitude of programs from a great variety of domains. The combination of the extensions soon leads to highly complex workflows.
  • Easy extensibility and modularity: Processing units (KNIME nodes) are not interdependent, i.e. each node can be separately developed and improved, and new nodes can be added (e.g. via the internal plug-in mechanism) without influencing existing functionalities.

Conclusion

Due to the advance of technology and the new possibilities it creates, researchers are having to familiarize themselves increasingly with widely differing image processing and analysis software programs. Fortunately, the advance of technology is being paralleled by a further development of the available software. The last years have seen an increase in user-friendliness as well as in the range of functions. The previous tendency to design single insular platforms for solving specific problems is gradually being replaced by a trend to combine and integrate different projects, as the open-source community has realized the great need for, and benefit of, closer cooperation among the separate projects (Figure 2)

This cooperative approach not only enhances the interoperability of different applications, but also provides users and software developers with new ways of analyzing biological image data.

Fig. 2: Schematic integration of various open-source software packages for the analysis of image data.

References

  1. Linkert M et. al.: Metadata matters: access to image metadata in the real world. J Cell Biology 189 (2010).
  2. Eliceiri KW et al.: Biological imaging software tools. Nature Methods 9 (2012).
  3. Carpenter AE et al.: A call for bioimaging software usability. Nature Methods 9 (2012).
  4. Allan C et al.: OMERO: flexible, model-driven data management for experimental biology. Nature Methods 9 (2012).
  5. Kvilekval K et al.: Bisque: a platform for bioimage analysis and management. Bioinformatics 26 (2010).
  6. Preibisch S et al.: lnto ImgLib – Generic Image Processing in Java. ImageJ User and Developer Conference (2010).
  7. Berthold MR et al.: KNIME: The Konstanz Information Miner. Springer (2007).
  8. Schindelin J et al.: Fiji: an open-source platform for biological-image analysis. Nature Methods 9 (2012).
  9. Pieper S et al.: From Nano to Macro. Proceedings of the 3rd IEEE International Symposium on Biomedical Imaging (2006).
  10. Kankaanpää P et al.: BiolmageXD: an open general-purpose and high-throughput image-processing platform. Nature Methods 9 (2012).
  11. Carpenter AE et al.: CellProfiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biology 7 (2006).
  12. Wan Y et al.: Proceedings of IEEE Pacific Visualization Symposium (2012).
  13. de Chaumont F et al.: lcy: an open bioimage informatics platform for extended reproducible research. Nature Methods 9 (2012).
  14. Feng D et al.: Stepping into the third dimension. J Neuroscience 27 (2007).
  15. Kremer JR et al.: Computer visualization of three-dimensional image data using IMOD. J. Struct. Biology 116 (1996).
  16. Rosset A et al.: OsiriX: An open-source software for navigating in multidimensional DICOM images. Journal Digital lmaging 17 (2004).
  17. Fiala JC: Reconstruct: a free editor for serial section microscopy. J. Microcopy 218 (2005).
  18. Schneider CA: NIH Image to ImageJ: 25 years of image analysis. Nature Methods 9 (2012).
  19. Abramoff M et al.: Image processing with lmageJ. Biophotonics International 11 (2004).
  20. Collins TJ: ImageJ for microscopy. Biotechniques 43 (2007).
  21. Walter T: Visualization of image data from cells to organisms. Nature Methods 7 (2010).

This article has originally been published in German in: BioPhotonik 3 (2012) 38–40.
English reprint with kind permission from AT-Fachverlag GmbH

Comments