An Introduction to the Discovery Tools

FlowJo v10 has released some new tools for use with high parametric data! If you are working with 10+ colors, these tools are must-haves. We at FlowJo refer to these collectively as “discovery tools.” They come with the FlowJo v10.2 installers and are placed in a folder labeled “plugins.” When you’ve opened FlowJo, the plugins can be found on the Workspace tab, Population band, and within the Plugins drop-down menu (see Figure 1). 

Figure 1. Location of the new discovery tools.

Why are they called discovery tools? Because they attempt to find or “discover” populations without—or with very little—human guidance. The toolset includes a downsampler, a dimensionality reduction tool, two clustering algorithms, and a plugin for querying a database. This article introduces you to these tools and their functions. 

The downsampler is perhaps the most straightforward. This plugin simply creates a population with a reduced number of events1. For example, if you have 90,000 events, and you want to reduce that population to 5,000 events with the same representative structure, the downsample plugin can produce the 5,000-cell population. The dimensionality reduction and clustering algorithms tend to be computationally intense, so reducing the numbers of cells before using them in downstream applications saves you time. 

Dimensionality reduction is a method to view and analyze data in reduced (e.g. 2) space versus high dimensions (e.g. 20). We are used to seeing and interacting with a three-dimensional world, but pile on extra dimensions—like time or hyperspace—and suddenly our ability to clearly interpret objects is reduced. The algorithm allows you to see distinct populations of cells in a single two-dimensional plot, as opposed to having to go through all the permutations of 2D plots in a multi-color experiment. The tool that FlowJo v10 provides is known as t-distributed Stochastic Neighbor Embedding (tSNE; pronounced “tee-snee”)2,3. 

The output of the tSNE algorithm is two derived parameters: tSNE-X and tSNE-Y. The reduced space plot is viewed along these axes. These parameters can be gated just like any other and their content perused in the Layout Editor (Figure 2).

Figure 2. tSNE parameters viewed in a graph window, both as a pseudocolor dot plot and smoothed pseudocolor plot.

Figure 3. Identifying grouped populations of cells using a gated overlay in the Layout Editor.

The clustering algorithms Spanning Tree Progression of Density Normalized Events (SPADE)4 and FlowMeans5 are also included with the latest installers. These algorithms attempt to identify populations in complex data sets. In addition, SPADE provides information about the relationships between the populations discovered. Both of these tools produce clusters, or populations in the FlowJo workspace, as if they had been derived from gates. SPADE also includes PDFs of the spanning trees for each marker. The heatmapping of the spanning tree is based on median fluorescence intensity (MFI) or Coefficient of Variance (CV) of a marker and population counts, among other factors (Figure 4). 

In general, FlowMeans is faster than SPADE and does well with dense, round populations. In contrast, SPADE is a bit more complex and therefore slower, but is good at retaining rare populations.

Figure 4. Output from the SPADE plugin. FlowMeans and SPADE both produce populations in the workspace (top panel; FlowMeans not shown) that can be further subgated and statistics derived. SPADE also includes heatmapped spanning trees as a visualization aid (bottom panel). 

The ultimate tool in the box is the CellOntology plugin. This plugin allows you to define a cell or population of cells, based on their “+” and “-” phenotypes (sorry, no lo, mid, or hi definitions at this time), and query the cell ontology database5. If your cell/population of interest has previously been described, the database should return a match and a hierarchical tree diagram that can be viewed in the Layout Editor. The results of the database query are placed in a folder labeled “CellOntology.” This folder is placed wherever your workspace is stored. The cell name as determined by the database is indicated in the tree diagram (Figure 5), but can also be found in the CSV file produced by the query. 

Figure 5. Output from the CellOntology plugin. This hierarchical tree diagram defines the queried population and its location in hematopoietic progression (CCR7+,CD45+,CD8+ cell).

Stay tuned for a tutorial on a general “discovery” workflow.

Have questions or comments? Please send us an email at techsupport [at] flowjo [dot] com.

References.

  1. FlowJo, LLC. 2015. A downsample plugin for FlowJo v10. [Computer software], Ashland, Oregon.

  2. L.J.P. van der Maaten and G.E. Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 9(Nov):2431–2456, 2008

  3. L.J.P. van der Maaten. Accelerating t-SNE using tree-based algorithms.Journal of Machine Learning Research, 15(Jan):3221-3245, 2014.

  4. Qiu P, Simonds EF, Bendall SC, et al. Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE. Nature biotechnology. 2011;29(10):886-891. doi:10.1038/nbt.1991.

  5. Aghaeepour N, Nikolic R, Hoos HH, Brinkman RR. Rapid Cell Population Identification in Flow Cytometry Data. Cytometry Part A : the journal of the International Society for Analytical Cytology. 2011;79(1):6-13. doi:10.1002/cyto.a.21007.

  6. Bard J, Rhee SY, Ashburner M. An ontology for cell types. Genome Biology. 2005;6(2):R21. doi:10.1186/gb-2005-6-2-r21.