Dimensionality Reduction with the t-Distributed Stochastic Neighbor Embedding (tSNE) Algorithm

So you’ve got access to a new high-parameter cytometry instrument, and 15+ parameter panels are calling your name. What do you do with all those markers? How do you start discovering the differences between a control and a treated sample in your disease model when you don’t know what phenotypes to look at?

One approach is to use a dimensionality reduction algorithm, which reduces an N-dimensional data space into two dimensions while still maintaining the structure of the data.

FlowJo v10 now comes with a dimensionality reduction algorithm plugin called t-Distributed Stochastic Neighbor Embedding (tSNE). The tSNE algorithm computes two new derived parameters from a user-defined selection of cytometric parameters. These tSNE-generated parameters are optimized in such a way that data points that were close together in the raw high-dimensional data remain close together in the reduced data space. (Figure 1) Figure 1. Example of a 15-color flow cytometry panel after tSNE has been used to reduce dimensionality into a 2-dimensional data space. Manually-gated populations of known phenotype were overlaid onto the tSNE space in the FlowJo Layout editor, revealing how distinct phenotypic subsets of events cluster together and are enriched in distinct areas of the continent-like structure.

tsne1.png Figure 1. Example of a 15-color flow cytometry panel after tSNE has been used to reduce dimensionality into a 2-dimensional data space. Manually-gated populations of known phenotype were overlaid onto the tSNE space in the FlowJo Layout editor, revealing how distinct phenotypic subsets of events cluster together and are enriched in distinct areas of the continent-like structure.

While tSNE is a powerful visualization technique, running the algorithm is computationally expensive, and the output is non-deterministic, which means that: 1) you must limit the number of events fed into the algorithm for the calculation to complete in a reasonable period of time, and 2) if you run the algorithm more than once (on two separate samples, for example), the 2-dimensional data space created by tSNE will be different between those samples.

Therefore, the only way to compare samples effectively is to reduce the number of events (a process called downsampling) and merge the samples together into a single FCS file first (a process called concatenation), and then perform dimensionality reduction on the concatenated data set.

During the concatenation step, you have the option to create new derived parameters based on keyword/value pairs, which allows you to gate on and separate out the events from different samples or different experimental conditions (such as timepoints, treatment groups, or simulations) within the common tSNE-generated dimensionally-reduced data space.

Example Workflow I have four samples that were treated for different lengths of time with a drug that induces signaling and cytokine expression. My plan is to:

A. Clean up the data—The best analyses begin with cleaning up raw data, so I will first apply manual gates to exclude doublets, debris, and dead cells from each sample. This step reduces noise in the data and can improve the tSNE algorithm output. B. Downsample—Computation time for a tSNE calculation scales with the number of events fed into the algorithm. As a result, initiating the calculation on a gated population containing 20,000 events (instead of 50,000 or 100,000 events) significantly reduces calculation time. In Figure 2, I’m using the DownSample gate tool in the Plugins menu to create a downsample gate containing a limited number of events on each live gate population.

tsne2.png

tsne2.1.png Figure 2. Initiating the DownSample plugin.

C. Concatenate Populations—I will then merge the four downsampled gates together using the Export/Concatenate Populations tool. In the process of concatenation, I will create new keyword-based derived parameters, which I can then gate on to pull apart individual samples representing different stimulation conditions in the concatenated file. (Note that even without selecting keywords, a new parameter named Sample ID will be created, which distributes data from each individual sample.)

tsne3.png

tsne3.1.png Figure 3. Concatenating populations.

D. tSNE—Dimensionally reduce (create tSNE parameters) on the concatenated file.

tsne4.png

Figure 4. Initiating the tSNE plugin.

E. Gate on Keyword Parameter—Either the Sample ID parameter, or a new keyword-derived parameter can be used to pull apart the concatenated data. Gate on the different groups of samples distinguished by a unique sample ID or keyword value. In this case, the keyword parameter I created is called Condition, and the different samples have numerical Condition values of 1, 2, 3, and 4, which orders the samples appropriately in the concatenated file. In Figure 5, I’ve taken the concatenated file, gated on the Condition parameter, then gated on an unknown population in the tSNE data space, and applied this ‘Unknown Pop1’ gate as a child of each Condition gate, revealing the enrichment of this population specifically within the Stim1 condition.

tsne5.png Figure 5. Gating on a keyword-based derived parameter to pull apart individual samples within a concatenated file.

F. Gate Unknown Populations and Identify Phenotypes—Gating can be performed in the dimensionally-reduced tSNE space, and the gated populations evaluated for phenotypic expression of all markers. In Figure 6, I’ve used the histogram multigraph function in the Layout Editor to aid in calling the phenotype of events within the Unknown Pop1 gate. In this case, the phenotype (HLADR-CD3-Perforin+CD38+CD4-CD8+/-) tells me that the unknown population is likely to be natural killer cells of the innate immune system. I then verified the phenotypic distribution using manual gates overlaid on the dimensionally-reduced data space. Stim1 Unknown Pop1 (Purple in the overlay) is contained within the total concatenated HLADR-CD3- population (Blue in the overlay).

tsne6.png

tsne6.1.png

Figure 6. Determining phenotypic expression using multigraph histogram plots and verifying with manual gates.

Aug 03, 2017