OVERVIEW MANUAL DOWNLOAD SUPPORT GET IT INFO
GET WINDOWS VERSION GET MAC VERSION

Clustering


A Highly Efficient Algorithm for Cluster Analysis


Introduction

In flow cytometry, clustering can be used to automatically identify subsets of cells. Clustering is the process of automatically identifying subsets of events in a data collection with similar characteristic. Previously, no satisfactory clustering algorithm existed to handle immunophenotyping data because most cluster algorithms are computationally expensive, they don't take into account "domain-specific" knowledge, or that a given "distance" for one region of multidimensional space is more or less important than the same "distance" in a different region.

The novel approach to clustering flow data implemented in FlowJo Version 4 was developed by Dr. Mario Roederer. For more information about this cluster algorithm, please contact us. Note that this algorithm is under development and is intended to be used as an exploratory tool. There is no guarantee that the subsets of cells identified by this algorithm necessarily correspond to meaningful biological subpopulations.

General Approach

Algorithmic operations that normally work on an event-by-event basis (such as distancing or similarity operations during cluster joining) can be significantly accelerated by working on a group of cells at a time. In this cluster algorithm, similar events in the dataset are grouped together and these groups could then serve as a surrogate for individual cells. The Probability Binning Algorithm (ref. 1-3) developed for the statistical comparison of samples of flow data (Population Comparison Platforms) laid the foundation for this approach. Probability Binning uses adaptive binning to group events together into "bins"; statistical operations are performed on the bins rather than on individual events.

The algorithm functions in two primary stages. In the first stage, adaptive binning is used to divide the events into hyperrectangular bins. This process is critical to the success of the algorithm: sufficient division must be done so as to ensure separation of distinct populations, but not so much as to obviate the computation gain of collecting similar events into single bins. In the second stage, bins are joined to create clusters. Clusters are joined only if they are immediately adjacent to each other and if joining would not significantly change the distribution of any parameter involved in the clustering. The use of adjacency as a prerequisite for clustering shows that binning should not be so aggressive as to "orphan" events of the same cluster simply because an empty bin was created between them!

How do I cluster my data?

Click once to select the sample or subset you wish to cluster in the Workspace window. Under the Platforms menu - choose Clustering. This will open the clustering platform (similar to the window below).

  • Step 1 - Select which fluorescence or scatter parameters to use for clustering
  • Step 2 - Select cluster parameters (try with the default options first)
  • Step 3 - Create the clusters
  • Step 4 - Join the clusters (if desired)
  • Step 5 - Create gates (click to select clusters first)

Generally, to begin with, you should create gates for the major populations. You can click on any cluster's name and rename it (once a gate has been created for that cluster); it will be renamed in the workspace as well. In the table of clusters an asterisk (*) in the gated column means that that cluster has been created and can be shown. Double click in the "show" column to show or hide that population (the asterisk in the show column indicates if the population is shown). You can select any single cluster and change it's color from the color box. You can resize the table vs. the graphs by clicking on the vertical bar between the two and dragging. The table shows approximate expression of each parameter for each cluster, ranging from "-" to "++++", where each step corresponds to 20% of the range of the parameter (i.e., the parameter is divided into 5 equal size parts; if the cluster is in the first part it is assigned a "-", if it's in the third pard (middle), it's given a "++", etc).

Note: Once clusters have been created, you can unselect any of the histogram parameters to remove them from the view, or select and unselected parameters to add them to the view. In other words, when you first cluster, the selected parameters control the clustering. Once clustering is done, however, you can view graphs for any combination of parameters (whether or not they were used in the clustering).

Click here for a play-by-play of the clustering process.
View a detailed description of the clustering algorithm parameters.

1) Roederer M, Treister A, Moore W, Herzenberg LA. Probability binning comparison: A metric for quantitating univariate distribution differences. Cytometry. 2001 Sep 1;45(1):37-46.

2) Roederer M, Moore W, Treister A, Hardy RR, Herzenberg LA. Probability binning comparison: a metric for quantitating multivariate distribution differences. Cytometry. 2001 Sep 1;45(1):47-55.

3) Roederer M, Hardy RR. Frequency difference gating: A multivariate method for identifying subsets that differ between samples. Cytometry. 2001 Sep 1;45(1):56-64.

Google Custom Search