| Clustering - A New, Highly Efficient
Algorithm for Cluster Analysis Introduction
In flow cytometry, clustering can be used to automatically identify
subsets of cells. Clustering is the process of automatically identifying
subsets of events in a data collection with similar characteristic.
Previously, no satisfactory clustering algorithm existed to handle
immunophenotyping data because most cluster algorithms are computationally
expensive, they don't take into account domain-specific
knowledge, or that a given distance for one region of
multidimensional space is more or less important than the same distance
in a different region.
The novel approach to clustering flow data implemented in FlowJo
Version 4 was developed by Dr. Mario Roederer. For more information
about this cluster algorithm, please contact
us. Note that this algorithm is under development and is intended
to be used as an exploratory tool. There is no guarantee that the
subsets of cells identified by this algorithm necessarily correspond
to meaningful biological subpopulations.
General Approach
Algorithmic operations that normally work on an event-by-event
basis (such as distancing or similarity operations during cluster
joining) can be significantly accelerated by working on a group
of cells at a time. In this cluster algorithm, similar events in
the dataset are grouped together and these groups could then serve
as a surrogate for individual cells. The Probability Binning Algorithm
(ref. 1-3) developed for the statistical comparison of samples of
flow data (Population Comparison Platforms)
laid the foundation for this approach. Probability Binning uses
adaptive binning to group events together into bins;
statistical operations are performed on the bins rather than on
individual events.
The algorithm functions in two primary stages. In the first stage,
adaptive binning is used to divide the events into hyperrectangular
bins. This process is critical to the success of the algorithm:
sufficient division must be done so as to ensure separation of distinct
populations, but not so much as to obviate the computation gain
of collecting similar events into single bins. In the second stage,
bins are joined to create clusters. Clusters are joined only if
they are immediately adjacent to each other and if joining would
not significantly change the distribution of any parameter involved
in the clustering. The use of adjacency as a prerequisite for clustering
shows that binning should not be so aggressive as to orphan
events of the same cluster simply because an empty bin was created
between them!
How do I cluster my data?
Click once to select the sample or subset you wish to cluster in
the Workspace window.
Under the Platforms menu - choose Clustering.
This will open the clustering platform (similar to the window below).
Step 1 - Select which fluorescence or scatter parameters to use
for clustering
Step 2 - Select cluster parameters (try with the default options
first)
Step 3 - Create the clusters
Step 4 - Join the clusters (if desired)
Step 5 - Create gates (click to select clusters first)

Generally, to begin with, you should create gates for the major
populations. You can click on any cluster's name and rename it (once
a gate has been created for that cluster); it will be renamed in
the workspace as well. In the table of clusters a "*"
in the gated column means that that cluster has been created and
can be shown. Double click in the "show" column to show
or hide that population (the "*" in the show column indicates
if the population is shown). You can select any single cluster and
change it's color from the color box. You can resize the table vs.
the graphs by clicking on the vertical bar between the two and dragging.
The table shows approximate expression of each parameter for each
cluster, ranging from "-" to "++++", where each
step corresponds to 20% of the range of the parameter (i.e., the
parameter is divided into 5 equal size parts; if the cluster is
in the first part it is assigned a "-", if it's in the
third pard (middle), it's given a "++", etc).
Note: Once clusters have been created, you can unselect any of
the histogram parameters to remove them from the view, or select
and unselected parameters to add them to the view. In other words,
when you first cluster, the selected parameters control the clustering.
Once clustering is done, however, you can view graphs for any combination
of parameters (whether or not they were used in the clustering).
Click here for a play-by-play of the clustering
process.
View a detailed description of the clustering algorithm
parameters.
1) Roederer M, Treister A, Moore W, Herzenberg LA. Probability
binning comparison: A metric for quantitating univariate distribution
differences. Cytometry. 2001 Sep 1;45(1):37-46.
2) Roederer M, Moore W, Treister A, Hardy RR, Herzenberg LA. Probability
binning comparison: a metric for quantitating multivariate distribution
differences. Cytometry. 2001 Sep 1;45(1):47-55.
3) Roederer M, Hardy RR. Frequency difference gating: A multivariate
method for identifying subsets that differ between samples.
Cytometry. 2001 Sep 1;45(1):56-64.
|