DL Applications

Exploring high-dimensional data: t-SNE

An effective way to understand non-linear transformations here.

The goal is to take a set of points in a high-dimensional space and find a faithful representation of those points in a lower-dimensional space, typically the 2D plane.

The algorithm is non-linear and adapts to the underlying data, performing different transformations on different regions.

A second feature of t-SNE is a tune-able parameter, “perplexity,” which says (loosely) how to balance attention between local and global aspects of your data. The parameter is, in a sense, a guess about the number of close neighbors each point has. The perplexity value has a complex effect on the resulting pictures.

Getting the most from t-SNE may mean analyzing multiple plots with different perplexities.

An additional hyperparameter to tune is the number of steps/iterations:

If you see a t-SNE plot with strange “pinched” shapes, chances are the process was stopped too early. Unfortunately, there’s no fixed number of steps that yields a stable result. Different data sets can require different numbers of iterations to converge.
a default safe number for most datasets is 5000

Usually, if you re-run the algorithm on the same dataset under the same hyperparameters, you should see the same behavior, but there's always a few exceptions.

Separately, the size of clusters don't mean anything because the t-SNE algorithm adapts its notion of “distance” to regional density variations in the data set. As a result, it naturally expands dense clusters, and contracts sparse ones, evening out cluster sizes. Overall, you cannot see relative sizes of clusters in a t-SNE plot. Also, distances between well-separated clusters in a t-SNE plot may mean nothing.

Low perplexity values often lead to non-statistically relevant clusters. Recognizing these clumps as random noise is an important part of reading t-SNE plots. However, after appropriately increasing the perplexity, t-SNE performs something really powerful on high-dimensional normal distributions, which are very close to uniform distributions on a sphere: evenly distributed, with roughly equal spaces between points. And that's exactly what you see. In that way, it's actually more accurate than a linear projection:

Sometimes you can read topological information off a t-SNE plot, but that typically requires views at multiple perplexities:

PreviousAutoencoders: Deep Dive

Last updated 7 years ago