Table of contents

Paper Summary: Visualizing Large-scale and High-dimensional Data

- LargeVis - a technique to visualize large-scale and high-dimensional data in low-dimensional space.
- Problem relates to both information visualization and machine learning (and data mining) domain.
- Link to the paper

- State of the art method for visualization problem.
- Start by constructing a similarity structure from the data and then project the structure to 2/3 dimensional space.
- An accelerated version proposed that uses a K-nearest Neighbor (KNN) graph as the similarity structure.
- Limitations
- Computational cost of constructing KNN graph for high-dimensional data.
- Efficiency of graph visualization techniques breaks down for large data (
*O(NlogN)*complexity). - Parameters are sensitive to the dataset.

- Constructs KNN graph (in a more efficient manner as compared to t-SNE).
- Uses a principled probabilistic model for graph visualization.
- Let us say the input is in the form of
*N*datapoints in*d*dimensional space.

**KNN Graph Construction**

- Random projection tree used for nearest-neighborhood search in the high-dimensional space with euclidean distance as metric of distance.
- Tree is constructed by partitioning the entire space and the nodes in the tree correspond to subspaces.
- For every non-leaf node of the tree, select a random hyperplane that splits the subspace (corresponding to the leaf node) into two children.
- This is done till the number of nodes in the subspace reaches a threshold.
- Once the tree is constructed, each data point gets assigned a leaf node and points in the subspace of the leaf node are the candidate nearest neighbors of that datapoint.
- For high accuracy, a large number of trees should be constructed (which increases the computational cost).
- The paper counters this bottleneck by using the idea "a neighbor of my neighbor is also my neighbor" to increase the accuracy of the constructed graph.
- Basically construct an approximate KNN graph using random projection trees and then for each node, search its neighbor's neighbors as well.
- Edges are assigned weights just like in t-SNE.

**Probabilistic Model For Graph Visualization**

- Given a pair of vertices, the probability of observing an edge between them is given as a function of the distance between the projection of the pair of vertices in the lower dimensional space.
- The probability of observing an edge with weight
*w*is given as*wth*power of probability of observing an edge. - Given a weighted graph,
*G*, the likelihood of the graph can be modeled as the likelihood of observed edges plus the likelihood of negative edges (vertex pairs without edges). - To simplify the objective function, some negative edges are sampled corresponding to each vertex using a noisy distribution.
- The edges are sampled with probability proportional to their weight and then treated as binary edges.
- The resulting objective function can be optimized using asynchronous stochastic gradient descent (very effective on sparse graphs).
- The overall complexity is
*O(sMN)*,*s*is the dimension of lower dimensional space,*M*is the number of negative samples and*N*is the number of vertices.

- Data is first preprocessed with embedding learning techniques like SkipGram and LINE and brought down to 100-dimensional space.
- One limitation is that the data is preprocessed to reduce the number of dimensions to 100. This seems to go against the claim of scaling for hundreds of dimensions.
- KNN construction is fastest for LargeVis followed by random projection trees, NN Descent, and vantage point trees (used in t-SNE).
- LargeVis requires very few iterations to create highly accurate KNN graphs.
- KNN Classifier is used to measure the accuracy of graph visualization quantitatively.
- Compared to t-SNE, LargeVis is much more stable with respect to learning rate across datasets.
- LargeVis benefits from its linear complexity (as compared to t-SNE's log linear complexity) and for default learning rate, outperforms t-SNE for larger datasets.

Read more…(612 words)

About the author:

Shagun Sodhani

Loading…

Join the discussion. Add a reply…

Post

Table of contents

- Introduction
- t-SNE
- LargeVis
- Experiments

About the author

Shagun Sodhani

Ready to join our community?

Sign up below to automatically get notified of new lists, get **reminders** to finish ones you subscribe to, and **bookmark** articles to read later.

Continue with Facebook

— OR —

Your Full Name

Email address

I have an account. Log in instead

By signing up, you agree to our Terms and our Privacy Policy.