How to Contribute
Sign up · Log in
1 year ago
Paper Summary: Visualizing Large-scale and High-dimensional Data
LargeVis - a technique to visualize large-scale and high-dimensional data in low-dimensional space.
Problem relates to both information visualization and machine learning (and data mining) domain.
Link to the paper
State of the art method for visualization problem.
Start by constructing a similarity structure from the data and then project the structure to 2/3 dimensional space.
An accelerated version proposed that uses a K-nearest Neighbor (KNN) graph as the similarity structure.
Computational cost of constructing KNN graph for high-dimensional data.
Efficiency of graph visualization techniques breaks down for large data (
Parameters are sensitive to the dataset.
Constructs KNN graph (in a more efficient manner as compared to t-SNE).
Uses a principled probabilistic model for graph visualization.
Let us say the input is in the form of
KNN Graph Construction
Random projection tree used for nearest-neighborhood search in the high-dimensional space with euclidean distance as metric of distance.
Tree is constructed by partitioning the entire space and the nodes in the tree correspond to subspaces.
For every non-leaf node of the tree, select a random hyperplane that splits the subspace (corresponding to the leaf node) into two children.
This is done till the number of nodes in the subspace reaches a threshold.
Once the tree is constructed, each data point gets assigned a leaf node and points in the subspace of the leaf node are the candidate nearest neighbors of that datapoint.
For high accuracy, a large number of trees should be constructed (which increases the computational cost).
The paper counters this bottleneck by using the idea "a neighbor of my neighbor is also my neighbor" to increase the accuracy of the constructed graph.
Basically construct an approximate KNN graph using random projection trees and then for each node, search its neighbor's neighbors as well.
Edges are assigned weights just like in t-SNE.
Probabilistic Model For Graph Visualization
Given a pair of vertices, the probability of observing an edge between them is given as a function of the distance between the projection of the pair of vertices in the lower dimensional space.
The probability of observing an edge with weight
is given as
power of probability of observing an edge.
Given a weighted graph,
, the likelihood of the graph can be modeled as the likelihood of observed edges plus the likelihood of negative edges (vertex pairs without edges).
To simplify the objective function, some negative edges are sampled corresponding to each vertex using a noisy distribution.
The edges are sampled with probability proportional to their weight and then treated as binary edges.
The resulting objective function can be optimized using asynchronous stochastic gradient descent (very effective on sparse graphs).
The overall complexity is
is the dimension of lower dimensional space,
is the number of negative samples and
is the number of vertices.
Data is first preprocessed with embedding learning techniques like SkipGram and LINE and brought down to 100-dimensional space.
One limitation is that the data is preprocessed to reduce the number of dimensions to 100. This seems to go against the claim of scaling for hundreds of dimensions.
KNN construction is fastest for LargeVis followed by random projection trees, NN Descent, and vantage point trees (used in t-SNE).
LargeVis requires very few iterations to create highly accurate KNN graphs.
KNN Classifier is used to measure the accuracy of graph visualization quantitatively.
Compared to t-SNE, LargeVis is much more stable with respect to learning rate across datasets.
LargeVis benefits from its linear complexity (as compared to t-SNE's log linear complexity) and for default learning rate, outperforms t-SNE for larger datasets.
Read more…(612 words)
About the author:
Join the discussion. Add a reply…
Table of contents
About the author
Ready to join our community?
Sign up below to automatically get notified of new courses, get
to finish ones you subscribe to, and
lessons to read later.
Continue with Facebook
— OR —
Your Full Name
I have an account. Log in instead
By signing up, you agree to our
Natural Language Processing
UX & UI Design
How to Contribute
Upgrade to Pro
Help and FAQ
Privacy, Terms & Refunds
Get in touch
Copyright 2016-18, Compose Labs Inc. All rights reserved.