- 1. Sampling Algorithms
- 2. Map-Reduce
- 3. Graph Algorithms
- 4. Feature Selection
- 5. Algorithms to work efficiently
- 6. Classification/Regression Algorithms
- 7. Clustering Methods
- 8. Other algorithms you can learn about
As a data scientist I believe that a lot of work has to be done before Classification/Regression/Clustering methods are applied to the data you get. The data which may be messy, unwieldy and big. So here are the list of algorithms that helps a data scientist to make better models using the data they have:
In case you want to work with a sample of data.
If you want to work with the whole data.
Can be used for feature creation. For Example: I had a use case where I had a graph of 60 Million customers and 130 Million accounts. Each account was connected to other account if they had the Same SSN or Same Name+DOB+Address. I had to find customer ID’s for each of the accounts. On a single node parsing such a graph took more than 2 days. On a Hadoop cluster of 80 nodes running a Connected Component Algorithm took less than 24 minutes. On Spark it is even faster.
Recently I was working on an optimization problem which was focussed on finding shortest distance and routes between two points in a store layout. Routes which don’t pass through different aisles, so we cannot use euclidean distances. We solved this problem by considering turning points in the store layout and the djikstra’s Algorithm.
Apart from these above algorithms sometimes you may need to write your own algorithms. Now I think of big algorithms as a combination of small but powerful algorithms. You just need to have idea of these algorithms to make a more better/efficient product. So some of these powerful algorithms which can help you are:
The usual suspects. Minimum you must know:
For unsupervised learning
Hope this has been helpful.....
Link to original article: Machine Learning algorithms for Data Scientists