Active In

Machine Learning

2 discussions. Member

Artificial Intelligence

Member

Virtual Reality

Follower

Featured Contributions

Contributed 13%

wiki

Dimensionality Reduction and Principal Component Analysis

**Dimensionality reduction** aims reduce the number of features of a high dimensional dataset in-order to overcome the difficulties that arise due to the curse of dimensionality.

There are two approaches: **feature selection** and **feature extraction**. Feature selection focuses on finding a subset of the original attributes. Whereas feature extraction transforms the original high-dimensional space into a lower dimensional one. Ideally, some structure in the data should remain in order to preserve enough information. Algorithms can be unsupervised (principal component analysis or PCA, independent component analysis or ICA) or supervised (linear discriminant analysis or LDA). In feature extraction, transformations can be linear (PCA, LDA) or non-linear (t-SNE, autoencoders).

There are plenty of applications such as hidden patterns visualization (by removing highly correlated attributes), noise reduction (removing irrelevant features), further exploration, data compression and storage, etc. In fact, dimensionality reduction is usually applied as a **preprocessing step **for other machine learning and data ...

Read more…(951 words)

Contributed 49%

wiki

Dropout (neural network regularization)

**Dropout **is a widely used regularization technique for neural networks. Neural networks, especially deep neural networks, are flexible machine learning algorithms and hence prone to overfitting. In this tutorial, we'll explain what is dropout and how it works, including a sample TensorFlow implementation.

If you [have] a deep neural net and it's not overfitting, you should probably be using a bigger one and using dropout, ... - Geoffrey Hinton [2]

Dropout is a regularization technique where during each iteration of gradient descent, we *drop* a set of neurons selected at random. By *drop*, what we mean is that we essentially act as if they do not exist.

Read more…(474 words)

Contributed 38%

wiki

Logistic Regression

Logistic Regression is a variant of linear regression where dependent or output variable is categorical. Don't be confused by the name logistic regression, its a classification algorithm. In particular, we can use it for **binary classification**. For example, we might want to predict whether or not a person has diabetes, or whether or not an email is spam.

The term *logistic* in logistic regression comes from the **logistic function **(also known as **sigmoid function**), which can be written as:

f(z)=\frac{1}{1 + e^{-z}} = =\frac{e^z}{e^z + 1}

The following is what we get if we plot *f*(*z*):

Read more…(522 words)

Contributed 39%

wiki

Overfitting, Cross Validation and Regularization

**Overfitting** is one of the most important problems (and concepts) in machine learning.

It's not a good idea to test a machine learning model on a dataset which we used to train it, since it won't give any indication of how well our model performs on unseen data. The ability to perform well on unseen data is called **generalization**, and is the desirable characteristic we want in a model.

When a model performs well on **training data** (the data on which the algorithm was trained) but does not perform well on **test data** (new or unseen data), we say that it has **overfit** the training data or that the model is *overfitting*. This happens because the model learns the noise present in the training data as if it was a reliable pattern.

Conversely, when a model does not perform well on tr...

Read more…(1310 words)

Contributed 24%

wiki

Types of Machine Learning problems: Supervised, Unsupervised and Reinforcement Learning

Currently, most of the machine learning products use supervised learning. In this, we have a set of features or inputs X (for example, an image) and our model will __predict a target or output variable y__ (for example, caption for the image).

y = f(X)

In other words, our model learns a function that maps inputs to desired outputs. Features are **independent variables** and targets are the **dependent variable**.

Supervised learning problems can be further grouped ...

Read more…(895 words)

Contributed 16%

wiki

Learning = Representation + Evaluation + Optimization

This tutorial describes the important components of a learning algorithm: **representation** (what the model looks like), **evaluation** (how do we differentiate good models from bad ones), and **optimization** (what is our process for finding the good models among all the possible models).

Read more…(505 words)

Contributed 50%

wiki

Linear Regression Tutorial with Example

Linear Regression is a simple machine learning model for regression problems, i.e., when the target variable is a real value.

Let's start with an example — suppose we have a dataset with information about the area of a house (in square feet) and its price (in thousands of dollars) and our task is to build a machine learning model which can predict the price given the area. Here is what our dataset looks like

Read more…(666 words)

Contributed 29%

wiki

Gradient Descent, an intuitive introduction

**Gradient Descent** is one of the most popular and widely used **optimization algorithm**. Given a machine learning model with parameters (weights and biases) and a cost function to evaluate how good a particular model is, our learning problem reduces to that of finding a good set of weights for our model which minimizes the cost function.

Read more…(880 words)

Contributed 65%

wiki

K-nearest neighbors

**K-nearest neighbors (KNN)** is one of the simplest Machine Learning algorithms. It is a supervised learning algorithm which can be used for both classification and regression.

Let us understand this algorithm with a classification problem. For simplicity of visualization, we'll assume that our input data is 2 dimensional. We have two possible classes - *green* and *red*.

Lets plot out training data in feature space.

There is no explicit training phase in KNN! In other words, for classifying new data points, we'll directly use our dataset (in some sense, the dataset *is* the model).

To classify a new data point, we find the **k** points in the training data closest to it, and make a prediction based on whichever class is most common among these **k** points (i.e. we simulate a vote). Here *closest* is defined by a suitable distance metric such as euclidean distance. Other distance metrics are discussed below.

For example, if we want to classify blue point as shown in following figure, we consider **k** nearest data points and we assign the class which has the majority.

If **k = 3**, we get two data points with green class and one data point with red class. Hence, we'll predict green class for the new point.

Here's another example, let us change the position of new point (blue point) as shown below.

If we take **k = 5** then we get four neighbors with red class and one neighbor with green class. Hence, new point will be classified as red point.

In case of regression (when target variable is a real value), we take the average of the K nearest neighbors.

A small value of k means that noise will have a higher influence on the result and large value make the algorithm computationally expensive. Usually, we perform cross-validation to find out best k value (or to choose the value of k that best suits our accuracy / speed trade-off). If you don't want to try multiple values of k, a rule of thumb is to set k equal to the square root of total number of data points. For more on choosing best value of k, refer __this stackoverflow__ thread.

There are various options available for distance metric such as euclidian or manhattan distance. The most commonly used metric is **euclidian distance**.

Minkowski is the generalization of Euclidian and Manhattan distance.

Note that you'll want to do some pre-processing on the input data (for example, make sure each dimension has 0 mean and unit variance) so that the distance metrics above are meaningful.

## load the datasetfrom sklearn.datasets import load_irisdataset = load_iris()X = dataset.datay = dataset.target## precessing# standardize the data to make sure each feature contributes equally# to the distancefrom sklearn.preprocessing import StandardScalerss = StandardScaler()X_processed = ss.fit_transform(X)## split the dataset into train and test setfrom sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X_processed, y, test_size=0.3, random_state=42)## fit n nearest neighbor modelfrom sklearn.neighbors import KNeighborsClassifiermodel = KNeighborsClassifier(n_neighbors = 5, metric="minkowski", p=2)# p=2 for euclidian distancemodel.fit(X_train, y_train)# output:# KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',# metric_params=None, n_jobs=1, n_neighbors=5, p=2,# weights='uniform')## evaluatemodel.score(X_test, y_test)# output: 1.0

In a parametric model, we continuously update a fixed number of parameters to learn a function which can classify new data point without requiring the training data (for example, logistic regression). In a non-parametric model, the number of parameters grows with the size of training data. This is what happens in KNN.

Read more…(502 words)

Contributed 100%

wiki

Practical Machine Learning With Python [Part 3b]: K-means Clustering

K-means is one of the simplest unsupervised learning algorithm used for clustering problem. Clustering is a process that finds groups of similar objects. So, in clustering our goal is to group objects based on their features similarity. K-means clustering is very easy to understand, very easy to implement and computationally efficient clustering algorithm. Now, let us see how it works.

Basic idea behind K-means is, we define **k centroids**, that is, one for each cluster. Here, k is the hyperparameter and we should be very careful about it. Usually, you should try range of values to determine best value of k. Where do we place them initially? Common choice is to place them as fas ar possible. Now, assign each data point to the nearest centroid. Once each data point has been assigned to one of the centroids, our next step is to recalculate k new centroids. How do we do that? We do it by moving centroid(old) to the center of the data samples that were assigned to it. And how do we do find center? We find it by taking the mean of data points in a particular cluster.

K-means clutering aims to find positions *μ _{i}*, i=1,2,....

Read more…(246 words)

Load More