Featured Contributions

Contributed 1%

2.

tutorial

TutorialMachine Learning

End-to-End Example: Using Logistic Regression for predicting Diabetes

In this tutorial, we will see how to predict whether a person has diabetes or not, based on information like blood pressure, body mass index (BMI), age, etc.

The data was collected and made available by "National Institute of Diabetes and Digestive and Kidney Diseases" as part of the Pima Indians Diabetes Database. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here belong to the Pima Indian heritage (subgroup of Native Americans), and are females of ages 21 and above.

We will be using Python as our programming language, and making use of some popular python machine learning and data science related packages. First of all, we will import pandas to read our data from a CSV file and manipulate it for further use. We will also use numpy to convert out data into a format suitable to feed our classif...

Read more…(1398 words)

Contributed 1%

3.

tutorial

TutorialMachine Learning

Introduction to NumPy (follow along)

NumPy, short for *Numerical Python* is a library for scientific computing in Python. As the name suggests, it provides a host of tools to conduct mathematical and numerical routines. One amongst these high-performing tools is the NumPy array. This multidimensional array object is a powerful data structure for efficient computation on vectors and matrices. In this article, we will explore these arrays and their power-packed functionalities.

You are encouraged to **follow along** with the tutorial and play around with NumPy, trying various things and making sure you're getting the hang of it. Let's get started!

Installation instructions for NumPy can be found here: Installing packages — SciPy.org (NumPy is part of the *Scientific Python* (SciPy) stack). In addition, we recommend you install IPython, which provides a lot of nice features helpful when using Python interact...

Read more…(1561 words)

Contributed 1%

4.

tutorial

TutorialMachine Learning

Linear Algebra: Vectors, Matrices and their properties (Quick Review)

Large datasets are often comprised of hundreds to millions of individual data items. It is easier to work with this data and operate on it when it is represented in the form of vectors and matrices. Broadly speaking linear algebra is a branch of mathematics that deals with vectors and operations on vectors. Linear algebra is also extremely important in various machine learning and data processing algorithms.

This tutorial covers the basics of vectors and matrices, as well as the concepts that are required for data science and machine learning. It also introduces you terminology, such as "dot product", "trace of a matrix", etc. You can always Google them later, but it helps to be somewhat familiar.

Read more…(1017 words)

Contributed 1%

5.

tutorial

TutorialMachine Learning

Introduction to Data Visualization with Matplotlib (follow along)

**Matplotlib** is the most popular Python package for data visualization. It provides a quick way to visualize data from Python and create publication-quality figures in various different formats. Matplotlib is a multi-platform data visualization library built on NumPy arrays. This allows it to work with the broader SciPy stack.

In this article, we are going to explore matplotlib in interactive mode covering 7 basic cases. You are encouraged to **follow along** with the tutorial and play around with Matplotlib, trying various things and making sure you're getting the hang of it. Let's get started!

Just as we use the np shorthand for NumPy and the pd shorthand for Pandas, we will use some standard shorthands for Matplotlib imports:

import matplotlib as mpl

Read more…(944 words)

Contributed 1%

6.

tutorial

TutorialMachine Le…

Introduction to Pandas (follow along)

If real world data-sets contain numeric, texts, alpha-numeric, time-stamps, and various other unstructured data types, then how does one store, retrieve and easily manipulate these multidimensional data-sets? The answer is a data science library like Pandas! Pandas is a powerful data analysis toolkit with high-performance and easy-to-use data structures. Unlike Excel and SQL, it carries a host of useful tools, methods and other functionalities that set it apart when it comes to row-wise and column-wise data manipulations. We will visit these functionalities in this tutorial.

In [7]: from IPython.display import Imagefrom IPython.core.display import HTMLImage(url="https://raw.githubusercontent.com/Back2Basics/Pandas-Talks/master/Python-datasci.jpg", width=400, height=300)Out[7]: <IPython.core.display.Image object>

Read more…(1139 words)

Contributed 1%

7.

tutorial

TutorialDeep Learning

The Curse of Dimensionality

Let's say you're working on a cool image processing project, and your goal is to build an algorithm that analyzes faces for emotions. It takes in a 256 pixel by 256 pixel grayscale image as its input and spits out an emotion as an answer. For example, if you passed in the following image, you'd expect the algorithm to label it as "happy."

Now this is all well and good, but before we're sati...

Read more…(1061 words)

Contributed 1%

8.

tutorial

TutorialMachine Le…

K-Means Clustering

**K-means clustering **is an algorithm to perform clustering. It is simple to understand and implement, and hence is one of the most popular methods, and often the first algorithm to be tried out when performing clustering.

The idea behind **clustering** is to segregate the data into groups called *clusters*, so that instances with similar behavior are classified in the same cluster. It is used in data mining, pattern recognition and anomaly detection. Clustering is the most popular **unsupervised learning** method. Unsupervised learning is a set of techniques to identify patterns and underlying characteristics in data.

K-means clustering partitions the dataset into* k *clu...

Read more…(848 words)

Contributed 1%

9.

tutorial

TutorialMachine Learning

Logistic Regression

Logistic Regression is a variant of linear regression where dependent or output variable is *categorical*, i.e. it takes out of a few possible discrete values. Don't be confused by the name logistic *regression*, it's a *classification* algorithm.

In particular, we can use it for **binary classification** (two categories). For example, we might want to predict whether or not a person has diabetes, or whether or not an email is spam.

The term *logistic* in logistic regression comes from the **logistic function **(also known as **sigmoid function**), which can be written as:

f(z)=\frac{1}{1 + e^{-z}} = =\frac{e^z}{e^z + 1}

The following is what we get if we plot *f*(*z*):

Read more…(992 words)

Load More