Thanks for noticing the error. It has been corrected.

CommonLounge

Categories

Active In

Machine Learning

1 reply. 1 discussion. Member

Deep Learning

Member

Algorithms and Data Structures

Member

"The Batman" Universe

Member

Big Data

Member

Data Science and Big Data

Member

Data Analytics

Member

Board Games

Member

Applying Logic

Member

Featured Contributions

NaN.

tutorial

Pandas: Apply functions and GroupByby Wiki

So far in the course, we have learnt quite a bit about DataFrames. In particular, we learnt about using various boolean and arithmetic operations on DataFrame columns, and also about indexing to select and modify various subsets of a DataFrame.

In this tutorial we will learn another method for doing operations on and also modifying a DataFrame using DataFrame methods like apply() and applymap(). These methods allow us to *apply* a function over an entire DataFrame.

Let's get started!

As in the previous tutorials, let us load the Pandas and Numpy libraries at the beginning.

type=codeblock|id=pd_func_import|autocreate=datascience|show_output=0

Read more…(1655 words)

Category: Machine Learning

comment in this discussion

Contributed 2%

3.

tutorial

Correlation Analysis: Multivariable [Under Construction]

Note: This tutorial is currently under construction. The final version is expected to be ready on or before June 15th 2019.

Correlation analysis is statistical evaluation method used to study the strength of relationship between two numerical variables. This type of analysis is useful when we want to check if there exist any positive or negative connections between the variables.

We will start by loading the wine_v2 , tips and questions_data datasets.

type=codeblock|id=load_data1|autocreate=datascience|show_output=0

Read more…(1907 words)

Category: Data Science and Big Data

Contributed 6%

4.

tutorial

Correlation Analysis: Two Variables

Correlation analysis can help us understand whether, and how strongly, a pair of variables are related.

In data science and machine learning, this can help us understand relationships between features/predictor variables and outcomes. It can also help us understand dependencies between different feature variables.

For example:

- How strong is the correlation between mental stress and cardiac issues?
- Is there a correlation between literacy rate and frequency of criminal activities?

This tutorial will help you learn the different tech...

Read more…(2843 words)

Category: Data Science and Big Data

Contributed 46%

5.

tutorial

Pandas: Indexing and Slicing

You have already learnt about basics of the Pandas DataFrame — a 2-dimensional data-structure supported by Pandas which looks and behaves like a *table*.

In this tutorial and the next one, we will learn how to select various subsets of the DataFrame. Pandas library provides us with a number a flexible options to do this.

As in the first Pandas tutorial, let's start by importing the libraries and loading the dataset.

type=codeblock|id=pd_index_load_adult|autocreate=datascience|show_output=1import pandas as pdimport numpy as np

Read more…(1733 words)

Category: Machine Learning

Contributed 94%

6.

tutorial

Polynomial Regression and Feature Engineering

Polynomial features are higher power polynomial terms of the original features, which are added to the feature space of a model.

Let us understand this with a few examples.

Suppose we have a dataset with features x_1, x_2 and target variable y. A multivariable linear regression model for this set of data would be:

y_{pred}=w_1x_1+w_2x_2+b

Polynomial features are higher ordered values of x_1 and x_2 which we can add to this model, for eg. x_1^2, x_1^3, x_2^2 , etc.

Our new model would look like this:

Read more…(1077 words)

Category: Machine Learning

Contributed 58%

7.

tutorial

Feature Scaling

Feature scaling is an important part of data pre-processing.

Often, numeric variables in a dataset have very different scales. For example, let's say we have a dataset which includes the area of a house (in square feet) and its corresponding price (in US dollars). Typically, the area of the house will be in the range 500 - 5000 square feet, but the price will range from $100,000 - $5,000,000. As you can see, the scale of the features are very different. In this case, the price is almost 1000x square feet area.

In this tutorial, we will first talk about how having all the variables be in a similar scale helps us. Then, we will talk about various methods to perform scaling.

Read more…(929 words)

Category: Machine Learning

Contributed 77%

8.

tutorial

Categorical Features

In statistical modeling and machine learning, a *categorical* *variable* is a variable which can only have a fixed set of values. Some examples of categorical variables are nationality, size of clothes (small, medium, large), day of the week, genre of music, educational qualification (doctorate, graduate, diploma), etc.

**Categorical Feature labels**

As opposed to a continuous numerical variable such as height, age, and distance, the above variables are not intrinsically represented by continuous numbers. Instead they represented by *labels*. Each unique value in a categorical variable is known as a label.

For example, if the categorical variable was Size of clothing**, **the labels would be small, medium and large*. *For the categorical feature Educational qualification, some of the possible labels would be doctorate, graduate, diploma*, *etc.

Read more…(1002 words)

Category: Machine Learning

Contributed 75%

9.

tutorial

Generalization and Overfitting

In this tutorial, we will talk about

**generalization**— the ability of a machine learning model to perform well on new*unseen data*, and**overfitting**— when a machine learning model performs much better on data used to train the model, but doesn't perform well on new*unseen data*.

Both of these are extremely important concepts in machine learning.

Let's get started!

Before we make predictions using a machine learning model, we first estimate the parameters (such as weights and bias) of the model.

The dataset based on which we estimate the parameter...

Read more…(1181 words)

Category: Machine Learning

Contributed 51%

10.

tutorial

Pandas: Apply functions and GroupBy

So far in the course, we have learnt quite a bit about DataFrames. In particular, we learnt about using various boolean and arithmetic operations on DataFrame columns, and also about indexing to select and modify various subsets of a DataFrame.

In this tutorial we will learn another method for doing operations on and also modifying a DataFrame using DataFrame methods like apply() and applymap(). These methods allow us to *apply* a function over an entire DataFrame.

Let's get started!

As in the previous tutorials, let us load the Pandas and Numpy libraries at the beginning.

type=codeblock|id=pd_func_import|autocreate=datascience|show_output=0

Read more…(1655 words)

Category: Machine Learning

Contributed 59%

11.

tutorial

Pandas: Modifying DataFrames

In the last couple of tutorials, we learned how to select various subsets of a DataFrame. In this tutorial, we will use these techniques to select a subset of the DataFrame and modify the selected data.

As in the previous tutorials, let's start by importing the libraries and loading the dataset.

We will use the same dataset that we used in Indexing and Slicing and Criteria Based Selection tutorials.

type=codeblock|id=pd_modeindex_load_adult|autocreate=datascience|show_output=1import pandas as pdimport numpy as np# load the dataset

Read more…(1253 words)

Category: Machine Learning

Load More

Show admin stats