Commonlounge

Communities

Message

Follow

Bishal Lakha

AI and Physics Enthusiast. Robotics Teacher. Previously worked at Coursera.

Active In

Quantum Computing

Member

Competitive Programming

Member

Deep Learning

Member

Reading

Member

Featured Contributions

Contributed 86%

discussion

Data Science and Machine Learning Project Ideas

Contributed 62%

tutorial

TutorialMachine LearningLast updated

Predicting Titanic Survivors with Machine Learning (Detailed End-to-End ML Example)

On 15 April, 1912 Titanic met with an unfortunate event - it collided with an iceberg and sank. The ship was carrying 2224 people and that tragic accident costed the life of 1502 passengers.

In this tutorial we will be predicting which passenger survived the accident and who couldn't from different features like age, sex, class, etc. This problem is hosted by **Kaggle **as a challenge...

Read more…(1940 words)

Contributed 47%

tutorial

TutorialMachine LearningLast updated

End-to-End Example: Using Logistic Regression for predicting Diabetes

In this tutorial, we will see how to use Data Science to predict whether a person has diabetes or not, based on information like blood pressure, body mass index (BMI), age, etc.

The data was collected and made available by "National Institute of Diabetes and Digestive and Kidney Diseases" as part of the Pima Indians Diabetes Database. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here belong to the Pima Indian heritage (subgroup of Native Americans), and are females of ages 21 and above.

We will be using Python as our programming language, and making use of some popular python data science related packages. First of all, we will import pandas to read our data from a CSV file and manipulate it for further use. We will also use numpy to convert out data into a format suitable to feed our classification model. We'll ...

Read more…(1400 words)

Contributed 59%

tutorial

TutorialBig DataLast updated

MapReduce: A Gentle Tutorial with Examples

MapReduce is a programming framework that allows us to perform parallel processing on large data sets in a distributed environment. It is a data processing paradigm for condensing large volumes of data into useful aggregated results.

As the name suggests, Map and Reduce functions are performed on the data. *Map* function processes a block of data and generates a set of intermediate key/value pairs. *Reduce* function merges all the intermediate values associated with the same intermediate key into a single value.

The two operations are each designed such that efficient implementation is possible in distributed systems. Moreover, the two types of operations together provide a lot of flexibility and expressibility, so that a variety of real-world tasks can be implemented within the MapReduce paradigm.

Read more…(1514 words)

Contributed 65%

tutorial

TutorialData AnalyticsLast updated

A Gentle Introduction to Data Science

Trillions of gigabytes of data is being produced yearly, and the number is still growing exponentially. It is estimated that for every person, 1.7 megabytes of data will be produced every second by 2020 and digital data accumulation will reach about 44 zettabytes or 44 trillion gigabytes. This explosion of data is also shown in the graph below.

Data is only a raw material and extracting information from it requires further work. Our society is increasingly becoming data dependent and data science is the field which helps us make sense of this huge quantity of data.

Read more…(1635 words)

Contributed 75%

tutorial

TutorialMachine LearningLast updated

Probability: Conditional and Marginal Probabilities, Bayes' Rule and Random Variables

Probability is used to mathematically describe the chance of occurrence of an event. It quantifies randomness and uncertainty. For example, probability tells us the chance of it raining on a particular day, or someone winning a lottery. The probability that an event occurs is always between 0 and 1, where 1 represents absolute certainty and 0 represents completely impossible.

Probability can be basically determined in two ways - theoretically and empirically. The theoretical (also called classical) method is used specially to determine probabilities of game of chances like lotteries, roulette wheel, coin flip etc. While the empirical (also called observational) method is used to determine probabilities of an event whose outcome can’t be predetermined.

To describe probability mathematically, we need few basic elements which are discussed below:

**Sample Space**

Read more…(1536 words)

Contributed 75%

tutorial

TutorialData AnalyticsLast updated

Statistics: Central Tendency metrics, Dispersion and Correlation

Statistics is a very broad branch of mathematics that deals with everything related to data, from collection and organization of data to its analysis, interpretation and presentation. With the ever increasing amount of data, statistics has become an indispensable tool in every field where one has to work with data.

When the amount of data we are dealing with is fairly small then it might be possible to talk about all the data items individually. However, when we are dealing with large quantities of data, which is almost always the case in real world situations, we need to have some characteristic values that can represent the data.

In this tutorial, we'll introduce such measures first for a single variable. For example, say the weight of students in a particular school. These measures will include measures of central tendency and measures of dispersion. Then, we'll look at measures for und...

Read more…(1124 words)

Contributed 86%

tutorial

TutorialMachine LearningLast updated

Linear Algebra: Vectors, Matrices and their properties

Large datasets often comprise of hundreds to millions of individual data items. It is easier to work with this data and operate on it when it is represented in the form of vectors and matrices. Broadly speaking linear algebra is a branch of mathematics that deals with vectors and operations on vectors. Linear algebra is also extremely important in various machine learning and data processing algorithms.

Vectors can be simply thought of as an array of numbers where the order of the numbers also matters. They are typically represented by a lowercase bold letter such as **x**. The individual members are simply denoted by writing the vector name with subscript indicating the order of the individual member like x_{1} for the first member, x_{2...}

Read more…(1272 words)

Contributed 78%

tutorial

TutorialMachine LearningLast updated

Introduction to Databases and SQL with Examples

A database is a collection of logically interrelated data and description of this data, designed to meet the information needs for organization. The primary objective of creating a database is to store, update and retrieve data efficiently and use it for analysis. The data stored can also include any transient data such as input documents, reports and intermediate results obtained during processing of the data. To achieve this, database must have different characteristics and some of them are:

- It should have a structure which deals with data types and data behavior.
- It should have a proper retrieval method which could be a declarative query language (like SQL, which we'll talk about in this article) or a procedural database programming language (like Java or Python).
- The storing and retrieving process should ...

Read more…(1289 words)

Contributed 82%

tutorial

TutorialMachine LearningLast updated

Network Analysis, metrics for Centrality and The PageRank Algorithm

Networks are everywhere. Be it the network of friends in a university or the online social networks such as Facebook and Twitter or the transportation networks or financial networks, networks are ubiquitous in human society. A **network** is a collection of objects (**nodes**) with relationships / interconnections (**edges**) between them.

Since networks are so universal, analyzing them gives us valuable information about different complex phenomena. For example, by analyzing a network of people in an organization, we can find out the most influential people in that organization. Or by analyzing a network of different airports or train stations we can find out which places have the maximum risk of spreading a particular virus. Network analysis provides us with the tools that will help us answer these type of questions.

The most common and easy method to represent network...

Read more…(1231 words)

Load More