Good Feature Building Techniques - Tricks for Kaggle - My Kaggle Code Repository
Often times it happens that we fall short of creativity. And creativity is one of the basic ingredients of what we do. Creating features needs creativity. So here is the list of ideas I gather in day to day life, where people have used creativity to get great results on Kaggle leaderboards.
This post is inspired by a Kernel on Kaggle written by Beluga, one of the top Kagglers, for a knowledge based competition.
Some of the techniques/tricks I am sharing have been taken directly from that kernel so you could take a look yourself. Otherwise stay here and read on.
1. Don't try predicting the future when you don't have to:
If both training/test comes from the same timeline, ...
Today I Learned This Part 2: Pretrained Neural Networks What are they?
Deeplearning is the buzz word right now. I was working on the course for deep learning by Jeremy Howard and one thing I noticed were pretrained deep Neural Networks. In the first lesson he used the pretrained NN to predict on the Dogs vs Cats competition on Kaggle to achieve very good results.
What are pretrained Neural Networks?
So let me tell you about the background a little bit. There is a challenge that happens every year in the visual recognition community - The Imagenet Challenge. The task there is to classify the images in 1000 categories using Image training data. People train big convolutional deep learning models for this challenge.
Now what does training a neural model actually mean?...
Top Data Science Resources on the Internet right now
I have been looking to create this list for a while now. There are many people on quora who ask me how I started in the data science field. And so I wanted to create this reference.
To be frank, when I first started learning it all looked very utopian and out of the world. The Andrew Ng course felt like black magic. And it still doesn't cease to amaze me. After all, we are predicting the future. Take the case of Nate Silver - What else can you call his success if not Black Magic?
But it is not magic. And this is a way an aspiring guy could take to become a self-trained data scientist. Follow in order. I have tried to include everything that comes to my mind. So here goes:
As a data scientist I believe that a lot of work has to be done before Classification/Regression/Clustering methods are applied to the data you get. The data which may be messy, unwieldy and big. So here are the list of algorithms that helps a data scientist to make better models using the data they have:
This is a post which deviates from my pattern of blogs that I have wrote till now but I found that Finance also uses up a lot of Statistics. So it won't be a far cry to put this on my blog here. I recently started investing in Mutual funds so thought of researching the area before going all in. Here is the result of some of my research.
Always Buy No Load Mutual Funds
There are many different sites from where you can buy Mutual funds. Most of these sites take a commission to let you the i...
The US elections have got everyone startled. Trump. Seriously. The most powerful man on the entire earth is Donald Trump. Just think about it for a second. The most coveted post in all world and America could not find a better person to fill it? Now to tell you the truth I was not at all surprised to see it happen. I don’t know if America has ever seen this brand of politics, we indians have seen this brand of politics many a times and have suffered a lot because of it.
To put my case forward we have many a examples in our country like Bal Thackeray whose slogan was “Uthao lungi aur bajao pungi” which basically meant send away the south indians back from Mumbai. He was worshipped in Mumbai by the local People for the same and around 1.5 Million people came to attend his funeral. His son Raj Thackeray has the same brand of politics but he wants to send away North Indians. Wow!
Then there are issues of Caste based politics where people like Mulayam Singh and Mayawati would use reservations in India to their advantage to pull the caste vote bank.
Trump Brand of politics was pretty much the same — D...
It has been quite a few days I have been working with Pandas and apparently I feel I have gotten quite good at it. (Quite a Braggard I know) So thought about adding a post about Pandas usage here. I intend to make this post quite practical and since I find the pandas syntax quite self explanatory, I won't be explaining much of the codes. Just the use cases and the code to achieve them.
1. Import Pandas
We Start by importing the libraries that we will need to use.
Deploying ML Apps using Python and Flask- Learning about Flask - Part 1
It has been a long time since I wrote anything on my blog. So thought about giving everyone a treat this time. Or so I think it is.
Recently I was thinking about a way to deploy all these machine learning models I create in python. I searched through the web but couldn't find anything nice and easy. Then I fell upon this book by Sebastian Rashcka and I knew that it was what I was looking for. To tell you the truth I did had some experience in Flask earlier but this book made it a whole lot easier to deploy a machine learning model in flask.
So today I am going to give a brief intro about Flask Apps and how to deploy them using a service called Openshift.
So What is flask?
Flask is a Python Web Framework that makes it easier to create webapps from python.
Openshift is a free service(if we only use 1 small i...
Shell Basics every Data Scientist Should know - Part II(AWK)
Yesterday I got introduced to awk programming on the shell and is it cool. It lets you do stuff on the command line which you never imagined. As a matter of fact, it's a whole data analytics software in itself when you think about it. You can do selections, groupby, mean, median, sum, duplication, append. You just ask. There is no limit actually.
And it is easy to learn.
In this post, I will try to give you a brief intro about how you could add awk to your daily work-flow.
Please see my previous post if you want some background or some basic to intermediate understanding of shell commands.
So let me start with an example first. Say you wanted to sum a column in a comma delimited file. How would you do that in shell?
Here is the command. The great thing about awk is th...