One of the most prominent field of application of machine learning in sports, and a lot of people love sports statistics. It is an excellent domain for practicing data exploration and visualization. In fact, most machine learning work that perform well on sports data is 90% data exploration and 10% model building.
Cricsheet has a bunch of cricket data available for download.
We provide ball-by-ball data for Men’s and Women’s Test Matches, One-day internationals, Twenty20 Internationals, some other international T20s, and all Indian Premier League seasons.
Which models? Note that usually its a bad idea to apply complex machine learning models to sports because the number of data points is usually in the 100s-1000s. SVMs and linear regression is what you're looking for most of the time, not neural networks (unless you start with a cricket match video :P ).
Another couple of interesting prompts to get you started.
- Can you come up with a good evaluation metric for ranking players? In cricket, most of the time we compare batsmen to batsmen, and bowlers to bowlers. Can you come up with a unified metric for the value of a player? One way to do this would be to train an SVM that predicts which team will win only using the player statistics as input. And then the coefficients learnt by the SVM can be thought of as an evaluation metric.
- When you are watching a cricket match, as the match goes along they keep showing the percentage chance of your country winning. And this number keeps updating after every ball. What's the best you can do at predicting the result of a game at various stages of the game?