Sometimes we encounter problems for which it's really hard to write a computer program to solve. For example, let's say we wanted to program a computer to recognize hand-written digits:
You could imagine trying to devise a set of rules to distinguish each individual digit. Zeros, for instance, are basically one closed loop. But what if the person didn't perfectly close the loop. Or what if the right top of the loop closes below where the left top of the loop starts?
In this case, we have difficulty differentiating zeroes from sixes. We could establish some sort of cutoff, but how would you decide the cutoff in the first place? As you can see, it quickly becomes quite complicated to compile a list of heuristics (i.e., rules and guesses) that accurately classifies handwritten digits.
And there are so many more classes of problems that fall into this category. Recognizing objects, understanding concepts, comprehending speech. We don't know what program to write because we still don't know how it's done by our own brains. And even if we did have a good idea about how to do it, the program might be horrendously complicated.
So instead of trying to write a program, we try to develop an algorithm that a computer can use to look at hundreds or thousands of examples (and the correct answers), and then the computer uses that experience to solve the same problem in new situations. Essentially, our goal is to teach the computer to solve by example, very similar to how we might teach a young child to distinguish a cat from a dog.
The field itself: ML is a field of study which harnesses principles of computer science and statistics to create statistical models. These models are generally used to do two things:
- Prediction: make predictions about the future based on data about the past
- Inference: discover patterns in data
Machine Learning (ML) and Artificial Intelligence (AI) are highly interconnected fields, and there is no universally agreed upon distinction between the two.
However, in general when people say machine learning, they are referring to making machines (computers) learn certain patterns and then to make predictions using those learnt patterns. When people say artificial intelligence, they are usually referring to computers behaving intelligently. That is, AI makes use of machine learning. Hence, machine learning is a subset / a component of AI.
In addition, historically (1960s-1970s), people often used AI to refer to a technical field which focused on programming computers to make decisions (usually based on hardcoded rules), whereas ML focuses more on making predictions about the future (based on patterns learnt from data).
Models: Teaching a computer to make predictions involves feeding data into machine learning models, which are representations of how the world supposedly works. If I tell a statistical model that the world works a certain way (say, for example, that the rent of a house grows with the number of rooms in the house), then this model can then tell me which house will have a higher rent, between one that has 2 rooms, and another which has 3 rooms.
I may believe, based on what I’ve seen, that a given house's rent is, on average, equal to the number of rooms times 1200, plus the number of restrooms times 400. That, is
Rent = Rooms × $1200 + Restrooms × $400
So, if it has 2 rooms and 1 restroom, then I’ll guess that the rent is probably $2800 / month. If is has 3 rooms and 2 restrooms, I think the rent is $4400 / month.
Here’s the main point: Machine learning refers to a set of techniques for estimating functions (like the one involving rent) based on datasets (room count, restroom count and rent for many many houses). These functions, which are called models, can then be used for predictions of future data.
Here, room count and restroom count are the features or variables, and rent is the target. In this problem, we have 2 features, but in general, we may have many (for example, size of rooms, the year the house was constructed, and so on).
To explain what is being learnt in machine learning, let's start with an example application, spam classification. One approach to write a computer program to classify spam emails from non-spam emails, is to maintain a list of words that appear more frequently in spam emails. For example, some example of such words might be 'loan', '$', 'credit', 'discount', 'offer', 'password', 'viagra', and so on. When a new email comes in, we split the email into individual words, and if the email has a substantial number of these spammy words, it should be classified as spam.
Although the strategy above might give fairly good results (say detect spam with an accuracy of 80%), the accuracy depends in large part on the list of words we maintain, and on the precise threshold we choose to classify an email as spam.
In machine learning, the strategy is to learn the list of words and the threshold from examples. In fact, in addition to which words are considered spammy, we could also learn how spammy each word is. (This example is quite realistic, and is how many spam classification algorithms work.)
So in this case, the thing being learnt is, a notion of how spammy each word is. Note that this is not the only way to frame the problem. We framed the problem this way because we noticed a pattern that spam emails often contain specific words, and then we came up with a strategy that would analyze every possible word as a possible suspect. This strategy might give inaccurate results for other tasks, or be too inefficient.
You might notice that using machine learning to learn how bad each word is has many desirable properties over maintaining this list manually.
- It reduces the amount of manual work involved in creating the list. Think about how long this list could get if you try to do this manually. Also, if you're trying to maintain the list manually, how would you deal with hundreds of languages across the world? This task can easily become infeasible without machine learning.
- The same strategy works for other similar tasks. Say we wanted to classify whether a movie review is speaking positively or negatively about a movie. If we were creating lists of words manually, then we would have to create a new list of words manually. But if we learn it, the same algorithm would work given that we already have some data (say ratings and reviews left by users on imdb).
- It updates automatically. Lets say tomorrow the spammers become more advanced and start typing the word 'password' as 'passw0rd'. Or they might try to sell you insurance, something we haven't yet encountered. We can simply set the machine learning algorithm to be trained daily, and it will use the new data available and keep adapting over time to changing behavior.