Part of list:

Ensemble Methods (Part 2): Boosting and AdaBoost

- Boosting
- AdaBoost

Ensemble Methods (Part 2): Boosting and AdaBoost[ Edit ]

Ensemble methods were introduced in a previous tutorial. In this tutorial we explore another approach to ensemble learning called **boosting**, and a specific boosting algorithm called **AdaBoost**.

**Boosting** aims to build a *strong learner* from a set of *weak learners*. A **weak learner** is a predictive model that performs only slightly better than random guessing, whereas **strong learner** refers to a model whose predictions are close to perfect.

Boosting proceeds by repeatedly modifying the training set based on the performance of the earlier predictors. It can be seen as a *sequential ensemble method. *It assigns a greater importance to the data points that were previously misclassified, thereby leading subsequent models to focus more of their resources on those tougher data points. **AdaBoost** (adaptive boosting) is a popular boosting algorithm.

Let there be a binary classification dataset of *n* training samples (*x _{i}, y_{i}*) where

\hat{y} (x) = sign (\sum_{j=1}^m \alpha_j h_j(x)) \qquad (1)

classifier weights *α _{j}* are computed following the premise of giving a greater influence to the more accurate classifiers. At each model iteration, the misclassification error

\epsilon_j = \dfrac{\sum_{i=1}^n w_j(i) \ I(y_i \neq h_j(x_i))}{\sum_{i=1}^n w_j(i)} = \dfrac{\sum_{y_i \neq h_j(x_i)} w_j(i)}{\sum_{i=1}^n w_j(i)} \qquad (2)

and used to compute α_{j}:

\alpha_j = \dfrac{1}{2} \ ln \left( \dfrac{1-\epsilon_j}{\epsilon_j} \right) > 0 \qquad (3)

In addition, probability weights W_{j} = (w_{j}(1), ..., w_{j}(n)) are used to assign different importances to the training instances. They are individually modified after each model iteration before resampling the training set for next classifier *h _{j+1}*:

w_{j+1}(i) = w_j(i)\ e^{-\alpha_j y_i h_j(x_i)} \qquad (4)

where, for a binary classification,

y_i h_j(x_i) =
\begin{cases}
+1 &\text{if} \ x_i \ \text{well-classified} \\
-1 &\text{if} \ x_i \ \text{mis-classified}
\end{cases}

Since *α _{j }*> 0, weights are increased for instances misclassified by h

w_{j+1}(i) \leftarrow \dfrac{w_{j+1}(i)}{\sum_k w_{j+1}(k)}

Note that there are two types of weights: *w _{i}* for initially training the models, and

__Pseudocode__

# Probability weights initializationfor i=1, ..., nw1(i) = 1/n# Model iterationfor j=1, ..., mresample training set using wj and fit model hjcalculate misclassification error εj using (2)calculate weight αj by applying (3)for i=1, ..., ncompute wj+1(i) by applying (4)normalize all wj+1# Terminationoutput the weighted majority vote ŷ using (1)

Read more…(357 words)

Mark as completed

Part of lists:

Previous

Ensemble Methods (Part 1): Model averaging, Bagging and Random Forests

Next

Ensemble Methods (Part 3): Meta-learning, Stacking and Mixture of Experts

About the contributors:

Keshav DhandhaniaMSc in Deep Learning @ MIT (2014)

100%

Bernard Mathias

Minor

Loading…

Have a question? Ask here…

Post

Part of list:

Ensemble Methods (Part 2): Boosting and AdaBoost

- Boosting
- AdaBoost

Contributors

Keshav DhandhaniaMSc in Deep Learning @ MIT (2014)

100%

Bernard Mathias

Minor

Ready to join our community?

Sign up below to automatically get notified of new lists, get **reminders** to finish ones you subscribe to, and **bookmark** articles to read later.

Continue with Facebook

— OR —

Your Full Name

Email address

I have an account. Log in instead

By signing up, you agree to our Terms and our Privacy Policy.