CommonLounge

Categories

Message

Follow

Marta Enesco

Data Scientist, Graduate Research Assistant at University of Potsdam

Active In

Machine Learning

1 discussion. Member

TED Talks

Member

Featured Contributions

Contributed 76%

2.

tutorial

Conditional Random Fields

Conditional Random Fields are **probabilistic graphical models for sequential or structured data**. They allow us to perform classification taking into account the context delivered by the sequence. We talk about a *structured prediction*, where segments are assumed to be related with each other.

By doing so, valuable contextual information, which would be lost in individual classifications, can be given to the model. For example, words in a sentence are grammatically connected: after an adjective it is more likely to find a noun than a verb. And this hint can be used to label the noun *books* in the sentence “The woman carefully carried the two red __books__”.

CRFs are mostly used in NLP tasks such as part-of-speech tagging or sequence labeling (for extracting specific words). It also has applications in computer vision: from image segmentation (lab...

Read more…(1505 words)

Category: Machine Learning

Contributed 48%

3.

tutorial

Feature Engineering: Techniques, examples and case studies

Let us say you have a machine learning problem that you'd like to solve. You try a few machine learning algorithms, and they give okay results, say in the range 55%-65% accuracy. You'd like to do better.

In practice, the most important thing that needs to be done is what is known as **feature engineering**. Feature engineering is the process of finding the optimal set of features (input) that should be given as input to the machine learning model.

Read more…(1449 words)

Category: Machine Learning

Contributed 27%

4.

tutorial

Theoretical guarantees are not what they seem

Machine learning has a number of popular theorems. However, one should keep in mind that theorems are stated in a mathematically strict sense. Extrapolating from these theorems to make decisions in practice is prone to errors, and one needs to be very careful. In this article, we will discuss some examples on how theoretical guarantees might not be what they seem, or how we can avoid reaching incorrect conclusions.

This article does not intend to say *theory is bad*. Its main point is, theory and practice often have a wide gap. When practitioners with less experience in theory come across widely applicable theorems, they often extrapolate it in ways which are undesirable.

**Example A**: The **universality theorem** asserts that __neural networks can compute any computable function__. Moreover, even a NN ...

Read more…(1022 words)

Category: Machine Learning

Contributed 91%

5.

tutorial

Anomaly Detection

**Anomaly detection** refers to the technique of identifying **unusual patterns** and finding **outliers** in a set of observations. Outliers are data points that differ considerably from the remainders of the dataset. Usually, extreme values that diverge from the normal or expected behavior.

Historically statistics was applied to find and remove outliers, for example from the tails of a Gaussian distribution. The idea was that outliers which result from errors (noise, human, etc) may arise in misleading interpretations. In addition, by filtering them out, modern algorithms in supervised learning can gain in accuracy. On the other hand, the anomalies are nowadays also the object of interest, as it is the case of *“rare events”* in physics, medicine, business or cybersecurity.

Datasets vary in their nature, but the most typical ones are time series and spatial data. There are three main types of outliers

**Points**are single occurrences anomalous with respect to the complete dataset. For example, the stock v...

Read more…(1019 words)

Category: Machine Learning

Contributed 59%

6.

tutorial

Pro Content

Dimensionality Reduction and Principal Component Analysis

This is for Pro Members

We like how far you've come. Upgrade today and get access to all of CommonLounge.

Already a member? Sign in.

Category: Machine Learning

Contributed 94%

7.

tutorial

Bayesian Machine Learning

The idea behind Bayesian approach is to incorporate into machine learning algorithms some **prior beliefs about the model θ** by applying the *Bayes' rule*. It is highly useful when data is scarce or difficult to obtain, which is often the case in practice. In Bayesian analysis, data D is not assumed to be right, but is allowed to become “less wrong with size”. The process consists on recursively updating our initial belief or knowledge (*prior*) as more evidence is obtained (*data*). Goals can be to either find the most probable model **θ*** (B*ayesian inference*) or to directly compute optimal predictions y* (*Bayesian prediction)*.

Lets say we have an empirical dataset D = {(x_{1}, y_{1}), ..., (x_{n}, y_{n})} and a model θ. Then, by Bayes' theorem, we have

P(\theta | D) = \dfrac{P(D|\theta) \times P(\theta)}{P(D)}

Read more…(1142 words)

Category: Machine Learning

Contributed 91%

8.

tutorial

Ensemble Methods (Part 3): Meta-learning, Stacking and Mixture of Experts

Ensemble methods were introduced in a previous tutorial. In this tutorial we will explore two more ensemble learning algorithms, namely - **stacking** and **mixture of experts**. Both these methods can be looked at as examples of **meta learning**, when machine learning models are trained on data from predictions outputted by other machine learning models.

Let us continue with the scenario where *m *models are trained on a dataset of *n* samples. **Stacking** (or stacked generalization) builds the models in the ensemble using different learning algorithms (e.g. one neural network, one decision tree, ...), as opposed to **bagging** or **boosting** that train various incarnations of the same learner (e.g. all decision trees).

The outputs of the models are combined to compute the ultimate prediction of any instance *x*:

\hat{y}(x) = \sum_{j=1}^m \beta_j h_j(x)

Read more…(496 words)

Category: Machine Learning

Contributed 18%

9.

tutorial

Improving Results After Implementing a Working Model

In this article, we'll discuss the importance of collecting more data and using more complex machine learning models for prediction in different situations and circumstances. What helps more (data or algorithm) depends on the specific problem at hand. In the discussion, we'll also highlight different caveats that might effect the desirability for pursuing each of these directions.

Collecting more data is one of the most promising ways of improving performance. In fact, you might have come across quotes like "*We don’t have better algorithms. We just have more data.*" or the article "*The Unreasonable Effectiveness of Data*" by Peter Norvig (Director of Research, Google).

One of the most effective ways in improving algorith...

Read more…(1121 words)

Category: Machine Learning

Contributed 71%

10.

tutorial

Pro Content

K-Means Clustering

This is for Pro Members

We like how far you've come. Upgrade today and get access to all of CommonLounge.

Already a member? Sign in.

Category: Machine Learning

Contributed 70%

11.

tutorial

Hidden Markov Models

Let us imagine a dynamic system whose future states only depend on the current one. Such a stochastic process is called a **Markov process**, and it is said to satisfy the **Markov property (“memoryless”)**:

p(h_{t+1} |h_1,..., h_t) = p(h_{t+1}|h_t)

Now, let us assume that each real state is not directly available. Instead, an observation of it that works as an indicator is available to us. In fact, this is a realistic scenario. For example, a system with noise-corrupted measurements or a process that cannot be completely measured. There is an uncertainty about the real state of the world, which is referred to as **hidden**. A **Hidden Markov Model (HMM)** serves as a probabilistic model of such a system.

Let H be the *latent, hidden* variable that evolves over time. Let O be the random variable over its *observations,* also known as the *output sequence. *Graphically, the system at time steps {1, …,...

Read more…(696 words)

Category: Machine Learning

Load More