TF-IDF is an abbreviation for Term Frequency-Inverse Document Frequency and is a very common algorithm to transform text into a meaningful representation of numbers. The technique is widely used to extract features across various NLP applications. This article would help you understand the importance of TF-IDF, and how to compute and apply the algorithm in your applications.
Vector representation of Text
To use a machine learning algorithm or a statistical technique on any form of text, it is prescribed to transform the text into some numeric or vector representation. This numeric representation should depict significant characteristics of the text. There are many such techniques, for example, occurrence, term-frequency, TF-IDF, word co-occurrence matrix, word2vec and GloVe.
Occurrence based vector representation
Since TF-IDF is an occurrence based numeric represen...