- TF-IDF is a common technique for building word vectors that represents the importance of words based on how frequently they occur in documents.
- It weights words by their frequency (TF) and how common they are across all documents (IDF), so more unique, document-specific words have higher weights.
- To calculate TF-IDF, the term frequency (TF) is multiplied by the inverse document frequency (IDF). This produces a sparse vector with many zero values, emphasizing distinguishing words.