Text Frequency Inverse Document Frequency python code example

Example 1: get tfidf score for a sentence

>>> from sklearn.feature_extraction.text import TfidfVectorizer
>>> corpus = [
...     'This is the first document.',
...     'This document is the second document.',
...     'And this is the third one.',
...     'Is this the first document?',
... ]
>>> vectorizer = TfidfVectorizer()
>>> X = vectorizer.fit_transform(corpus)
>>> print(vectorizer.get_feature_names())
['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this']
>>> print(X.shape)
(4, 9)

Example 2: calculate term frequency python

from collections import Counter

# Counter token frequency from a sentence
sentence = "Texas A&M University is located in Texas"

term_frequencies = Counter(sentence.split())