Python Technique
Name Brief Small Code Example
Used
from textblob import TextBlob
Analyzing the
Sentiment sentiment text = "I love this product!"
TextBlob, VADER
Analysis (positive/negative) of
blob = TextBlob(text)sentiment =
text.
blob.sentiment.polarity
import pytesseract
Optical
Character Extracting text from Tesseract OCR, from PIL import Image
Recognition images. Pytesseract image = Image.open('image.png')
(OCR)
text = pytesseract.image_to_string(image)
from sklearn.feature_extraction.text import
CountVectorizer
Categorizing text into from sklearn.naive_bayes import
Text
predefined scikit-learn MultinomialNB
Categorization
categories.
X_train = ['text data']
Y_train = [1]
from keras.preprocessing.text import Tokenizer
Word Predicting the next text = ["hello world"]
Keras, TensorFlow
Prediction word or sentence. tokenizer = Tokenizer()
tokenizer.fit_on_texts(text)
import speech_recognition as sr
recognizer = sr.Recognizer()
Speech Converting speech to
SpeechRecognition with sr.Microphone() as source:
Recognition text.
audio = recognizer.listen(source)
text = recognizer.recognize_google(audio)
from googletrans import Translator
Translating text from
Machine translator = Translator()
one language to Googletrans
Translation
another. result = translator.translate('Hello', src='en',
dest='es')
Python Technique
Name Brief Small Code Example
Used
import re
Cleaning and
Text
preparing text data re, nltk text = "This is a sample text!"
Preprocessing
for further analysis.
cleaned_text = re.sub(r'[^\w\s]', '', text)
import nltk
Splitting text into
nltk.word_tokenize, nltk.download('punkt')
Tokenization smaller chunks like
spacy
words or sentences. tokens = nltk.word_tokenize("This is a sample
sentence.")
from nltk.stem import WordNetLemmatizer
Reducing words to
Lemmatization
their base or root nltk, spaCy lemmatizer = WordNetLemmatizer()
and Stemming
form.
lemma = lemmatizer.lemmatize("running")
Extracting
Feature CountVectorizer,
meaningful features No code example needed.
Extraction TfidfVectorizer
from text data.
NLP Understanding terms
nltk, spaCy No code example needed.
Terminology used in NLP.
Key components like
parsing, part-of-
Components of
speech tagging, spaCy, nltk No code example needed.
NLP
named entity
recognition.
from sklearn.feature_extraction.text import
CountVectorizer
Term Frequency of a term
scikit-learn
Frequency (TF) in a document. vectorizer = CountVectorizer()
X = vectorizer.fit_transform(['sample text'])
Inverse Measures the
Document importance of a term scikit-learn No code example needed.
Frequency (IDF) in the corpus.
Creating models
Modeling using based on TF-IDF
scikit-learn No code example needed.
TF-IDF values for text
analysis.
Python Technique
Name Brief Small Code Example
Used
Classifying text as Naive Bayes, scikit-
Spam Filtering No code example needed.
spam or non-spam. learn