1.
Program to Perform Basic Word Analysis Using Natural Language Processing (NLP)
Aim
To perform basic word analysis using Natural Language Processing (NLP) techniques in Python
using the NLTK library.
Algorithm
Step 1: Start the program.
Step 2: Import the required modules from the NLTK library:
word_tokenize, pos_tag, PorterStemmer, WordNetLemmatizer.
Step 3: Download the required NLTK datasets:
- punkt (for tokenization)
- averaged_perceptron_tagger (for POS tagging)
- WordNet (for lemmatization)
Step 3: Define a sample text for analysis.
Step 4: Tokenize the text into words.
Step 5: Perform POS tagging on the tokens.
Step 6: Apply stemming to each token.
Step 7: Apply lemmatization to each token.
Step 8: Display tokens, POS tags, stems, and lemmas.
Step 9: End the program.
PROGRAM:
import nltk
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag
from nltk.stem import PorterStemmer
from nltk.stem import WordNetLemmatizer
# Download necessary NLTK data
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')
# Sample text for word analysis
text = "The quick brown fox jumps over the lazy dog."
# Step 1: Tokenization
tokens = word_tokenize(text)
print("Tokens:", tokens)
# Step 2: POS Tagging
pos_tags = pos_tag(tokens)
print("\nPOS Tags:", pos_tags)
# Step 3: Stemming
stemmer = PorterStemmer()
stems = [stemmer.stem(word) for word in tokens]
print("\nStems:", stems)
# Step 4: Lemmatization
lemmatizer = WordNetLemmatizer()
lemmas = [lemmatizer.lemmatize(word, pos='v') for word in tokens]
print("\nLemmas:", lemmas)
OUTPUT:
Tokens: ['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog', '.']
POS Tags: [('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'),
('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN'), ('.', '.')]
Stems: ['the', 'quick', 'brown', 'fox', 'jump', 'over', 'the', 'lazi', 'dog', '.']
Lemmas: ['The', 'quick', 'brown', 'fox', 'jump', 'over', 'the', 'lazy', 'dog', '.']
Result:
Thus, the Python program to perform basic word analysis using NLP techniques was executed
and verified successfully.
2.Program to Generate a Random Word Using a Given Corpus
Aim
To write a Python program that generates a random word of a given length using a predefined set
of characters (corpus).
Algorithm
Step 1: Start the program.
Step 2: Import the random module.
Step 3: Define a sample corpus containing lowercase English alphabets.
Step 4: Define a function generate_word(length) to create a random word:
4.1: Use a loop to select random characters from the corpus.
4.2: Join them into a single string.
4.3: Return the generated word.
Step 5: Call the function with a specified length (e.g., 6).
Step 6: Display the generated word.
Step 7: End the program.
Program
import random
# Sample corpus of characters
corpus = "abcdefghijklmnopqrstuvwxyz"
# Function to generate a new word
def generate_word(length):
word = "".join(random.choice(corpus) for _ in range(length))
return word
# Generate a word of length 6
new_word = generate_word(6)
print("Generated Word:", new_word)
Output
Generated Word: tnwaey
Result
Thus, the Python program to generate a random word from a given corpus was executed and
verified successfully.
3. Program to Perform Morphological Analysis Using Stemming and Lemmatization
AIM
This program aims to demonstrate morphological analysis in Natural Language Processing
(NLP). Specifically, the program will perform stemming and lemmatization, two common
techniques in morphology, to analyse the structure of words and reduce them to their base forms.
ALGORITHM
Step 1: Start the program.
Step 2: Import PorterStemmer and WordNetLemmatizer from the nltk.stem module.
Step 3: Download the WordNet dataset for lemmatization.
Step 4: Create a list of words for morphological analysis.
Step 5: Initialise the stemmer and lemmatizer objects.
Step 6: For each word in the list:
6.1: Find the stem using the stemmer. stem(word).
6.2: Find the lemma using lemmatizer.lemmatize(word, pos='v').
6.3: Display the original word, stem, and lemma in a tabular format.
Step 7: End the program.
PROGRAM
import nltk
from nltk.stem import PorterStemmer
from nltk.stem import WordNetLemmatizer
# Download necessary NLTK data
nltk.download('wordnet')
# Sample list of words for morphological analysis
words = ["running", "jumps", "easily", "fairly", "happier"]
# Initialize the stemmer and lemmatizer
stemmer = PorterStemmer()
lemmatizer = WordNetLemmatizer()
# Perform stemming and lemmatization
print(f"{'Word':<10} {'Stem':<10} {'Lemma':<10}")
for word in words:
stem = stemmer.stem(word)
lemma = lemmatizer.lemmatize(word, pos='v') # 'v' for verb
print(f"{word:<10} {stem:<10} {lemma:<10}")
OUTPUT
Word Stem Lemma
running run run
jumps jump jump
easily easili easily
fairly fairli fairly
happier happier happier
RESULT
Thus, the Python program to perform morphological analysis using stemming and lemmatization
was executed and verified successfully.