0% found this document useful (0 votes)

19 views33 pages

NLP Toolkits for AI Students

Uploaded by

Haisam Abbas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views33 pages

NLP Toolkits for AI Students

Uploaded by

Haisam Abbas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

WEEK 2

Natural Language
Processing
CSC 4106

MUHAMMAD ATIF SAEED

LECTURER (Artificial Intelligence & Robotics)
Course Outline

• Natural Language Processing: Toolkits and Concepts

• Toolkits
• Natural Language Tool Kit (NLTK), Apache OpenNLP, Stanford
Core NLP, Unstructured Information Management Application
(UIMA)

SLIDE 02
Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)
Implementation of NLP Application
Step 1:
- Analyze the Task - Define the Framework
Step 2:
- Preprocess Data - Inspect and get Insights
Step 3:
- Define Relevant Information - Extract Information
Step 4:
- Select Appropriate Algo - Implement the Algo
Step 5:

SLIDE 03
- Apply your Algo in Practice - Test and Evaluate

Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)

Step 1: Analysis of the task
Define what exactly the task involves: e.g., ask yourself, how you would
solve it yourself (without ML)
• In spam filtering: you probably pay attention to certain characteristics
(sender, fonts, format, how many recipients the email has, etc.)
• You also may pay attention to the content: “lottery”, “click on this
link”, “your account is blocked”, and similar
• Most probably, you classify the emails in two types – normal emails
and spam

SLIDE 04
• ⇒ Binary classification task
Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)
Step 2: Analysis and preprocessing
• Given the “red flags” (words and phrases) you may attempt using
templates
• For machine learning, define what the relevant data is and how to
prepare it:
• You need access to labelled data of two classes
• What is the distribution of classes?
• Are you going to use only textual features?
• Are there any other significant differences (e.g., spam emails being

SLIDE 05
considerably shorter)?

Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)

Step 3: Definition and extraction of the relevant
information
• Identify relevant signal in the data
• Is it single words (“lottery”, “blocked”) or phrases (“click on this link”)?
• Are you going to learn from misspellings?
• Are you going to learn from different ways to spell words (e.g., “Now”, “now”,
“NOW”)?
• Are you going to learn from word occurrences or word distribution?
• Will you apply any other normalisation techniques?
• The above points refer to feature selection, feature representation,

SLIDE 06
and feature weighting

Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)

Step 4: Implementation of the algorithm

• No algorithm can be considered absolutely the best for all tasks

and all datasets
• Analyse the task to identify which one suits best in each
particular case

SLIDE 07
Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)
Step 5: Testing and evaluation
• It is important to understand how your current algorithm performs and
what you can do better
• E.g., for classification tasks, you can measure accuracy, precision, recall, F1
• Arguably, it is better to let some annoying spam messages to slip through
than send important “normal” emails to the spam box – is precision or
recall more important?
• It is advisable to set up some baseline: What is the majority class
distribution? How would the simplest algorithm perform? Are you really

SLIDE 08
doing better using a more sophisticated approach?

Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)

Natural Language Toolkit (NLTK)

Natural Language Toolkit (NLTK) is a leading platform for building

Python programs to work with human language data. It provides
easy-to-use interfaces to over 50 corpora and lexical resources
such as WordNet, along with a suite of text processing libraries for
classification, tokenization, stemming, tagging, parsing, and
semantic reasoning, wrappers for industrial-strength NLP libraries.

SLIDE 09
Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)
spaCy

spaCy is a library for advanced Natural Language Processing in

Python and Cython. It's built on the very latest research, and was
designed from day one to be used in real products. spaCy comes
with pretrained pipelines and currently supports tokenization and
training for 60+ languages. It also features neural network models
for tagging, parsing, named entity recognition, text classification
and more, multi-task learning with pretrained transformers like

SLIDE 10
BERT.
Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)
CoreNLP

CoreNLP is a set of natural language analysis tools written in Java.

CoreNLP enables users to derive linguistic annotations for text,
including token and sentence boundaries, parts of speech, named
entities, numeric and time values, dependency and constituency
parses, coreference, sentiment, quote attributions, and relations.

SLIDE 11
Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)
NLPnet

NLPnet is a Python library for Natural Language Processing tasks

based on neural networks. It performs part-of-speech tagging,
semantic role labeling and dependency parsing.

SLIDE 12
Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)
Flair

Flair is a simple framework for state-of-the-art Natural Language

Processing (NLP) models to your text, such as named entity
recognition (NER), part-of-speech tagging (PoS), special support
for biomedical data, sense disambiguation and classification, with
support for a rapidly growing number of languages.

SLIDE 13
Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)
Catalyst

Catalyst is a C# Natural Language Processing library built for speed.

Inspired by spaCy's design, it brings pre-trained models, out-of-the
box support for training word and document embeddings, and
flexible entity recognition models.

SLIDE 14
Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)
Apache OpenNLP

Apache OpenNLP is an open-source library for a machine learning

based toolkit used in the processing of natural language text. It
features an API for use cases like Named Entity Recognition,
Sentence Detection, POS(Part-Of-Speech) tagging, Tokenization
Feature extraction, Chunking, Parsing, and Coreference resolution.

SLIDE 15
Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)
DyNet
DyNet is a neural network library developed by Carnegie Mellon
University and many others. It is written in C++ (with bindings in
Python) and is designed to be efficient when run on either CPU or
GPU, and to work well with networks that have dynamic structures
that change for every training instance. These kinds of networks
are particularly important in natural language processing tasks,
and DyNet has been used to build state-of-the-art systems for
syntactic parsing, machine translation, morphological inflection,

SLIDE 16
and many other application areas.
Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)
MLpack

MLpack is a fast, flexible C++ machine learning library written in

C++ and built on the Armadillo linear algebra library, the
ensmallen numerical optimization library, and parts of Boost.

SLIDE 17
Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)
OpenNN

OpenNN is an open-source neural networks library for machine

learning. It contains sophisticated algorithms and utilities to deal
with many artificial intelligence solutions.

SLIDE 18
Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)
Microsoft Cognitive Toolkit (CNTK)
Microsoft Cognitive Toolkit (CNTK) is an open-source toolkit for
commercial-grade distributed deep learning. It describes neural
networks as a series of computational steps via a directed graph.
CNTK allows the user to easily realize and combine popular model
types such as feed-forward DNNs, convolutional neural networks
(CNNs) and recurrent neural networks (RNNs/LSTMs). CNTK
implements stochastic gradient descent (SGD, error
backpropagation) learning with automatic differentiation and

SLIDE 19
parallelization across multiple GPUs and servers.
Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)
NVIDIA cuDNN

NVIDIA cuDNN is a GPU-accelerated library of primitives for deep

neural networks. cuDNN provides highly tuned implementations
for standard routines such as forward and backward convolution,
pooling, normalization, and activation layers. cuDNN accelerates
widely used deep learning frameworks, including Caffe2, Chainer,
Keras, MATLAB, MxNet, PyTorch, and TensorFlow.

SLIDE 20
Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)
TensorFlow

TensorFlow is an end-to-end open source platform for machine

learning. It has a comprehensive, flexible ecosystem of tools,
libraries and community resources that lets researchers push the
state-of-the-art in ML and developers easily build and deploy ML
powered applications.

SLIDE 21
Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)
Keras

Keras is a high-level neural networks API, written in Python and

capable of running on top of TensorFlow, CNTK, or Theano.It was
developed with a focus on enabling fast experimentation. It is
capable of running on top of TensorFlow, Microsoft Cognitive
Toolkit, R, Theano, or PlaidML.

SLIDE 22
Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)
PyTorch

PyTorch is a library for deep learning on irregular input data such

as graphs, point clouds, and manifolds. Primarily developed by
Facebook's AI Research lab.

SLIDE 23
Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)
Scikit-Learn

Scikit-Learn is a Python module for machine learning built on top

of SciPy, NumPy, and matplotlib, making it easier to apply robust
and simple implementations of many popular machine learning
algorithms.

SLIDE 24
Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)
Theano

Theano is a Python library that allows you to define, optimize, and

evaluate mathematical expressions involving multi-dimensional
arrays efficiently including tight integration with NumPy.

SLIDE 25
Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)
Apache Spark

Apache Spark is a unified analytics engine for large-scale data

processing. It provides high-level APIs in Scala, Java, Python, and R,
and an optimized engine that supports general computation
graphs for data analysis. It also supports a rich set of higher-level
tools including Spark SQL for SQL and DataFrames, MLlib for
machine learning, GraphX for graph processing, and Structured
Streaming for stream processing.

SLIDE 26
Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)
Apache Spark Connector

Apache Spark Connector for SQL Server and Azure SQL is a high-
performance connector that enables you to use transactional data
in big data analytics and persists results for ad-hoc queries or
reporting. The connector allows you to use any SQL database, on-
premises or in the cloud, as an input data source or output data
sink for Spark jobs.

SLIDE 27
Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)
Unstructured Information Management

Unstructured Information Management applications are software

systems that analyze large volumes of unstructured information in
order to discover knowledge that is relevant to an end user. An
example UIM application might ingest plain text and identify
entities, such as persons, places, organizations; or relations, such
as works-for or located-at.
UIMA enables applications to be decomposed into components, for example "language

SLIDE 28
identification" => "language specific segmentation" => "sentence boundary detection" => "entity
detection (person/place names etc.)".
Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)
Fuzzy Logic

Fuzzy logic is a heuristic approach that allows for more advanced

decision-tree processing and better integration with rules-based
programming.

SLIDE 29
Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)
Fuzzy Logic (Cont.)

Fuzzy logic is an extension of classical Boolean logic that permits

the representation of intermediate values between completely
true and completely false. Instead of strict "0" or "1" values, fuzzy
logic allows degrees of truth ranging from 0 to 1.
In classical logic: "The weather is hot" might be True or False.
In fuzzy logic: "The weather is hot" could have a truth value of 0.7,

SLIDE 30
indicating it's somewhat hot but not extremely hot.

Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)

Fuzzy Logic (Cont.)
Text Classification:
Traditional text classification models assign texts to categories in a
binary manner, but documents often belong to multiple topics to
varying degrees.
Fuzzy classification allows assigning a document to multiple
categories with varying degrees of membership. For example, an
article about "climate change" could belong to both

SLIDE 31
"Environment" (0.8) and "Politics" (0.4).
Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)
Support Vector Machine (SVM)

Support Vector
Machine (SVM) is a
supervised machine
learning model that
uses classification
algorithms for two-
group classification

SLIDE 32
problems.
Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)
Random Forest

Random forest is a commonly-used machine learning algorithm,

which combines the output of multiple decision trees to reach a
single result. A decision tree in a forest cannot be pruned for
sampling and therefore, prediction selection. Its ease of use and
flexibility have fueled its adoption, as it handles both classification
and regression problems.

SLIDE 33
Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)

NLP Lecture Week 01 - 02
No ratings yet
NLP Lecture Week 01 - 02
32 pages
Topic 2: Introduction To Natural Language Processing (NLP)
No ratings yet
Topic 2: Introduction To Natural Language Processing (NLP)
16 pages
NLP Materia
No ratings yet
NLP Materia
29 pages
Module I NLP
No ratings yet
Module I NLP
65 pages
NLP LectureNotes UNIT 1
No ratings yet
NLP LectureNotes UNIT 1
55 pages
NLP Unit1 Presentation
No ratings yet
NLP Unit1 Presentation
65 pages
Elective
No ratings yet
Elective
10 pages
Archivo - 01 (3 Cópia)
No ratings yet
Archivo - 01 (3 Cópia)
5 pages
NLP DL
No ratings yet
NLP DL
26 pages
Introduction To Natural Language Processing
No ratings yet
Introduction To Natural Language Processing
21 pages
Natural Language Processing - Bridging The Gap Between Humans and Machines
No ratings yet
Natural Language Processing - Bridging The Gap Between Humans and Machines
6 pages
Applied Natural Language Processing
No ratings yet
Applied Natural Language Processing
3 pages
Natural Language Processing: John Doe CEO
No ratings yet
Natural Language Processing: John Doe CEO
16 pages
SNLP - 1
No ratings yet
SNLP - 1
11 pages
Unit 1 NLP
No ratings yet
Unit 1 NLP
76 pages
005 NLP Computer Vision and Neural Network (Machine Learning)
No ratings yet
005 NLP Computer Vision and Neural Network (Machine Learning)
45 pages
SCO409 Lecture Notes
No ratings yet
SCO409 Lecture Notes
64 pages
NLP Week 1 20
No ratings yet
NLP Week 1 20
20 pages
Wisdom Natural Language Processing
No ratings yet
Wisdom Natural Language Processing
4 pages
NLP Course Notes 2024-2025
No ratings yet
NLP Course Notes 2024-2025
38 pages
17B1NCI731 - ML&NLP - CD - Odd - 25-26
No ratings yet
17B1NCI731 - ML&NLP - CD - Odd - 25-26
2 pages
University Institute of Engineering Department of Computer Science and Engg
No ratings yet
University Institute of Engineering Department of Computer Science and Engg
9 pages
Natural Language Processing
No ratings yet
Natural Language Processing
29 pages
NLP Handwritten Notes
No ratings yet
NLP Handwritten Notes
26 pages
1 NLP
No ratings yet
1 NLP
26 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
24 pages
Eco 36
No ratings yet
Eco 36
6 pages
01 Introduction To Natural Language Processing
No ratings yet
01 Introduction To Natural Language Processing
42 pages
Practical Natural Language Processing A Comprehensive Guide To Building Real World NLP Systems 1st Edition Sowmya Vajjala Full Chapters Included
No ratings yet
Practical Natural Language Processing A Comprehensive Guide To Building Real World NLP Systems 1st Edition Sowmya Vajjala Full Chapters Included
123 pages
Introduction To Natural Language Processing
No ratings yet
Introduction To Natural Language Processing
211 pages
Natural Language Processing - Personal Notes
No ratings yet
Natural Language Processing - Personal Notes
8 pages
Slides - Introduction
No ratings yet
Slides - Introduction
128 pages
Natural Language Processing: All You Need To Know About
No ratings yet
Natural Language Processing: All You Need To Know About
45 pages
Languages: What Is Natural Language Processing ?
No ratings yet
Languages: What Is Natural Language Processing ?
25 pages
MBA NLP Course Outline 2023
No ratings yet
MBA NLP Course Outline 2023
5 pages
Natural Language Processing UNIT 1
No ratings yet
Natural Language Processing UNIT 1
130 pages
Natural Language Processing (NLP) (A Complete Guide)
No ratings yet
Natural Language Processing (NLP) (A Complete Guide)
26 pages
Intro to NLP: Concepts & Applications
No ratings yet
Intro to NLP: Concepts & Applications
80 pages
Hocken Maier 25
No ratings yet
Hocken Maier 25
46 pages
NLP Intro Logistics MIHE
No ratings yet
NLP Intro Logistics MIHE
21 pages
Natural Language Processing Course
No ratings yet
Natural Language Processing Course
2 pages
Seminar Title: Natural Language Processing: Understanding and Generating Human Language
No ratings yet
Seminar Title: Natural Language Processing: Understanding and Generating Human Language
20 pages
What Is Natural Language Processing (NLP) ?
No ratings yet
What Is Natural Language Processing (NLP) ?
11 pages
NLP Seminar for Graduate Students
No ratings yet
NLP Seminar for Graduate Students
22 pages
Tech Titans
No ratings yet
Tech Titans
12 pages
Google NLP: NLP (Natural Language Processing)
No ratings yet
Google NLP: NLP (Natural Language Processing)
8 pages
Introduction To Data Science - Week 7 - LAQ's
No ratings yet
Introduction To Data Science - Week 7 - LAQ's
4 pages
Unit 1
No ratings yet
Unit 1
99 pages
Natural Language Processing
No ratings yet
Natural Language Processing
5 pages
NLP for AI and Business Solutions
No ratings yet
NLP for AI and Business Solutions
13 pages
Top 7 NLP Books for 2024
No ratings yet
Top 7 NLP Books for 2024
5 pages
NLP Sheets
No ratings yet
NLP Sheets
23 pages
1 NLP (Introduction)
No ratings yet
1 NLP (Introduction)
60 pages
Natural Language Processing A Machine Learning Perspective by Yue Zhang, Westlake University Zhiyang Teng, Westlake University
No ratings yet
Natural Language Processing A Machine Learning Perspective by Yue Zhang, Westlake University Zhiyang Teng, Westlake University
768 pages
NLP Module 1
No ratings yet
NLP Module 1
31 pages
NLP 1
No ratings yet
NLP 1
11 pages
Introduction To NLP - Part 1
No ratings yet
Introduction To NLP - Part 1
23 pages
Hypothesis Space & Inductive Bias-1
No ratings yet
Hypothesis Space & Inductive Bias-1
47 pages
Day 2 Presentation
No ratings yet
Day 2 Presentation
65 pages
Clustering
No ratings yet
Clustering
18 pages
Experience AI - Glossary of Terms
No ratings yet
Experience AI - Glossary of Terms
12 pages
Digital Image Processing (Image Restoration)
100% (4)
Digital Image Processing (Image Restoration)
34 pages
10 53635-Jit 1274928-3051095
No ratings yet
10 53635-Jit 1274928-3051095
8 pages
Movie Recommendation System SEO
No ratings yet
Movie Recommendation System SEO
49 pages
Remote Sensing Third Edition Models and Methods For Image Processing Robert A. Schowengerdt Instant Download
No ratings yet
Remote Sensing Third Edition Models and Methods For Image Processing Robert A. Schowengerdt Instant Download
52 pages
Ann 5TH
No ratings yet
Ann 5TH
98 pages
Lab Manual Soft Computing
100% (1)
Lab Manual Soft Computing
44 pages
A Case Study On PNG UNITECH Campus
100% (1)
A Case Study On PNG UNITECH Campus
8 pages
Mahabaleshwar Classification
No ratings yet
Mahabaleshwar Classification
15 pages
LP-III Lab Manual
No ratings yet
LP-III Lab Manual
81 pages
Classification DMKD
No ratings yet
Classification DMKD
50 pages
Exam Overview: GCP Data Engineer
100% (5)
Exam Overview: GCP Data Engineer
12 pages
Stock Market Prediction Using Machine Learning: Gareja Pradip, Chitrak Bari, J. Shiva Nandhini
No ratings yet
Stock Market Prediction Using Machine Learning: Gareja Pradip, Chitrak Bari, J. Shiva Nandhini
4 pages
Sentiment Analysis of Product Reviews A Review
No ratings yet
Sentiment Analysis of Product Reviews A Review
6 pages
JOCC Volume 2 Issue 1 Page 9 19
No ratings yet
JOCC Volume 2 Issue 1 Page 9 19
11 pages
IEEE Final Paper
No ratings yet
IEEE Final Paper
6 pages
Data Science Bootcamp - UG - V1 - 0324
No ratings yet
Data Science Bootcamp - UG - V1 - 0324
30 pages
Electric Drive Fault Diagnosis
No ratings yet
Electric Drive Fault Diagnosis
14 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
61 pages
Theory and Application of Artificial Neural Networks
No ratings yet
Theory and Application of Artificial Neural Networks
101 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
3 pages
Ijerph 19 05215
No ratings yet
Ijerph 19 05215
19 pages
UNIT 2 PART 1 Data Science
No ratings yet
UNIT 2 PART 1 Data Science
49 pages
Introduction To NLP
No ratings yet
Introduction To NLP
50 pages
Practical Natural Language Processing A Comprehensive Guide To Building Real World NLP Systems 1st Edition Sowmya Vajjala Available All Format
100% (2)
Practical Natural Language Processing A Comprehensive Guide To Building Real World NLP Systems 1st Edition Sowmya Vajjala Available All Format
74 pages
Senior High EAPP Lesson Guide
No ratings yet
Senior High EAPP Lesson Guide
58 pages
Hacking AES-128
No ratings yet
Hacking AES-128
5 pages

NLP Toolkits for AI Students

Uploaded by

NLP Toolkits for AI Students

Uploaded by

WEEK 2

MUHAMMAD ATIF SAEED

• Natural Language Processing: Toolkits and Concepts

Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)

Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)

Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)

• No algorithm can be considered absolutely the best for all tasks

Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)

Natural Language Toolkit (NLTK) is a leading platform for building

spaCy is a library for advanced Natural Language Processing in

CoreNLP is a set of natural language analysis tools written in Java.

NLPnet is a Python library for Natural Language Processing tasks

Flair is a simple framework for state-of-the-art Natural Language

Catalyst is a C# Natural Language Processing library built for speed.

Apache OpenNLP is an open-source library for a machine learning

MLpack is a fast, flexible C++ machine learning library written in

OpenNN is an open-source neural networks library for machine

NVIDIA cuDNN is a GPU-accelerated library of primitives for deep

TensorFlow is an end-to-end open source platform for machine

Keras is a high-level neural networks API, written in Python and

PyTorch is a library for deep learning on irregular input data such

Scikit-Learn is a Python module for machine learning built on top

Theano is a Python library that allows you to define, optimize, and

Apache Spark is a unified analytics engine for large-scale data

Unstructured Information Management applications are software

Fuzzy logic is a heuristic approach that allows for more advanced

Fuzzy logic is an extension of classical Boolean logic that permits

Natural Language Processing (NLP) | MUHAMMAD ATIF SAEED (Lecturer)

Random forest is a commonly-used machine learning algorithm,

You might also like