Course Introduction
Tanmoy Chakraborty
Associate Professor, IIT Delhi
https://tanmoychak.com/
Introduction to Large Language Models
Instructors Teaching Assistants
Tanmoy Chakraborty Soumen Chakrabarti Anwoy Chatterjee Poulami Ghosh
IIT Delhi IIT Bombay PhD student, IIT Delhi PhD student, IIT Bombay
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Course Content
• This is an introductory graduate course and we will be teaching the fundamental
concepts underlying large language models.
• This course will start with a short introduction to NLP and Deep Learning, and then move
on to the architectural intricacies of Transformers, followed by the recent advances in
LLM research.
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Course Content
Basics
• Introduction
• Intro to NLP
• Intro to Deep
Learning
• Intro to Language
Models (LMs)
• Word Embeddings
(Word2Vec,
GloVE)
• Neural LMs (CNN,
RNN, Seq2Seq,
Attention)
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Course Content
Basics Architecture
• Introduction
• Intro to
• Intro to NLP Transformer
• Intro to Deep • Positional
Learning encoding
• Intro to Language • Tokenization
Models (LMs) strategies
• Word Embeddings • Decoder-only LM,
(Word2Vec, Prefix LM,
GloVE) Decoding
• Neural LMs (CNN, strategies
RNN, Seq2Seq, • Encoder-only LM,
Attention) Encoder-decoder
LM
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Course Content
Basics Architecture
• Introduction
Learnability
• Intro to
• Intro to NLP Transformer • Instruction fine-
• Intro to Deep tuning
• Positional
Learning encoding • In-context learning
• Intro to Language • Tokenization • Advanced
Models (LMs) strategies prompting (Chain of
• Word Embeddings Thoughts, Graph of
• Decoder-only LM, Thoughts, Prompt
(Word2Vec, Prefix LM,
GloVE) Chaining, etc.)
Decoding
• Neural LMs (CNN, strategies • Alignment
RNN, Seq2Seq, • Encoder-only LM, • PEFT
Attention) Encoder-decoder
LM
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Course Content
Basics Architecture
• Introduction
Learnability Knowledge &
• Intro to Retrieval
• Intro to NLP Transformer • Instruction fine-
tuning • Knowledge graphs
• Intro to Deep • Positional
Learning encoding • In-context learning • Open-book
question
• Intro to Language • Tokenization • Advanced answering
Models (LMs) strategies prompting (Chain of
Thoughts, Graph of • Retrieval
• Word Embeddings • Decoder-only LM, augmentation
(Word2Vec, Thoughts, Prompt
Prefix LM, Chaining, etc.) techniques
GloVE) Decoding
• Neural LMs (CNN, strategies • Alignment
RNN, Seq2Seq, • Encoder-only LM, • PEFT
Attention) Encoder-decoder
LM
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Course Content
Basics Architecture
• Introduction
Learnability Knowledge &
• Intro to Retrieval Ethics and Misc.
• Intro to NLP Transformer • Instruction fine-
tuning • Knowledge graphs
• Intro to Deep • Positional • Overview of recently
Learning encoding • In-context learning • Open-book popular models
question
• Intro to Language • Tokenization • Advanced answering • Bias, toxicity and
Models (LMs) strategies prompting (Chain of hallucination
Thoughts, Graph of • Retrieval
• Word Embeddings • Decoder-only LM, augmentation
(Word2Vec, Thoughts, Prompt
Prefix LM, Chaining, etc.) techniques
GloVE) Decoding
• Neural LMs (CNN, strategies • Alignment
RNN, Seq2Seq, • Encoder-only LM, • PEFT
Attention) Encoder-decoder
LM
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Pre-Requisites
• Excitement about language!
• Willingness to learn
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Pre-Requisites
• Excitement about language!
• Willingness to learn
Mandatory Desirable
• Data Structures & Algorithms • NLP
• Machine Learning • Deep learning
• Python programming
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Pre-Requisites
• Excitement about language!
• Willingness to learn
Mandatory Desirable
• Data Structures & Algorithms • NLP
• Machine Learning • Deep learning
• Python programming
This course will NOT cover:
• Details of NLP, Machine Learning and Deep Learning
• Generative models for modalities other than text
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Reading and Reference Materials
• Books (optional reading)
• Speech and Language Processing, Dan Jurafsky and James H. Martin
https://web.stanford.edu/~jurafsky/slp3/
• Foundations of Statistical Natural Language Processing, Chris Manning and Hinrich Schütze
• Natural Language Processing, Jacob Eisenstein
https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf
• A Primer on Neural Network Models for Natural Language Processing, Yoav Goldberg
http://u.cs.biu.ac.il/~yogo/nnlp.pdf
• Journals
• Computational Linguistics, Natural Language Engineering, TACL, JMLR, TMLR, etc.
• Conferences
• ACL, EMNLP, NAACL, COLING, ICML, NeurIPS, ICLR, AAAI, WWW, KDD, SIGIR, etc.
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Research Papers Repository
https://aclanthology.org/
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Research Papers Repository
https://arxiv.org/list/cs.CL/recent
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Acknowledgements (Non-exhaustive List)
• Advanced NLP, Graham Neubig http://www.phontron.com/class/anlp2022/
• Advanced NLP, Mohit Iyyer https://people.cs.umass.edu/~miyyer/cs685/
• NLP with Deep Learning, Chris Manning, http://web.stanford.edu/class/cs224n/
• Understanding Large Language Models, Danqi Chen https://www.cs.princeton.edu/courses/archive/fall22/cos597G/
• Natural Language Processing, Greg Durrett https://www.cs.utexas.edu/~gdurrett/courses/online-course/materials.html
• Large Language Models: https://stanford-cs324.github.io/winter2022/
• Natural Language Processing at UMBC, https://laramartin.net/NLP-class/
• Computational Ethics in NLP, https://demo.clab.cs.cmu.edu/ethical_nlp/
• Self-supervised models, CS 601.471/671: Self-supervised Models (jhu.edu)
• WING.NUS Large Language Models, https://wing-nus.github.io/cs6101/
• And many more…
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
What is a Language Model (LM)?
Language Model gives the probability distribution over a sequence of tokens.
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
What is a Language Model (LM)?
Language Model gives the probability distribution over a sequence of tokens.
Language Model
Vocabulary
V = {arrived, delhi, have,
is, monsoon, rains, the}
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
What is a Language Model (LM)?
Language Model gives the probability distribution over a sequence of tokens.
P(the monsoon rains
have arrived) 0.2
Language Model
Vocabulary
V = {arrived, delhi, have,
is, monsoon, rains, the}
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
What is a Language Model (LM)?
Language Model gives the probability distribution over a sequence of tokens.
P(the monsoon rains
have arrived) 0.2
Language Model
P(monsoon the have
rains arrived) 0.001
Vocabulary
V = {arrived, delhi, have,
is, monsoon, rains, the}
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
LMs can ‘Generate’ Text !
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
LMs can ‘Generate’ Text !
Vocabulary
Given input ‘the monsoon rains have’ ,
V = {arrived, delhi, have, LM can calculate
is, monsoon, rains, the}
P(xi | the monsoon rains have) , ∀ xi ϵ V
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
LMs can ‘Generate’ Text !
Vocabulary
Given input ‘the monsoon rains have’ ,
V = {arrived, delhi, have, LM can calculate
is, monsoon, rains, the}
P(xi | the monsoon rains have) , ∀ xi ϵ V
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
LMs can ‘Generate’ Text !
Vocabulary
Given input ‘the monsoon rains have’ ,
V = {arrived, delhi, have, LM can calculate
is, monsoon, rains, the}
P(xi | the monsoon rains have) , ∀ xi ϵ V
For generation, next token is sampled
from this probability distribution
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
LMs can ‘Generate’ Text !
Vocabulary
Given input ‘the monsoon rains have’ ,
V = {arrived, delhi, have, LM can calculate
is, monsoon, rains, the}
P(xi | the monsoon rains have) , ∀ xi ϵ V
For generation, next token is sampled
from this probability distribution
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
LMs can ‘Generate’ Text !
Vocabulary
Given input ‘the monsoon rains have’ ,
V = {arrived, delhi, have, LM can calculate
is, monsoon, rains, the}
P(xi | the monsoon rains have) , ∀ xi ϵ V
For generation, next token is sampled
Auto-regressive LMs calculate from this probability distribution
this distribution efficiently, e.g.
using ‘Deep’ Neural Networks
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
‘Large’ Language Models
The ‘Large’ in terms of model's size (# parameters) and massive size of training dataset.
Model sizes have
increased by an order of
5000x over just the last
4 years !!!
Image source: https://hellofuture.orange.com/en/the-gpt-3-language-model-revolution-or-evolution/
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
‘Large’ Language Models
The ‘Large’ in terms of model's size (# parameters) and massive size of training dataset.
Model sizes have
increased by an order of
5000x over just the last
4 years !!!
Other recent models: PaLM (540B), OPT (175B), BLOOM
(176B), Gemini-Ultra (1.56T), GPT-4 (1.76T)
Disclaimer: For API-based models like GPT-4/Gemini-Ultra,
the number of parameters are not announced officially –
these are rumored numbers as on the web
Image source: https://hellofuture.orange.com/en/the-gpt-3-language-model-revolution-or-evolution/
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
LLMs in AI Landscape
Image source: https://www.manning.com/books/build-a-large-language-model-from-scratch
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Evolution of
(L)LMs
Image source:
https://synthedia.substack.com/p/a-
timeline-of-large-language-model
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Post-Transformers Era
The LLM Race
Google Designed Transformers: But Could it Take
Advantage?
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Google Designed Transformers: But Could it Take
Advantage?
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Google Designed Transformers: But Could it Take
Advantage?
The beginning of use of
Transformer as Language
Representation Models.
BERT achieved SOTA on 11 NLP
tasks.
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Google Designed Transformers: But Could it Take
Advantage?
DistilBERT, TinyBERT, MobileBERT
The beginning of use of
Transformer as Language
Representation Models.
BERT achieved SOTA on 11 NLP
tasks.
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
However, someone was waiting for the right
opportunity!!
Guess Who?
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
However, someone was waiting for the right
opportunity!!
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
OpenAI Started Pushing the Frontier
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
OpenAI Started Pushing the Frontier
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
OpenAI Started Pushing the Frontier
• Use of decoder-only architecture
• The idea of generative pre-training over large corpus
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
The Beginning of Scale
• GPT-1 (117 M) → GPT-2 (1.5 B) 13x increase in # parameters
• Minimal changes (some LayerNorms added, modified weight
initialization)
• Increase in context length: GPT-1 (512 tokens) → GPT-2 (1024 tokens)
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
The Beginning of Scale
Performance boosts across tasks
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
What Was Google Developing Parallelly?
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
What Was Google Developing Parallelly?
• Similar broader goal of converting all text-based language problems
into a text-to-text format.
• Used Encoder-Decoder Architecture.
• Pre-training strategy differs from GPT
• Strategy more similar to BERT
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Was It Only Google vs OpenAI?
Where did Meta Stand?
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Was It Only Google vs OpenAI?
Where did Meta Stand?
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Was It Only Google vs OpenAI?
Where did Meta Stand?
• Replication study of BERT pretraining
• Measured the impact of many key
hyperparameters and training data
size.
• Found that BERT was significantly
undertrained, and can match or
exceed the performance of every
model published after it.
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Was It Only Google vs OpenAI?
Where did Meta Stand?
• Replication study of BERT pretraining
• Measured the impact of many key
hyperparameters and training data
size.
• Found that BERT was significantly
undertrained, and can match or
exceed the performance of every
model published after it.
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Was It Only Google vs OpenAI?
Where did Meta Stand?
• Proposed methods to learn
• Replication study of BERT pretraining cross-lingual language models
• Measured the impact of many key
(XLMs)
hyperparameters and training data • Obtained SOTA on:
size.
• cross-lingual classification
• Found that BERT was significantly
• unsupervised and
undertrained, and can match or supervised machine
exceed the performance of every
translation
model published after it.
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
OpenAI Continues to Scale
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
OpenAI Continues to Scale
175 B parameters !
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
OpenAI Continues to Scale
175 B parameters !
OpenAI stops open-sourcing!!
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Google Starts Scaling too (But is it Late) !
540 B parameters !
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Google Starts Scaling too (But is it Late) !
540 B parameters !
Google follows OpenAI in
stopping open-sourcing !
It’s now the “LLM Race”
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
2021-2022: A Flurry of LLMs
Megatron-Turing
NLG
Codex
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Meta Promotes Open-sourcing !
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Meta Promotes Open-sourcing !
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Meta Promotes Open-sourcing !
• A suite of decoder-only pre-trained
transformers ranging from 125M to
175B parameters
• Open-sourced !!!
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
The ChatGPT Moment
November 30, 2022
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
2023: The Year of Rapid Pace
March, 2023: Anthropic, a
Feb, 2023: Meta releases start-up founded in 2021 by March, 2023: OpenAI
its LLaMA family of open- ex-OpenAI researchers, releases GPT-4, which is
Feb, 2023: Google releases
source models releases Claude multimodal
Bard
June, 2023: Microsoft Sept, 2023: Mistral AI
releases Phi-1, a 1.3B releases Mistral-7B Nov, 2023: xAI releases Dec, 2023: Google
LLM for code model Grok releases Gemini
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
And now in 2024 seeing even more
rapid advancements !
Why Does This Course Exist?
Why Does This Course Exist?
Why do we need a separate course on LLMs? What changes with the scale of LMs?
Content credits: https://stanford-cs324.github.io/winter2022/
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Why Does This Course Exist?
Why do we need a separate course on LLMs? What changes with the scale of LMs?
Emergence
Content credits: https://stanford-cs324.github.io/winter2022/
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Why Does This Course Exist?
Why do we need a separate course on LLMs? What changes with the scale of LMs?
Emergence
Although the technical machineries are almost similar, ‘just scaling up’ these models
results in new emergent behaviors, which lead to significantly different capabilities and
societal impacts.
Content credits: https://stanford-cs324.github.io/winter2022/
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Why Does This Course Exist?
LLMs show emergent capabilities, not observed previously in ‘small’ LMs.
Content credits: https://stanford-cs324.github.io/winter2022/
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Why Does This Course Exist?
LLMs show emergent capabilities, not observed previously in ‘small’ LMs.
• In-context learning: A pre-trained language model can be guided with only prompts to perform different tasks
(without separate task-specific fine-tuning).
• In-context learning is an example of emergent behavior.
Content credits: https://stanford-cs324.github.io/winter2022/
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Why Does This Course Exist?
LLMs show emergent capabilities, not observed previously in ‘small’ LMs.
• In-context learning: A pre-trained language model can be guided with only prompts to perform different tasks
(without separate task-specific fine-tuning).
• In-context learning is an example of emergent behavior.
LLMs are widely adopted in real-world.
Content credits: https://stanford-cs324.github.io/winter2022/
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Why Does This Course Exist?
LLMs show emergent capabilities, not observed previously in ‘small’ LMs.
• In-context learning: A pre-trained language model can be guided with only prompts to perform different tasks
(without separate task-specific fine-tuning).
• In-context learning is an example of emergent behavior.
LLMs are widely adopted in real-world.
• Research: LLMs have transformed NLP research world, achieving state-of-the-art performance across a wide
range of tasks such as sentiment classification, question answering, summarization, and machine translation.
Content credits: https://stanford-cs324.github.io/winter2022/
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Why Does This Course Exist?
LLMs show emergent capabilities, not observed previously in ‘small’ LMs.
• In-context learning: A pre-trained language model can be guided with only prompts to perform different tasks
(without separate task-specific fine-tuning).
• In-context learning is an example of emergent behavior.
LLMs are widely adopted in real-world.
• Research: LLMs have transformed NLP research world, achieving state-of-the-art performance across a wide
range of tasks such as sentiment classification, question answering, summarization, and machine translation.
• Industry: Here is a very incomplete list of some high profile large language models that are being used in
production systems:
• Google Search (BERT)
• Facebook content moderation (XLM)
• Microsoft’s Azure OpenAI Service (GPT-3/3.5/4)
Content credits: https://stanford-cs324.github.io/winter2022/
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Why Does This Course Exist?
With tremendous capabilities, LLMs’ usage also carries various risks.
Content credits: https://stanford-cs324.github.io/winter2022/
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Why Does This Course Exist?
With tremendous capabilities, LLMs’ usage also carries various risks.
• Reliability & Disinformation: LLMs often hallucinate – generate responses that seem correct, but
are not factually correct.
• Significant challenge for high-stakes applications like healthcare
Content credits: https://stanford-cs324.github.io/winter2022/
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Why Does This Course Exist?
With tremendous capabilities, LLMs’ usage also carries various risks.
• Reliability & Disinformation: LLMs often hallucinate – generate responses that seem correct, but
are not factually correct.
• Significant challenge for high-stakes applications like healthcare
• Social bias: Most LLMs show performance disparities across demographic groups, and their
predictions can enforce stereotypes.
• P(He is a doctor) > P(She is a doctor.)
• Training data contains inherent bias
Content credits: https://stanford-cs324.github.io/winter2022/
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Why Does This Course Exist?
With tremendous capabilities, LLMs’ usage also carries various risks.
• Reliability & Disinformation: LLMs often hallucinate – generate responses that seem correct, but
are not factually correct.
• Significant challenge for high-stakes applications like healthcare
• Social bias: Most LLMs show performance disparities across demographic groups, and their
predictions can enforce stereotypes.
• P(He is a doctor) > P(She is a doctor.)
• Training data contains inherent bias
• Toxicity: LLMs can generate toxic/hateful content.
• Trained on a huge amount of Internet data (e.g., Reddit), which inevitably contains offensive content
• Challenge for applications such as writing assistants or chatbots
Content credits: https://stanford-cs324.github.io/winter2022/
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Why Does This Course Exist?
With tremendous capabilities, LLMs’ usage also carries various risks.
• Reliability & Disinformation: LLMs often hallucinate – generate responses that seem correct, but
are not factually correct.
• Significant challenge for high-stakes applications like healthcare
• Social bias: Most LLMs show performance disparities across demographic groups, and their
predictions can enforce stereotypes.
• P(He is a doctor) > P(She is a doctor.)
• Training data contains inherent bias
• Toxicity: LLMs can generate toxic/hateful content.
• Trained on a huge amount of Internet data (e.g., Reddit), which inevitably contains offensive content
• Challenge for applications such as writing assistants or chatbots
• Security: LLMs are trained on a scrape of the public Internet - anyone can put up a website that
can enter the training data.
• An attacker can perform a data poisoning attack. Content credits: https://stanford-cs324.github.io/winter2022/
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
We Will Cover Almost All of These in 5 Modules
Module-1: Basics
• A refresher on the basics of NLP required to understand and appreciate LLMs.
• A brief introduction to the basics of Deep Learning.
Intro to Deep
Intro to NLP
Learning
• The basics of Statistical Language Modelling.
• How did we end up in Neural NLP? Intro to Language
Word Embeddings
(Word2Vec,
Models (LMs)
GloVE)
• We will discuss the transition and the foundations of Neural NLP.
Neural LMs (CNN,
• Initial Neural LMs RNN, Seq2Seq,
Attention)
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
We Will Cover Almost All of These in 5 Modules
• Module-2: Architecture
Intro to Transformer Positional encoding
• Workings of Vanilla Transformers
• Positional encoding and Tokenization strategies Decoder-only LM,
Tokenization Prefix LM,
strategies Decoding
• Different Transformer Variants strategies
• How do their training strategies differ? How are Masked LMs (like, BERT) Encoder-only LM,
Encoder-decoder
different from Auto-regressive LMs (like, GPT)? LM
• Response generation (Decoding) strategies
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
We Will Cover Almost All of These in 5 Modules
• Module-3: Learnability Instruction fine-
In-context learning
tuning
• What makes modern LLMs so good in following user instructions?
• What is In-context Learning? What are its various facets? Advanced
Alignment
Prompting
• What kind of prompting techniques are required to elicit reasoning in LLMs?
• How are LLMs made to generate responses preferred by humans? PEFT
• Does it remove toxicity in responses?
• Efficiency is crucial in production systems.
• How are LLMs efficiently fine-tuned?
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
We Will Cover Almost All of These in 5 Modules
• Module-4: Knowledge and Retrieval
Open-book
Knowledge graphs
• Knowledge graphs (KGs) question answering
• Representation, completion
Retrieval
• Tasks: Alignment and isomorphism
augmentation
techniques
• Distinction between graph neural networks and neural KG inference
• Open-book question answering: retrieving from structured and unstructured sources
• Retrieval augmentation techniques
• Key-value memory networks in QA for simple paths in KGs
• Early HotPotQA solvers, pointer networks, reading comprehension
• REALM, RAG, FiD, Unlimiformer
• KGQA (e.g., EmbedKGQA, GrailQA)
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
We Will Cover Almost All of These in 5 Modules
• Module-5: Ethics and Miscellaneous
• A discussion on ethical issues and risks of LLM usage Bias, toxicity and
hallucination
• An overview of the recent popular LLMs, like GPT4, Llama 3,
Claude 3, Mistral, and Gemini.
Overview of the recent
popular LLMs
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Suggestions (For Effective Learning)
• To understand the concepts clearly, experiment with the models (Hugging Face makes life easier).
• Smaller models (like, GPT2) can be run on Google Colab / Kaggle.
• Even 7B models can be run with proper quantization.
Always get your hands dirty !
LLM Research is all about implementing and experimenting with your ideas.
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty
Suggestions (For Effective Learning)
• To understand the concepts clearly, experiment with the models (Hugging Face makes life easier).
• Smaller models (like, GPT2) can be run on Google Colab / Kaggle.
• Even 7B models can be run with proper quantization.
Rule of thumb:
Never believeAlways in any hypothesis until your
get your hands dirty !
experiments verify it !
LLM Research is all about implementing and experimenting with your ideas.
Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty