[go: up one dir, main page]

0% found this document useful (0 votes)
52 views45 pages

Chapter 1

Uploaded by

Yahya Alnwsany
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views45 pages

Chapter 1

Uploaded by

Yahya Alnwsany
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Introduction to

Hugging Face
W O R K I N G W I T H H U G G I N G FA C E

Jacob H. Marquez
Lead Data Engineer
What is Hugging Face?

Collaboration platform

Open-source machine learning


Text, vision, and audio tasks

Models, datasets, frameworks

Reduce barriers to entry

1 https://huggingface.co/

WORKING WITH HUGGING FACE


In this course
Navigate and use the Hugging Face Hub

Explore models and datasets

Build pipelines for text, image, and audio data

Fine-tuning, generation, embeddings, and semantic search

WORKING WITH HUGGING FACE


Large Language Models
LLMs
Understand and generate human-like text

Massive amounts of data

Learn patterns in sequences

1 https://en.wikipedia.org/wiki/Large_language_model

WORKING WITH HUGGING FACE


Large Language Models
LLMs
Understand and generate human-like text

Massive amounts of data

Learn patterns in sequences

Transformer architecture

1 https://towardsdatascience.com/transformers-89034557de14

WORKING WITH HUGGING FACE


Large Language Models
LLMs
Understand and generate human-like text

Massive amounts of data

Learn patterns in sequences

Transformer architecture

Popular options are GPT and Llama

WORKING WITH HUGGING FACE


Benefits of Hugging Face

Faster experimentation

WORKING WITH HUGGING FACE


Benefits of Hugging Face

Faster experimentation

Supports every step of the process

WORKING WITH HUGGING FACE


Benefits of Hugging Face

Faster experimentation

Supports every step of the process


Smoother adoption

WORKING WITH HUGGING FACE


Deciding when to use
Use Hugging Face Use another solution

Quick way to use ML tasks Slow computer

Don't have deep ML expertise Highly customized architectures


Testing several models Domain specific needs not yet met

Dataset needed Not leveraging advanced ML techniques

WORKING WITH HUGGING FACE


Installing Hugging Face
Hugging Face

pip install transformers datasets

ML Framework

pip install torch torchvision torchaudio

1 https://pytorch.org/

WORKING WITH HUGGING FACE


Let's practice!
W O R K I N G W I T H H U G G I N G FA C E
Transformers and
the Hub
W O R K I N G W I T H H U G G I N G FA C E

Jacob H. Marquez
Lead Data Engineer
Transformers - the Hugging Face package

1 https://github.com/huggingface/transformers

WORKING WITH HUGGING FACE


Transformers - the model architecture
Neural network models
Learn context and understanding

Core components:
Encoder

Decoder

Self-attention mechanism

Transform input to numerical


representations

Helps model understand context of the


input

1 https://www.turing.com/kb/brief-introduction-to-transformers-and-their-power

WORKING WITH HUGGING FACE


Uses cases of transformers

Use cases for text, image, and vision

Classification for all three


Automatic speech recognition

Text summarization

Object detection for autonomous driving

WORKING WITH HUGGING FACE


A key benefit of transformers

Enables Hugging Face models to perform well on new tasks with little data

1 https://www.topbots.com/transfer-learning-in-nlp/#transfer-learning

WORKING WITH HUGGING FACE


The Hub

1 https://huggingface.co/

WORKING WITH HUGGING FACE


Navigating the Hub

1 https://huggingface.co/

WORKING WITH HUGGING FACE


Searching for models

1 https://huggingface.co/models

WORKING WITH HUGGING FACE


Searching for models

1 https://huggingface.co/models

WORKING WITH HUGGING FACE


Searching for models

1 https://huggingface.co/models

WORKING WITH HUGGING FACE


Searching for models

1 https://huggingface.co/models

WORKING WITH HUGGING FACE


Model cards

1 https://huggingface.co/openai/whisper-large-v3

WORKING WITH HUGGING FACE


Using huggingface_hub
pip install huggingface_hub

from huggingface_hub import HfApi


api = HfApi()
list(api.list_models())

[ModelInfo: {
{'_id': '622fea36174feb5439c2e4be',
'author': 'cardiffnlp',
...}]

1 https://github.com/huggingface/huggingface_hub

WORKING WITH HUGGING FACE


Using huggingface_hub
models = api.list_models( task searches for specified task
filter=ModelFilter(
sort will order the list
task="text-classification"),
sort="downloads", direction provides the direction of the
direction=-1, sorted order
limit=5
-1 for descending
)
) all other numbers for ascending

modelList = list(models) limit will limit the number of models


returned
print(modelList[0])

Model Name: albert/albert-base-v1, Tags: [...]

1 https://github.com/huggingface/huggingface_hub

WORKING WITH HUGGING FACE


Saving a model locally
# Import AutoModel
from transformers import AutoModel

modelId = "distilbert-base-uncased-finetuned-sst-2-english"

# Download model using the modelId


model = AutoModel.from_pretrained(modelId)

# Save the model to a local directory


model.save_pretrained(save_directory=f"models/{modelId}")

Be mindful of storage!

1 https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModel

WORKING WITH HUGGING FACE


Let's practice!
W O R K I N G W I T H H U G G I N G FA C E
Working with
datasets
W O R K I N G W I T H H U G G I N G FA C E

Jacob H. Marquez
Lead Data Engineer
Datasets in Hugging Face

1 https://huggingface.co/datasets

WORKING WITH HUGGING FACE


Searching for datasets

1 https://huggingface.co/datasets

WORKING WITH HUGGING FACE


Dataset cards

1 https://huggingface.co/datasets/imdb

WORKING WITH HUGGING FACE


Dataset cards
Description

Dataset structure

An example

Field metadata

Training and testing splits

1 https://huggingface.co/datasets/imdb

WORKING WITH HUGGING FACE


Dataset cards

1 https://huggingface.co/datasets/imdb

WORKING WITH HUGGING FACE


Dataset cards

1 https://huggingface.co/datasets/imdb

WORKING WITH HUGGING FACE


datasets package
pip install datasets

Access

Download

Mutate

Use

Share

1 https://huggingface.co/docs/datasets/index

WORKING WITH HUGGING FACE


Inspecting a dataset
from datasets import load_dataset_builder

data_builder = load_dataset_builder("imdb")

print(data_builder.info.description)

Large Movie Review Dataset. This is a dataset for sentiment classification...

print(data_builder.info.features)

{'text': Value(dtype='string', id=None), 'label': Value(dtype='string', id=None)}

1 https://huggingface.co/docs/datasets/load_hub

WORKING WITH HUGGING FACE


Downloading a dataset
from datasets import load_dataset

data = load_dataset("imdb")

Split parameter

data = load_dataset("imdb", split="train")

Configuration parameter

data = load_dataset("wikipedia", "20231101.en")

1 https://huggingface.co/docs/datasets/v2.15.0/loading

WORKING WITH HUGGING FACE


Use in datasets

WORKING WITH HUGGING FACE


Use in datasets

WORKING WITH HUGGING FACE


Apache Arrow dataset formats

1 https://arrow.apache.org/overview/

WORKING WITH HUGGING FACE


Mutating a dataset
imdb = load_dataset("imdb", split="train")

# Filter imdb
filtered = imdb.filter(lambda row: row['label']==0)

{'text': 'I rented I AM CURIOUS-YELLOW...''}

1 https://huggingface.co/docs/datasets/process#select-and-filter

WORKING WITH HUGGING FACE


Mutating a dataset
# Slicing
sliced = filtered.select(range(2))

print(sliced)

Dataset({features: ['id', 'url', 'title', 'text'], num_rows: 2})

print(sliced[0]['text'])

1 https://huggingface.co/docs/datasets/process#select-and-filter

WORKING WITH HUGGING FACE


Benefits of datasets

Accessible and shareable

Relevant to common ML tasks


Efficient processing on large data

Faster querying

Convenient complimentary datasets package

WORKING WITH HUGGING FACE


Let's practice!
W O R K I N G W I T H H U G G I N G FA C E

You might also like