2/27/25, 11:59 PM PLC_codeGen_codelama.
ipynb - Colab
pip install transformers accelerate torch datasets peft bitsandbytes
https://colab.research.google.com/drive/1p58RwOA0-k0Gq5edNOY6mFis61SBuyae#scrollTo=h40RpyaynDYP&printMode=true 1/6
2/27/25, 11:59 PM PLC_codeGen_codelama.ipynb - Colab
Success u y u sta ed d a cuda cupt cu .5.8
Attempting uninstall: nvidia-cublas-cu12
Found existing installation: nvidia-cublas-cu12 12.5.3.2
Uninstalling nvidia-cublas-cu12-12.5.3.2:
Successfully uninstalled nvidia-cublas-cu12-12.5.3.2
Attempting uninstall: nvidia-cusparse-cu12
Found existing installation: nvidia-cusparse-cu12 12.5.1.3
Uninstalling nvidia-cusparse-cu12-12.5.1.3:
Successfully uninstalled nvidia-cusparse-cu12-12.5.1.3
Attempting uninstall: nvidia-cudnn-cu12
Found existing installation: nvidia-cudnn-cu12 9.3.0.75
Uninstalling nvidia-cudnn-cu12-9.3.0.75:
Successfully uninstalled nvidia-cudnn-cu12-9.3.0.75
Attempting uninstall: nvidia-cusolver-cu12
Found existing installation: nvidia-cusolver-cu12 11.6.3.83
Uninstalling nvidia-cusolver-cu12-11.6.3.83:
Successfully uninstalled nvidia-cusolver-cu12-11.6.3.83
Successfully installed bitsandbytes-0.45.3 datasets-3.3.2 dill-0.3.8 multiprocess-0.70.16 nvidia-cublas-cu12-12.4.5.8 nvidia-cuda-cupt
from datasets import Dataset
from transformers import AutoTokenizer
# Load CSV dataset
import pandas as pd
df = pd.read_csv("dataset.csv")
# Load tokenizer
model_name = "codellama/CodeLlama-7b-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name)
# 🚨 FIX: Assign padding token explicitly
tokenizer.pad_token = tokenizer.eos_token # Use EOS token for padding
# Tokenize dataset
def tokenize_function(examples):
inputs = tokenizer(
examples["name"], truncation=True, padding="max_length", max_length=512
)
targets = tokenizer(
https://colab.research.google.com/drive/1p58RwOA0-k0Gq5edNOY6mFis61SBuyae#scrollTo=h40RpyaynDYP&printMode=true 2/6
2/27/25, 11:59 PM PLC_codeGen_codelama.ipynb - Colab
examples["code"], truncation=True, padding="max_length", max_length=512
)
return {
"input_ids": inputs["input_ids"],
"attention_mask": inputs["attention_mask"],
"labels": targets["input_ids"], # Expected output
}
# Convert Pandas DataFrame to Hugging Face Dataset
dataset = Dataset.from_pandas(df)
# Apply tokenization
dataset = dataset.map(tokenize_function, batched=True, remove_columns=["name", "code"])
# Split dataset into training and validation sets
dataset = dataset.train_test_split(test_size=0.1)
print("✅ Dataset successfully tokenized and prepared!")
Map: 100% 50/50 [00:00<00:00, 449.15 examples/s]
✅ Dataset successfully tokenized and prepared!
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
# Define 4-bit quantization settings
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.float16
)
model_name = "codellama/CodeLlama-7b-hf"
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
https://colab.research.google.com/drive/1p58RwOA0-k0Gq5edNOY6mFis61SBuyae#scrollTo=h40RpyaynDYP&printMode=true 3/6
2/27/25, 11:59 PM PLC_codeGen_codelama.ipynb - Colab
# Free GPU memory before loading
torch.cuda.empty_cache()
from accelerate import infer_auto_device_map
device_map = infer_auto_device_map(model, max_memory={0: "12GiB", "cpu": "20GiB"})
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
device_map=device_map,
torch_dtype=torch.float16,
)
print("✅ Model loaded successfully on Google Colab's T4 GPU!")
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-18-dfc82b45ae0f> in <cell line: 0>()
21 device_map = infer_auto_device_map(model, max_memory={0: "12GiB", "cpu": "20GiB"})
22
---> 23 model = AutoModelForCausalLM.from_pretrained(
24 model_name,
25 quantization_config=bnb_config,
2 frames
/usr/local/lib/python3.11/dist-packages/transformers/quantizers/quantizer_bnb_4bit.py in validate_environment(self, *args, **kwargs)
101 pass
102 elif "cpu" in device_map_without_lm_head.values() or "disk" in device_map_without_lm_head.values():
--> 103 raise ValueError(
104 "Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the "
105 "quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules "
ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you
want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set
`llm_int8_enable_fp32_cpu_offload=True` and pass a custom `device_map` to `from_pretrained`. Check
https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details.
https://colab.research.google.com/drive/1p58RwOA0-k0Gq5edNOY6mFis61SBuyae#scrollTo=h40RpyaynDYP&printMode=true 4/6
2/27/25, 11:59 PM PLC_codeGen_codelama.ipynb - Colab
Next steps: Explain error
# Load fine-tuned model
tokenizer = AutoTokenizer.from_pretrained("./finetuned_codellama")
model = AutoModelForCausalLM.from_pretrained("./finetuned_codellama", torch_dtype=torch.float16, device_map="auto")
# New Prompt
prompt = "generate ladder of bottle filling system on conveyer belt"
# Tokenize input
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
# Generate code
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=100)
# Decode output
generated_code = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_code)
https://colab.research.google.com/drive/1p58RwOA0-k0Gq5edNOY6mFis61SBuyae#scrollTo=h40RpyaynDYP&printMode=true 5/6
2/27/25, 11:59 PM PLC_codeGen_codelama.ipynb - Colab
https://colab.research.google.com/drive/1p58RwOA0-k0Gq5edNOY6mFis61SBuyae#scrollTo=h40RpyaynDYP&printMode=true 6/6