Project Overview

Model Interpretation and Performance Improvement with Large Language Models and Data Attribution: For this project we develop methods to improve language model performance by using important data as context, and use large language models to better explain smaller models with data attribution results.

File Descriptions

cifar_clip.ipynb: Implements image classification using the CLIP model on the CIFAR dataset.
cifar_llm.ipynb: Uses a large language model for CIFAR image classification.
cifar_resnet.ipynb: Implements ResNet model for CIFAR image classification.
cifar_trak.ipynb: Performs TRAK analysis on CIFAR image classification.

Project Report

For detailed methodology, results, and analysis, please refer to the project report.

Setup Instructions

Clone the Repository:

git clone <repository_url>
cd <repository_directory>

Install Dependencies: Ensure you have Python 3.10 or later installed. Install the required packages using pip:
```
pip install -r requirements.txt
```
Download Necessary Data: Some notebooks may require downloading datasets. Follow the instructions within each notebook to download the necessary data.

Running the Notebooks

Launch Jupyter Notebook:
```
jupyter notebook
```
Open the Desired Notebook: Navigate to the notebook you want to run (e.g., cifar_clip.ipynb) and open it.
Run the Notebook: Follow the instructions within the notebook to execute the cells. Ensure you run the cells in order to avoid any errors.

Notebooks to run to get the data in the report (Notebooks should be run in order of the list below)

CIFAR Task

cifar_resnet.ipynb:
- Run this notebook to train a ResNet9 architecture on the CIFAR-10 dataset. We will use this model for data attribution with TRAK.
- Save model checkpoints in the directory CHECKPOINT_DIR
cifar_trak.ipynb:
- Run this notebook to load the ResNet9 model and compute the TRAK scores of on the training set.
- Load model checkpoints from the directory CHECKPOINT_DIR, and save the images with their TRAK scores in the directory IMAGE_DIR.
cifar_clip.ipynb:
- Run this notebook to run the CLIP model on the scored CIFAR-10 training data.
- Load the images with their TRAK scores in the directory IMAGE_DIR.
cifar_llm.ipynb:
- Run this notebook to run the CLIP model then generate LLM descriptions and score the reconstruction accuracy.
- Ensure you have the torchvision and transformers libraries installed.
- Load the images with their TRAK scores in the directory IMAGE_DIR, and set the OpenAI API key in the api_key variable.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
trak_images		trak_images
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
TRAK_scores.ipynb		TRAK_scores.ipynb
cifar_analysis.ipynb		cifar_analysis.ipynb
cifar_clip.ipynb		cifar_clip.ipynb
cifar_llm.ipynb		cifar_llm.ipynb
cifar_quickstart.ipynb		cifar_quickstart.ipynb
cifar_resnet.ipynb		cifar_resnet.ipynb
cifar_trak.ipynb		cifar_trak.ipynb
cifar_trak_precompute.ipynb		cifar_trak_precompute.ipynb
image_info.npy		image_info.npy
llm_info.npy		llm_info.npy
mnist_cnn.ipynb		mnist_cnn.ipynb
mnist_cnn.pt		mnist_cnn.pt
mnist_trak.ipynb		mnist_trak.ipynb
report.pdf		report.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Overview

File Descriptions

Project Report

Setup Instructions

Running the Notebooks

Notebooks to run to get the data in the report (Notebooks should be run in order of the list below)

CIFAR Task

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project Overview

File Descriptions

Project Report

Setup Instructions

Running the Notebooks

Notebooks to run to get the data in the report (Notebooks should be run in order of the list below)

CIFAR Task

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages