SARC

Project Setup

Download the dataset.. Create a folder named dataset that is structured like this (don't forget to extract the files):

dataset/
├─ main/
|   ├─ comments.json
|   ├─ test-balanced.csv
|   └─ train-balanced.csv
└─ pol/
   ├─ comments.json
   ├─ test-balanced.csv
   └─ train-balanced.csv

Put the dataset folder at this repo's root directory.
Still at the repo's root directory, run git submodule update --init. This is one of the dependencies to create bag-of-n-grams (bong).
If you want to use word embedding instead of bong, download 1600-dimensional Amazon GloVe embeddings (NOTE: 2.6 GB compressed, 8.7 GB uncompressed). Then put the extracted .txt file inside the dataset folder.

Run the baseline models

Run one of the following commands: ($EMBEDDING is the file of downloaded GloVe embeddings)

'all' dataset

# Bag-of-Words on all:
python eval.py main -l --min_count 5

# Bag-of-Bigrams on all
python eval.py main -n 2 -l --min_count 5

# Embedding on all
python eval.py main -e -l --embedding dataset/amazon_glove1600.txt

'pol' dataset

# Bag-of-Words on pol
python eval.py pol -l

# Bag-of-Bigrams on pol
python eval.py pol -n 2 -l

# Embedding on pol
python eval.py pol -e -l --embedding dataset/amazon_glove1600.txt

Run the proposed models

'pol' dataset

VADER sentiment analysis scores

python turn-level-sentiment.py pol

Original Readme

Evaluation code for the Self-Annotated Reddit Corpus (SARC).

Dependencies: NLTK, scikit-learn, text_embedding.

To recreate the all-balanced and pol-balanced results in Table 2 of the paper:

download 1600-dimensional Amazon GloVe embeddings (NOTE: 2.4 GB compressed)
set the root directory of the SARC dataset at the top of utils.py
run the following ($EMBEDDING is the file of downloaded GloVe embeddings)

Bag-of-Words on all: python SARC/eval.py main -l --min_count 5
Bag-of-Bigrams on all: python SARC/eval.py main -n 2 -l --min_count 5
Embedding on all: python SARC/eval.py main -e -l --embedding $EMBEDDING
Bag-of-Words on pol: python SARC/eval.py pol -l
Bag-of-Bigrams on pol: python SARC/eval.py pol -n 2 -l
Embedding on pol: python SARC/eval.py pol -e -l --embedding $EMBEDDING

If you find this code useful please cite the following:

@inproceedings{khodak2018corpus,
  title={A Large Self-Annotated Corpus for Sarcasm},
  author={Khodak, Mikhail and Saunshi, Nikunj and Vodrahalli, Kiran},
  booktitle={Proceedings of the Linguistic Resource and Evaluation Conference (LREC)},
  year={2018}
}

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.idea		.idea
text_embedding @ ee26972		text_embedding @ ee26972
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
emotion_extraction.py		emotion_extraction.py
eval.py		eval.py
feature_visualization.py		feature_visualization.py
main_compare.py		main_compare.py
nlp_project_xgboost.py		nlp_project_xgboost.py
random_forest.py		random_forest.py
turn_level_sentiment.py		turn_level_sentiment.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SARC

Project Setup

Run the baseline models

Run the proposed models

Original Readme

About

Releases

Packages

Contributors 4

Languages

License

aldidoanta/SARC

Folders and files

Latest commit

History

Repository files navigation

SARC

Project Setup

Run the baseline models

Run the proposed models

Original Readme

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages