This repository contains resources of our paper:
-
Download and unzip dataset from Google Drive
-
Run
python evaluation/eval_accuracy.py \
--detector hc3 \
--tests ./output/hc3/**/*.jsonl \
--output_file /tmp/hc3_evaluation.csv
-
Distill sample labels from your target victim detector, train a surrogate model with
train_detector.py
-
Follow
attack.multi_flint_attack
to start multi-process attacking
If you find our paper/resources useful, please cite:
@inproceedings{Zhou2024_COLING,
author = {Ying Zhou and
Ben He and
Le Sun},
title = {Humanizing Machine-Generated Content: Evading AI-Text Detection through Adversarial Attack},
booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation.},
year = {2024},
}