8000 Updated README · basedrhys/obfuscated-code2vec@f5e1f24 · GitHub
[go: up one dir, main page]

Skip to content

Commit f5e1f24

Browse files
committed
Updated README
1 parent 0e27701 commit f5e1f24

File tree

4 files changed

+22
-2
lines changed

4 files changed

+22
-2
lines changed

README.md

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,28 @@
11
# Obsucated code2vec: Improving Generalisation by Hiding Information
22

3-
## Instructions
4-
To
3+
![Overall project view](img/overall.png)
4+
5+
Code for the paper: *Obsucated code2vec: Improving Generalisation by Hiding Information*
6+
7+
This repository contains code for the dataset pipeline, as well as the obfuscation tool used for obfuscating the datasets.
58

69
All of the model-related code (`common.py`, `model.py`, `PathContextReader.py`) as well as the `JavaExtractor` folder is code from the original [code2vec repository](https://github.com/tech-srl/code2vec). This was used for invoking the trained code2vec models to create method embeddings.
710

811
All models/datasets are on the paper google drive folder
912
https://drive.google.com/drive/u/1/folders/1CXgSXKf292BTlryASui2kBvYvJSvFnWN
1013

14+
## Usage - Dataset Pipeline
15+
16+
![Dataset Pipeline View](img/pipeline.png)
17+
18+
To run the dataset pipeline and create class-level embeddings for a dataset of Java files:
19+
1. Download a `.java` dataset (from the datasets supplied or your own) and put in the `java_files/` directory
20+
2. Download a code2vec model checkpoint and put the checkpoint folder in the `models/` directory
21+
3. Change the paths and definitions in `model_defs.py` and number of models in `create_datasets.sh` to match your setup
22+
4. Run `create_datasets.sh`. This will loop through each model and create class-level embeddings for the supplied datasets. The resulting datasets will be in `.arff` format in the `weka_files/` folder
23+
24+
### Config
25+
By default the pipeline will use the full range of values for each parameter, which creates a huge number of resulting `.arff` datasets (>1000). To reduce the number of these, remove (or comment out) some of the items in the arrays in `reduction_methods.py` and `selection_methods.py` (at the end of the file). Our experiments showed that the `SelectAll` selection method and `NoReduction` reduction method performed best in most cases so you may want to keep only these.
1126

1227
## Datasets
1328

img/overall.png

111 KB
Loading

img/pipeline.png

129 KB
Loading

model_defs.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,4 +9,9 @@
99
'name': 'random',
1010
'args': "-r"
1111
},
12+
{
13+
'location': 'models/type-obfuscated/saved_model_iter2',
14+
'name': 'type_obfuscated',
15+
'args': "-o"
16+
},
1217
]

0 commit comments

Comments
 (0)
0