Each attack takes in elements from the data set in DATA_LOC. X-Y elements are considered to be instances of site X and are monitored. Integer elements Z without a hyphen are considered to be non-monitored elements in the open world; each should come from a different site. X, Y should start from 0.
The input files should be well-filtered, i.e. it should not have missing or empty elements. There should be at least 75 cells in each instance.
Attacks read data based on TRAIN_LIST and TEST_LIST. They also need to know if there are any open world elements (OPEN = 0 or 1).
gen-list.py generates those files:
python gen-list.py options-XX
It takes in options-gen-list, which is based on: MODE, CLOSED_SITENUM, CLOSED_INSTNUM, OPEN_INSTNUM, DATA_LOC, DATA_TYPE, OUTPUT_LOC, and FOLD. It generates two files, OUTPUT_LOC + "trainlist" and OUTPUT_LOC + "testlist".
Data is in the following format: Each line is a pair time\tpacketsize. Packetsize is positive if outgoing and negative if incoming In the cell format, |packetsize| = 1.
To run a non-lev-based attack algorithm, do the following:
python attack-tenfold.py attack_list options
The above code does the following for each attackname in attack_list, after cell data is put in INPUT_LOC: - For fold_num from 0 to 9: -- "gen_list.py options" generates train/test lists for fold X. -- "python attackname.py options" calculates accuracy for fold X. - Combine the above results.
A variation of attack-tenfold.py is attack-kNC.py, which uses ten-fold cross validation to increase TNR.
The non-lev-based algorithms are:
cc jac nb mnb timing Pa-FeaturesSVM vngpp kNN Pa-CUMUL Ha-kFP
kNN requires flearner.cpp to be compiled.
Lev-based algorithms are slow, and require pre-processing. Therefore, running each attack is a two-step process (each step triggered manually). The first step generally requires computational power. Pre-processing starts by compiling clLev:
mpiCC clLev.cpp -o clLev
mpirun -n CORE_TOTAL ./clLev options-XXThe lev files go to OUTPUT_LOC of options-XX. clgen_stratify will read them.
OUTPUT_LOC states where these files are to be created:
1, OUTPUT_LOC + ".log" - contains misc details, but final two lines must be time, TPR: x/x and time, FPR: x/x
2. OUTPUT_LOC + ".results" - contains classification of every input file Each line contains (2 + number of classes) tab-delimited numbers. First is the time, second is the ground truth, and the numbers thereafter are the "match" of each class. The highest-scoring match is the assigned class.