clgen_stratify.cpp flearner.cpp


Data set

Each attack takes in elements from the data set in DATA_LOC. X-Y elements are considered to be instances of site X and are monitored. Integer elements Z without a hyphen are considered to be non-monitored elements in the open world; each should come from a different site. X, Y should start from 0.

The input files should be well-filtered, i.e. it should not have missing or empty elements. There should be at least 75 cells in each instance.

Attacks read data based on TRAIN_LIST and TEST_LIST. They also need to know if there are any open world elements (OPEN = 0 or 1).

Generating training/testing set generates those files:

python options-XX

It takes in options-gen-list, which is based on: MODE, CLOSED_SITENUM, CLOSED_INSTNUM, OPEN_INSTNUM, DATA_LOC, DATA_TYPE, OUTPUT_LOC, and FOLD. It generates two files, OUTPUT_LOC + "trainlist" and OUTPUT_LOC + "testlist".

Data is in the following format: Each line is a pair time\tpacketsize. Packetsize is positive if outgoing and negative if incoming In the cell format, |packetsize| = 1.

Non-lev-based attacks

To run a non-lev-based attack algorithm, do the following:

python attack_list options

The above code does the following for each attackname in attack_list, after cell data is put in INPUT_LOC: - For fold_num from 0 to 9: -- " options" generates train/test lists for fold X. -- "python options" calculates accuracy for fold X. - Combine the above results.

A variation of is, which uses ten-fold cross validation to increase TNR.

The non-lev-based algorithms are:

cc jac nb mnb timing Pa-FeaturesSVM vngpp kNN Pa-CUMUL Ha-kFP

kNN requires flearner.cpp to be compiled.

Lev-based attacks

Lev-based algorithms are slow, and require pre-processing. Therefore, running each attack is a two-step process (each step triggered manually). The first step generally requires computational power. Pre-processing starts by compiling clLev:

mpiCC clLev.cpp -o clLev

To run:

mpirun -n CORE_TOTAL ./clLev options-XX

The lev files go to OUTPUT_LOC of options-XX. clgen_stratify will read them.


OUTPUT_LOC states where these files are to be created:

1, OUTPUT_LOC + ".log" - contains misc details, but final two lines must be time, TPR: x/x and time, FPR: x/x

2. OUTPUT_LOC + ".results" - contains classification of every input file Each line contains (2 + number of classes) tab-delimited numbers. First is the time, second is the ground truth, and the numbers thereafter are the "match" of each class. The highest-scoring match is the assigned class.