Website Fingerprinting

When browsing the web, many users would prefer to have privacy. Clients who wish to avoid behavorial marketing, tracking and surveillance could use an anonymizing proxy service such as Tor. Tor, however, is susceptible to website fingerprinting, wherein a local, passive adversary (such as your ISP or those who have access to your ISP's data) can identify a user's behavior according to patterns in their packet sequence.

We have implemented new and old website fingerprinting attacks and defenses, in order to demonstrate its realistic threat and defend against the it. We have five papers describing our research and implementations. DL indicates a direct link and L indicates a link to the entry in the relevant publisher.

T. Wang and I. Goldberg. Improved Website Fingerprinting on Tor (L). WPES 2013.

T. Wang, X. Cai, R. Nithyanand, R. Johnson and I. Goldberg. Effective Attacks and Provable Defenses for Website Fingerprinting (DL). USENIX 2014.

X. Cai, R. Nithyanand, T. Wang, R. Johnson and I. Goldberg. A Systematic Approach to Developing and Evaluating Website Fingerprinting Defenses (L). CCS 2014.

T. Wang and I. Goldberg. Walkie-Talkie: An Effective and Efficient Defense for Website Fingerprinting (DL). Tech report.

T. Wang and I. Goldberg. On Realistically Attacking Tor with Website Fingerprinting (DL). Tech report.


Download

You may need to do some editing to get some defenses to work with different data sets (for example, changing the folder names in the code). Please feel free to e-mail Tao Wang's @uwaterloo.ca e-mail address for any questions (the e-mail account name is t55wang).

Our Work:

Attacks:

We developed and implemented these attacks.

OSAD attack. WPES 2013. As it is a modification of Cai's attack, much of this code is written by Xiang Cai et al.
kNN attack. USENIX 2014.
Code and data for splitting and training set update. Under submission. (141MB)

Defenses:

Tamaraw: Works with TCP traces below. CCS 2014.
Supersequences: Works with cell traces below. USENIX 2014.
Walkie-Talkie Browser: Code that changes Tor Firefox to use half-duplex communication. Tech report.
Walkie-Talkie Padding: Code that adds padding to Walkie-Talkie. Tech report.

Other data:

120 sensitive sites we used for USENIX 2014: List of banned sites from UK, Saudi Arabia, and China. Cut down to 100, with some removed.

Other Work:

Attacks:

These are our implementations of previously published attacks. They all work with the standard format, where each line is a (time \t packetsize) pair.
Some attacks require several files to work; for those, runattackname.sh will run the attack.
Jaccard: Liberatore and Levine. "Inferring the source of encrypted HTTP connections." CCS 2006.
Naive Bayes: Liberatore and Levine. "Inferring the source of encrypted HTTP connections." CCS 2006.
Timing: Shmatikov and Wang. "Timing analysis in low-latency mix networks: Attacks and defenses." ESORICS 2006.
Multinomial Naive Bayes: Herrmann, Wedonlsky and Federrath. "Website fingerprinting: attacking popular privacy enhancing technologies with the multinomial naive-bayes classifier." CCSW 2009.
SVM: Panchenko, Niessen, Zinnen and Engel. "Website fingerprinting in onion routing based anonymization networks." WPES 2011.
VNG++: Dyer, Coull, Ristenpart and Shrimpton. "Peek-a-boo, I still see you: Why efficient traffic analysis countermeasures fail." Oakland 2012.
For your convenience, we have a single zip file of all these attacks.

Defenses:

We implemented these defenses. Some work by default with the traces we used for USENIX 2014 (below).

Traffic morphing (NDSS 2009): Works with TCP traces below.
HTTPOS split (NDSS 2011): Works with TCP traces below.
Decoy pages (WPES 2011): Works with cell traces below.
BuFLO (IEEE S&P 2012): Works with cell traces below.

Traces:

We used these traces for our works; we do not have the traces of other authors. The file name is either a single number, or two numbers separated by a hyphen. In the latter case the first number represents a site, i.e. 39-40 and 39-20 come from the same site, but 39-40 and 59-40 do not.

Cai's traces, converted to cells (WPES 2013): 100 sites, 40 instances each.
Traces (WPES 2013): 100 sites, 40 instances each.
Traces for open world (WPES 2013): 5 sites, 40 instances each, plus 900 open world instances.
Cell traces (USENIX 2014): 100 sites, 90 instances each, plus 9000 open world instances.
TCP traces (USENIX 2014): Lost when my hard disk burned out. Sorry.
Cell traces : Gathered under the walkie-talkie defense above.

Example:

Suppose you wanted to test our kNN's performance against decoy pages on Tor. Download "Our kNN attack" under Attacks, "Decoy pages" under Defenses, and "Cell traces" under Data. Modify the cell traces with the defense (pdef.py), and then use the feature extractor (fextractor.py) in our kNN attack to generate feature files for the data. Finally, use our kNN (flearner.cpp) and get the accuracy. It will require some editing to get all of those files to work together with each other (e.g. edit the number of instances used, folder names).


Change Log

27th June 2015: Updated with one more paper.
24th June 2015: Updated with new paper.
3rd March 2015: Uploaded implementation of other researchers' attacks.
1st March 2015: Uploaded splitting algorithm.
11th July 2014: Updated Tamaraw
31st May 2014: Created this site and uploaded data.