Monday, March 11, 2013

Vaurien: The Chaos Proxy

I had a need to test out a program's behavior when its backend web server returned errors. Unit testing would have been difficult in this case, because I specifically wanted to test the program's macro result when encountering, say, 10% server response errors.

After searching around, I came across Vaurien, the Chaos TCP Proxy. It's seriously cool. As its name suggests, it's a TCP proxy server that you can route a local connection through to a backend server. In TCP mode, it can delay packets, insert bad data, or drop packets, testing the socks (pun intended) off your application.

It also supports the application-level protocols: HTTP, Memcache, MySQL, Redis, and SMTP, and its architecture is very modular, so it's easy to plug in new protocols.

For example, here's how I can run it in HTTP mode:
$ vaurien --protocol http --proxy 127.0.0.1:8888 \
          --backend www.google.com:80 --behavior 50:error
This says to run an HTTP proxy that connects to google.com on the backend and return a 5xx HTTP error code 50% of the time. Testing it out with curl:
$ curl --head -H "Host: www.google.com" http://127.0.0.1:8888/
HTTP/1.1 200 OK
...

$ curl --head -H "Host: www.google.com" http://127.0.0.1:8888/
HTTP/1.1 500 Internal Server Error
Content-Type: text/html; charset=UTF-8

$ curl --head -H "Host: www.google.com" http://127.0.0.1:8888/
HTTP/1.1 502 Bad Gateway
Content-Type: text/html; charset=UTF-8

Filed this away as a super useful tool.

Thursday, January 17, 2013

Git Branches by Date

I tend to create a huge number of branches in my git repositories, but I have a bad habit of not cleaning them up once I'm finished with them.

I found a nice command from an answer on StackOverflow that allows you to sort branches by date. I modified it slightly to also show the date when printing the branch information:

$ git for-each-ref --sort=-committerdate refs/heads/ --format='%(committerdate) %09 %(refname:short)'
Mon Jan 14 23:46:15 2013 +0000   keyfile-seek-rebase
Sat Jan 12 02:09:04 2013 +0000   perfdiag-mbit-fix
Fri Jan 11 17:38:13 2013 -0800   keyfile-seek
Thu Jan 10 01:05:43 2013 +0000   master

You can also set the date format to be relative (or other possibilities, see man git-for-each-ref):

$ git for-each-ref --sort=-committerdate refs/heads/ --format='%(committerdate:relative) %09 %(refname:short)'
3 days ago   keyfile-seek-rebase
6 days ago   perfdiag-mbit-fix
6 days ago   keyfile-seek
8 days ago   master

I added it to my .gitconfig file as an alias:
[alias]
    branchdates = for-each-ref --sort=-committerdate refs/heads/ --format='%(committerdate:relative) %09 %(refname:short)'

That allows me to just type git branchdates and get a nice listing of my local branches by date.

Monday, January 14, 2013

Google Cloud Storage Signed URLs

Google Cloud Storage has a feature called Signed URLs, which allows you to use your private key file to authorize access to a specific operation to a third-party.

Putting all the bits together to create a properly signed URL can be a bit tricky, so I wrote a Python example that we just open-sourced in a repository called storage-signedurls-python. It demonstrates signing a PUT, GET, and DELETE request to Cloud Storage.

The example uses the awesome requests module for its HTTP operations and PyCrypto for its RSA signing methods.

Monday, December 17, 2012

Google Cloud Storage JSON Example

I recently started working at Google on the Google Cloud Storage platform.

I wanted to try out the experimental Cloud Storage JSON API, and I was extremely excited to see that they support CORS for all of their methods, and you can even enable CORS on your own buckets and files.

To show the power of Cloud Storage + CORS, I released an example on GitHub that uses HTML, JavaScript and CSS to display the contents of a bucket.

The demo uses Bootstrap, jQuery, jsTree, and only about 200 lines of JavaScript that I wrote myself. The project is called Metabucket, because it's very "meta" - the demo is hosted in the same bucket that it displays!

Thursday, September 13, 2012

Bash Completion for Mac OS X

I've been using my Mac more often lately, and I realized I was really missing the fancy bash completion features that Ubuntu has. Turns out, it's pretty easy to enable on Mac with Homebrew:

brew install bash-completion

Then, as usual, homebrew tells you exactly what you have to do. Just add this to .bashrc:

if [ -f $(brew --prefix)/etc/bash_completion ]; then
  . $(brew --prefix)/etc/bash_completion
fi

Now you get fancy completion. Here's me hitting tab twice with a git command:

$ git checkout rdiankov
rdiankov-master          rdiankov/animation       rdiankov/animation2
rdiankov/collada15spec   rdiankov/master          rdiankov/pypy

Awesome.

Sunday, September 02, 2012

Comparing Image Comparison Algorithms

As part of my research into optimizing 3D content delivery for dynamic virtual worlds, I needed to compare two screenshots of a rendering of a scene and come up with an objective measure for how different the images are. For example, here are two images that I want to compare:


As you can clearly see, the image on the left is missing some objects, and the texture for the terrain is a lower resolution. A human could look at this and notice the difference, but I needed a numerical value to measure how different the two images are.

This is a problem that has been well-studied in the vision and graphics research communities. The most common algorithm is the Structural Similarity Image Metric (SSIM), outlined in the 2004 paper Image Quality Assessment: From Error Visibility to Structural Similarity by Wang, et al. The most common previous methods were PNSR and RMSE, but those metrics didn't take into account the perceptual difference between the two images. Another metric that takes into account perception is the 2001 paper Spatiotemporal Sensitivity and Visual Attention for Ef´Čücient Rendering of Dynamic Environments by Yee, et al.

I wanted to compare a few of these metrics to try and see how they differ. Zhou Wang from the 2004 SSIM paper maintains a nice site with some information about SSIM and comparisons to other metrics. I decided to start by taking the images of Einstein he created, labeled as Meanshift, Contrast, Impulse, Blur, and JPG:


The comparison programs I chose were pyssim for SSIM, perceptualdiff for the 2001 paper, and the compare command from ImageMagick for RMSE and PNSR. The results I came up with for those five images of Einstein:


As you can see, PNSR and RMSE are pretty much useless at comparing these images. It treats them all equally. Note that the SSIM values are actually inverted so that lower values mean a lower error rate for both the SSIM and perceptualdiff graphs. The SSIM and perceptualdiff metrics seem to agree on Meanshift and JPG, but have a reverse ordering of the error rates for Contrast, Impulse, and Blur.

To get a better idea of how these algorithms would work for my dataset, I took an example run of my scene rendering. This is a series of screenshots over time, with both the ground truth, correct image, and an image I want to compare to it. I plotted the error values for all of the four metrics over time:


Here we see a different story. RMSE and SSIM seem to be almost identical (with respect to their scale) and PNSR (inverted) and perceptualdiff also show similar shapes. All of the metrics seem to compare about equally in this case. You can find the code for generating these two graphs in my repository image-comparison-comparison on GitHub.

I think the moral of the story is make sure you explore multiple options for your dataset. Your domain might be different from other areas where the relevant algorithms were applied. I actually ended up using perceptualdiff because it gives a nice, intuitive output value: the number of pixels perceptually different in the two images, and because it supports MPI, so it runs really fast on machines with 16 cores.

Wednesday, August 22, 2012

How I Write LaTeX

I've been really happy with using the TeXlipse plugin for Eclipse to write my LaTeX documents. There are some really awesome features, so I thought I'd highlight some:

LaTeX Context

TeXlipse knows all the LaTeX commands, so you get auto-completion when you start typing:

Document Context

TeXlipse keeps an index of labels and references, so if you try and reference or cite something that doesn't exist, you get a warning:

It even has auto-completion for these labels too:

Bibliography Folding

The editor is full of little things that make your life easier. For example, you can configure bibliography files to be folded by default when you open them. It makes it much easier to navigate the document:

Spell Checking

TeXclipse can use the aspell command, or you can download a dictionary file and get instant feedback as you type:

Continuous Compiling

TeXclipse runs pdflatex (or you can use latex+dvipdf, pslatex+ps2pdf, etc.) in the background after you save a file. It knows how many times to invoke the compiler so the document gets built properly. If you have the PDF document open, most viewers will automatically reload the file when it gets changed, so you get instant feedback on how the final document looks. It also has the option of using Pdf4Eclipse, a PDF viewer that runs inside Eclipse. What's really nice about this is that it uses SyncTeX to connect the PDF with the original source files. You can double click on something in the PDF and it jumps directly to the source that generated that part of the document. It makes it really easy to make quick edits while reading the PDF.

Makefile

Since I usually collaborate with other people that don't use Eclipse, and in case I want to build the project outside of Eclipse, I also use latex-makefile project. This Makefile is amazing. You don't have to configure anything. You just drop it into the source directory and everything magically works. I can't recommend this enough.