In 1996, the US government decided that large numbers of its nuclear weapons would require replacement, refurbishing, or decommissioning. Accordingly, the Department of Energy set up a refurbishment program aimed at extending the service lives of older nuclear weapons. In 2000, the National Nuclear Security Administration (NNSA) specified a life-extension program for W76 warheads that would enable them to remain in service until at least 2040.[2]
It was soon realized that the FOGBANK material was a potential source of problems for the program, as few records of its manufacturing process had been retained when it was originally manufactured in the 1980s, and nearly all staff members who had expertise in its production had either retired or left the agency. The NNSA briefly investigated sourcing a substitute for FOGBANK, but eventually decided that since FOGBANK had been produced previously, they would be able to repeat it.[2] Additionally, "Los Alamos computer simulations at that time were not sophisticated enough to determine conclusively that an alternate material would function as effectively as Fogbank," according to a Los Alamos publication.[3]
Another benefit of this approach is that it allows you to save storage space by eliminating the need to create a copy of the JSON data in the DB's own internal format.
It's genuinely refreshing to have a senator who isn't embarrassed to admit that his newest legislative proposal is inspired by the plot of a Hollywood thriller.
Several years ago I had the idea of using Markov chain algorithms to auto-generate Thomas Friedman articles. I was going to build a site around it called mechanicalfriedman.com. I just discovered someone else beat me to it:
Author here. Believe it or not I originally had the compression ratio graph rotated 90 degrees, and had manually modified it to run from 0.00 to 1.00. Google docs for some god awful reason insists on starting at 0.2 by default. Anyway, when my colleagues reviewed a draft of this post they requested that I rotate the graph back, and in the process I forgot to reset the scale. Sorry for the confusion. It's fixed now. As for the definition of "compression ratio", I looked this up and went with the definition found here: http://en.wikipedia.org/wiki/Data_compression_ratio
Each of the seven queries we used in our benchmark required a sequential scan of the 32GB dataset. It's unlikely that the ARC had any impact on the results since the EC2 instance had only 7GiB of memory.
I wasn't aware that Reiser4 supported compression. Thanks for pointing that out. As for why we chose to use ZFS instead of Btrfs, we feel that ZFS is closer to being in a state where an enterprise customer would be comfortable deploying it in production. This is due to the fact that ZFS has been in development for over a decade with many Solaris sites already using it in production, and Btrfs is still marked as "unstable".
EDIT: I realize you said "near" and "closer" to production ready, but I think it's worth mentioning --
No FUD intended, but I don't consider ZFS on Linux production ready. Wanting to use ZFS, I recently started regularly reading their GitHub issues.
There are deadlocks and un-importable pools in certain situations (hard-links being one: think rsync). I would not want production boxes in the same predicaments experienced by several bug reporters. Moreover, applying debug and hot-fix (hopefully) kernel patches and the associated downtime in production is a no-go for me.
Mind you, the project leads are very responsive and it's making great strides.
In addition, I believe the Linux implementation currently lacks the L2ARC (which can make ZFS really fly, caching to SSDs).
However, I would absolutely run ZFS on Illumos or Solaris; for the stability and article-mentioned compression benefits.
I'm using ZFS with L2ARC and write logs on an SSD on Ubuntu right now. Not sure I'd use it in production yet for the reasons you mention, but for things like my home workstation and office NAS it works great!
While I can't claim that we logged CPU load while running these tests, I can say that I watched the output of top and iotop and that the CPU load was relatively light. It's also worth pointing out that Amazon describes the I/O performance of c1.xlarge instances as "high". We also considered using an hs1.8xlarge "High Storage" instance for these tests, but eventually decided that we were more interested in testing against conventional disks as opposed to SSDs.