That's like saying a bicycle is faster than a car because I can buy a bike in 10...

TallGuyShort · on Feb 20, 2013

If you do enough queries, you should spend the time to use RCFile for Hive, in which case redshift wont come out _that_ much faster. The point is the 17 hours is not negligible.

dromidas · on Feb 20, 2013

That is a good case since customers who typically need a datawarehouse aren't just going to upload data once... they probably are going to upload frequently.

TallGuyShort · on Feb 20, 2013

You're missing my point and resorting to sarcasm - very nice </sarcasm>. My point is not that Hive is the better choice because everyone is going to reload their data frequently. My point is that if you want a fair benchmark, don't use an obviously slow data format for Hive. They spent time importing data optimized for RedShift, but they took a very naive approach for Hive. I'm sure RedShift will still be faster, but not 10 times faster.

seanmcdirmid · on Feb 20, 2013

That's assuming Hive doesn't have it's own special format that could be converted to to improve performance, right?