Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's like saying a bicycle is faster than a car because I can buy a bike in 10 minutes while it may take me a couple of hours to get through the car's paperwork.

If you do enough queries, redshift will come out faster (assuming the numbers are correct).



If you do enough queries, you should spend the time to use RCFile for Hive, in which case redshift wont come out _that_ much faster. The point is the 17 hours is not negligible.


That is a good case since customers who typically need a datawarehouse aren't just going to upload data once... they probably are going to upload frequently.


You're missing my point and resorting to sarcasm - very nice </sarcasm>. My point is not that Hive is the better choice because everyone is going to reload their data frequently. My point is that if you want a fair benchmark, don't use an obviously slow data format for Hive. They spent time importing data optimized for RedShift, but they took a very naive approach for Hive. I'm sure RedShift will still be faster, but not 10 times faster.


That's assuming Hive doesn't have it's own special format that could be converted to to improve performance, right?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: