What's very interesting is that in most cases with either modern PCIe-connection based SSDs or multiple SSD drives, database performance has become CPU limited once again in operations that require multiple transactions. These chips surely help with the higher processor counts.
Wouldn't SSDs - and even better system-bus-connected storage contraptions - also drive the use of different algorithms? I can see how merely cutting the disk latencies by two (or more) orders of magnitude leaves you starved for computational resources, but you don't have to do many things at that point. Data packing is different (could be even CPU matched now?), access patterns can be more random, etc.
That's an interesting question, but I expect many of the algorithms will stay the same. Think of a computer as having a multi-level cache: network -> disk -> ram -> L2 cache -> L1 cache -> register, each one smaller than the last. We're seeing a big change in disk latency, but that's only one part of the chain. For example, we're still going to be optimizing data structures to fit in L1 cache lines.
Interestingly, that may be the new ordering if the disks are SSDs, but the typical seek latency on a spinning disk (~5 ms) is definitely higher than the latency to read data from another machine's memory across ethernet (a few hundred us), and even the bandwidths are comparable (~150 MB/s).
So, now it has jumped from (disk -> network -> memory -> ...) to (network -> disk -> memory -> ...), which is a big change.
It's more and more like network/disk -> L3 cache -> L2 cache. DRAM is pretty slow.
Because PCIe controller is anyways on the same chip as L3 cache, there's no reason to send the data on a long trip to DRAM and back. Until, of course, when the cache line gets evicted for reason or another.
Having said that, I do wonder to what degree various database engines are aware of the underlying disk platforms.
I've definitely noticed that when optimizing a query and deciding between a high number of seeks vs. a table scan, older versions of MSSQL will tend to be pessimistic of drive latencies and just go with the full scan (potentially incorrectly / prematurely). In an uncached scenario on an SSD, this is probably sub-optimal. My guess would be that instead of looking at actual seek latency, the optimizer was using reasonable guesses for spinning disks. I'm guessing newer versions are more SSD aware though.
They all seem to be an extremely incremental (to be kind) iteration on what was already out there as far as price goes. I have no idea why there are so many people that seem excited by them. The chips might be good, but the prices negate any of the gains.
There is no way Intel would be charging over $4000.00 for a chip if they had any competition in that space.
Cool, might finally see an update to Apple's Mac Pro - it's been waiting on the E5 for quite a while considering the current Mac Pro ships with Intel's Ivy Bridge architecture!
One of those announcements that makes me stop everything and reminisce... This is simply incredible! I could actually run 100 production Windows servers on this with my Data-center license... the ROI is insane... thank you Intel!
Ah I see - good point, and it makes sense. I actually prefer they keep pushing up pricing on the Server and SQL solutions so that people like me will finally start seriously considering opensource solutions.
I work for a Microsoft partner and I am involved with their sales processes. The discussion is never around competing with open source. It's not on their radar. The majority of the discussion is about replacing Oracle, which is still dramatically more expensive than SQL Server Enterprise Edition.
Maybe that's true for list price, but it's definitely not true for my organization for negotiated price. We have ELAs in place for both Oracle and Microsoft and the upshot of it all is Oracle RDBMS is significantly cheaper for us than SQL Server. Enterprise licensing and contract negotiation is definitely a case of where YMMV.
I'm not intimate with Oracle pricing, but Microsoft doesn't want to sell just the RDBMS, but the entire SQL Server ecosystem. The enterprise license includes all of SSAS, SSRS, SSIS, integrated R in the DB, integrated SQL over big data sources, and a whole bunch of other stuff. My understanding is that the fully loaded cost of the entire data pipeline from source to business insight is where the TCO argument confess from.
I'm not trying to shill and hope I don't come across that way. I just want to share a semi-insider perspective to help others understand where MS is coming from. Do with that understanding what you will. I don't have a horse on this race.
We used the E5-2620 series since v1 and now it gets two additional cores for the same price.
However when it comes to CPU's the price was really good for a while now, hopefully the prices for SAS drives will soon be as good as CPU's aswell.
I mean computing power is propably cheaper than storage at the moment.
It depends on your DB workload ;)
Actually we sell the 2620 for shared nothing architectures to all kinds of small to midsize customers.
If you don't need to run fast queries against your analytics or keep logs for something like 20 years you really won't get too much data. Especially when the only thing you index are the content of business documents.
Throw that kit in a SuperMicro MicroBlade Chassis with Dual Node Blades, kerplow. 56 nodes with 64G RAM, 4TB SSD, and 8 cores per node for ~140W/node. Banging.
The rated clock speed is what you're guaranteed to get when you run all cores. If you run a subset of cores, you'll get significantly more speed via turbo.
It would be a more powerful tool if it was multi-threaded. I'm not an expert, but my impression that it was able to be more powerful in other ways (stability, features) because of the choice to make it mostly single threaded (given labor and complexity constraints).
Is there an alternative RAM database that you like better that is multi-threaded?
As much as Redis is an incredibly potent tool and the quality of craftsmanship on it is very high, there are some incredibly peculiar design decisions that have been made.
Single-treading is one of those. There are times when having more than one thread to help process things would come in very handy, though I recognize that the cost of adding this can be very high.
It's something that will have to be addressed eventually for a single Redis process to take advantage of newer hardware with very low ceilings on CPU power, but huge numbers of cores.
That seems unrealistic. Single core performance still matters for algorithms that can't be efficiently parallelised. Moreover, writing efficient, parallel versions of a lot of algorithms is hard, and often introduces significant overheads of its own that must be outweighed by the better scalability that the parallelisation brings.
> Single core performance still matters for algorithms that can't be efficiently parallelised.
Any real world examples? Especially considering that at this point, sacrificing cores to boost the remaining ones seems to be a really bad deal with current silicon. Core power requirements appear to decrease faster than their actual computational speed does if you go low-power. Even if you lose 40% of performance due to overhead, if the same-TDP CPU package is twice as fast with more cores, you still win. (And who's to say that your implementation can't be improved in the future?)
I honestly don't know how to answer that. Are you suggesting that you know how to parallelise an arbitrary expensive algorithm? Because if you've beaten Amdahl's law, a lot of people would like to make you very, very rich.
Sounds like an ambiguous question. Most algorithms are "arbitrarily expensive". It generally depends on some measure of the data you're putting in. But in case you mean "an arbitrary algorithm", then no, nobody knows how to do that. But it appears that the most useful things people actually want to do lie somewhere in the middle: not trivial to parallelize but also not exactly impossible.
Just to be clear, I meant what I wrote: an arbitrary expensive algorithm, i.e., solving the general case.
As for "most useful things people actually want to do", it seems to me that a lot of relatively computationally expensive software still isn't using lots of cores where they're available in practice today.
One significant example is computer games. Since the advent of GPUs with effectively hundreds or thousands of parallel computations available, rendering hasn't been the bottleneck it once was. Today the bottleneck might instead be the game control logic that runs on a CPU, and is often still either single-core or divided among at most a small, fixed number of cores doing different tasks.
Another common real world example is graphics and image processing software. You'd think there might be a lot of natural data parallelism to exploit, but software in this area has made relatively little use of algorithms that scale to arbitrary numbers of cores so far.
A third example would be real-time processing, say operations on high speed network traffic. In this case you can sometimes dispatch different packets to different cores to process them in parallel, but the amount of processing you can do on any given packet might well be limited by the speed of a single core, because the overheads for cache misses or inter-core communications are prohibitive. If your processing needs to consider more than one packet at once, so you can't just spray packets at different cores as they arrive, then this can become a very significant real world bottleneck.
This isn't to say that none of these problems will ever be solved as we develop more understanding and better tools, but even in 2016 the state of the art is far from using as many cores as we have available efficiently for a lot of real world use cases. Manual parallelisation often has architecture-level implications and few development teams have the experience and foresight to get it right consistently with today's programming tools. Automatic optimisation to exploit data parallelism is an interesting research field but still in its infancy, and many mainstream programming languages have far from ideal semantics for such optimisations because of aliasing issues and the like. Either or probably both of these areas will have to advance considerably before we can assume that scaling out into more cores is generally going to give better performance than scaling up with faster CPUs and related hardware architecture.
This article give higher clock speeds than the chart on Ars Technica does. http://www.anandtech.com/show/10158/the-intel-xeon-e5-v4-rev...The Xeon E5-2699 v4 can run at 2.8 GHz with all cores busy, and is capable of boosting up to 3.6 GHz. But the one on Ars says the E5-2699 v4 is only 2.2 GHz.
Clock speeds depend on many things, like thermal limits, power draw, what instructions are used, how many cores are active, whether the integrated GPU is running (for OpenCL) if it's enabled, …
2.2 GHz is guaranteed – without AVX, with you only get 1.8. 2.8 GHz is possible, assuming there are thermal reserves, power consumption is not hitting a limit, etc.
Ok, so all cores could run at 2.6 GHz momentarily but it would probably start thermal throttling pretty fast. And maybe not at all if the cores were using power-intensive units.
clock speed is the last thing you should watch on a processor.
There are some other factors which boosts the performance incredible.
Like the L1-L3 Caches
And on servers you prolly watch out the TDP, too.
Also clock-speed doesn't mean a processor is slower, there are processors with a slower clock speed and still have a higher IPS.
Understood that Broadwell is "tick" on Haswell's "tock", I was questioning what the previous poster meant by the EP uncore. I thought all Xexon 2600s were "efficient performance(EP.)"
This is useful for me when comparing ranges of cores, power, price, and maybe it will be for you too.
I'll filter to my range of options, then make decisions on $/W, $/Ghz, etc..
https://docs.google.com/spreadsheets/d/1PcjgdtSV-2JLJXDpktjg...