Since TensorFlow was dubbed as slower than most (http://arxiv.org/abs/1511.06435...

mrry · on April 13, 2016

We've made good progress on single-machine performance and memory usage in the latest release, especially for convolutional models (such as adding support for NCHW data layouts), and we'll continue to aim for best-in-class performance in that setting.

The cool thing about distributed TensorFlow is that it supports efficient synchronous optimizers, so you can scale up the effective batch size by using multiple GPUs, to get increased throughput without losing accuracy.

dgacmu · on April 13, 2016

That study's way out of date - it benchmarked the CuDNNv2 version. Soumith's convnet-benchmarks is much more up-to-date: https://github.com/soumith/convnet-benchmarks

but it hasn't yet been updated to reflect the latest performance improvements in 0.8. We've continued to push on both single-machine and distributed performance, and the next update to soumith's benchmarks should continue to show that improvement.

therobot24 · on April 13, 2016

>> That study's way out of date

I don't know about 'way' out of date, it was first published just a few months ago (November) and the authors pushed a revised version just a few weeks ago (March 30th), but i definitely agree that it's not using the most current implementations

>> Soumith's convnet-benchmarks is much more up-to-date

I'll definitely check these out, thanks for the link

vrv · on April 13, 2016

And even those numbers on the front page are out of date :) (we're even faster now: https://github.com/soumith/convnet-benchmarks/pull/96, which is from a few weeks ago.)

The field is moving quickly enough that many published benchmarks are stale within 3 months, and it's a lot of hard work to maintain up to date benchmarks, given how many frameworks there are. Also there are performance/memory/scalability/flexibility tradeoffs everywhere, so it's hard to capture everything in one number without a tremendous number of caveats.

dgacmu · on April 13, 2016

vrv addressed why I called it "way" out of date - in the time since the study was done with CuDNNv2, we've moved TensorFlow to CuDNN v4, and NVidia released the CuDNNv5 release candidate a week ago. Each of those releases provides a pretty big speed bump for specific types of DNNs, and we've been pushing out some very significant speed bumps for TensorFlow at the same time.

My conclusion from this is that Soumith's approach to having a living repository is the way to go. It's harder to call it a "publication", but it's providing something of more lasting value than a static performance snapshot in a field where the engineering is moving so quickly.