I hate to ask but can anyone explain this like I'm from 2000? I guess it's a way...

striking · on Sept 11, 2016

It lets you put data on the wire, in a structured format, right out of memory. Asking the question "how much faster is it" isn't even valid here, because it skips the usual serialization process.

Cap'n Proto generates you some code that contains some data structures. You put data into these structures, and they will automatically be in the right shape to put directly on the wire. And then that data can be pulled right off the wire and right into memory and be fully ready to access, with no intermediate step.

It's, in a sense, infinitely faster than JSON serialization or deserialization. Because it doesn't even perform any serialization. It's just data.

There are some other tricks at play here, but I won't go into them. This is plenty cool.

Matthias247 · on Sept 11, 2016

It's infinitely faster if you have control over the data layout in your application - which most likely means your are developing your application in C/C++, or maybe Rust. In the case of JS you don't have, so accessing and writing the data is slow (since you would need [de]deserialization there to write it in a byte array). In fact any serializations in JS are slower than JSON, since this is natively implemented in the JS VMs while others are not.

kentonv · on Sept 11, 2016

No no, this is a common misunderstanding about Cap'n Proto. It does not take your regular in-memory data structures from your regular programming language (even C++) and put them on the wire. What it does is defines its own specific data layout which happens to be appropriate both for in-memory random-access use and for transmission.

Cap'n Proto generates classes which wrap a byte buffer and give you accessor methods that read the fields straight out of the buffer.

That actually works equally well in C++ and Javascript.

Matthias247 · on Sept 11, 2016

Ok, but these accessor methods will still have a very different performance. In C++ copying a UTF8 string from one byte array into an std::string is super fast. Whereas in JS it's really slow, since you need to read from an ArrayBuffer, convert code points to UTF16 in JS and then store these in a string (which is not efficient, since the strings are immutable). In node you could at least speed that up through some native extension, but even then it would most likely be slower than JSON. In the browser it would be in any case. But that's really a speciality of JS.

In general I then think the difference (for non-C++) between your method and others (protobuf, thrift, ...) is that yours would require the cost of a field serialization in the moment the field is accessed. In others all fields are deserialized at once. But in the end it should have the same cost if I need all fields, e.g. in order to convert the data into a plain Java/Javascript/C#/... object, or am I missing something there? For C++ is absolutely believe that you can have a byte-array backed proxy-object with accessor methods that have the same properties as accessing native C++ structures.

kentonv · on Sept 11, 2016

Even if you access every field, Cap'n Proto's approach should still be faster in theory because:

- Making one pass instead of two is better for the cache. When dealing with messages larger than the CPU cache, memory bandwidth can easily be the program's main bottleneck, at which point using one pass instead of two can actually double your performance.

- Along similar lines, when you parse a protobuf upfront, you have to parse it into some intermediate structure. That intermediate structure takes memory, which adds cache pressure. Cap'n Proto has no intermediate structure.

- Protobuf and many formats like it are branch-heavy. For example, protobuf likes to encode integers as "varints" (variable-width integers), which require a branch on every byte to check if it's the last byte. Also, protobuf is a tag-value stream, which means the parser has to be a switch-in-a-loop, which is a notoriously CPU-unfriendly pattern. Cap'n Proto uses fixed widths and fixed offsets, which means there are very few branches. As a result, an upfront Cap'n Proto parser would be expected to outperform a Protobuf parser. The fact that parsing happens lazily at time of use is a bonus.

All that said, it's true that if you are reading every field of your structure, then Cap'n Proto serialization is more of an incremental improvement, not a paradigm shift.

anentropic · on Sept 12, 2016

I have a similar question re the Python implementation

Is it still possible to realise benefits of the encoding when translating Python objects?

Drdrdrq · on Sept 11, 2016

Well, to be fair, serialization is still performed - but interchange format is cleverly picked out to be as similar as possible to common memory representation.

mrfusion · on Sept 11, 2016

So could django send data to jquery with this? Or what are some simple use cases?

scott_karana · on Sept 11, 2016

I think this is aimed at lower-level problems: RPC between interoperating daemons, connector protocols for databases, that sort of thing.

Lack of serialization is probably useful for JavaScript, but "exact in-memory data formats" probably don't fit well with dynamic-typed languages ;)

lmm · on Sept 11, 2016

It's a lot higher-performance, because you virtually don't have to parse. Remember when MySQL started supporting the memcached protocol because it'd reached the point where form simple pkey lookups it was spending more time parsing an SQL query than actually executing it? It's like doing that for your program.

But for me at least, the real advantage over JSON isn't the performance but the schema compatibility. You have a spec for your data and generate code from that, which means the spec is guaranteed to be correct, and there's clear documentation about what changes to the spec are or aren't forward or backward compatibile. (You get the same thing from the original Protocol Buffers though).