Perhaps we need to start talking with actual code? Here's just a simple idea that's in my brain right now.
template <class Data>
struct HashNode{
// Probably should be shared_ptr<HashNode<Data>>, but that's
// now adding even more inefficiencies like a ref_count per element
struct HashNode<Data>* nextPointerChain;
Data d; // Maybe Data* d if you're sharing it with other data-structures?
};
template <class Data, int size>
struct ChainHashTable{
HashNode<Data> theTable[size];
};
template <class Data, int size>
struct LinearProbingHashTable{
Data theTable[size];
vector<bool> occupied; // set to "size", 0 means empty and 1 means occupied
};
nextPointerChain takes up 8 bytes, even if its nullptr. No other data-structure needs to have the nextPointerChain. If we use shared_ptr<HashNode<Data>> as per typical modern C++, there's even more inefficiencies that I forgot about. EDIT: You probably can get away with unique_ptr, now that I think of it.
----------
Pointers aren't free btw. If you're sharing the pointer with many different parts of your program, that's definitely one of those things you'll want to start getting rid of. Not only does a pointer cost 8 bytes, but its also _SIGNIFICANTLY_ slower and cache-unfriendly in practice.
L1 cache works by reading data in-and-around anything you access. If you're only using 8-bytes of a cache line for pointer-indirection (8/64), you're wasting 56-bytes of your fetch.
Linear Probing is extremely fast because it tends to blitz through L1 cache thanks to locality of data. Simple Linear Probing (or Robin-hood augmented probing) is probably the fastest, and simplest, approach for modern CPUs.
You still fail to understand what being part of multiple data structures means.
Shared pointers, separate container to check occupancy, suggesting making a container of pointers, all these things suggest you have no idea how to do hybrid data structures.
The nodes contain the data along with multiple pointers in them to chain them to various data structures they are part of (trees, lists, hash tables etc.). The object in question contains state that is several cache lines long.
You cannot just put the nodes inside your buckets as you don't get to control where the nodes are and cannot invalidate the other data structures. Though I suppose it's a possibility if you know ahead of time how many nodes you're going to need, but then in that case you may be able to go for perfect hashing to begin with (unless you're building some kind of fixed-size cache).
A given object for example can be part of several hash maps, based on the various pieces of state it is useful to index by.
In practice you'd use a pool to allocate all your nodes in blocks. There is no per-node overhead for this allocation strategy.
Note that linearProbingHashTable only uses 8 bytes per pointer. While ChainHashPointer still needs nextPtr, taking up 16 bytes.
The *2 is for 50% load factor on linear table. But we still win on linear if we're at say 75% (simple linear) or 95% load factor (which is doable with Robin Hood)
Agreed - I do mention in the post that open addressing is impractical when using intrusive linked lists, which are common in low level data structures.