It's a borderline thing. The official Commodore REU only supported 8x the RAM. But you could modify it yourself to 32x. Creative Micro Design also had the third party 1750 REU which supported 32x RAM. (2 megabytes.)
So it is somewhat period accurate, albeit very expensive at the time.
I might be mistaken, but I think this is partly because of the undirected structure of RBMs, so you can't build a computational graph in the same way as with feed-forward networks.
By "undirected structure" I assume you refer to the presence of cycles in the graph? I was taught to call such networks "recurrent" but it seems that that term has evolved to mean something slightly different. Anyway yeah, because of the cycles Gibbs sampling is key to the network's operation. One still employs gradient descent during training, but the procedure to calculate the gradient itself involves Gibbs sampling.
Edit: Actually was talking about the General Boltzmann Machine. For the Restricted Boltzmann Machine an approximation has been assumed which obviates the need for full Gibbs sampling during training. Then (quoting the article, emphasis mine) "after training, it can sample new data from the learned distribution using Gibbs sampling."
This doesn't make motion capture obsolete: 1) Mocap can be applied to rigged characters and 2) mocap can animate full-body rigs not just facial expressions.