You can get a lot of the benefits of hexagonal grids with triangular grids if you play your cards right. For example, you can allow units on a given triangle to move as if they were on the hexagonal grid that's formed by gluing triangles together at their corners.
I suggest using triangles in pairs, since diamonds form a grid nicely.
5 large strips (with 4 macro-triangles each) can form an icosahedron in a fairly sane way.
But IMO the biggest mistake people make is trying to make everything fit on a single square; multi-tile objects are very useful. And at that point, why not make everything take several tiles?
Abandoning tiles entirely in favor of node adjacency can cut memory a lot but requires more thought.
I don't know if this is the real historical reason, but if you're doing something 2.5D, isometric, at with 2D graphics, at anything other than a 45 degree angle, then anything larger than one square creates clipping problems because part of it should either be behind another sprite on a square whose closest vertex is closer to the camera than the furthest vertex of the forward element. Z-ordering things on the ground between those elements gets even trickier. Making each building (or part of a building) stay within one square is by far the easiest way out of that predicament.
I basically took a square grid and then just randomly displaced each of the vertices a bit to disguise the fact that there is a grid at all. I just wasn't really clever enough to come up with any other way to do deterministic procedural generation.