Anything involving pivoting for stability isn’t trivial to parallelize. Plain ol...

Anything involving pivoting for stability isn’t trivial to parallelize. Plain old Gaussian elimination on huge sparse matrices doesn’t scale in parallel very well.

At the other end of the spectrum, getting a matrix-matrix multiply to run fast isn’t easy either. It’s what necessitated the kind of approach the authors of BLIS adopted. On paper it’s easy, but actually getting it to run fast on a computer isn’t.