If you start naively without any library that avoids the problem then memory access is the problem. Have a look at how much effort is needed to avoid the problem, for example with blocking algorithms.
I've run into a problem with my project with both the speed and accuracy with a home brew matrix class using doubles. I'm inverting a matrix and multiplying it by a vector and transposed vector. When ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results