Profile cover photo
Profile photo
Yin Zhu
185 followers
185 followers
About
Yin's posts

Post has attachment
Improve cache performance: matrix multiplication as an example
It is surprising to see mul1() is 10 times slower than mul2(). Mul2: By using j as the inner loop, C[i][j] & B[k][j] have good cache hits, while A[i][k] is a constant during the inner loop. Mul1: In the inner loop, C[i][j] has a constant address, and A[i][k...
Wait while more posts are being loaded