> I do not see the selling point of BLIS-specific APIs
BLAS/CBLAS assumes your matrices are column-major/row-major. BLIS uses general stride instead (both columns and rows are parameterized by stride) which makes it more flexible on matrix storage and makes it easier to implement tensor contractions in higher dimensions.
BLAS/CBLAS assumes your matrices are column-major/row-major. BLIS uses general stride instead (both columns and rows are parameterized by stride) which makes it more flexible on matrix storage and makes it easier to implement tensor contractions in higher dimensions.
This necessitates a different API.