Small Matrix Multiplication Shootout








Here I’ll compare the speed of small matrix multiplication of a few different popular libraries with PaddedMatrices.jl:

I will benchmark the operation $\textbf{C} = \textbf{A} \times \textbf{B}$, where $\textbf{C}\in\mathbb{R}^{M\times N}$, $\textbf{A}\in\mathbb{R}^{M\times K}$, and $\textbf{B}\in\mathbb{R}^{K\times N}$.

I’ll consider every combination of $M\in(3,\ldots,32)$ and $N\in(3,\ldots,32)$ with $K=32$, and use a column major data layout. When the matrix sizes are small enough to avoid memory bandwidth problems, as is the case here, $K$ should effect runtime in a perfectly linear fashion because it should not effect the shape of the kernel, vectorization, or possible register spills, because the outer loop should be over $K$.
See here for an introduction on matmul kernels.

We’re testing 900 fucntions total.
So that all matrix sizes are known at compile time, and these templated C++ libraries can take advantage of this information to optimize the operations, I generate the C++ files programatically:

In [1]:
Mrange = 3:32; Krange = 32; Nrange = 3:32;

function create_eigen_file_string(mrange, krange, nrange)
    base_file = """
#include <Eigen/Dense>

using namespace Eigen;

extern "C" {
"""

    for m  mrange, k  krange, n  nrange
        base_file *= """
\nvoid mul_$(m)x$(k)times$(k)x$(n)(
        Matrix<double,$(m),$(n)> &C,
        Matrix<double,$(m),$(k)> &A,
        Matrix<double,$(k),$(n)> &B){
    C = A * B;
}\n
        """

    end
    base_file * "\n}"
end

function create_blaze_file_string(mrange, krange, nrange)
    base_file = """
#include <blaze/Math.h>
#include <blaze/math/StaticMatrix.h>

using blaze::StaticMatrix;

extern "C" {
"""

    for m  mrange, k  krange, n  nrange
        base_file *= """
\nvoid mul_$(m)x$(k)times$(k)x$(n)(
        StaticMatrix<double,$(m)UL,$(n)UL,blaze::columnMajor> &C,
        StaticMatrix<double,$(m)UL,$(k)UL,blaze::columnMajor> &A,
        StaticMatrix<double,$(k)UL,$(n)UL,blaze::columnMajor> &B){
    C = A * B;
}\n
        """

    end
    base_file * "\n}"
end
Out[1]:
create_blaze_file_string (generic function with 1 method)

Same deal with the Fortran code, although, although I don’t know any of the implementation details for their matmul function. At the very least, the compiler can use the information to unroll loops:

In [2]:
function create_fortran_file_string(mrange, krange, nrange)
    base_file = """
    module fixed_size_matmul
    
    use ISO_C_BINDING
    implicit none
    
    contains
    """
    for m  mrange, k  krange, n  nrange
        base_file *= """
        \nsubroutine mul_$(m)x$(k)times$(k)x$(n)(C, A, B) bind(C, name = "mul_$(m)x$(k)times$(k)x$(n)")
            real(C_double), dimension($m, $n), intent(out)  :: C
            real(C_double), dimension($m, $k), intent(in)   :: A
            real(C_double), dimension($k, $n), intent(in)   :: B
        
            C = matmul(A, B)
        
        end subroutine mul_$(m)x$(k)times$(k)x$(n)\n
        """        
    end
    
    base_file * "\n\nend module fixed_size_matmul"
    
end
Out[2]:
create_fortran_file_string (generic function with 1 method)

I will also benchmark the performance impact of PaddedMatrices’s namesake on a CPU supporting avx-512.
PaddedMatrices is named after the fact that by default it will automatically apply padding between matrix columns, so that the stride between columns is a multiple of SIMD vector width. This should make it easier for the CPU to load columns into registers, and use a single instruction to operate on multiple elements at a time.

Eigen does not support this, but because we’re using PaddedMatrices does, we will just report the padded number of rows when testing padded multiplication. That is, if there are 7 real rows, padded out to 8, we will report 8 rows. Same story for gfortran and MKL JIT.

Blaze, on the other hand, also pads by default, so I will compile blaze both with padding enabled and disabled.
So when the padded matrix is not padded, we call the non-padded blaze, and when it is padded, we call the padded version.

To compile the c++ code, I use the latest version of the g++ compiler:

In [3]:
run(`g++ --version`)
g++ (Clear Linux OS for Intel Architecture) 9.1.1 20190529 gcc-9-branch@271737
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Out[3]:
Process(`g++ --version`, ProcessExited(0))

We turn on aggressive compiler optimizations (-O3), enable cpu-specific optimizations (-march=native).
-Ofast turns all of these optimizations on, but causes Eigen to crash. We therefore use these explicitly with Eigen, and simply use -Ofast otherwise.

Looking at the assembly of a simple test function, the difference between -O2 -ftree-vectorize and -Ofast looks fairly negligible. I’m defaulting to the higher optimization levels.

In the case of gfortran, -Ofast looks like it generated excessive amounts of code relative to -O2 -ftree-vectorize. This is because the compiler pursued an inefficient vectorization pattern; -fdisable-tree-cunrolli prevents that, and produces more or less the same behavior as -O2 -ftree-vectorize. I test both versions here for completeness’ sake. However, I’d be careful with that flag: many loops need it to vectorize, so I wouldn’t want to compile an entire program with it. When running into that problem before, I isolated the functions that needed it into a single file, and set the makefile to replace -flto with -fdisable-tree-cunrolli for that particular file.

Digression aside, here we create the five files, and compiling four in parallel build jobs:

In [4]:
open("eigen_mul.cpp", "w") do io
    eigen_file_string = create_eigen_file_string(Mrange, Krange, Nrange);
    write(io, eigen_file_string)
end;
open("blaze_mul.cpp", "w") do io
    blaze_file_string = create_blaze_file_string(Mrange, Krange, Nrange);
    write(io, blaze_file_string)
end;
open("fortran_intrinsic_mul.f90", "w") do io
    gfort_file_string = create_fortran_file_string(Mrange, Krange, Nrange);
    write(io, gfort_file_string)
end;
open("blaze_mul.cpp", "w") do io
    blaze_file_string = create_blaze_file_string(Mrange, Krange, Nrange);
    write(io, blaze_file_string)
end;
run(
    `g++ -O3 -fno-signed-zeros -fno-trapping-math -fassociative-math -march=native -mprefer-vector-width=512 -shared -fPIC -I/usr/include/eigen3 eigen_mul.cpp -o libeigenmul.so` & 
    `g++ -Ofast -march=native -mprefer-vector-width=512 -shared -fPIC blaze_mul.cpp -o libblazemul.so` &
    `g++ -DBLAZE_USE_PADDING=0 -Ofast -march=native -mprefer-vector-width=512 -shared -fPIC blaze_mul.cpp -o libblazemul_unpadded.so` &
    `gfortran -Ofast -march=native -fdisable-tree-cunrolli -mprefer-vector-width=512 -shared -fPIC fortran_intrinsic_mul.f90 -o libgfortmul.so`
)
run(`gfortran -Ofast -march=native -mprefer-vector-width=512 -shared -fPIC fortran_intrinsic_mul.f90 -o libgforttcmul.so`);
f951: note: disable pass tree-cunrolli for functions in the range of [0, 4294967295]

We don’t compile the two gfortran instances at the same time, because trying to compile two different instances of the same module simultaneously causes problems. But using a modules saves me the hastle from defining explicit interfaces for the BIND(C) procedures, which saves me from the hastle of dealing with name-mangling.

Additionally, gfortran is fast, but the two templated C++ libraries take more than six minutes to compile the 900 functions on this computer.

To test Intel MKL’s JIT, we use a simple Fortran file:

module jitmul

include "mkl_direct_call.fi"

use ISO_C_BINDING
implicit none

contains

    subroutine dgemmwrap(C,A,B,M,K,N,alpha,beta) bind(C, name = "dgemmwrapped")
      integer,                        intent(in)  :: M, K, N
!      real(C_double), parameter                   :: alpha = 1.0, beta = 0.0
      real(C_double),                 intent(in)  :: alpha, beta
      real(C_double), dimension(M,K), intent(in)  :: A
      real(C_double), dimension(K,N), intent(in)  :: B
      real(C_double), dimension(M,N), intent(out) :: C

      call dgemm('N', 'N', M, N, K, alpha, A, M, B, K, beta, C, M)

    end subroutine dgemmwrap

end module jitmul

This has the advantage of being very easy to use; all we must do is define the macro MKL_DIRECT_CALL_SEQ_JIT while compiling, and we get to take advantage of JIT compilation.

In [5]:
run(`ifort --version`)
ifort (IFORT) 19.0.4.243 20190416
Copyright (C) 1985-2019 Intel Corporation.  All rights reserved.

Out[5]:
Process(`ifort --version`, ProcessExited(0))

Writing it to a file, and compiling with a recent ifort that supports JIT for larger sizes:

In [6]:
open("fortran_mul.f90", "w") do io
    write(io, """
    module jitmul

    #include "mkl_direct_call.fi"

    use ISO_C_BINDING
    implicit none

    contains

        subroutine dgemmwrap(C,A,B,M,K,N,alpha,beta) bind(C, name = "dgemmwrapped")
          integer,                        intent(in)  :: M, K, N
    !      real(C_double), parameter                   :: alpha = 1.0, beta = 0.0
          real(C_double),                 intent(in)  :: alpha, beta
          real(C_double), dimension(M,K), intent(in)  :: A
          real(C_double), dimension(K,N), intent(in)  :: B
          real(C_double), dimension(M,N), intent(out) :: C

          call dgemm('N', 'N', M, N, K, alpha, A, M, B, K, beta, C, M)

        end subroutine dgemmwrap

    end module jitmul
    """)
end;

run(`ifort -fast -mkl -fpp -DMKL_DIRECT_CALL_SEQ_JIT -shared -fPIC fortran_mul.f90 -o libifortmul.so`);

I’m using (as of June 2nd) the most recent commit of PaddedMatrices, and Julia version:

In [7]:
using PaddedMatrices, LinearAlgebra, OffsetArrays, Random, BenchmarkTools

versioninfo()
Julia Version 1.3.0-DEV.324
Commit de3401908a* (2019-06-01 18:06 UTC)
Platform Info:
  OS: Linux (x86_64-generic-linux)
  CPU: Intel(R) Core(TM) i9-9940X CPU @ 3.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.0 (ORCJIT, skylake)

Wrapping these libraries with ccall:

In [8]:
const eigenlib = joinpath(pwd(), "libeigenmul.so")
const blazelib = joinpath(pwd(), "libblazemul.so")
const blazelib_unpadded = joinpath(pwd(), "libblazemul_unpadded.so")
const gfortlib = joinpath(pwd(), "libgfortmul.so")
const gforttclib = joinpath(pwd(), "libgforttcmul.so")
const ifortlib = joinpath(pwd(), "libifortmul.so")

using PaddedMatrices: AbstractMutableFixedSizePaddedMatrix

@generated function eigen_mul!(
            C::AbstractMutableFixedSizePaddedMatrix{M,N,Float64,P},
            A::AbstractMutableFixedSizePaddedMatrix{M,K,Float64,P},
            B::AbstractMutableFixedSizePaddedMatrix{K,N,Float64,K}
        ) where {M,K,N,P}
    func = QuoteNode(Symbol(:mul_, P, :x, K, :times, K, :x, N))
    quote
        ccall(
            ($func, eigenlib), Cvoid,
            (Ptr{Cdouble}, Ptr{Cdouble}, Ptr{Cdouble}),
            C, A, B
        )
    end
end
@generated function blaze_mul!(
            C::AbstractMutableFixedSizePaddedMatrix{M,N,Float64,M},
            A::AbstractMutableFixedSizePaddedMatrix{M,K,Float64,M},
            B::AbstractMutableFixedSizePaddedMatrix{K,N,Float64,K}
        ) where {M,K,N}
    func = QuoteNode(Symbol(:mul_, M, :x, K, :times, K, :x, N))
    quote
        ccall(
            ($func, blazelib_unpadded), Cvoid,
            (Ptr{Cdouble}, Ptr{Cdouble}, Ptr{Cdouble}),
            C, A, B
        )
    end
end
@generated function blaze_mul!(
            C::AbstractMutableFixedSizePaddedMatrix{M,N,Float64,P},
            A::AbstractMutableFixedSizePaddedMatrix{M,K,Float64,P},
            B::AbstractMutableFixedSizePaddedMatrix{K,N,Float64,K}
        ) where {M,K,N,P}
    func = QuoteNode(Symbol(:mul_, M, :x, K, :times, K, :x, N))
    quote
        ccall(
            ($func, blazelib), Cvoid,
            (Ptr{Cdouble}, Ptr{Cdouble}, Ptr{Cdouble}),
            C, A, B
        )
    end
end
@generated function gfort_mul!(
            C::AbstractMutableFixedSizePaddedMatrix{M,N,Float64,P},
            A::AbstractMutableFixedSizePaddedMatrix{M,K,Float64,P},
            B::AbstractMutableFixedSizePaddedMatrix{K,N,Float64,K}
        ) where {M,K,N,P}
    func = QuoteNode(Symbol(:mul_, P, :x, K, :times, K, :x, N))
    quote
        ccall(
            ($func, gfortlib), Cvoid,
            (Ptr{Cdouble}, Ptr{Cdouble}, Ptr{Cdouble}),
            C, A, B
        )
    end
end
@generated function gf_tc_mul!(
            C::AbstractMutableFixedSizePaddedMatrix{M,N,Float64,P},
            A::AbstractMutableFixedSizePaddedMatrix{M,K,Float64,P},
            B::AbstractMutableFixedSizePaddedMatrix{K,N,Float64,K}
        ) where {M,K,N,P}
    func = QuoteNode(Symbol(:mul_, P, :x, K, :times, K, :x, N))
    quote
        ccall(
            ($func, gforttclib), Cvoid,
            (Ptr{Cdouble}, Ptr{Cdouble}, Ptr{Cdouble}),
            C, A, B
        )
    end
end
function ifort_mul!(
            C::AbstractMutableFixedSizePaddedMatrix{M,N,Float64,P},
            A::AbstractMutableFixedSizePaddedMatrix{M,K,Float64,P},
            B::AbstractMutableFixedSizePaddedMatrix{K,N,Float64,K}
        ) where {M,K,N,P}
    ccall(
        (:dgemmwrapped, ifortlib), Cvoid,
        (Ptr{Cdouble}, Ptr{Cdouble}, Ptr{Cdouble}, Ref{Cint}, Ref{Cint}, Ref{Cint}, Ref{Cdouble}, Ref{Cdouble}),
        C, A, B, Ref(Int32(P)), Ref(Int32(K)), Ref(Int32(N)), Ref(1.0), Ref(0.0)
    )
end
Out[8]:
ifort_mul! (generic function with 1 method)

Now, before looping over a range of tests, let’s confirm that each are giving us the same answer for an arbitrary size, and do an initial test for first impressions.

Specifying 7 rows in the matrix A below tends to cause autovectorizers struggle, so I figure that is a good test case to highlight here. For that reason, I also manually specify the matrice’s size and disable padding.

The parameters in the curly braces { } are:

  1. Number of rows
  2. Number of columns
  3. Element type
  4. Stride between columns.

If the stride between columns (parameter 4) equals the number of rows (parameter 1), then there is no extra padding between them. After initializing undefined matrices, we fill them with random numbers.

In [9]:
C₁ = MutableFixedSizePaddedMatrix{7,10,Float64,7}(undef);
A₁ = MutableFixedSizePaddedMatrix{7,32,Float64,7}(undef);
B₁ = MutableFixedSizePaddedMatrix{32,10,Float64,32}(undef);
randn!(A₁); randn!(B₁);

To multiply in place (overwriting the contents of C) using PaddedMatrices, we use mul! (exported by LinearAlgebra, although the methods for PaddedMatrices are defined in the latter library).
The macro @btime runs the expressions many times to get a reasonably reliable estimate of the minimum time.

Testing each of the libraries:

In [10]:
@btime mul!($C₁, $A₁, $B₁); C₁
  57.097 ns (0 allocations: 0 bytes)
Out[10]:
7×10 MutableFixedSizePaddedArray{Tuple{7,10},Float64,2,7,70}:
  2.30257  -10.2044    7.29207   -4.06196  …   2.15086   -2.43992  -9.95768 
 -1.33198   -5.65042  -0.485499  -3.33891      5.24883   11.0879   -0.600485
  1.11261   -9.81942  -1.03964   -1.2958       4.83331   -3.81413   3.21009 
  2.05042   10.9062    7.15174    5.45935     -6.54997  -16.8048    8.70556 
 -2.71947   -1.28211  -5.32339    4.4657       2.44242    1.22124  -2.28389 
  5.67357   15.7839    1.87185    3.02726  …  -6.48584   -3.22967  -6.39815 
 -2.60538    8.74262  -3.29058    1.23828      6.32987    1.94902   3.07087 
In [11]:
fill!(C₁, 0); @btime eigen_mul!($C₁, $A₁, $B₁); C₁
  396.734 ns (0 allocations: 0 bytes)
Out[11]:
7×10 MutableFixedSizePaddedArray{Tuple{7,10},Float64,2,7,70}:
  2.30257  -10.2044    7.29207   -4.06196  …   2.15086   -2.43992  -9.95768 
 -1.33198   -5.65042  -0.485499  -3.33891      5.24883   11.0879   -0.600485
  1.11261   -9.81942  -1.03964   -1.2958       4.83331   -3.81413   3.21009 
  2.05042   10.9062    7.15174    5.45935     -6.54997  -16.8048    8.70556 
 -2.71947   -1.28211  -5.32339    4.4657       2.44242    1.22124  -2.28389 
  5.67357   15.7839    1.87185    3.02726  …  -6.48584   -3.22967  -6.39815 
 -2.60538    8.74262  -3.29058    1.23828      6.32987    1.94902   3.07087 
In [12]:
fill!(C₁, 0); @btime blaze_mul!($C₁, $A₁, $B₁); C₁
  327.832 ns (0 allocations: 0 bytes)
Out[12]:
7×10 MutableFixedSizePaddedArray{Tuple{7,10},Float64,2,7,70}:
  2.30257  -10.2044    7.29207   -4.06196  …   2.15086   -2.43992  -9.95768 
 -1.33198   -5.65042  -0.485499  -3.33891      5.24883   11.0879   -0.600485
  1.11261   -9.81942  -1.03964   -1.2958       4.83331   -3.81413   3.21009 
  2.05042   10.9062    7.15174    5.45935     -6.54997  -16.8048    8.70556 
 -2.71947   -1.28211  -5.32339    4.4657       2.44242    1.22124  -2.28389 
  5.67357   15.7839    1.87185    3.02726  …  -6.48584   -3.22967  -6.39815 
 -2.60538    8.74262  -3.29058    1.23828      6.32987    1.94902   3.07087 
In [13]:
fill!(C₁, 0); @btime gfort_mul!($C₁, $A₁, $B₁); C₁
  436.869 ns (0 allocations: 0 bytes)
Out[13]:
7×10 MutableFixedSizePaddedArray{Tuple{7,10},Float64,2,7,70}:
  2.30257  -10.2044    7.29207   -4.06196  …   2.15086   -2.43992  -9.95768 
 -1.33198   -5.65042  -0.485499  -3.33891      5.24883   11.0879   -0.600485
  1.11261   -9.81942  -1.03964   -1.2958       4.83331   -3.81413   3.21009 
  2.05042   10.9062    7.15174    5.45935     -6.54997  -16.8048    8.70556 
 -2.71947   -1.28211  -5.32339    4.4657       2.44242    1.22124  -2.28389 
  5.67357   15.7839    1.87185    3.02726  …  -6.48584   -3.22967  -6.39815 
 -2.60538    8.74262  -3.29058    1.23828      6.32987    1.94902   3.07087 
In [14]:
fill!(C₁, 0); @btime gf_tc_mul!($C₁, $A₁, $B₁); C₁
  316.466 ns (0 allocations: 0 bytes)
Out[14]:
7×10 MutableFixedSizePaddedArray{Tuple{7,10},Float64,2,7,70}:
  2.30257  -10.2044    7.29207   -4.06196  …   2.15086   -2.43992  -9.95768 
 -1.33198   -5.65042  -0.485499  -3.33891      5.24883   11.0879   -0.600485
  1.11261   -9.81942  -1.03964   -1.2958       4.83331   -3.81413   3.21009 
  2.05042   10.9062    7.15174    5.45935     -6.54997  -16.8048    8.70556 
 -2.71947   -1.28211  -5.32339    4.4657       2.44242    1.22124  -2.28389 
  5.67357   15.7839    1.87185    3.02726  …  -6.48584   -3.22967  -6.39815 
 -2.60538    8.74262  -3.29058    1.23828      6.32987    1.94902   3.07087 
In [15]:
fill!(C₁, 0); @btime ifort_mul!($C₁, $A₁, $B₁); C₁
  101.845 ns (0 allocations: 0 bytes)
Out[15]:
7×10 MutableFixedSizePaddedArray{Tuple{7,10},Float64,2,7,70}:
  2.30257  -10.2044    7.29207   -4.06196  …   2.15086   -2.43992  -9.95768 
 -1.33198   -5.65042  -0.485499  -3.33891      5.24883   11.0879   -0.600485
  1.11261   -9.81942  -1.03964   -1.2958       4.83331   -3.81413   3.21009 
  2.05042   10.9062    7.15174    5.45935     -6.54997  -16.8048    8.70556 
 -2.71947   -1.28211  -5.32339    4.4657       2.44242    1.22124  -2.28389 
  5.67357   15.7839    1.87185    3.02726  …  -6.48584   -3.22967  -6.39815 
 -2.60538    8.74262  -3.29058    1.23828      6.32987    1.94902   3.07087 

Initial impressions: the Julia library is clearly the fastest, but Intel MKL’s JIT is not far behind. Everything else is many times slower.

Now, a script to perform all the benchmarks:

In [16]:
function sorted_median(x)
    N = length(x)
    N < 1 && return NaN
    if isodd(N)
        x[(N+1)>>1]
    else
        h = N >> 1
        (x[h] + x[h+1]) / 2
    end
end
function bench_matmul!(minimum, median, C, A, B)
    times = (@benchmark eigen_mul!($C, $A, $B)).times
    minimum[1] = first(times)
    median[1] = sorted_median(times)
    
    times = (@benchmark blaze_mul!($C, $A, $B)).times
    minimum[2] = first(times)
    median[2] = sorted_median(times)    
    
    times = (@benchmark gfort_mul!($C, $A, $B)).times
    minimum[3] = first(times)
    median[3] = sorted_median(times)
    
    times = (@benchmark gf_tc_mul!($C, $A, $B)).times
    minimum[4] = first(times)
    median[4] = sorted_median(times)
    
    times = (@benchmark ifort_mul!($C, $A, $B)).times
    minimum[5] = first(times)
    median[5] = sorted_median(times)
    
    times = (@benchmark mul!($C, $A, $B)).times
    minimum[6] = first(times)
    median[6] = sorted_median(times)
end

function bench_range(Mrange, Krange, Nrange)
    mr = Mrange isa Integer ? (Mrange:Mrange) : Mrange
    kr = Krange isa Integer ? (Krange:Krange) : Krange
    nr = Nrange isa Integer ? (Nrange:Nrange) : Nrange
    minimum_times = OffsetArray{Float64}(undef, 1:6, mr, kr, nr, 1:2)
    median_times = OffsetArray{Float64}(undef, 1:6, mr, kr, nr, 1:2)
    for m  Mrange, k  Krange, n  Nrange
        padded_A = @Mutable randn(m, k)
        padded_B = @Mutable randn(k, n)
        padded_C = PaddedMatrices.MutableFixedSizePaddedMatrix{m,n,Float64}(undef)
        @views bench_matmul!( # test with padding
            minimum_times[:,m,k,n,1], median_times[:,m,k,n,1],
            padded_C, padded_A, padded_B
        )
        
        C = PaddedMatrices.MutableFixedSizePaddedMatrix{m,n,Float64,m}(undef)
        A = PaddedMatrices.MutableFixedSizePaddedMatrix{m,k,Float64,m}(undef)
        B = PaddedMatrices.MutableFixedSizePaddedMatrix{k,n,Float64,k}(undef)
        A .= padded_A; B .= padded_B
        @views bench_matmul!( # test without padding
            minimum_times[:,m,k,n,2], median_times[:,m,k,n,2], C, A, B
        )
    end
    minimum_times, median_times
end
Out[16]:
bench_range (generic function with 1 method)

Now, unfortuantely blaze only supports alligned memory accesses (vmovapd) with its statitcally sized arrays. PaddedMatrices does not offer convenience functions for creating alligned matrices and only uses unaligned accesses (vmovupd).
The differences:

  • On some old architectures vmovapd was faster than vmovupd, but on reasonably recent computers (ie, AFAIK any desktop cpu in the last 5 years) they’re the same fast.
  • If the memory is not alligned, vmovapd will crash the program while vmovupd works equally quickly.

Memory being alligned means that the address is an integer multiple of the size of the vector being moved by the access.

So that means instead of using a regular PaddedMatrix, I allocate memory using malloc, and wrap this pointer in PtrMatrices.

In [17]:
using PaddedMatrices: PtrMatrix
function bench_range(Mrange, Krange, Nrange)
    mr = Mrange isa Integer ? (Mrange:Mrange) : Mrange
    kr = Krange isa Integer ? (Krange:Krange) : Krange
    nr = Nrange isa Integer ? (Nrange:Nrange) : Nrange
    minimum_times = OffsetArray{Float64}(undef, 1:6, mr, kr, nr, 1:2)
    median_times = OffsetArray{Float64}(undef, 1:6, mr, kr, nr, 1:2)
    
    # max dims * bytes/element + 63
    mem_per = (32^2)*8 + 63
    
    ptr = Libc.malloc(3mem_per)
    uptr = reinterpret(UInt, ptr)
    
    # ptr_A, ptr_B, and ptr_C will be aligned on a 64 byte boundary
    ptr_A = reinterpret(Ptr{Float64}, ( uptr + 63 ) & -64 )
    ptr_B = reinterpret(Ptr{Float64}, ( uptr + mem_per + 63 ) & -64 )
    ptr_C = reinterpret(Ptr{Float64}, ( uptr + 2mem_per + 63 ) & -64 )
    
    for m  Mrange, k  Krange, n  Nrange
        # if m <= 4, then m either 3 or 4; otherwise make it a multiple of 8
        P = m > 4 ? (m + 7) & -8 : 4
        padded_A = PtrMatrix{m,k,Float64,P,(P*k),true}(ptr_A)
        padded_B = PtrMatrix{k,n,Float64,k,(k*n),true}(ptr_B)
        padded_C = PtrMatrix{m,n,Float64,P,(P*n),true}(ptr_C)

        randn!(padded_A); randn!(padded_B)
        @views bench_matmul!( # test with padding
            minimum_times[:,m,k,n,1], median_times[:,m,k,n,1],
            padded_C, padded_A, padded_B
        )
        
        unpadded_A = PtrMatrix{m,k,Float64,m,(m*k),true}(ptr_A)
        unpadded_B = PtrMatrix{k,n,Float64,k,(k*n),true}(ptr_B)
        unpadded_C = PtrMatrix{m,n,Float64,m,(m*n),true}(ptr_C)
        randn!(unpadded_A); randn!(unpadded_B)
        @views bench_matmul!( # test without padding
            minimum_times[:,m,k,n,2], median_times[:,m,k,n,2],
            unpadded_C, unpadded_A, unpadded_B
        )
    end
    
    Libc.free(ptr)
    
    minimum_times, median_times
end
Out[17]:
bench_range (generic function with 1 method)
In [18]:
minimum_times, median_times = bench_range(Mrange, Krange, Nrange);

Some boiler plate to get things to print in the way I want:

In [19]:
using DataFrames
function string_to_ind(str)
    if str == "eigen"
        i = 1
    elseif str == "blaze"
        i = 2
    elseif str == "gfortran"
        i = 3
    elseif str == "gfortran tc"
        i = 4
    elseif str == "mkl" || str == "mkl jit"
        i = 5
    elseif str == "native julia"
        i = 6
    else
        throw("$str not recognized; please specify either \"eigen\", \"blaze\", \"mkl\", or \"native julia\".")
    end
    return i
end
function DataFrames.showrowindices(io::IO,
                        df::AbstractDataFrame,
                        rowindices::AbstractVector{Int},
                        maxwidths::Vector{Int},
                        leftcol::Int,
                        rightcol::Int,
                        rowid) # -> Void
    rowmaxwidth = maxwidths[end]

    for i in rowindices
        # Print row ID
        if rowid isa Nothing
            DataFrames.@printf io "| %d" i + 2
        else
            DataFrames.@printf io "| %d" rowid
        end
        padding = rowmaxwidth - DataFrames.ndigits(rowid isa Nothing ? i + 2 : rowid)
        for _ in 1:padding
            write(io, ' ')
        end
        print(io, " | ")
        # Print DataFrame entry
        for j in leftcol:rightcol
            strlen = 0
            if DataFrames.isassigned(df[j], i)
                s = df[i, j]
                strlen = DataFrames.ourstrwidth(io, s)
                if ismissing(s)
                    DataFrames.printstyled(io, s, color=:light_black)
                elseif s === nothing
                    strlen = 0
                else
                    DataFrames.ourshow(io, s)
                end
            else
                strlen = DataFrames.ourstrwidth(io, Base.undef_ref_str)
                DataFrames.ourshow(io, Base.undef_ref_str)
            end
            padding = maxwidths[j] - strlen
            for _ in 1:padding
                write(io, ' ')
            end
            if j == rightcol
                if i == rowindices[end]
                    print(io, " |")
                else
                    print(io, " |\n")
                end
            else
                print(io, " | ")
            end
        end
    end
    return
end
function print_results(compare1, compare2, times; padded = false, sigdigits = 3)
    ind1 = string_to_ind(compare1)
    ind2 = string_to_ind(compare2)
    padind = padded ? 1 : 2
    times_parent = times.parent
    df = round.(times_parent[ind1,:,1,:,padind] ./ times_parent[ind2,:,1,:,padind], sigdigits = sigdigits) |> DataFrame
    offset_m = times.offsets[2]; offset_n = times.offsets[4]
    print_results(df, offset_m, offset_n)
end
function print_results(df, offset_m, offset_n)
    names!(df, [Symbol(:NCols,j+offset_n) for j  1:size(df,2)])
    show(df, allrows = true, allcols = true)
end
Out[19]:
print_results (generic function with 1 method)

Because I am most interested in the performance of PaddedMatrices.jl, I compare the relative runtimes of each of the other libraries with PaddedMatrices as the baseline.

First taking a look at Eigen, which seems to be the most popular of the alternatives, employed for example in Stan and TensorFlow:

In [20]:
print_results("eigen", "native julia", minimum_times, padded = false)
30×30 DataFrame
│ Row │ NCols3  │ NCols4  │ NCols5  │ NCols6  │ NCols7  │ NCols8  │ NCols9  │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 5.07    | 4.87    | 5.64    | 5.88    | 5.63    | 4.85    | 4.92    |
| 4   | 5.62    | 4.1     | 4.68    | 4.96    | 4.65    | 3.91    | 4.38    |
| 5   | 5.55    | 4.16    | 5.16    | 5.86    | 5.5     | 4.23    | 5.05    |
| 6   | 6.9     | 5.81    | 6.23    | 6.78    | 6.76    | 4.76    | 6.08    |
| 7   | 7.75    | 5.63    | 7.19    | 7.59    | 7.49    | 6.33    | 6.75    |
| 8   | 5.75    | 3.94    | 4.66    | 5.34    | 5.67    | 4.28    | 4.41    |
| 9   | 5.2     | 3.75    | 5.03    | 4.83    | 4.77    | 2.96    | 3.84    |
| 10  | 5.92    | 4.64    | 5.59    | 5.26    | 5.24    | 3.73    | 4.13    |
| 11  | 6.75    | 5.28    | 6.22    | 5.85    | 5.71    | 4.08    | 4.64    |
| 12  | 4.98    | 4.34    | 4.5     | 4.35    | 4.23    | 3.68    | 3.67    |
| 13  | 6.33    | 5.01    | 5.69    | 5.39    | 5.02    | 3.98    | 4.38    |
| 14  | 6.91    | 5.7     | 6.18    | 5.76    | 5.77    | 4.18    | 4.95    |
| 15  | 7.65    | 5.86    | 6.98    | 6.58    | 6.3     | 4.52    | 5.66    |
| 16  | 8.07    | 6.22    | 6.65    | 6.44    | 6.54    | 4.97    | 5.32    |
| 17  | 6.4     | 4.76    | 5.12    | 4.88    | 4.92    | 3.79    | 4.23    |
| 18  | 6.74    | 5.18    | 5.46    | 5.05    | 5.24    | 4.3     | 4.55    |
| 19  | 7.71    | 5.6     | 6.17    | 5.68    | 6.03    | 4.4     | 4.66    |
| 20  | 6.53    | 4.57    | 4.73    | 4.51    | 4.55    | 3.66    | 3.81    |
| 21  | 7.1     | 5.16    | 5.41    | 5.28    | 5.25    | 4.03    | 4.14    |
| 22  | 7.47    | 5.19    | 5.74    | 5.59    | 5.6     | 4.53    | 4.89    |
| 23  | 8.19    | 6.11    | 6.4     | 6.15    | 6.21    | 4.66    | 5.22    |
| 24  | 6.57    | 4.99    | 4.7     | 4.89    | 4.96    | 3.99    | 4.15    |
| 25  | 6.3     | 4.13    | 4.3     | 4.4     | 3.89    | 3.08    | 3.51    |
| 26  | 6.58    | 4.52    | 4.66    | 4.37    | 4.02    | 3.39    | 3.79    |
| 27  | 6.75    | 4.74    | 5.42    | 5.09    | 4.16    | 3.6     | 3.93    |
| 28  | 7.0     | 4.57    | 4.61    | 4.77    | 4.24    | 3.53    | 3.77    |
| 29  | 7.71    | 4.91    | 5.44    | 5.24    | 4.44    | 3.95    | 4.25    |
| 30  | 8.28    | 5.28    | 5.69    | 5.51    | 5.12    | 4.08    | 4.65    |
| 31  | 8.21    | 5.42    | 6.23    | 6.22    | 5.01    | 4.3     | 4.78    |
| 32  | 7.69    | 4.93    | 5.26    | 5.18    | 4.48    | 3.79    | 4.13    |

│ Row │ NCols10 │ NCols11 │ NCols12 │ NCols13 │ NCols14 │ NCols15 │ NCols16 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 4.23    | 4.7     | 4.15    | 4.61    | 4.53    | 4.26    | 3.83    |
| 4   | 4.27    | 4.43    | 3.6     | 3.81    | 3.87    | 3.81    | 3.42    |
| 5   | 5.34    | 5.26    | 4.02    | 4.71    | 4.8     | 4.99    | 3.91    |
| 6   | 5.46    | 6.16    | 5.11    | 5.68    | 5.49    | 5.63    | 4.66    |
| 7   | 6.98    | 6.91    | 5.68    | 6.23    | 6.49    | 6.4     | 5.31    |
| 8   | 4.63    | 4.86    | 3.95    | 4.15    | 4.24    | 4.34    | 3.78    |
| 9   | 3.77    | 3.91    | 2.89    | 3.39    | 3.35    | 2.96    | 2.39    |
| 10  | 4.09    | 4.22    | 3.27    | 3.7     | 3.53    | 3.32    | 2.78    |
| 11  | 4.75    | 4.76    | 3.62    | 4.46    | 4.27    | 3.85    | 3.14    |
| 12  | 3.59    | 3.47    | 3.2     | 3.28    | 3.35    | 2.91    | 2.8     |
| 13  | 4.69    | 4.23    | 3.63    | 4.03    | 4.13    | 3.55    | 3.24    |
| 14  | 4.84    | 4.91    | 4.06    | 4.56    | 4.18    | 4.05    | 3.83    |
| 15  | 5.49    | 5.47    | 4.47    | 5.04    | 4.88    | 4.35    | 4.2     |
| 16  | 5.3     | 5.44    | 4.49    | 4.63    | 4.52    | 4.36    | 3.92    |
| 17  | 3.82    | 4.01    | 3.39    | 3.68    | 3.64    | 3.77    | 3.08    |
| 18  | 3.72    | 4.27    | 3.63    | 4.05    | 4.1     | 4.09    | 3.42    |
| 19  | 4.35    | 4.49    | 3.9     | 4.39    | 4.17    | 4.28    | 3.82    |
| 20  | 3.46    | 3.76    | 3.21    | 3.44    | 3.5     | 3.5     | 3.04    |
| 21  | 3.81    | 4.3     | 3.6     | 3.83    | 3.95    | 3.98    | 3.54    |
| 22  | 4.26    | 4.73    | 3.99    | 4.4     | 4.4     | 4.58    | 3.67    |
| 23  | 4.4     | 4.86    | 4.38    | 4.46    | 4.53    | 4.79    | 4.24    |
| 24  | 3.83    | 4.02    | 3.54    | 3.67    | 3.79    | 3.84    | 3.47    |
| 25  | 3.61    | 3.57    | 2.93    | 2.97    | 2.87    | 3.1     | 2.75    |
| 26  | 3.76    | 3.94    | 3.2     | 3.15    | 3.26    | 3.54    | 2.9     |
| 27  | 4.0     | 3.98    | 3.42    | 3.23    | 3.22    | 3.62    | 3.16    |
| 28  | 3.81    | 3.93    | 3.17    | 3.17    | 3.33    | 3.57    | 3.1     |
| 29  | 4.42    | 4.38    | 3.56    | 3.41    | 3.73    | 3.87    | 3.36    |
| 30  | 4.69    | 4.96    | 3.84    | 3.96    | 3.92    | 4.33    | 3.61    |
| 31  | 4.84    | 4.92    | 4.05    | 3.87    | 4.12    | 4.37    | 3.58    |
| 32  | 4.26    | 4.45    | 3.52    | 3.52    | 3.73    | 3.93    | 3.42    |

│ Row │ NCols17 │ NCols18 │ NCols19 │ NCols20 │ NCols21 │ NCols22 │ NCols23 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 4.03    | 4.08    | 3.87    | 3.65    | 3.99    | 3.96    | 3.82    |
| 4   | 3.59    | 3.6     | 3.65    | 3.31    | 3.4     | 3.51    | 3.38    |
| 5   | 4.51    | 4.31    | 4.45    | 3.68    | 4.31    | 4.31    | 4.58    |
| 6   | 5.11    | 5.02    | 5.3     | 4.35    | 4.92    | 4.72    | 5.0     |
| 7   | 5.8     | 5.84    | 5.91    | 5.06    | 5.86    | 5.64    | 5.89    |
| 8   | 3.89    | 4.0     | 3.88    | 3.65    | 3.68    | 3.85    | 3.91    |
| 9   | 3.14    | 3.07    | 3.12    | 2.5     | 3.03    | 3.04    | 2.99    |
| 10  | 3.37    | 3.26    | 3.48    | 2.88    | 3.32    | 3.3     | 3.43    |
| 11  | 3.97    | 3.97    | 3.98    | 3.27    | 3.84    | 3.35    | 3.73    |
| 12  | 2.94    | 3.07    | 3.05    | 2.79    | 2.93    | 3.01    | 3.08    |
| 13  | 3.63    | 3.71    | 3.85    | 3.28    | 3.46    | 3.57    | 3.6     |
| 14  | 4.17    | 4.15    | 4.34    | 3.78    | 4.07    | 4.08    | 3.9     |
| 15  | 4.6     | 4.51    | 4.59    | 4.23    | 4.41    | 4.41    | 4.67    |
| 16  | 4.11    | 4.24    | 4.37    | 3.93    | 4.09    | 4.17    | 4.25    |
| 17  | 3.37    | 3.19    | 3.16    | 2.9     | 3.05    | 3.09    | 3.23    |
| 18  | 3.83    | 3.6     | 3.56    | 3.0     | 3.47    | 3.48    | 3.67    |
| 19  | 3.94    | 3.81    | 3.58    | 3.32    | 3.56    | 3.6     | 3.82    |
| 20  | 3.22    | 3.09    | 3.05    | 2.71    | 2.92    | 3.03    | 3.2     |
| 21  | 3.72    | 3.52    | 3.38    | 3.02    | 3.37    | 3.48    | 3.47    |
| 22  | 4.23    | 4.02    | 3.9     | 3.45    | 3.84    | 3.69    | 3.93    |
| 23  | 4.31    | 4.13    | 3.91    | 3.66    | 3.87    | 3.91    | 4.04    |
| 24  | 3.6     | 3.52    | 3.5     | 3.28    | 3.42    | 3.53    | 3.63    |
| 25  | 2.88    | 3.02    | 2.96    | 2.77    | 2.84    | 3.0     | 3.04    |
| 26  | 3.35    | 3.26    | 3.25    | 2.93    | 3.29    | 3.2     | 3.29    |
| 27  | 3.18    | 3.41    | 3.32    | 3.03    | 3.36    | 3.34    | 3.53    |
| 28  | 3.32    | 3.45    | 3.35    | 3.03    | 3.37    | 3.34    | 3.48    |
| 29  | 3.49    | 3.61    | 3.6     | 3.27    | 3.57    | 3.66    | 3.87    |
| 30  | 4.11    | 4.0     | 4.12    | 3.5     | 3.95    | 3.98    | 4.22    |
| 31  | 4.06    | 4.11    | 4.07    | 3.84    | 4.08    | 4.16    | 4.28    |
| 32  | 3.64    | 3.79    | 3.94    | 3.32    | 3.67    | 3.9     | 3.9     |

│ Row │ NCols24 │ NCols25 │ NCols26 │ NCols27 │ NCols28 │ NCols29 │ NCols30 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 3.5     | 3.83    | 3.72    | 3.62    | 3.5     | 3.78    | 3.62    |
| 4   | 3.12    | 3.25    | 3.29    | 3.41    | 3.09    | 3.21    | 3.36    |
| 5   | 3.48    | 4.1     | 3.95    | 4.14    | 3.44    | 3.73    | 4.05    |
| 6   | 4.16    | 4.8     | 4.57    | 4.83    | 4.06    | 4.75    | 4.59    |
| 7   | 4.91    | 5.57    | 4.75    | 4.79    | 4.8     | 5.14    | 5.32    |
| 8   | 3.48    | 3.53    | 3.62    | 3.73    | 3.44    | 3.57    | 3.63    |
| 9   | 2.46    | 2.75    | 2.71    | 2.81    | 2.24    | 2.29    | 2.36    |
| 10  | 2.71    | 3.21    | 3.08    | 3.25    | 2.58    | 2.6     | 2.74    |
| 11  | 3.14    | 3.52    | 3.56    | 3.61    | 3.14    | 2.93    | 3.01    |
| 12  | 2.87    | 2.9     | 2.91    | 2.95    | 2.61    | 2.47    | 2.56    |
| 13  | 3.27    | 3.42    | 3.41    | 3.49    | 3.1     | 2.83    | 2.91    |
| 14  | 3.76    | 4.01    | 3.98    | 4.21    | 3.58    | 3.49    | 3.37    |
| 15  | 4.14    | 4.25    | 4.34    | 4.4     | 3.85    | 3.63    | 3.75    |
| 16  | 3.88    | 4.19    | 4.26    | 4.45    | 3.87    | 3.79    | 3.94    |
| 17  | 2.91    | 3.02    | 3.1     | 3.31    | 2.8     | 2.93    | 3.06    |
| 18  | 3.13    | 3.43    | 3.25    | 3.54    | 2.96    | 3.3     | 3.24    |
| 19  | 3.41    | 3.33    | 3.48    | 3.66    | 3.19    | 3.5     | 3.55    |
| 20  | 2.81    | 2.95    | 3.03    | 3.19    | 2.77    | 3.03    | 3.0     |
| 21  | 3.09    | 3.21    | 3.41    | 3.53    | 3.09    | 3.32    | 3.42    |
| 22  | 3.38    | 3.78    | 3.54    | 4.1     | 3.38    | 3.83    | 3.72    |
| 23  | 3.66    | 3.91    | 3.85    | 4.17    | 3.57    | 3.97    | 3.98    |
| 24  | 3.39    | 3.5     | 3.55    | 3.65    | 3.29    | 3.49    | 3.64    |
| 25  | 2.8     | 2.82    | 2.98    | 3.03    | 2.93    | 2.99    | 3.07    |
| 26  | 2.99    | 3.14    | 3.01    | 3.38    | 3.09    | 3.38    | 3.3     |
| 27  | 3.21    | 3.14    | 3.3     | 3.27    | 3.25    | 3.49    | 3.41    |
| 28  | 3.16    | 3.2     | 3.3     | 3.44    | 3.11    | 3.24    | 3.31    |
| 29  | 3.36    | 3.41    | 3.58    | 3.77    | 3.5     | 3.64    | 3.65    |
| 30  | 3.55    | 3.83    | 3.81    | 4.14    | 3.82    | 4.04    | 3.92    |
| 31  | 3.87    | 4.03    | 4.18    | 4.28    | 3.78    | 4.02    | 4.11    |
| 32  | 3.61    | 3.52    | 3.71    | 3.88    | 3.54    | 3.63    | 3.64    |

│ Row │ NCols31 │ NCols32 │
│     │ Float64Float64 │
├─────┼─────────┼─────────┤
| 3   | 3.28    | 3.05    |
| 4   | 2.96    | 2.76    |
| 5   | 3.72    | 3.2     |
| 6   | 4.44    | 3.84    |
| 7   | 4.96    | 4.44    |
| 8   | 3.28    | 3.09    |
| 9   | 2.52    | 2.27    |
| 10  | 2.96    | 2.58    |
| 11  | 2.97    | 2.81    |
| 12  | 2.68    | 2.56    |
| 13  | 2.99    | 2.89    |
| 14  | 3.71    | 3.43    |
| 15  | 3.79    | 3.67    |
| 16  | 4.02    | 3.64    |
| 17  | 3.14    | 2.89    |
| 18  | 3.51    | 3.14    |
| 19  | 3.59    | 3.41    |
| 20  | 3.12    | 2.78    |
| 21  | 3.59    | 3.29    |
| 22  | 4.05    | 3.6     |
| 23  | 4.17    | 3.86    |
| 24  | 3.67    | 3.39    |
| 25  | 3.01    | 2.91    |
| 26  | 3.26    | 3.16    |
| 27  | 3.32    | 3.12    |
| 28  | 3.31    | 3.11    |
| 29  | 3.49    | 3.41    |
| 30  | 4.0     | 3.64    |
| 31  | 4.07    | 3.82    |
| 32  | 3.69    | 3.36    |

What surprises me is that Eigen is still slow when $\textbf{A} \in \mathbb{R}^{8i\times32}$ for integer $i$. That is, whenever the number of rows is a multiple of 8.

I’m fairly sure gfortran mostly relies on the autovectorizer for vectorization, an idea supported by how much the generated code changes when you change compiler flags.
So for all of Eigen’s templating and slow compilation, all it seems to have to show for it (vs gfortran) is slow compilation times and fat binaries (although Blaze is heaviest):

Shared Library File Size
Eigen 1.3M
Blaze-Padded 1.9M
Blze-Unpadded 2.7M
gfortran 0.529M
gfortran-tc 0.641M
ifort mkl-jit 0.018M

Of course, MKL-JIT’s file size is small because it generates the matmul code “just in time” while running, instead of ahead of time as in the other examples..

Comparing Eigen vs gfortran so we can directly compare them (times are ratio of Eigen runtimes to gfortran runtimes):

In [33]:
print_results("eigen", "gfortran tc", minimum_times, padded = false)
30×30 DataFrame
│ Row │ NCols3  │ NCols4  │ NCols5  │ NCols6  │ NCols7  │ NCols8  │ NCols9  │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 4.58    | 3.7     | 3.92    | 3.58    | 3.51    | 3.06    | 3.25    |
| 4   | 3.77    | 2.59    | 2.68    | 2.67    | 2.67    | 2.11    | 2.16    |
| 5   | 2.07    | 1.35    | 1.43    | 1.39    | 1.35    | 0.971   | 1.15    |
| 6   | 2.35    | 1.6     | 1.65    | 1.51    | 1.47    | 1.11    | 1.25    |
| 7   | 2.39    | 1.58    | 1.72    | 1.57    | 1.5     | 1.14    | 1.26    |
| 8   | 2.03    | 1.46    | 1.53    | 1.54    | 1.57    | 1.22    | 1.39    |
| 9   | 1.33    | 0.799   | 0.908   | 0.845   | 0.823   | 0.585   | 0.674   |
| 10  | 1.42    | 0.887   | 0.976   | 0.906   | 0.912   | 0.654   | 0.742   |
| 11  | 1.49    | 0.949   | 1.01    | 0.955   | 0.935   | 0.682   | 0.771   |
| 12  | 0.978   | 0.701   | 0.666   | 0.641   | 0.622   | 0.54    | 0.529   |
| 13  | 1.2     | 0.775   | 0.826   | 0.777   | 0.745   | 0.593   | 0.639   |
| 14  | 1.18    | 0.816   | 0.847   | 0.771   | 0.782   | 0.632   | 0.667   |
| 15  | 1.25    | 0.885   | 0.894   | 0.829   | 0.815   | 0.655   | 0.725   |
| 16  | 0.983   | 0.659   | 0.678   | 0.649   | 0.651   | 0.504   | 0.528   |
| 17  | 1.67    | 1.28    | 1.39    | 1.32    | 1.34    | 1.0     | 1.12    |
| 18  | 1.76    | 1.28    | 1.36    | 1.27    | 1.29    | 1.05    | 1.11    |
| 19  | 1.9     | 1.32    | 1.4     | 1.36    | 1.42    | 1.05    | 1.13    |
| 20  | 1.44    | 1.0     | 1.03    | 1.02    | 1.02    | 0.797   | 0.84    |
| 21  | 1.44    | 1.01    | 1.09    | 1.06    | 1.04    | 0.796   | 0.876   |
| 22  | 1.52    | 1.06    | 1.12    | 1.07    | 1.07    | 0.824   | 0.895   |
| 23  | 1.52    | 1.08    | 1.13    | 1.09    | 1.07    | 0.82    | 0.924   |
| 24  | 2.59    | 1.83    | 1.8     | 1.85    | 1.87    | 1.52    | 1.58    |
| 25  | 1.87    | 1.25    | 1.32    | 1.36    | 1.37    | 0.988   | 1.08    |
| 26  | 1.88    | 1.29    | 1.32    | 1.28    | 1.32    | 1.02    | 1.13    |
| 27  | 1.69    | 1.18    | 1.36    | 1.27    | 1.23    | 0.972   | 1.03    |
| 28  | 1.76    | 1.14    | 1.17    | 1.12    | 1.14    | 0.874   | 0.934   |
| 29  | 1.71    | 1.1     | 1.19    | 1.15    | 1.14    | 0.915   | 0.937   |
| 30  | 1.76    | 1.14    | 1.23    | 1.14    | 1.23    | 0.9     | 0.981   |
| 31  | 1.67    | 1.17    | 1.27    | 1.23    | 1.17    | 0.901   | 0.959   |
| 32  | 3.07    | 2.01    | 2.08    | 2.05    | 2.06    | 1.59    | 1.68    |

│ Row │ NCols10 │ NCols11 │ NCols12 │ NCols13 │ NCols14 │ NCols15 │ NCols16 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 3.14    | 3.1     | 2.75    | 3.09    | 3.03    | 2.95    | 2.68    |
| 4   | 2.23    | 2.33    | 1.77    | 1.88    | 1.87    | 2.13    | 1.84    |
| 5   | 1.1     | 1.14    | 0.843   | 1.0     | 0.986   | 1.04    | 0.798   |
| 6   | 1.18    | 1.23    | 0.966   | 1.08    | 1.05    | 1.11    | 0.869   |
| 7   | 1.25    | 1.23    | 0.997   | 1.13    | 1.14    | 1.15    | 0.917   |
| 8   | 1.43    | 1.5     | 1.24    | 1.3     | 1.32    | 1.39    | 1.18    |
| 9   | 0.663   | 0.685   | 0.503   | 0.588   | 0.591   | 0.6     | 0.459   |
| 10  | 0.705   | 0.725   | 0.568   | 0.646   | 0.612   | 0.654   | 0.516   |
| 11  | 0.771   | 0.764   | 0.585   | 0.704   | 0.699   | 0.697   | 0.537   |
| 12  | 0.529   | 0.527   | 0.468   | 0.48    | 0.485   | 0.487   | 0.442   |
| 13  | 0.669   | 0.633   | 0.529   | 0.575   | 0.597   | 0.571   | 0.49    |
| 14  | 0.638   | 0.644   | 0.55    | 0.602   | 0.563   | 0.608   | 0.542   |
| 15  | 0.686   | 0.688   | 0.58    | 0.639   | 0.607   | 0.616   | 0.562   |
| 16  | 0.529   | 0.543   | 0.445   | 0.474   | 0.456   | 0.495   | 0.419   |
| 17  | 1.16    | 1.14    | 0.907   | 0.983   | 0.988   | 1.04    | 0.836   |
| 18  | 1.05    | 1.11    | 0.895   | 1.0     | 1.02    | 1.03    | 0.86    |
| 19  | 1.18    | 1.13    | 0.936   | 1.05    | 1.01    | 1.04    | 0.919   |
| 20  | 0.864   | 0.876   | 0.722   | 0.76    | 0.782   | 0.803   | 0.685   |
| 21  | 0.868   | 0.903   | 0.719   | 0.769   | 0.786   | 0.802   | 0.7     |
| 22  | 0.902   | 0.925   | 0.748   | 0.822   | 0.822   | 0.862   | 0.687   |
| 23  | 0.887   | 0.899   | 0.77    | 0.789   | 0.798   | 0.83    | 0.738   |
| 24  | 1.61    | 1.61    | 1.41    | 1.46    | 1.49    | 1.53    | 1.4     |
| 25  | 1.1     | 1.1     | 0.956   | 1.04    | 1.01    | 1.03    | 0.919   |
| 26  | 1.08    | 1.15    | 0.987   | 1.04    | 1.04    | 1.11    | 0.921   |
| 27  | 1.04    | 1.05    | 0.935   | 0.95    | 0.96    | 1.01    | 0.881   |
| 28  | 0.922   | 0.971   | 0.821   | 0.876   | 0.883   | 0.966   | 0.807   |
| 29  | 0.983   | 0.976   | 0.832   | 0.869   | 0.904   | 0.919   | 0.798   |
| 30  | 0.975   | 1.04    | 0.848   | 0.936   | 0.893   | 0.971   | 0.801   |
| 31  | 0.955   | 0.978   | 0.837   | 0.859   | 0.875   | 0.91    | 0.713   |
| 32  | 1.7     | 1.77    | 1.45    | 1.55    | 1.57    | 1.64    | 1.41    |

│ Row │ NCols17 │ NCols18 │ NCols19 │ NCols20 │ NCols21 │ NCols22 │ NCols23 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 2.89    | 2.67    | 2.81    | 2.63    | 2.8     | 3.08    | 3.01    |
| 4   | 2.11    | 2.11    | 2.16    | 1.96    | 2.0     | 2.07    | 2.11    |
| 5   | 0.925   | 0.916   | 0.94    | 0.759   | 0.9     | 0.882   | 0.94    |
| 6   | 0.98    | 0.972   | 1.03    | 0.817   | 0.948   | 0.925   | 0.983   |
| 7   | 1.04    | 1.03    | 1.05    | 0.886   | 1.04    | 0.994   | 1.05    |
| 8   | 1.24    | 1.26    | 1.3     | 1.19    | 1.18    | 1.23    | 1.25    |
| 9   | 0.568   | 0.541   | 0.548   | 0.433   | 0.532   | 0.524   | 0.516   |
| 10  | 0.6     | 0.572   | 0.618   | 0.492   | 0.566   | 0.561   | 0.583   |
| 11  | 0.647   | 0.632   | 0.628   | 0.519   | 0.599   | 0.57    | 0.585   |
| 12  | 0.445   | 0.452   | 0.454   | 0.421   | 0.426   | 0.434   | 0.441   |
| 13  | 0.521   | 0.524   | 0.535   | 0.455   | 0.481   | 0.486   | 0.494   |
| 14  | 0.563   | 0.552   | 0.56    | 0.493   | 0.534   | 0.528   | 0.544   |
| 15  | 0.588   | 0.562   | 0.568   | 0.519   | 0.535   | 0.54    | 0.567   |
| 16  | 0.438   | 0.448   | 0.456   | 0.408   | 0.42    | 0.427   | 0.438   |
| 17  | 0.911   | 0.919   | 0.977   | 0.862   | 0.887   | 0.898   | 0.934   |
| 18  | 0.944   | 1.0     | 1.06    | 0.827   | 0.969   | 0.969   | 1.02    |
| 19  | 0.944   | 0.971   | 0.982   | 0.878   | 0.919   | 0.912   | 0.989   |
| 20  | 0.721   | 0.737   | 0.777   | 0.665   | 0.703   | 0.722   | 0.761   |
| 21  | 0.735   | 0.739   | 0.763   | 0.66    | 0.714   | 0.732   | 0.732   |
| 22  | 0.775   | 0.79    | 0.822   | 0.7     | 0.761   | 0.725   | 0.759   |
| 23  | 0.751   | 0.769   | 0.776   | 0.7     | 0.725   | 0.712   | 0.73    |
| 24  | 1.45    | 1.5     | 1.55    | 1.4     | 1.45    | 1.48    | 1.52    |
| 25  | 0.961   | 0.987   | 1.01    | 0.856   | 0.883   | 0.923   | 0.93    |
| 26  | 1.05    | 0.995   | 0.982   | 0.859   | 0.952   | 0.921   | 0.979   |
| 27  | 0.93    | 0.935   | 0.909   | 0.832   | 0.905   | 0.891   | 0.942   |
| 28  | 0.867   | 0.881   | 0.893   | 0.76    | 0.832   | 0.872   | 0.913   |
| 29  | 0.84    | 0.853   | 0.901   | 0.79    | 0.847   | 0.868   | 0.919   |
| 30  | 0.915   | 0.866   | 0.948   | 0.799   | 0.923   | 0.887   | 0.944   |
| 31  | 0.802   | 0.798   | 0.828   | 0.764   | 0.796   | 0.813   | 0.84    |
| 32  | 1.52    | 1.57    | 1.67    | 1.37    | 1.51    | 1.57    | 1.59    |

│ Row │ NCols24 │ NCols25 │ NCols26 │ NCols27 │ NCols28 │ NCols29 │ NCols30 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 2.74    | 3.0     | 2.89    | 2.97    | 2.74    | 2.98    | 2.94    |
| 4   | 1.87    | 1.96    | 1.98    | 2.11    | 1.87    | 1.9     | 2.01    |
| 5   | 0.732   | 0.86    | 0.822   | 0.866   | 0.709   | 0.802   | 0.841   |
| 6   | 0.779   | 0.924   | 0.889   | 0.94    | 0.779   | 0.879   | 0.883   |
| 7   | 0.862   | 0.988   | 0.979   | 0.986   | 0.856   | 0.914   | 0.921   |
| 8   | 1.12    | 1.17    | 1.18    | 1.21    | 1.11    | 1.14    | 1.17    |
| 9   | 0.424   | 0.474   | 0.475   | 0.485   | 0.415   | 0.455   | 0.461   |
| 10  | 0.472   | 0.545   | 0.522   | 0.548   | 0.466   | 0.535   | 0.526   |
| 11  | 0.499   | 0.55    | 0.551   | 0.561   | 0.524   | 0.536   | 0.54    |
| 12  | 0.41    | 0.413   | 0.415   | 0.422   | 0.402   | 0.405   | 0.415   |
| 13  | 0.456   | 0.478   | 0.465   | 0.476   | 0.46    | 0.456   | 0.463   |
| 14  | 0.495   | 0.521   | 0.513   | 0.543   | 0.496   | 0.526   | 0.5     |
| 15  | 0.505   | 0.521   | 0.528   | 0.535   | 0.507   | 0.518   | 0.534   |
| 16  | 0.397   | 0.421   | 0.426   | 0.444   | 0.399   | 0.421   | 0.432   |
| 17  | 0.844   | 0.88    | 0.903   | 0.942   | 0.84    | 0.806   | 0.828   |
| 18  | 0.832   | 0.959   | 0.865   | 0.983   | 0.812   | 0.889   | 0.865   |
| 19  | 0.874   | 0.899   | 0.852   | 0.882   | 0.808   | 0.857   | 0.858   |
| 20  | 0.67    | 0.697   | 0.719   | 0.738   | 0.676   | 0.717   | 0.714   |
| 21  | 0.632   | 0.655   | 0.696   | 0.699   | 0.644   | 0.67    | 0.686   |
| 22  | 0.65    | 0.722   | 0.702   | 0.757   | 0.656   | 0.721   | 0.706   |
| 23  | 0.666   | 0.709   | 0.695   | 0.724   | 0.67    | 0.717   | 0.713   |
| 24  | 1.41    | 1.45    | 1.46    | 1.48    | 1.39    | 1.44    | 1.49    |
| 25  | 0.859   | 0.91    | 0.94    | 0.946   | 0.903   | 0.93    | 0.958   |
| 26  | 0.87    | 0.958   | 0.919   | 0.984   | 0.901   | 0.987   | 0.974   |
| 27  | 0.857   | 0.885   | 0.923   | 0.92    | 0.875   | 0.937   | 0.919   |
| 28  | 0.792   | 0.831   | 0.839   | 0.9     | 0.788   | 0.867   | 0.842   |
| 29  | 0.804   | 0.853   | 0.876   | 0.903   | 0.831   | 0.865   | 0.533   |
| 30  | 0.816   | 0.898   | 0.885   | 0.941   | 0.863   | 0.622   | 0.606   |
| 31  | 0.773   | 0.832   | 0.843   | 0.852   | 0.589   | 0.605   | 0.61    |
| 32  | 1.45    | 1.48    | 1.51    | 0.539   | 0.504   | 0.521   | 0.534   |

│ Row │ NCols31 │ NCols32 │
│     │ Float64Float64 │
├─────┼─────────┼─────────┤
| 3   | 2.91    | 2.61    |
| 4   | 2.01    | 1.8     |
| 5   | 0.834   | 0.697   |
| 6   | 0.938   | 0.761   |
| 7   | 0.938   | 0.815   |
| 8   | 1.2     | 1.11    |
| 9   | 0.476   | 0.419   |
| 10  | 0.553   | 0.475   |
| 11  | 0.549   | 0.485   |
| 12  | 0.421   | 0.397   |
| 13  | 0.466   | 0.436   |
| 14  | 0.533   | 0.484   |
| 15  | 0.526   | 0.495   |
| 16  | 0.429   | 0.379   |
| 17  | 0.865   | 0.791   |
| 18  | 0.89    | 0.823   |
| 19  | 0.879   | 0.826   |
| 20  | 0.736   | 0.649   |
| 21  | 0.711   | 0.654   |
| 22  | 0.76    | 0.669   |
| 23  | 0.733   | 0.677   |
| 24  | 1.5     | 1.39    |
| 25  | 0.967   | 0.915   |
| 26  | 1.03    | 0.94    |
| 27  | 0.929   | 0.551   |
| 28  | 0.568   | 0.538   |
| 29  | 0.584   | 0.578   |
| 30  | 0.617   | 0.575   |
| 31  | 0.614   | 0.59    |
| 32  | 0.552   | 0.507   |

gfortran did better for small matrices, and Eigen for large matrices. If you’re dealing with larger matrices in Fortran, you’d probably just link a BLAS library. The compiler will then turn matmul calls into the appropriate calls from the linked library.

Blaze also did worse than gfortran when $M$, the number rows in matrices $\textbf{A}$ and $\textbf{C}$, was 4 or less. With larger $M$s, Blaze performed much better, especially when $\textbf{A}$ and $\textbf{C}$ had 16 rows:

In [32]:
print_results("blaze", "gfortran tc", minimum_times, padded = false)
30×30 DataFrame
│ Row │ NCols3  │ NCols4  │ NCols5  │ NCols6  │ NCols7  │ NCols8  │ NCols9  │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 1.65    | 1.41    | 1.67    | 1.57    | 1.75    | 1.79    | 1.85    |
| 4   | 1.81    | 1.73    | 1.95    | 1.86    | 1.98    | 2.05    | 2.05    |
| 5   | 1.07    | 0.855   | 0.993   | 0.867   | 0.967   | 0.873   | 0.944   |
| 6   | 1.21    | 0.938   | 1.11    | 0.981   | 1.18    | 0.979   | 1.06    |
| 7   | 1.27    | 0.998   | 1.18    | 1.02    | 1.13    | 1.15    | 1.1     |
| 8   | 0.278   | 0.302   | 0.369   | 0.352   | 0.325   | 0.336   | 0.432   |
| 9   | 0.291   | 0.252   | 0.304   | 0.268   | 0.258   | 0.25    | 0.271   |
| 10  | 0.387   | 0.333   | 0.382   | 0.338   | 0.353   | 0.327   | 0.353   |
| 11  | 0.465   | 0.393   | 0.45    | 0.402   | 0.419   | 0.383   | 0.423   |
| 12  | 0.534   | 0.451   | 0.503   | 0.452   | 0.479   | 0.439   | 0.468   |
| 13  | 0.598   | 0.493   | 0.58    | 0.506   | 0.544   | 0.492   | 0.534   |
| 14  | 0.659   | 0.527   | 0.621   | 0.539   | 0.581   | 0.528   | 0.572   |
| 15  | 0.709   | 0.57    | 0.663   | 0.582   | 0.623   | 0.563   | 0.615   |
| 16  | 0.157   | 0.128   | 0.153   | 0.128   | 0.123   | 0.119   | 0.124   |
| 17  | 0.351   | 0.325   | 0.387   | 0.342   | 0.339   | 0.307   | 0.338   |
| 18  | 0.448   | 0.364   | 0.424   | 0.38    | 0.406   | 0.362   | 0.399   |
| 19  | 0.515   | 0.44    | 0.498   | 0.443   | 0.471   | 0.429   | 0.485   |
| 20  | 0.561   | 0.476   | 0.55    | 0.537   | 0.533   | 0.481   | 0.525   |
| 21  | 0.572   | 0.488   | 0.571   | 0.507   | 0.543   | 0.495   | 0.543   |
| 22  | 0.638   | 0.527   | 0.618   | 0.552   | 0.593   | 0.535   | 0.585   |
| 23  | 0.665   | 0.604   | 0.645   | 0.571   | 0.615   | 0.559   | 0.609   |
| 24  | 0.578   | 0.507   | 0.547   | 0.5     | 0.541   | 0.509   | 0.536   |
| 25  | 0.492   | 0.386   | 0.455   | 0.386   | 0.442   | 0.398   | 0.435   |
| 26  | 0.548   | 0.433   | 0.494   | 0.437   | 0.492   | 0.447   | 0.487   |
| 27  | 0.552   | 0.437   | 0.515   | 0.452   | 0.513   | 0.464   | 0.509   |
| 28  | 0.643   | 0.516   | 0.585   | 0.493   | 0.544   | 0.498   | 0.543   |
| 29  | 0.638   | 0.507   | 0.587   | 0.513   | 0.57    | 0.513   | 0.556   |
| 30  | 0.683   | 0.592   | 0.625   | 0.543   | 0.658   | 0.545   | 0.588   |
| 31  | 0.725   | 0.572   | 0.666   | 0.573   | 0.633   | 0.577   | 0.62    |
| 32  | 0.562   | 0.469   | 0.515   | 0.453   | 0.494   | 0.455   | 0.488   |

│ Row │ NCols10 │ NCols11 │ NCols12 │ NCols13 │ NCols14 │ NCols15 │ NCols16 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 1.79    | 1.9     | 1.9     | 1.93    | 1.9     | 2.0     | 1.99    |
| 4   | 2.08    | 2.37    | 2.0     | 2.06    | 1.98    | 2.25    | 2.21    |
| 5   | 0.87    | 0.931   | 0.87    | 0.94    | 0.878   | 0.928   | 0.882   |
| 6   | 0.976   | 1.06    | 0.968   | 1.03    | 0.998   | 1.05    | 0.967   |
| 7   | 1.04    | 1.08    | 1.04    | 1.23    | 1.04    | 1.1     | 1.04    |
| 8   | 0.389   | 0.379   | 0.384   | 0.407   | 0.397   | 0.391   | 0.394   |
| 9   | 0.261   | 0.251   | 0.239   | 0.256   | 0.255   | 0.244   | 0.233   |
| 10  | 0.334   | 0.34    | 0.32    | 0.339   | 0.324   | 0.329   | 0.315   |
| 11  | 0.389   | 0.432   | 0.409   | 0.406   | 0.382   | 0.39    | 0.373   |
| 12  | 0.443   | 0.46    | 0.431   | 0.454   | 0.434   | 0.447   | 0.427   |
| 13  | 0.5     | 0.566   | 0.488   | 0.515   | 0.489   | 0.503   | 0.479   |
| 14  | 0.531   | 0.55    | 0.523   | 0.552   | 0.527   | 0.545   | 0.52    |
| 15  | 0.568   | 0.593   | 0.559   | 0.592   | 0.56    | 0.58    | 0.555   |
| 16  | 0.122   | 0.118   | 0.115   | 0.117   | 0.115   | 0.116   | 0.114   |
| 17  | 0.316   | 0.321   | 0.307   | 0.334   | 0.318   | 0.319   | 0.312   |
| 18  | 0.372   | 0.38    | 0.358   | 0.391   | 0.372   | 0.377   | 0.365   |
| 19  | 0.441   | 0.459   | 0.433   | 0.466   | 0.444   | 0.454   | 0.439   |
| 20  | 0.495   | 0.519   | 0.493   | 0.521   | 0.502   | 0.516   | 0.499   |
| 21  | 0.507   | 0.529   | 0.498   | 0.531   | 0.507   | 0.521   | 0.5     |
| 22  | 0.546   | 0.573   | 0.533   | 0.572   | 0.545   | 0.565   | 0.54    |
| 23  | 0.569   | 0.596   | 0.558   | 0.598   | 0.57    | 0.591   | 0.567   |
| 24  | 0.506   | 0.529   | 0.503   | 0.529   | 0.5     | 0.525   | 0.512   |
| 25  | 0.399   | 0.428   | 0.4     | 0.432   | 0.402   | 0.427   | 0.415   |
| 26  | 0.451   | 0.482   | 0.448   | 0.48    | 0.452   | 0.481   | 0.455   |
| 27  | 0.473   | 0.504   | 0.47    | 0.502   | 0.471   | 0.501   | 0.486   |
| 28  | 0.55    | 0.538   | 0.507   | 0.539   | 0.509   | 0.559   | 0.514   |
| 29  | 0.524   | 0.555   | 0.52    | 0.554   | 0.522   | 0.553   | 0.526   |
| 30  | 0.597   | 0.583   | 0.543   | 0.581   | 0.547   | 0.58    | 0.549   |
| 31  | 0.567   | 0.605   | 0.564   | 0.596   | 0.563   | 0.594   | 0.532   |
| 32  | 0.447   | 0.481   | 0.471   | 0.495   | 0.446   | 0.468   | 0.446   |

│ Row │ NCols17 │ NCols18 │ NCols19 │ NCols20 │ NCols21 │ NCols22 │ NCols23 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 2.04    | 1.85    | 2.04    | 2.11    | 2.0     | 2.21    | 2.24    |
| 4   | 2.5     | 2.42    | 2.53    | 2.45    | 2.51    | 2.44    | 2.52    |
| 5   | 0.922   | 0.884   | 0.921   | 0.883   | 0.916   | 0.883   | 0.911   |
| 6   | 1.01    | 1.14    | 1.04    | 0.963   | 0.999   | 0.995   | 1.03    |
| 7   | 1.09    | 1.04    | 1.09    | 1.04    | 1.08    | 1.04    | 1.08    |
| 8   | 0.408   | 0.392   | 0.386   | 0.387   | 0.405   | 0.405   | 0.394   |
| 9   | 0.245   | 0.236   | 0.238   | 0.231   | 0.24    | 0.234   | 0.235   |
| 10  | 0.333   | 0.322   | 0.326   | 0.316   | 0.327   | 0.317   | 0.321   |
| 11  | 0.392   | 0.378   | 0.387   | 0.373   | 0.388   | 0.374   | 0.38    |
| 12  | 0.444   | 0.428   | 0.439   | 0.425   | 0.439   | 0.427   | 0.434   |
| 13  | 0.5     | 0.479   | 0.493   | 0.47    | 0.49    | 0.473   | 0.484   |
| 14  | 0.544   | 0.577   | 0.533   | 0.517   | 0.535   | 0.517   | 0.53    |
| 15  | 0.582   | 0.553   | 0.569   | 0.611   | 0.574   | 0.553   | 0.567   |
| 16  | 0.118   | 0.115   | 0.116   | 0.114   | 0.117   | 0.116   | 0.121   |
| 17  | 0.329   | 0.317   | 0.321   | 0.317   | 0.33    | 0.32    | 0.353   |
| 18  | 0.389   | 0.432   | 0.402   | 0.371   | 0.403   | 0.395   | 0.409   |
| 19  | 0.465   | 0.45    | 0.459   | 0.447   | 0.462   | 0.446   | 0.463   |
| 20  | 0.52    | 0.502   | 0.517   | 0.498   | 0.519   | 0.503   | 0.515   |
| 21  | 0.529   | 0.513   | 0.524   | 0.504   | 0.524   | 0.511   | 0.521   |
| 22  | 0.568   | 0.55    | 0.564   | 0.606   | 0.563   | 0.55    | 0.543   |
| 23  | 0.594   | 0.574   | 0.592   | 0.573   | 0.594   | 0.557   | 0.57    |
| 24  | 0.555   | 0.518   | 0.531   | 0.514   | 0.542   | 0.524   | 0.532   |
| 25  | 0.44    | 0.408   | 0.43    | 0.386   | 0.395   | 0.385   | 0.399   |
| 26  | 0.479   | 0.456   | 0.447   | 0.425   | 0.503   | 0.429   | 0.448   |
| 27  | 0.504   | 0.479   | 0.471   | 0.453   | 0.473   | 0.458   | 0.536   |
| 28  | 0.538   | 0.513   | 0.533   | 0.492   | 0.512   | 0.514   | 0.537   |
| 29  | 0.56    | 0.528   | 0.565   | 0.532   | 0.556   | 0.534   | 0.557   |
| 30  | 0.581   | 0.553   | 0.579   | 0.618   | 0.582   | 0.56    | 0.582   |
| 31  | 0.559   | 0.537   | 0.565   | 0.543   | 0.566   | 0.547   | 0.569   |
| 32  | 0.462   | 0.445   | 0.462   | 0.444   | 0.459   | 0.444   | 0.456   |

│ Row │ NCols24 │ NCols25 │ NCols26 │ NCols27 │ NCols28 │ NCols29 │ NCols30 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 2.2     | 2.24    | 2.17    | 2.24    | 2.21    | 2.23    | 2.22    |
| 4   | 2.46    | 2.49    | 2.51    | 2.72    | 2.49    | 2.41    | 2.45    |
| 5   | 0.885   | 0.909   | 0.882   | 0.907   | 0.882   | 0.913   | 0.882   |
| 6   | 0.964   | 1.15    | 0.992   | 1.02    | 0.964   | 0.986   | 0.995   |
| 7   | 1.04    | 1.08    | 1.04    | 1.21    | 1.04    | 1.07    | 1.05    |
| 8   | 0.391   | 0.408   | 0.398   | 0.392   | 0.4     | 0.408   | 0.404   |
| 9   | 0.23    | 0.237   | 0.232   | 0.234   | 0.229   | 0.236   | 0.23    |
| 10  | 0.312   | 0.323   | 0.314   | 0.317   | 0.312   | 0.321   | 0.314   |
| 11  | 0.366   | 0.382   | 0.372   | 0.376   | 0.369   | 0.38    | 0.371   |
| 12  | 0.422   | 0.435   | 0.423   | 0.432   | 0.425   | 0.433   | 0.424   |
| 13  | 0.522   | 0.488   | 0.474   | 0.483   | 0.47    | 0.486   | 0.474   |
| 14  | 0.514   | 0.533   | 0.519   | 0.529   | 0.576   | 0.531   | 0.518   |
| 15  | 0.549   | 0.569   | 0.554   | 0.565   | 0.552   | 0.566   | 0.554   |
| 16  | 0.113   | 0.115   | 0.114   | 0.116   | 0.119   | 0.116   | 0.114   |
| 17  | 0.345   | 0.333   | 0.321   | 0.332   | 0.321   | 0.315   | 0.312   |
| 18  | 0.37    | 0.402   | 0.41    | 0.402   | 0.372   | 0.378   | 0.371   |
| 19  | 0.441   | 0.463   | 0.424   | 0.434   | 0.42    | 0.436   | 0.428   |
| 20  | 0.501   | 0.512   | 0.501   | 0.511   | 0.497   | 0.51    | 0.499   |
| 21  | 0.488   | 0.501   | 0.492   | 0.501   | 0.491   | 0.506   | 0.496   |
| 22  | 0.523   | 0.537   | 0.526   | 0.537   | 0.588   | 0.541   | 0.536   |
| 23  | 0.551   | 0.569   | 0.554   | 0.567   | 0.552   | 0.57    | 0.559   |
| 24  | 0.518   | 0.531   | 0.514   | 0.522   | 0.513   | 0.522   | 0.512   |
| 25  | 0.388   | 0.412   | 0.391   | 0.405   | 0.392   | 0.403   | 0.407   |
| 26  | 0.436   | 0.449   | 0.433   | 0.449   | 0.477   | 0.466   | 0.481   |
| 27  | 0.461   | 0.488   | 0.511   | 0.48    | 0.466   | 0.496   | 0.535   |
| 28  | 0.498   | 0.566   | 0.501   | 0.54    | 0.505   | 0.595   | 0.506   |
| 29  | 0.538   | 0.559   | 0.613   | 0.56    | 0.543   | 0.558   | 0.329   |
| 30  | 0.562   | 0.584   | 0.578   | 0.647   | 0.63    | 0.401   | 0.391   |
| 31  | 0.611   | 0.573   | 0.556   | 0.575   | 0.418   | 0.431   | 0.416   |
| 32  | 0.442   | 0.49    | 0.44    | 0.155   | 0.169   | 0.159   | 0.158   |

│ Row │ NCols31 │ NCols32 │
│     │ Float64Float64 │
├─────┼─────────┼─────────┤
| 3   | 2.32    | 2.18    |
| 4   | 2.49    | 2.42    |
| 5   | 0.906   | 0.881   |
| 6   | 1.02    | 0.963   |
| 7   | 1.2     | 1.05    |
| 8   | 0.4     | 0.399   |
| 9   | 0.231   | 0.227   |
| 10  | 0.317   | 0.31    |
| 11  | 0.377   | 0.367   |
| 12  | 0.431   | 0.424   |
| 13  | 0.483   | 0.472   |
| 14  | 0.528   | 0.516   |
| 15  | 0.564   | 0.555   |
| 16  | 0.115   | 0.111   |
| 17  | 0.309   | 0.301   |
| 18  | 0.371   | 0.371   |
| 19  | 0.433   | 0.424   |
| 20  | 0.508   | 0.497   |
| 21  | 0.504   | 0.554   |
| 22  | 0.541   | 0.527   |
| 23  | 0.569   | 0.556   |
| 24  | 0.519   | 0.51    |
| 25  | 0.407   | 0.392   |
| 26  | 0.452   | 0.441   |
| 27  | 0.483   | 0.299   |
| 28  | 0.343   | 0.342   |
| 29  | 0.364   | 0.379   |
| 30  | 0.388   | 0.395   |
| 31  | 0.463   | 0.425   |
| 32  | 0.161   | 0.162   |

We’ll note later that disabling tree-cunrolli improves gfortran’s code a lot for the case of 16 rows — but not by the nearly 9x advantage Blaze has here.

Now, let’s take a look at Blaze. As we already saw by it doing much better relative to gfortran, it also did better than Eigen relative to PaddedMatrices without padding. Despite padding being the namesame of PaddedMatrices, it is not dependent on padding to generate optimal code — while Blaze apparently is.

In [21]:
print_results("blaze", "native julia", minimum_times, padded = false)
30×30 DataFrame
│ Row │ NCols3  │ NCols4  │ NCols5  │ NCols6  │ NCols7  │ NCols8  │ NCols9  │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 1.83    | 1.85    | 2.4     | 2.58    | 2.8     | 2.84    | 2.81    |
| 4   | 2.7     | 2.74    | 3.41    | 3.45    | 3.46    | 3.79    | 4.15    |
| 5   | 2.88    | 2.63    | 3.59    | 3.64    | 3.94    | 3.8     | 4.15    |
| 6   | 3.56    | 3.41    | 4.2     | 4.42    | 5.43    | 4.21    | 5.15    |
| 7   | 4.1     | 3.55    | 4.92    | 4.92    | 5.61    | 6.4     | 5.89    |
| 8   | 0.787   | 0.813   | 1.13    | 1.22    | 1.17    | 1.18    | 1.36    |
| 9   | 1.14    | 1.18    | 1.68    | 1.53    | 1.49    | 1.26    | 1.54    |
| 10  | 1.61    | 1.74    | 2.19    | 1.96    | 2.03    | 1.86    | 1.96    |
| 11  | 2.11    | 2.19    | 2.77    | 2.47    | 2.56    | 2.29    | 2.54    |
| 12  | 2.72    | 2.8     | 3.4     | 3.07    | 3.26    | 3.0     | 3.24    |
| 13  | 3.17    | 3.19    | 4.0     | 3.51    | 3.67    | 3.3     | 3.66    |
| 14  | 3.86    | 3.68    | 4.53    | 4.03    | 4.29    | 3.49    | 4.25    |
| 15  | 4.33    | 3.78    | 5.18    | 4.62    | 4.82    | 3.89    | 4.8     |
| 16  | 1.29    | 1.21    | 1.5     | 1.26    | 1.24    | 1.17    | 1.25    |
| 17  | 1.35    | 1.21    | 1.43    | 1.26    | 1.24    | 1.16    | 1.28    |
| 18  | 1.71    | 1.47    | 1.69    | 1.52    | 1.65    | 1.49    | 1.64    |
| 19  | 2.09    | 1.87    | 2.19    | 1.85    | 2.0     | 1.8     | 1.99    |
| 20  | 2.55    | 2.17    | 2.52    | 2.39    | 2.39    | 2.21    | 2.38    |
| 21  | 2.81    | 2.48    | 2.84    | 2.53    | 2.74    | 2.51    | 2.57    |
| 22  | 3.12    | 2.59    | 3.16    | 2.89    | 3.11    | 2.94    | 3.19    |
| 23  | 3.58    | 3.41    | 3.65    | 3.21    | 3.56    | 3.17    | 3.44    |
| 24  | 1.47    | 1.38    | 1.43    | 1.32    | 1.43    | 1.34    | 1.41    |
| 25  | 1.66    | 1.27    | 1.48    | 1.25    | 1.25    | 1.24    | 1.42    |
| 26  | 1.92    | 1.51    | 1.74    | 1.5     | 1.5     | 1.48    | 1.64    |
| 27  | 2.2     | 1.76    | 2.04    | 1.82    | 1.74    | 1.72    | 1.94    |
| 28  | 2.56    | 2.06    | 2.3     | 2.1     | 2.03    | 2.01    | 2.19    |
| 29  | 2.88    | 2.25    | 2.67    | 2.34    | 2.23    | 2.21    | 2.52    |
| 30  | 3.22    | 2.74    | 2.9     | 2.63    | 2.73    | 2.47    | 2.79    |
| 31  | 3.55    | 2.66    | 3.26    | 2.91    | 2.71    | 2.75    | 3.09    |
| 32  | 1.41    | 1.15    | 1.3     | 1.15    | 1.07    | 1.09    | 1.2     |

│ Row │ NCols10 │ NCols11 │ NCols12 │ NCols13 │ NCols14 │ NCols15 │ NCols16 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 2.42    | 2.87    | 2.86    | 2.88    | 2.85    | 2.88    | 2.84    |
| 4   | 4.0     | 4.52    | 4.08    | 4.17    | 4.11    | 4.04    | 4.11    |
| 5   | 4.21    | 4.31    | 4.15    | 4.41    | 4.28    | 4.44    | 4.32    |
| 6   | 4.53    | 5.3     | 5.12    | 5.43    | 5.2     | 5.31    | 5.19    |
| 7   | 5.83    | 6.07    | 5.92    | 6.8     | 5.91    | 6.12    | 6.04    |
| 8   | 1.26    | 1.23    | 1.23    | 1.3     | 1.28    | 1.22    | 1.26    |
| 9   | 1.48    | 1.44    | 1.37    | 1.47    | 1.45    | 1.2     | 1.22    |
| 10  | 1.94    | 1.98    | 1.84    | 1.95    | 1.87    | 1.67    | 1.69    |
| 11  | 2.4     | 2.69    | 2.53    | 2.57    | 2.33    | 2.16    | 2.18    |
| 12  | 3.01    | 3.03    | 2.95    | 3.1     | 3.0     | 2.67    | 2.7     |
| 13  | 3.5     | 3.77    | 3.35    | 3.61    | 3.38    | 3.13    | 3.16    |
| 14  | 4.03    | 4.19    | 3.86    | 4.18    | 3.91    | 3.63    | 3.67    |
| 15  | 4.54    | 4.71    | 4.3     | 4.66    | 4.49    | 4.1     | 4.15    |
| 16  | 1.23    | 1.18    | 1.16    | 1.15    | 1.14    | 1.02    | 1.07    |
| 17  | 1.04    | 1.13    | 1.15    | 1.25    | 1.17    | 1.16    | 1.15    |
| 18  | 1.32    | 1.47    | 1.45    | 1.58    | 1.49    | 1.5     | 1.45    |
| 19  | 1.62    | 1.82    | 1.8     | 1.95    | 1.84    | 1.87    | 1.83    |
| 20  | 1.98    | 2.23    | 2.2     | 2.36    | 2.24    | 2.25    | 2.22    |
| 21  | 2.22    | 2.52    | 2.5     | 2.64    | 2.55    | 2.58    | 2.53    |
| 22  | 2.58    | 2.93    | 2.84    | 3.06    | 2.92    | 3.0     | 2.88    |
| 23  | 2.82    | 3.22    | 3.18    | 3.38    | 3.23    | 3.41    | 3.26    |
| 24  | 1.2     | 1.32    | 1.27    | 1.32    | 1.27    | 1.32    | 1.27    |
| 25  | 1.31    | 1.39    | 1.23    | 1.24    | 1.14    | 1.29    | 1.24    |
| 26  | 1.57    | 1.64    | 1.45    | 1.45    | 1.41    | 1.54    | 1.43    |
| 27  | 1.82    | 1.92    | 1.72    | 1.71    | 1.58    | 1.79    | 1.75    |
| 28  | 2.27    | 2.18    | 1.96    | 1.95    | 1.92    | 2.07    | 1.98    |
| 29  | 2.36    | 2.49    | 2.23    | 2.17    | 2.15    | 2.33    | 2.22    |
| 30  | 2.87    | 2.78    | 2.46    | 2.46    | 2.4     | 2.58    | 2.47    |
| 31  | 2.87    | 3.05    | 2.72    | 2.68    | 2.65    | 2.85    | 2.67    |
| 32  | 1.12    | 1.21    | 1.14    | 1.12    | 1.06    | 1.12    | 1.08    |

│ Row │ NCols17 │ NCols18 │ NCols19 │ NCols20 │ NCols21 │ NCols22 │ NCols23 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 2.85    | 2.83    | 2.81    | 2.93    | 2.85    | 2.84    | 2.85    |
| 4   | 4.26    | 4.14    | 4.28    | 4.14    | 4.28    | 4.15    | 4.03    |
| 5   | 4.49    | 4.16    | 4.35    | 4.28    | 4.39    | 4.31    | 4.44    |
| 6   | 5.29    | 5.86    | 5.32    | 5.13    | 5.18    | 5.08    | 5.23    |
| 7   | 6.11    | 5.9     | 6.11    | 5.95    | 6.11    | 5.92    | 6.05    |
| 8   | 1.27    | 1.24    | 1.15    | 1.19    | 1.26    | 1.26    | 1.24    |
| 9   | 1.35    | 1.34    | 1.35    | 1.33    | 1.37    | 1.36    | 1.36    |
| 10  | 1.87    | 1.84    | 1.83    | 1.85    | 1.92    | 1.86    | 1.89    |
| 11  | 2.41    | 2.37    | 2.45    | 2.35    | 2.49    | 2.2     | 2.43    |
| 12  | 2.93    | 2.91    | 2.95    | 2.83    | 3.01    | 2.96    | 3.04    |
| 13  | 3.48    | 3.39    | 3.55    | 3.39    | 3.52    | 3.47    | 3.53    |
| 14  | 4.03    | 4.33    | 4.13    | 3.96    | 4.08    | 4.0     | 3.8     |
| 15  | 4.55    | 4.45    | 4.6     | 4.98    | 4.74    | 4.52    | 4.67    |
| 16  | 1.1     | 1.09    | 1.11    | 1.1     | 1.15    | 1.14    | 1.18    |
| 17  | 1.22    | 1.1     | 1.04    | 1.06    | 1.13    | 1.1     | 1.22    |
| 18  | 1.58    | 1.55    | 1.35    | 1.35    | 1.44    | 1.42    | 1.47    |
| 19  | 1.94    | 1.77    | 1.67    | 1.69    | 1.79    | 1.76    | 1.79    |
| 20  | 2.32    | 2.1     | 2.03    | 2.03    | 2.15    | 2.11    | 2.16    |
| 21  | 2.68    | 2.45    | 2.32    | 2.31    | 2.47    | 2.42    | 2.47    |
| 22  | 3.1     | 2.8     | 2.68    | 2.99    | 2.85    | 2.8     | 2.81    |
| 23  | 3.41    | 3.09    | 2.98    | 3.0     | 3.18    | 3.06    | 3.15    |
| 24  | 1.38    | 1.22    | 1.2     | 1.21    | 1.28    | 1.25    | 1.27    |
| 25  | 1.32    | 1.25    | 1.27    | 1.25    | 1.27    | 1.25    | 1.31    |
| 26  | 1.53    | 1.49    | 1.48    | 1.45    | 1.74    | 1.49    | 1.51    |
| 27  | 1.72    | 1.75    | 1.72    | 1.65    | 1.75    | 1.71    | 2.01    |
| 28  | 2.06    | 2.01    | 2.0     | 1.96    | 2.07    | 1.97    | 2.05    |
| 29  | 2.33    | 2.23    | 2.26    | 2.2     | 2.35    | 2.25    | 2.34    |
| 30  | 2.61    | 2.55    | 2.52    | 2.7     | 2.5     | 2.51    | 2.6     |
| 31  | 2.83    | 2.77    | 2.78    | 2.73    | 2.9     | 2.8     | 2.9     |
| 32  | 1.1     | 1.07    | 1.09    | 1.07    | 1.11    | 1.1     | 1.12    |

│ Row │ NCols24 │ NCols25 │ NCols26 │ NCols27 │ NCols28 │ NCols29 │ NCols30 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 2.82    | 2.85    | 2.79    | 2.74    | 2.82    | 2.84    | 2.74    |
| 4   | 4.11    | 4.15    | 4.17    | 4.39    | 4.11    | 4.08    | 4.1     |
| 5   | 4.21    | 4.33    | 4.24    | 4.34    | 4.28    | 4.25    | 4.25    |
| 6   | 5.15    | 5.99    | 5.09    | 5.27    | 5.02    | 5.32    | 5.17    |
| 7   | 5.95    | 6.08    | 5.06    | 5.86    | 5.84    | 6.04    | 6.04    |
| 8   | 1.21    | 1.23    | 1.22    | 1.21    | 1.24    | 1.28    | 1.25    |
| 9   | 1.33    | 1.38    | 1.33    | 1.35    | 1.24    | 1.19    | 1.18    |
| 10  | 1.8     | 1.9     | 1.85    | 1.88    | 1.72    | 1.56    | 1.64    |
| 11  | 2.3     | 2.44    | 2.41    | 2.42    | 2.21    | 2.08    | 2.06    |
| 12  | 2.95    | 3.05    | 2.97    | 3.02    | 2.76    | 2.64    | 2.62    |
| 13  | 3.73    | 3.5     | 3.47    | 3.53    | 3.16    | 3.02    | 2.98    |
| 14  | 3.91    | 4.1     | 4.03    | 4.1     | 4.15    | 3.53    | 3.49    |
| 15  | 4.51    | 4.64    | 4.55    | 4.65    | 4.19    | 3.97    | 3.89    |
| 16  | 1.1     | 1.15    | 1.14    | 1.16    | 1.16    | 1.04    | 1.04    |
| 17  | 1.19    | 1.14    | 1.1     | 1.17    | 1.07    | 1.15    | 1.15    |
| 18  | 1.39    | 1.44    | 1.54    | 1.45    | 1.36    | 1.41    | 1.39    |
| 19  | 1.72    | 1.72    | 1.73    | 1.8     | 1.66    | 1.78    | 1.77    |
| 20  | 2.1     | 2.17    | 2.11    | 2.21    | 2.04    | 2.15    | 2.1     |
| 21  | 2.39    | 2.45    | 2.41    | 2.53    | 2.35    | 2.51    | 2.48    |
| 22  | 2.72    | 2.81    | 2.65    | 2.91    | 3.03    | 2.87    | 2.82    |
| 23  | 3.02    | 3.14    | 3.07    | 3.26    | 2.94    | 3.16    | 3.12    |
| 24  | 1.25    | 1.28    | 1.25    | 1.29    | 1.22    | 1.27    | 1.25    |
| 25  | 1.27    | 1.28    | 1.24    | 1.3     | 1.27    | 1.29    | 1.3     |
| 26  | 1.5     | 1.47    | 1.42    | 1.54    | 1.64    | 1.6     | 1.63    |
| 27  | 1.72    | 1.73    | 1.82    | 1.7     | 1.73    | 1.84    | 1.98    |
| 28  | 1.99    | 2.18    | 1.97    | 2.06    | 1.99    | 2.23    | 1.99    |
| 29  | 2.24    | 2.23    | 2.5     | 2.34    | 2.28    | 2.35    | 2.25    |
| 30  | 2.44    | 2.49    | 2.49    | 2.84    | 2.79    | 2.6     | 2.53    |
| 31  | 3.06    | 2.77    | 2.76    | 2.88    | 2.69    | 2.86    | 2.8     |
| 32  | 1.1     | 1.16    | 1.08    | 1.12    | 1.19    | 1.11    | 1.08    |

│ Row │ NCols31 │ NCols32 │
│     │ Float64Float64 │
├─────┼─────────┼─────────┤
| 3   | 2.61    | 2.55    |
| 4   | 3.68    | 3.71    |
| 5   | 4.05    | 4.05    |
| 6   | 4.83    | 4.86    |
| 7   | 6.36    | 5.72    |
| 8   | 1.09    | 1.11    |
| 9   | 1.22    | 1.23    |
| 10  | 1.7     | 1.69    |
| 11  | 2.04    | 2.13    |
| 12  | 2.74    | 2.74    |
| 13  | 3.1     | 3.13    |
| 14  | 3.67    | 3.66    |
| 15  | 4.06    | 4.11    |
| 16  | 1.08    | 1.06    |
| 17  | 1.12    | 1.1     |
| 18  | 1.46    | 1.41    |
| 19  | 1.77    | 1.75    |
| 20  | 2.15    | 2.13    |
| 21  | 2.54    | 2.78    |
| 22  | 2.88    | 2.84    |
| 23  | 3.24    | 3.17    |
| 24  | 1.27    | 1.24    |
| 25  | 1.27    | 1.25    |
| 26  | 1.43    | 1.48    |
| 27  | 1.73    | 1.7     |
| 28  | 2.0     | 1.98    |
| 29  | 2.18    | 2.24    |
| 30  | 2.52    | 2.5     |
| 31  | 3.07    | 2.75    |
| 32  | 1.07    | 1.07    |

Blaze was faster when $M=8$ and $N=3$ or $N=4$, but slower otherwise.

Both gfortran versions were comparatively slow — at least they were fast to compile (and let you link alternative libraries).

In [22]:
print_results("gfortran", "native julia", minimum_times, padded = false)
30×30 DataFrame
│ Row │ NCols3  │ NCols4  │ NCols5  │ NCols6  │ NCols7  │ NCols8  │ NCols9  │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 3.44    | 4.3     | 4.92    | 5.73    | 5.88    | 6.04    | 5.88    |
| 4   | 2.79    | 3.12    | 3.41    | 3.69    | 3.38    | 3.71    | 3.82    |
| 5   | 3.47    | 3.93    | 4.64    | 5.31    | 5.21    | 5.57    | 5.75    |
| 6   | 4.16    | 4.91    | 5.31    | 6.28    | 6.34    | 6.09    | 6.88    |
| 7   | 4.59    | 4.91    | 5.79    | 6.69    | 6.95    | 7.74    | 7.4     |
| 8   | 3.03    | 3.34    | 3.6     | 4.23    | 4.43    | 4.43    | 4.15    |
| 9   | 3.62    | 4.09    | 4.87    | 4.95    | 4.95    | 4.19    | 4.71    |
| 10  | 4.33    | 4.9     | 5.41    | 5.47    | 5.34    | 5.3     | 5.25    |
| 11  | 4.43    | 5.4     | 5.94    | 5.87    | 5.8     | 5.65    | 5.7     |
| 12  | 4.93    | 5.85    | 6.36    | 6.41    | 6.35    | 6.38    | 6.41    |
| 13  | 5.13    | 6.18    | 6.6     | 6.68    | 6.44    | 6.38    | 6.46    |
| 14  | 5.79    | 6.71    | 7.1     | 7.23    | 7.21    | 6.43    | 7.21    |
| 15  | 6.27    | 6.94    | 7.93    | 8.01    | 7.82    | 6.94    | 7.84    |
| 16  | 3.12    | 3.55    | 3.48    | 3.44    | 3.4     | 3.35    | 3.32    |
| 17  | 3.84    | 3.73    | 3.69    | 3.69    | 3.67    | 3.78    | 3.74    |
| 18  | 3.83    | 4.01    | 4.0     | 3.99    | 4.06    | 4.15    | 4.11    |
| 19  | 4.06    | 4.31    | 4.4     | 4.18    | 4.25    | 4.2     | 4.11    |
| 20  | 4.55    | 4.55    | 4.58    | 4.45    | 4.48    | 4.59    | 4.53    |
| 21  | 4.86    | 5.09    | 4.98    | 4.97    | 5.04    | 5.07    | 4.73    |
| 22  | 4.9     | 4.91    | 5.12    | 5.23    | 5.24    | 5.5     | 5.44    |
| 23  | 5.39    | 5.64    | 5.66    | 5.63    | 5.79    | 5.68    | 5.64    |
| 24  | 2.54    | 2.73    | 2.61    | 2.64    | 2.65    | 2.63    | 2.62    |
| 25  | 3.37    | 3.3     | 3.26    | 3.26    | 2.83    | 3.11    | 3.25    |
| 26  | 3.5     | 3.48    | 3.52    | 3.42    | 3.04    | 3.32    | 3.36    |
| 27  | 3.99    | 4.01    | 3.97    | 4.02    | 3.39    | 3.7     | 3.8     |
| 28  | 3.98    | 3.99    | 3.93    | 4.27    | 3.73    | 4.04    | 4.04    |
| 29  | 4.52    | 4.44    | 4.56    | 4.57    | 3.9     | 4.32    | 4.53    |
| 30  | 4.7     | 4.63    | 4.64    | 4.84    | 4.15    | 4.53    | 4.74    |
| 31  | 4.9     | 4.66    | 4.89    | 5.08    | 4.28    | 4.77    | 4.98    |
| 32  | 2.51    | 2.45    | 2.53    | 2.55    | 2.18    | 2.39    | 2.46    |

│ Row │ NCols10 │ NCols11 │ NCols12 │ NCols13 │ NCols14 │ NCols15 │ NCols16 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 5.38    | 6.11    | 6.24    | 6.25    | 6.29    | 6.28    | 6.26    |
| 4   | 3.81    | 3.86    | 3.86    | 3.82    | 3.89    | 3.69    | 3.88    |
| 5   | 6.32    | 6.07    | 6.25    | 6.05    | 6.24    | 6.14    | 6.24    |
| 6   | 6.56    | 7.03    | 7.23    | 7.18    | 7.23    | 7.02    | 7.16    |
| 7   | 7.77    | 7.55    | 7.85    | 7.57    | 7.78    | 7.61    | 7.88    |
| 8   | 4.26    | 4.3     | 4.24    | 4.3     | 4.26    | 4.24    | 4.24    |
| 9   | 4.69    | 4.74    | 4.75    | 4.77    | 4.77    | 4.08    | 4.28    |
| 10  | 5.34    | 5.44    | 5.32    | 5.27    | 5.23    | 4.56    | 4.82    |
| 11  | 5.77    | 5.83    | 5.73    | 5.86    | 5.59    | 5.04    | 5.31    |
| 12  | 6.29    | 6.04    | 6.24    | 6.2     | 6.24    | 5.39    | 5.7     |
| 13  | 6.66    | 6.31    | 6.47    | 6.56    | 6.46    | 5.8     | 6.14    |
| 14  | 7.37    | 7.27    | 7.15    | 7.32    | 7.21    | 6.46    | 6.83    |
| 15  | 8.01    | 7.99    | 7.71    | 7.85    | 8.05    | 7.05    | 7.45    |
| 16  | 3.32    | 3.3     | 3.3     | 3.2     | 3.24    | 2.87    | 3.02    |
| 17  | 3.26    | 3.56    | 3.72    | 3.75    | 3.69    | 3.62    | 3.68    |
| 18  | 3.54    | 3.85    | 4.06    | 4.05    | 3.98    | 3.97    | 3.98    |
| 19  | 3.68    | 3.97    | 4.16    | 4.18    | 4.14    | 4.11    | 4.16    |
| 20  | 4.01    | 4.29    | 4.45    | 4.52    | 4.48    | 4.36    | 4.44    |
| 21  | 4.39    | 4.76    | 5.01    | 4.98    | 5.03    | 4.96    | 5.06    |
| 22  | 4.73    | 5.11    | 5.33    | 5.35    | 5.36    | 5.3     | 5.33    |
| 23  | 4.97    | 5.4     | 5.69    | 5.65    | 5.67    | 5.77    | 5.74    |
| 24  | 2.42    | 2.49    | 2.51    | 2.5     | 2.51    | 2.48    | 2.48    |
| 25  | 3.28    | 3.23    | 3.07    | 2.87    | 2.84    | 3.01    | 2.99    |
| 26  | 3.48    | 3.41    | 3.24    | 3.02    | 3.13    | 3.19    | 3.15    |
| 27  | 3.85    | 3.81    | 3.66    | 3.4     | 3.35    | 3.57    | 3.59    |
| 28  | 4.13    | 4.05    | 3.86    | 3.63    | 3.61    | 3.67    | 3.66    |
| 29  | 4.5     | 4.49    | 4.28    | 3.93    | 4.13    | 4.2     | 4.21    |
| 30  | 4.81    | 4.78    | 4.53    | 4.24    | 4.39    | 4.45    | 4.5     |
| 31  | 5.07    | 5.03    | 4.84    | 4.5     | 4.71    | 4.8     | 5.02    |
| 32  | 2.5     | 2.52    | 2.42    | 2.27    | 2.37    | 2.4     | 2.42    |

│ Row │ NCols17 │ NCols18 │ NCols19 │ NCols20 │ NCols21 │ NCols22 │ NCols23 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 6.15    | 6.26    | 6.12    | 6.19    | 6.23    | 6.11    | 6.09    |
| 4   | 3.72    | 3.73    | 3.67    | 3.7     | 3.7     | 3.69    | 3.5     |
| 5   | 6.19    | 5.97    | 5.97    | 6.15    | 6.08    | 6.14    | 6.09    |
| 6   | 6.95    | 7.04    | 6.98    | 6.99    | 6.81    | 6.89    | 6.85    |
| 7   | 7.58    | 7.7     | 7.59    | 7.75    | 7.65    | 7.64    | 7.54    |
| 8   | 4.17    | 4.22    | 4.01    | 4.13    | 4.14    | 4.19    | 4.22    |
| 9   | 4.52    | 4.63    | 4.63    | 4.64    | 4.58    | 4.63    | 4.65    |
| 10  | 4.99    | 5.08    | 4.98    | 5.16    | 5.16    | 5.16    | 5.15    |
| 11  | 5.55    | 5.67    | 5.71    | 5.67    | 5.77    | 5.27    | 5.72    |
| 12  | 5.94    | 6.08    | 6.01    | 5.93    | 6.11    | 6.22    | 6.22    |
| 13  | 6.44    | 6.55    | 6.65    | 6.61    | 6.61    | 6.71    | 6.69    |
| 14  | 7.16    | 7.27    | 7.45    | 7.38    | 7.33    | 7.45    | 6.91    |
| 15  | 7.79    | 7.94    | 7.99    | 8.05    | 8.15    | 8.07    | 8.13    |
| 16  | 3.07    | 3.04    | 3.06    | 3.08    | 3.08    | 3.07    | 3.0     |
| 17  | 3.71    | 3.48    | 3.23    | 3.36    | 3.43    | 3.44    | 3.45    |
| 18  | 4.05    | 3.75    | 3.36    | 3.63    | 3.55    | 3.63    | 3.58    |
| 19  | 4.18    | 3.93    | 3.64    | 3.79    | 3.87    | 3.92    | 3.86    |
| 20  | 4.45    | 4.18    | 3.92    | 4.08    | 4.16    | 4.2     | 4.2     |
| 21  | 5.06    | 4.77    | 4.42    | 4.58    | 4.71    | 4.74    | 4.74    |
| 22  | 5.46    | 5.09    | 4.75    | 4.94    | 5.05    | 5.09    | 5.17    |
| 23  | 5.74    | 5.37    | 5.03    | 5.23    | 5.34    | 5.49    | 5.52    |
| 24  | 2.48    | 2.36    | 2.26    | 2.34    | 2.36    | 2.38    | 2.39    |
| 25  | 3.0     | 3.07    | 2.95    | 3.24    | 3.21    | 3.24    | 3.27    |
| 26  | 3.19    | 3.27    | 3.31    | 3.42    | 3.46    | 3.47    | 3.36    |
| 27  | 3.42    | 3.65    | 3.65    | 3.64    | 3.71    | 3.74    | 3.75    |
| 28  | 3.67    | 3.92    | 3.76    | 3.99    | 3.86    | 3.98    | 3.99    |
| 29  | 4.16    | 4.23    | 3.99    | 4.14    | 4.22    | 4.22    | 4.2     |
| 30  | 4.49    | 4.61    | 4.35    | 4.37    | 4.28    | 4.48    | 4.47    |
| 31  | 5.07    | 5.15    | 4.92    | 5.03    | 5.12    | 5.12    | 5.09    |
| 32  | 2.39    | 2.41    | 2.36    | 2.43    | 2.44    | 2.48    | 2.44    |

│ Row │ NCols24 │ NCols25 │ NCols26 │ NCols27 │ NCols28 │ NCols29 │ NCols30 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 6.12    | 6.12    | 6.08    | 5.89    | 6.18    | 6.16    | 6.01    |
| 4   | 3.63    | 3.58    | 3.65    | 3.51    | 3.61    | 3.61    | 3.61    |
| 5   | 5.95    | 5.99    | 5.98    | 5.94    | 6.02    | 5.75    | 5.94    |
| 6   | 6.97    | 6.97    | 6.88    | 6.87    | 6.72    | 6.94    | 6.93    |
| 7   | 7.66    | 7.58    | 6.53    | 6.5     | 7.55    | 7.53    | 7.73    |
| 8   | 4.17    | 4.05    | 4.1     | 4.14    | 4.14    | 4.15    | 4.13    |
| 9   | 4.63    | 4.63    | 4.56    | 4.6     | 4.28    | 3.98    | 4.03    |
| 10  | 5.03    | 5.14    | 5.15    | 5.15    | 4.8     | 4.21    | 4.51    |
| 11  | 5.61    | 5.71    | 5.77    | 5.74    | 5.33    | 4.87    | 4.94    |
| 12  | 6.23    | 6.22    | 6.21    | 6.2     | 5.78    | 5.4     | 5.47    |
| 13  | 6.56    | 6.55    | 6.71    | 6.7     | 6.17    | 5.69    | 5.78    |
| 14  | 7.31    | 7.4     | 7.48    | 7.45    | 6.94    | 6.4     | 6.51    |
| 15  | 8.11    | 8.05    | 8.12    | 8.16    | 7.54    | 6.95    | 6.98    |
| 16  | 3.05    | 3.07    | 3.08    | 3.08    | 2.95    | 2.73    | 2.73    |
| 17  | 3.44    | 3.44    | 3.44    | 3.52    | 3.33    | 3.64    | 3.69    |
| 18  | 3.58    | 3.6     | 3.58    | 3.6     | 3.82    | 3.71    | 3.92    |
| 19  | 3.89    | 3.7     | 4.08    | 4.16    | 3.95    | 4.08    | 4.14    |
| 20  | 4.2     | 4.23    | 4.21    | 4.32    | 4.1     | 4.22    | 4.21    |
| 21  | 4.89    | 4.89    | 4.9     | 5.05    | 4.8     | 4.96    | 4.99    |
| 22  | 5.19    | 5.23    | 5.04    | 5.41    | 5.15    | 5.31    | 5.27    |
| 23  | 5.49    | 5.52    | 5.55    | 5.76    | 5.33    | 5.54    | 5.58    |
| 24  | 2.41    | 2.41    | 2.43    | 2.43    | 2.34    | 2.43    | 2.42    |
| 25  | 3.27    | 3.09    | 3.17    | 3.21    | 3.24    | 3.21    | 3.2     |
| 26  | 3.44    | 3.27    | 3.27    | 3.43    | 3.43    | 3.43    | 3.39    |
| 27  | 3.74    | 3.54    | 3.57    | 3.55    | 3.71    | 3.72    | 3.71    |
| 28  | 3.84    | 3.66    | 3.94    | 3.98    | 3.94    | 3.91    | 3.77    |
| 29  | 4.17    | 4.0     | 4.08    | 4.18    | 4.2     | 4.21    | 6.84    |
| 30  | 4.35    | 4.26    | 4.31    | 4.4     | 4.43    | 6.53    | 6.45    |
| 31  | 5.0     | 4.85    | 4.96    | 5.02    | 6.4     | 7.29    | 6.87    |
| 32  | 2.48    | 2.36    | 2.44    | 7.46    | 7.06    | 7.15    | 6.82    |

│ Row │ NCols31 │ NCols32 │
│     │ Float64Float64 │
├─────┼─────────┼─────────┤
| 3   | 5.47    | 5.62    |
| 4   | 3.23    | 3.25    |
| 5   | 5.5     | 5.66    |
| 6   | 6.29    | 6.47    |
| 7   | 7.06    | 7.28    |
| 8   | 3.62    | 3.7     |
| 9   | 4.17    | 4.25    |
| 10  | 4.63    | 4.69    |
| 11  | 4.81    | 5.15    |
| 12  | 5.64    | 5.75    |
| 13  | 5.89    | 6.08    |
| 14  | 6.71    | 6.83    |
| 15  | 7.13    | 7.34    |
| 16  | 2.82    | 2.88    |
| 17  | 3.62    | 3.65    |
| 18  | 3.94    | 3.79    |
| 19  | 4.08    | 4.14    |
| 20  | 4.24    | 4.28    |
| 21  | 5.05    | 5.03    |
| 22  | 5.34    | 5.38    |
| 23  | 5.69    | 5.71    |
| 24  | 2.44    | 2.43    |
| 25  | 3.11    | 3.18    |
| 26  | 3.16    | 3.36    |
| 27  | 3.58    | 6.0     |
| 28  | 6.84    | 5.76    |
| 29  | 5.95    | 6.38    |
| 30  | 6.23    | 6.41    |
| 31  | 6.58    | 6.25    |
| 32  | 6.67    | 7.22    |
In [23]:
print_results("gfortran tc", "native julia", minimum_times, padded = false)
30×30 DataFrame
│ Row │ NCols3  │ NCols4  │ NCols5  │ NCols6  │ NCols7  │ NCols8  │ NCols9  │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 1.11    | 1.32    | 1.44    | 1.64    | 1.6     | 1.58    | 1.51    |
| 4   | 1.49    | 1.58    | 1.75    | 1.86    | 1.74    | 1.85    | 2.02    |
| 5   | 2.68    | 3.07    | 3.62    | 4.2     | 4.07    | 4.35    | 4.4     |
| 6   | 2.93    | 3.63    | 3.78    | 4.5     | 4.6     | 4.3     | 4.86    |
| 7   | 3.24    | 3.56    | 4.18    | 4.83    | 4.98    | 5.56    | 5.34    |
| 8   | 2.83    | 2.69    | 3.05    | 3.46    | 3.6     | 3.5     | 3.16    |
| 9   | 3.91    | 4.69    | 5.54    | 5.71    | 5.8     | 5.05    | 5.69    |
| 10  | 4.16    | 5.23    | 5.73    | 5.81    | 5.75    | 5.7     | 5.56    |
| 11  | 4.53    | 5.57    | 6.16    | 6.13    | 6.11    | 5.98    | 6.02    |
| 12  | 5.1     | 6.2     | 6.76    | 6.78    | 6.79    | 6.82    | 6.93    |
| 13  | 5.29    | 6.46    | 6.89    | 6.95    | 6.74    | 6.71    | 6.85    |
| 14  | 5.85    | 6.99    | 7.3     | 7.48    | 7.37    | 6.61    | 7.43    |
| 15  | 6.12    | 6.62    | 7.81    | 7.93    | 7.74    | 6.9     | 7.81    |
| 16  | 8.21    | 9.44    | 9.81    | 9.92    | 10.0    | 9.87    | 10.1    |
| 17  | 3.83    | 3.73    | 3.69    | 3.68    | 3.67    | 3.78    | 3.77    |
| 18  | 3.83    | 4.04    | 4.0     | 3.99    | 4.06    | 4.11    | 4.11    |
| 19  | 4.07    | 4.24    | 4.4     | 4.18    | 4.25    | 4.2     | 4.11    |
| 20  | 4.54    | 4.55    | 4.58    | 4.45    | 4.49    | 4.59    | 4.53    |
| 21  | 4.91    | 5.09    | 4.98    | 4.99    | 5.05    | 5.07    | 4.73    |
| 22  | 4.9     | 4.91    | 5.12    | 5.23    | 5.24    | 5.5     | 5.46    |
| 23  | 5.39    | 5.64    | 5.66    | 5.62    | 5.79    | 5.68    | 5.65    |
| 24  | 2.54    | 2.72    | 2.61    | 2.64    | 2.65    | 2.63    | 2.62    |
| 25  | 3.37    | 3.3     | 3.26    | 3.25    | 2.83    | 3.11    | 3.25    |
| 26  | 3.5     | 3.5     | 3.52    | 3.42    | 3.04    | 3.32    | 3.36    |
| 27  | 3.99    | 4.01    | 3.97    | 4.02    | 3.39    | 3.71    | 3.8     |
| 28  | 3.98    | 3.99    | 3.93    | 4.26    | 3.73    | 4.04    | 4.04    |
| 29  | 4.51    | 4.44    | 4.56    | 4.57    | 3.91    | 4.31    | 4.53    |
| 30  | 4.72    | 4.62    | 4.64    | 4.84    | 4.15    | 4.53    | 4.74    |
| 31  | 4.9     | 4.65    | 4.89    | 5.08    | 4.28    | 4.77    | 4.98    |
| 32  | 2.51    | 2.45    | 2.53    | 2.53    | 2.18    | 2.39    | 2.46    |

│ Row │ NCols10 │ NCols11 │ NCols12 │ NCols13 │ NCols14 │ NCols15 │ NCols16 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 1.35    | 1.51    | 1.51    | 1.49    | 1.5     | 1.44    | 1.43    |
| 4   | 1.92    | 1.9     | 2.04    | 2.02    | 2.08    | 1.79    | 1.86    |
| 5   | 4.83    | 4.63    | 4.77    | 4.69    | 4.87    | 4.79    | 4.91    |
| 6   | 4.64    | 5.0     | 5.29    | 5.29    | 5.21    | 5.08    | 5.36    |
| 7   | 5.6     | 5.6     | 5.7     | 5.51    | 5.69    | 5.58    | 5.79    |
| 8   | 3.24    | 3.24    | 3.19    | 3.19    | 3.21    | 3.13    | 3.2     |
| 9   | 5.69    | 5.72    | 5.73    | 5.76    | 5.68    | 4.94    | 5.21    |
| 10  | 5.79    | 5.82    | 5.75    | 5.73    | 5.76    | 5.08    | 5.38    |
| 11  | 6.16    | 6.23    | 6.19    | 6.34    | 6.11    | 5.52    | 5.85    |
| 12  | 6.79    | 6.58    | 6.84    | 6.84    | 6.9     | 5.97    | 6.33    |
| 13  | 7.01    | 6.67    | 6.86    | 7.0     | 6.92    | 6.23    | 6.6     |
| 14  | 7.59    | 7.61    | 7.39    | 7.58    | 7.43    | 6.66    | 7.05    |
| 15  | 8.0     | 7.95    | 7.7     | 7.88    | 8.03    | 7.06    | 7.48    |
| 16  | 10.0    | 10.0    | 10.1    | 9.78    | 9.92    | 8.81    | 9.36    |
| 17  | 3.29    | 3.53    | 3.73    | 3.75    | 3.68    | 3.64    | 3.68    |
| 18  | 3.55    | 3.86    | 4.06    | 4.05    | 4.0     | 3.96    | 3.98    |
| 19  | 3.69    | 3.97    | 4.16    | 4.18    | 4.14    | 4.11    | 4.16    |
| 20  | 4.01    | 4.29    | 4.45    | 4.52    | 4.47    | 4.36    | 4.45    |
| 21  | 4.39    | 4.76    | 5.01    | 4.98    | 5.02    | 4.96    | 5.06    |
| 22  | 4.73    | 5.11    | 5.33    | 5.36    | 5.35    | 5.31    | 5.33    |
| 23  | 4.96    | 5.41    | 5.69    | 5.65    | 5.67    | 5.77    | 5.74    |
| 24  | 2.38    | 2.49    | 2.51    | 2.5     | 2.54    | 2.5     | 2.48    |
| 25  | 3.28    | 3.23    | 3.07    | 2.87    | 2.84    | 3.01    | 2.99    |
| 26  | 3.49    | 3.41    | 3.24    | 3.02    | 3.13    | 3.19    | 3.15    |
| 27  | 3.84    | 3.81    | 3.66    | 3.4     | 3.35    | 3.57    | 3.59    |
| 28  | 4.13    | 4.05    | 3.86    | 3.62    | 3.77    | 3.7     | 3.84    |
| 29  | 4.5     | 4.49    | 4.28    | 3.93    | 4.13    | 4.21    | 4.21    |
| 30  | 4.81    | 4.78    | 4.53    | 4.23    | 4.39    | 4.46    | 4.51    |
| 31  | 5.07    | 5.03    | 4.83    | 4.5     | 4.71    | 4.8     | 5.02    |
| 32  | 2.51    | 2.52    | 2.42    | 2.27    | 2.37    | 2.4     | 2.42    |

│ Row │ NCols17 │ NCols18 │ NCols19 │ NCols20 │ NCols21 │ NCols22 │ NCols23 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 1.4     | 1.53    | 1.38    | 1.39    | 1.42    | 1.28    | 1.27    |
| 4   | 1.7     | 1.71    | 1.69    | 1.69    | 1.7     | 1.7     | 1.6     |
| 5   | 4.87    | 4.71    | 4.73    | 4.85    | 4.79    | 4.89    | 4.87    |
| 6   | 5.22    | 5.16    | 5.14    | 5.33    | 5.19    | 5.1     | 5.09    |
| 7   | 5.58    | 5.65    | 5.61    | 5.71    | 5.65    | 5.67    | 5.6     |
| 8   | 3.12    | 3.17    | 2.98    | 3.08    | 3.12    | 3.12    | 3.14    |
| 9   | 5.52    | 5.68    | 5.7     | 5.77    | 5.69    | 5.79    | 5.8     |
| 10  | 5.61    | 5.71    | 5.63    | 5.85    | 5.86    | 5.87    | 5.89    |
| 11  | 6.13    | 6.28    | 6.33    | 6.29    | 6.41    | 5.88    | 6.38    |
| 12  | 6.61    | 6.8     | 6.72    | 6.65    | 6.86    | 6.94    | 7.0     |
| 13  | 6.96    | 7.08    | 7.2     | 7.21    | 7.2     | 7.33    | 7.3     |
| 14  | 7.41    | 7.51    | 7.75    | 7.67    | 7.62    | 7.73    | 7.18    |
| 15  | 7.82    | 8.04    | 8.09    | 8.16    | 8.25    | 8.17    | 8.23    |
| 16  | 9.37    | 9.46    | 9.57    | 9.63    | 9.75    | 9.78    | 9.71    |
| 17  | 3.7     | 3.48    | 3.23    | 3.36    | 3.43    | 3.44    | 3.46    |
| 18  | 4.05    | 3.59    | 3.35    | 3.63    | 3.58    | 3.59    | 3.59    |
| 19  | 4.18    | 3.93    | 3.64    | 3.79    | 3.87    | 3.95    | 3.86    |
| 20  | 4.46    | 4.19    | 3.92    | 4.08    | 4.15    | 4.19    | 4.2     |
| 21  | 5.06    | 4.77    | 4.42    | 4.58    | 4.72    | 4.75    | 4.74    |
| 22  | 5.46    | 5.09    | 4.75    | 4.93    | 5.05    | 5.09    | 5.17    |
| 23  | 5.73    | 5.38    | 5.03    | 5.23    | 5.34    | 5.49    | 5.53    |
| 24  | 2.48    | 2.36    | 2.26    | 2.35    | 2.35    | 2.38    | 2.39    |
| 25  | 2.99    | 3.06    | 2.95    | 3.24    | 3.21    | 3.24    | 3.27    |
| 26  | 3.19    | 3.27    | 3.31    | 3.42    | 3.46    | 3.47    | 3.36    |
| 27  | 3.42    | 3.65    | 3.65    | 3.64    | 3.71    | 3.74    | 3.75    |
| 28  | 3.82    | 3.92    | 3.75    | 3.99    | 4.05    | 3.83    | 3.82    |
| 29  | 4.16    | 4.22    | 4.0     | 4.14    | 4.22    | 4.22    | 4.21    |
| 30  | 4.49    | 4.62    | 4.35    | 4.38    | 4.29    | 4.49    | 4.47    |
| 31  | 5.07    | 5.15    | 4.92    | 5.02    | 5.12    | 5.11    | 5.09    |
| 32  | 2.39    | 2.41    | 2.36    | 2.42    | 2.42    | 2.48    | 2.46    |

│ Row │ NCols24 │ NCols25 │ NCols26 │ NCols27 │ NCols28 │ NCols29 │ NCols30 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 1.28    | 1.27    | 1.29    | 1.22    | 1.28    | 1.27    | 1.23    |
| 4   | 1.67    | 1.66    | 1.66    | 1.61    | 1.65    | 1.69    | 1.68    |
| 5   | 4.76    | 4.76    | 4.8     | 4.78    | 4.85    | 4.65    | 4.81    |
| 6   | 5.34    | 5.19    | 5.13    | 5.14    | 5.21    | 5.4     | 5.2     |
| 7   | 5.7     | 5.64    | 4.85    | 4.85    | 5.6     | 5.63    | 5.77    |
| 8   | 3.09    | 3.02    | 3.07    | 3.09    | 3.09    | 3.15    | 3.09    |
| 9   | 5.8     | 5.8     | 5.71    | 5.8     | 5.41    | 5.04    | 5.12    |
| 10  | 5.75    | 5.88    | 5.91    | 5.94    | 5.53    | 4.86    | 5.21    |
| 11  | 6.3     | 6.4     | 6.47    | 6.43    | 5.99    | 5.47    | 5.56    |
| 12  | 7.0     | 7.02    | 7.02    | 6.99    | 6.5     | 6.09    | 6.17    |
| 13  | 7.16    | 7.16    | 7.33    | 7.32    | 6.72    | 6.2     | 6.29    |
| 14  | 7.6     | 7.69    | 7.76    | 7.75    | 7.21    | 6.65    | 6.73    |
| 15  | 8.21    | 8.15    | 8.22    | 8.24    | 7.59    | 7.02    | 7.02    |
| 16  | 9.78    | 9.96    | 10.0    | 10.0    | 9.69    | 9.0     | 9.12    |
| 17  | 3.44    | 3.43    | 3.44    | 3.51    | 3.33    | 3.64    | 3.7     |
| 18  | 3.76    | 3.57    | 3.76    | 3.6     | 3.65    | 3.72    | 3.75    |
| 19  | 3.9     | 3.7     | 4.09    | 4.15    | 3.95    | 4.08    | 4.14    |
| 20  | 4.2     | 4.23    | 4.21    | 4.33    | 4.1     | 4.22    | 4.21    |
| 21  | 4.9     | 4.9     | 4.9     | 5.05    | 4.8     | 4.96    | 4.99    |
| 22  | 5.19    | 5.23    | 5.04    | 5.42    | 5.15    | 5.31    | 5.27    |
| 23  | 5.49    | 5.52    | 5.54    | 5.76    | 5.33    | 5.54    | 5.58    |
| 24  | 2.41    | 2.41    | 2.43    | 2.47    | 2.37    | 2.43    | 2.45    |
| 25  | 3.26    | 3.1     | 3.17    | 3.21    | 3.24    | 3.21    | 3.2     |
| 26  | 3.44    | 3.28    | 3.27    | 3.44    | 3.44    | 3.43    | 3.39    |
| 27  | 3.74    | 3.54    | 3.57    | 3.55    | 3.71    | 3.72    | 3.71    |
| 28  | 3.99    | 3.85    | 3.94    | 3.82    | 3.94    | 3.74    | 3.93    |
| 29  | 4.17    | 4.0     | 4.09    | 4.18    | 4.2     | 4.21    | 6.84    |
| 30  | 4.35    | 4.26    | 4.31    | 4.4     | 4.43    | 6.5     | 6.46    |
| 31  | 5.0     | 4.84    | 4.96    | 5.02    | 6.42    | 6.63    | 6.73    |
| 32  | 2.48    | 2.38    | 2.46    | 7.21    | 7.03    | 6.98    | 6.82    |

│ Row │ NCols31 │ NCols32 │
│     │ Float64Float64 │
├─────┼─────────┼─────────┤
| 3   | 1.12    | 1.17    |
| 4   | 1.48    | 1.53    |
| 5   | 4.46    | 4.6     |
| 6   | 4.74    | 5.04    |
| 7   | 5.28    | 5.45    |
| 8   | 2.72    | 2.79    |
| 9   | 5.31    | 5.42    |
| 10  | 5.36    | 5.44    |
| 11  | 5.41    | 5.8     |
| 12  | 6.36    | 6.45    |
| 13  | 6.41    | 6.63    |
| 14  | 6.96    | 7.1     |
| 15  | 7.2     | 7.4     |
| 16  | 9.36    | 9.58    |
| 17  | 3.63    | 3.65    |
| 18  | 3.94    | 3.81    |
| 19  | 4.08    | 4.13    |
| 20  | 4.24    | 4.28    |
| 21  | 5.05    | 5.03    |
| 22  | 5.33    | 5.38    |
| 23  | 5.69    | 5.71    |
| 24  | 2.44    | 2.43    |
| 25  | 3.11    | 3.18    |
| 26  | 3.16    | 3.36    |
| 27  | 3.58    | 5.67    |
| 28  | 5.83    | 5.77    |
| 29  | 5.98    | 5.91    |
| 30  | 6.48    | 6.32    |
| 31  | 6.64    | 6.47    |
| 32  | 6.68    | 6.63    |

More interesting than comparing them with Julia is to look at the impact of the tree-cunrolli compiler pass; numbers greater than 1 mean it was faster without the pass, and smaller mean it produced better code with:

In [34]:
print_results("gfortran tc", "gfortran", minimum_times, padded = false)
30×30 DataFrame
│ Row │ NCols3  │ NCols4  │ NCols5  │ NCols6  │ NCols7  │ NCols8  │ NCols9  │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 0.322   | 0.306   | 0.293   | 0.286   | 0.272   | 0.262   | 0.257   |
| 4   | 0.536   | 0.508   | 0.512   | 0.504   | 0.515   | 0.499   | 0.53    |
| 5   | 0.773   | 0.782   | 0.78    | 0.792   | 0.783   | 0.781   | 0.765   |
| 6   | 0.705   | 0.74    | 0.712   | 0.717   | 0.725   | 0.707   | 0.707   |
| 7   | 0.705   | 0.725   | 0.721   | 0.722   | 0.717   | 0.718   | 0.722   |
| 8   | 0.933   | 0.807   | 0.849   | 0.819   | 0.814   | 0.79    | 0.761   |
| 9   | 1.08    | 1.15    | 1.14    | 1.15    | 1.17    | 1.21    | 1.21    |
| 10  | 0.961   | 1.07    | 1.06    | 1.06    | 1.08    | 1.07    | 1.06    |
| 11  | 1.02    | 1.03    | 1.04    | 1.04    | 1.05    | 1.06    | 1.06    |
| 12  | 1.03    | 1.06    | 1.06    | 1.06    | 1.07    | 1.07    | 1.08    |
| 13  | 1.03    | 1.05    | 1.04    | 1.04    | 1.05    | 1.05    | 1.06    |
| 14  | 1.01    | 1.04    | 1.03    | 1.03    | 1.02    | 1.03    | 1.03    |
| 15  | 0.975   | 0.954   | 0.984   | 0.99    | 0.99    | 0.995   | 0.996   |
| 16  | 2.64    | 2.66    | 2.82    | 2.88    | 2.96    | 2.94    | 3.03    |
| 17  | 0.999   | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.01    |
| 18  | 1.0     | 1.01    | 1.0     | 1.0     | 1.0     | 0.992   | 1.0     |
| 19  | 1.0     | 0.985   | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 20  | 0.996   | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 21  | 1.01    | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 22  | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 23  | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 24  | 1.0     | 0.999   | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 25  | 1.0     | 1.0     | 1.0     | 0.998   | 1.0     | 1.0     | 1.0     |
| 26  | 1.0     | 1.01    | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 27  | 1.0     | 1.0     | 1.0     | 0.999   | 0.999   | 1.0     | 1.0     |
| 28  | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 0.999   |
| 29  | 0.998   | 1.0     | 0.999   | 1.0     | 1.0     | 1.0     | 0.999   |
| 30  | 1.0     | 0.998   | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 31  | 0.999   | 0.998   | 1.0     | 1.0     | 1.0     | 1.0     | 0.999   |
| 32  | 0.999   | 0.999   | 1.0     | 0.993   | 1.0     | 1.0     | 0.999   |

│ Row │ NCols10 │ NCols11 │ NCols12 │ NCols13 │ NCols14 │ NCols15 │ NCols16 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 0.25    | 0.248   | 0.242   | 0.239   | 0.238   | 0.23    | 0.229   |
| 4   | 0.504   | 0.493   | 0.528   | 0.529   | 0.534   | 0.486   | 0.479   |
| 5   | 0.765   | 0.762   | 0.763   | 0.775   | 0.781   | 0.779   | 0.786   |
| 6   | 0.707   | 0.711   | 0.732   | 0.737   | 0.72    | 0.723   | 0.748   |
| 7   | 0.721   | 0.742   | 0.726   | 0.728   | 0.731   | 0.733   | 0.734   |
| 8   | 0.761   | 0.752   | 0.752   | 0.742   | 0.753   | 0.737   | 0.755   |
| 9   | 1.21    | 1.21    | 1.21    | 1.21    | 1.19    | 1.21    | 1.22    |
| 10  | 1.08    | 1.07    | 1.08    | 1.09    | 1.1     | 1.11    | 1.12    |
| 11  | 1.07    | 1.07    | 1.08    | 1.08    | 1.09    | 1.1     | 1.1     |
| 12  | 1.08    | 1.09    | 1.1     | 1.1     | 1.11    | 1.11    | 1.11    |
| 13  | 1.05    | 1.06    | 1.06    | 1.07    | 1.07    | 1.07    | 1.08    |
| 14  | 1.03    | 1.05    | 1.03    | 1.04    | 1.03    | 1.03    | 1.03    |
| 15  | 0.999   | 0.994   | 0.998   | 1.0     | 0.998   | 1.0     | 1.0     |
| 16  | 3.02    | 3.04    | 3.06    | 3.06    | 3.06    | 3.07    | 3.09    |
| 17  | 1.01    | 0.991   | 1.0     | 1.0     | 0.998   | 1.0     | 1.0     |
| 18  | 1.0     | 1.0     | 1.0     | 0.999   | 1.0     | 0.998   | 0.999   |
| 19  | 1.0     | 0.999   | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 20  | 0.999   | 1.0     | 1.0     | 1.0     | 0.998   | 0.999   | 1.0     |
| 21  | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 22  | 0.998   | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 23  | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 0.999   | 1.0     |
| 24  | 0.982   | 1.0     | 1.0     | 1.0     | 1.01    | 1.01    | 1.0     |
| 25  | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 26  | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 27  | 1.0     | 0.999   | 1.0     | 1.0     | 1.0     | 0.999   | 1.0     |
| 28  | 0.999   | 1.0     | 0.999   | 0.997   | 1.04    | 1.01    | 1.05    |
| 29  | 0.999   | 0.999   | 1.0     | 0.999   | 1.0     | 1.0     | 1.0     |
| 30  | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 31  | 1.0     | 1.0     | 0.999   | 1.0     | 1.0     | 1.0     | 0.999   |
| 32  | 1.0     | 0.999   | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |

│ Row │ NCols17 │ NCols18 │ NCols19 │ NCols20 │ NCols21 │ NCols22 │ NCols23 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 0.227   | 0.244   | 0.226   | 0.224   | 0.229   | 0.21    | 0.209   |
| 4   | 0.457   | 0.459   | 0.461   | 0.457   | 0.461   | 0.461   | 0.457   |
| 5   | 0.788   | 0.789   | 0.793   | 0.789   | 0.788   | 0.796   | 0.8     |
| 6   | 0.751   | 0.733   | 0.736   | 0.762   | 0.762   | 0.74    | 0.742   |
| 7   | 0.736   | 0.734   | 0.739   | 0.737   | 0.739   | 0.742   | 0.743   |
| 8   | 0.75    | 0.751   | 0.744   | 0.745   | 0.753   | 0.746   | 0.743   |
| 9   | 1.22    | 1.23    | 1.23    | 1.24    | 1.24    | 1.25    | 1.25    |
| 10  | 1.12    | 1.12    | 1.13    | 1.13    | 1.14    | 1.14    | 1.14    |
| 11  | 1.11    | 1.11    | 1.11    | 1.11    | 1.11    | 1.12    | 1.12    |
| 12  | 1.11    | 1.12    | 1.12    | 1.12    | 1.12    | 1.12    | 1.12    |
| 13  | 1.08    | 1.08    | 1.08    | 1.09    | 1.09    | 1.09    | 1.09    |
| 14  | 1.04    | 1.03    | 1.04    | 1.04    | 1.04    | 1.04    | 1.04    |
| 15  | 1.0     | 1.01    | 1.01    | 1.01    | 1.01    | 1.01    | 1.01    |
| 16  | 3.05    | 3.12    | 3.13    | 3.13    | 3.16    | 3.18    | 3.24    |
| 17  | 0.998   | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 18  | 1.0     | 0.957   | 0.995   | 0.999   | 1.01    | 0.99    | 1.0     |
| 19  | 1.0     | 0.999   | 1.0     | 0.999   | 1.0     | 1.01    | 1.0     |
| 20  | 1.0     | 1.0     | 1.0     | 1.0     | 0.999   | 0.999   | 0.999   |
| 21  | 1.0     | 1.0     | 1.0     | 0.999   | 1.0     | 1.0     | 1.0     |
| 22  | 0.999   | 1.0     | 1.0     | 0.999   | 1.0     | 1.0     | 1.0     |
| 23  | 0.999   | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 24  | 1.0     | 0.999   | 1.0     | 1.0     | 0.999   | 1.0     | 0.999   |
| 25  | 0.999   | 1.0     | 0.999   | 1.0     | 1.0     | 1.0     | 1.0     |
| 26  | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 27  | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 28  | 1.04    | 1.0     | 0.999   | 1.0     | 1.05    | 0.962   | 0.956   |
| 29  | 0.999   | 1.0     | 1.0     | 1.0     | 0.999   | 1.0     | 1.0     |
| 30  | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 31  | 1.0     | 1.0     | 0.999   | 1.0     | 1.0     | 1.0     | 1.0     |
| 32  | 1.0     | 1.0     | 1.0     | 0.993   | 0.992   | 1.0     | 1.01    |

│ Row │ NCols24 │ NCols25 │ NCols26 │ NCols27 │ NCols28 │ NCols29 │ NCols30 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 0.209   | 0.208   | 0.212   | 0.207   | 0.207   | 0.207   | 0.205   |
| 4   | 0.46    | 0.465   | 0.456   | 0.46    | 0.457   | 0.469   | 0.464   |
| 5   | 0.8     | 0.795   | 0.804   | 0.804   | 0.805   | 0.808   | 0.81    |
| 6   | 0.766   | 0.745   | 0.746   | 0.748   | 0.775   | 0.778   | 0.751   |
| 7   | 0.743   | 0.744   | 0.744   | 0.746   | 0.742   | 0.747   | 0.746   |
| 8   | 0.742   | 0.746   | 0.749   | 0.745   | 0.747   | 0.758   | 0.75    |
| 9   | 1.25    | 1.25    | 1.25    | 1.26    | 1.27    | 1.27    | 1.27    |
| 10  | 1.14    | 1.14    | 1.15    | 1.15    | 1.15    | 1.15    | 1.16    |
| 11  | 1.12    | 1.12    | 1.12    | 1.12    | 1.12    | 1.12    | 1.12    |
| 12  | 1.12    | 1.13    | 1.13    | 1.13    | 1.13    | 1.13    | 1.13    |
| 13  | 1.09    | 1.09    | 1.09    | 1.09    | 1.09    | 1.09    | 1.09    |
| 14  | 1.04    | 1.04    | 1.04    | 1.04    | 1.04    | 1.04    | 1.03    |
| 15  | 1.01    | 1.01    | 1.01    | 1.01    | 1.01    | 1.01    | 1.01    |
| 16  | 3.21    | 3.25    | 3.25    | 3.25    | 3.29    | 3.29    | 3.34    |
| 17  | 1.0     | 0.999   | 1.0     | 0.998   | 1.0     | 0.999   | 1.0     |
| 18  | 1.05    | 0.993   | 1.05    | 0.999   | 0.954   | 1.0     | 0.957   |
| 19  | 1.0     | 1.0     | 1.0     | 0.999   | 1.0     | 1.0     | 1.0     |
| 20  | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 21  | 1.0     | 1.0     | 1.0     | 0.999   | 0.999   | 1.0     | 1.0     |
| 22  | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 23  | 1.0     | 0.999   | 0.999   | 1.0     | 1.0     | 0.999   | 1.0     |
| 24  | 1.0     | 1.0     | 1.0     | 1.01    | 1.01    | 1.0     | 1.01    |
| 25  | 0.997   | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 0.999   |
| 26  | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 0.999   |
| 27  | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 28  | 1.04    | 1.05    | 1.0     | 0.96    | 1.0     | 0.957   | 1.04    |
| 29  | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 30  | 1.0     | 1.0     | 0.999   | 0.999   | 1.0     | 0.995   | 1.0     |
| 31  | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 0.91    | 0.98    |
| 32  | 1.0     | 1.01    | 1.01    | 0.966   | 0.997   | 0.976   | 1.0     |

│ Row │ NCols31 │ NCols32 │
│     │ Float64Float64 │
├─────┼─────────┼─────────┤
| 3   | 0.206   | 0.208   |
| 4   | 0.457   | 0.472   |
| 5   | 0.812   | 0.813   |
| 6   | 0.753   | 0.779   |
| 7   | 0.749   | 0.749   |
| 8   | 0.753   | 0.755   |
| 9   | 1.27    | 1.28    |
| 10  | 1.16    | 1.16    |
| 11  | 1.13    | 1.13    |
| 12  | 1.13    | 1.12    |
| 13  | 1.09    | 1.09    |
| 14  | 1.04    | 1.04    |
| 15  | 1.01    | 1.01    |
| 16  | 3.32    | 3.32    |
| 17  | 1.0     | 1.0     |
| 18  | 1.0     | 1.01    |
| 19  | 1.0     | 0.999   |
| 20  | 0.999   | 1.0     |
| 21  | 1.0     | 1.0     |
| 22  | 0.999   | 1.0     |
| 23  | 0.999   | 0.999   |
| 24  | 1.0     | 1.0     |
| 25  | 1.0     | 1.0     |
| 26  | 1.0     | 1.0     |
| 27  | 1.0     | 0.944   |
| 28  | 0.852   | 1.0     |
| 29  | 1.0     | 0.926   |
| 30  | 1.04    | 0.987   |
| 31  | 1.01    | 1.04    |
| 32  | 1.0     | 0.918   |

When $M$ was 8 or less, the pass helped. It seemed to make things a little slower when there were between 9 and 15 rows, much slower for 16 rows, and didn’t make a difference beyond that.

Finally, let’s take a look at MKL JIT:

In [24]:
print_results("mkl jit", "native julia", minimum_times, padded = false)
30×30 DataFrame
│ Row │ NCols3  │ NCols4  │ NCols5  │ NCols6  │ NCols7  │ NCols8  │ NCols9  │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 2.15    | 2.01    | 2.03    | 2.1     | 2.02    | 1.92    | 1.77    |
| 4   | 2.16    | 2.08    | 1.98    | 1.98    | 1.69    | 1.77    | 1.76    |
| 5   | 2.18    | 1.89    | 1.97    | 1.99    | 1.8     | 1.74    | 1.7     |
| 6   | 2.31    | 2.07    | 1.94    | 2.01    | 1.85    | 1.61    | 1.75    |
| 7   | 2.16    | 1.84    | 1.94    | 1.92    | 1.84    | 1.87    | 1.73    |
| 8   | 2.08    | 1.82    | 1.82    | 1.93    | 1.88    | 1.76    | 1.67    |
| 9   | 1.83    | 1.74    | 1.82    | 1.71    | 1.62    | 1.38    | 1.49    |
| 10  | 1.84    | 1.86    | 1.79    | 1.69    | 1.59    | 1.52    | 1.4     |
| 11  | 2.09    | 2.14    | 2.19    | 1.93    | 1.84    | 1.65    | 1.65    |
| 12  | 1.94    | 1.9     | 1.81    | 1.71    | 1.61    | 1.54    | 1.49    |
| 13  | 2.17    | 2.27    | 2.32    | 2.02    | 1.89    | 1.7     | 1.71    |
| 14  | 2.24    | 2.28    | 2.25    | 2.01    | 1.91    | 1.56    | 1.71    |
| 15  | 2.51    | 2.05    | 2.32    | 2.01    | 1.9     | 1.53    | 1.71    |
| 16  | 2.03    | 1.96    | 1.84    | 1.72    | 1.62    | 1.52    | 1.5     |
| 17  | 1.76    | 1.61    | 1.5     | 1.45    | 1.37    | 1.36    | 1.35    |
| 18  | 1.72    | 1.59    | 1.47    | 1.39    | 1.36    | 1.51    | 1.34    |
| 19  | 1.96    | 1.74    | 1.63    | 1.45    | 1.43    | 1.36    | 1.32    |
| 20  | 1.79    | 1.68    | 1.5     | 1.4     | 1.36    | 1.36    | 1.39    |
| 21  | 1.97    | 1.82    | 1.64    | 1.49    | 1.48    | 1.42    | 1.3     |
| 22  | 1.92    | 1.64    | 1.59    | 1.51    | 1.45    | 1.45    | 1.4     |
| 23  | 2.31    | 1.79    | 1.66    | 1.5     | 1.51    | 1.4     | 1.36    |
| 24  | 1.75    | 1.74    | 1.51    | 1.46    | 1.41    | 1.36    | 1.38    |
| 25  | 1.7     | 1.47    | 1.38    | 1.32    | 1.11    | 1.19    | 1.22    |
| 26  | 1.63    | 1.45    | 1.41    | 1.32    | 1.12    | 1.2     | 1.18    |
| 27  | 1.79    | 1.52    | 1.41    | 1.39    | 1.15    | 1.21    | 1.31    |
| 28  | 1.65    | 1.48    | 1.37    | 1.36    | 1.15    | 1.22    | 1.2     |
| 29  | 1.85    | 1.56    | 1.49    | 1.41    | 1.18    | 1.25    | 1.35    |
| 30  | 1.87    | 1.55    | 1.45    | 1.43    | 1.2     | 1.26    | 1.33    |
| 31  | 2.12    | 1.5     | 1.46    | 1.41    | 1.16    | 1.24    | 1.34    |
| 32  | 1.73    | 1.51    | 1.44    | 1.4     | 1.15    | 1.23    | 1.22    |

│ Row │ NCols10 │ NCols11 │ NCols12 │ NCols13 │ NCols14 │ NCols15 │ NCols16 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 1.5     | 1.65    | 1.62    | 1.57    | 1.53    | 1.5     | 1.47    |
| 4   | 1.7     | 1.66    | 1.6     | 1.51    | 1.52    | 1.41    | 1.47    |
| 5   | 1.73    | 1.61    | 1.58    | 1.56    | 1.54    | 1.49    | 1.49    |
| 6   | 1.55    | 1.65    | 1.62    | 1.59    | 1.55    | 1.48    | 1.48    |
| 7   | 1.72    | 1.64    | 1.62    | 1.55    | 1.53    | 1.48    | 1.49    |
| 8   | 1.66    | 1.62    | 1.58    | 1.5     | 1.5     | 1.45    | 1.44    |
| 9   | 1.45    | 1.46    | 1.4     | 1.37    | 1.35    | 1.12    | 1.16    |
| 10  | 1.42    | 1.4     | 1.35    | 1.31    | 1.3     | 1.11    | 1.17    |
| 11  | 1.55    | 1.55    | 1.46    | 1.49    | 1.38    | 1.24    | 1.29    |
| 12  | 1.43    | 1.35    | 1.37    | 1.33    | 1.33    | 1.11    | 1.17    |
| 13  | 1.63    | 1.53    | 1.49    | 1.5     | 1.42    | 1.28    | 1.32    |
| 14  | 1.62    | 1.58    | 1.49    | 1.51    | 1.43    | 1.27    | 1.32    |
| 15  | 1.62    | 1.6     | 1.47    | 1.5     | 1.45    | 1.28    | 1.32    |
| 16  | 1.45    | 1.41    | 1.38    | 1.32    | 1.45    | 1.18    | 1.21    |
| 17  | 1.13    | 1.22    | 1.25    | 1.26    | 1.32    | 1.16    | 1.16    |
| 18  | 1.12    | 1.2     | 1.24    | 1.23    | 1.31    | 1.15    | 1.14    |
| 19  | 1.13    | 1.21    | 1.24    | 1.24    | 1.31    | 1.2     | 1.21    |
| 20  | 1.16    | 1.23    | 1.25    | 1.27    | 1.34    | 1.16    | 1.17    |
| 21  | 1.15    | 1.25    | 1.27    | 1.26    | 1.33    | 1.22    | 1.23    |
| 22  | 1.17    | 1.25    | 1.27    | 1.26    | 1.33    | 1.23    | 1.23    |
| 23  | 1.16    | 1.26    | 1.27    | 1.27    | 1.33    | 1.25    | 1.23    |
| 24  | 1.22    | 1.26    | 1.25    | 1.26    | 1.33    | 1.18    | 1.16    |
| 25  | 1.29    | 1.18    | 1.12    | 1.05    | 1.15    | 1.12    | 1.09    |
| 26  | 1.21    | 1.18    | 1.11    | 1.03    | 1.18    | 1.08    | 1.08    |
| 27  | 1.28    | 1.26    | 1.19    | 1.1     | 1.13    | 1.1     | 1.1     |
| 28  | 1.2     | 1.17    | 1.12    | 1.05    | 1.21    | 1.12    | 1.1     |
| 29  | 1.29    | 1.29    | 1.19    | 1.11    | 1.2     | 1.11    | 1.1     |
| 30  | 1.31    | 1.29    | 1.19    | 1.13    | 1.2     | 1.13    | 1.11    |
| 31  | 1.3     | 1.28    | 1.19    | 1.11    | 1.21    | 1.11    | 1.1     |
| 32  | 1.22    | 1.21    | 1.15    | 1.06    | 1.27    | 1.12    | 1.13    |

│ Row │ NCols17 │ NCols18 │ NCols19 │ NCols20 │ NCols21 │ NCols22 │ NCols23 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 1.42    | 1.41    | 1.35    | 1.36    | 1.35    | 1.34    | 1.32    |
| 4   | 1.43    | 1.42    | 1.37    | 1.35    | 1.35    | 1.34    | 1.27    |
| 5   | 1.46    | 1.38    | 1.37    | 1.38    | 1.36    | 1.36    | 1.34    |
| 6   | 1.42    | 1.41    | 1.39    | 1.37    | 1.33    | 1.32    | 1.31    |
| 7   | 1.42    | 1.4     | 1.38    | 1.38    | 1.36    | 1.34    | 1.31    |
| 8   | 1.39    | 1.39    | 1.32    | 1.32    | 1.33    | 1.32    | 1.32    |
| 9   | 1.28    | 1.24    | 1.23    | 1.23    | 1.2     | 1.21    | 1.21    |
| 10  | 1.19    | 1.2     | 1.17    | 1.2     | 1.19    | 1.19    | 1.22    |
| 11  | 1.33    | 1.34    | 1.34    | 1.32    | 1.34    | 1.21    | 1.31    |
| 12  | 1.2     | 1.22    | 1.19    | 1.16    | 1.19    | 1.2     | 1.19    |
| 13  | 1.37    | 1.37    | 1.38    | 1.36    | 1.35    | 1.36    | 1.35    |
| 14  | 1.37    | 1.36    | 1.39    | 1.36    | 1.35    | 1.35    | 1.25    |
| 15  | 1.37    | 1.37    | 1.37    | 1.37    | 1.38    | 1.35    | 1.36    |
| 16  | 1.21    | 1.2     | 1.19    | 1.18    | 1.18    | 1.17    | 1.15    |
| 17  | 1.2     | 1.09    | 1.01    | 1.08    | 1.07    | 1.1     | 1.06    |
| 18  | 1.19    | 1.08    | 1.0     | 1.03    | 1.05    | 1.05    | 1.05    |
| 19  | 1.21    | 1.13    | 1.04    | 1.07    | 1.09    | 1.08    | 1.08    |
| 20  | 1.19    | 1.1     | 1.02    | 1.05    | 1.06    | 1.07    | 1.07    |
| 21  | 1.22    | 1.14    | 1.06    | 1.08    | 1.11    | 1.11    | 1.14    |
| 22  | 1.24    | 1.14    | 1.06    | 1.09    | 1.15    | 1.11    | 1.13    |
| 23  | 1.23    | 1.14    | 1.06    | 1.09    | 1.12    | 1.12    | 1.12    |
| 24  | 1.19    | 1.14    | 1.06    | 1.09    | 1.09    | 1.09    | 1.09    |
| 25  | 1.09    | 1.12    | 1.07    | 1.1     | 1.09    | 1.1     | 1.1     |
| 26  | 1.08    | 1.11    | 1.05    | 1.08    | 1.1     | 1.09    | 1.08    |
| 27  | 1.05    | 1.11    | 1.07    | 1.06    | 1.07    | 1.08    | 1.12    |
| 28  | 1.1     | 1.12    | 1.06    | 1.07    | 1.08    | 1.06    | 1.06    |
| 29  | 1.09    | 1.1     | 1.04    | 1.07    | 1.09    | 1.09    | 1.08    |
| 30  | 1.11    | 1.14    | 1.07    | 1.08    | 1.05    | 1.09    | 1.12    |
| 31  | 1.11    | 1.12    | 1.07    | 1.09    | 1.11    | 1.11    | 1.11    |
| 32  | 1.1     | 1.1     | 1.09    | 1.09    | 1.09    | 1.11    | 1.09    |

│ Row │ NCols24 │ NCols25 │ NCols26 │ NCols27 │ NCols28 │ NCols29 │ NCols30 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 1.32    | 1.31    | 1.28    | 1.24    | 1.29    | 1.27    | 1.24    |
| 4   | 1.3     | 1.28    | 1.31    | 1.23    | 1.27    | 1.26    | 1.25    |
| 5   | 1.3     | 1.29    | 1.31    | 1.27    | 1.27    | 1.21    | 1.25    |
| 6   | 1.32    | 1.32    | 1.31    | 1.27    | 1.24    | 1.27    | 1.26    |
| 7   | 1.32    | 1.3     | 1.12    | 1.09    | 1.25    | 1.24    | 1.27    |
| 8   | 1.3     | 1.26    | 1.29    | 1.27    | 1.27    | 1.27    | 1.25    |
| 9   | 1.2     | 1.19    | 1.18    | 1.19    | 1.1     | 1.02    | 1.04    |
| 10  | 1.15    | 1.16    | 1.17    | 1.16    | 1.08    | 0.948   | 1.01    |
| 11  | 1.28    | 1.3     | 1.31    | 1.29    | 1.2     | 1.09    | 1.1     |
| 12  | 1.19    | 1.18    | 1.19    | 1.17    | 1.1     | 1.02    | 1.03    |
| 13  | 1.32    | 1.32    | 1.33    | 1.33    | 1.23    | 1.12    | 1.13    |
| 14  | 1.33    | 1.35    | 1.33    | 1.32    | 1.24    | 1.12    | 1.14    |
| 15  | 1.37    | 1.35    | 1.34    | 1.33    | 1.24    | 1.13    | 1.13    |
| 16  | 1.16    | 1.19    | 1.18    | 1.19    | 1.14    | 1.06    | 1.07    |
| 17  | 1.06    | 1.08    | 1.07    | 1.09    | 1.02    | 1.06    | 1.08    |
| 18  | 1.04    | 1.07    | 1.04    | 1.04    | 1.01    | 1.02    | 1.03    |
| 19  | 1.09    | 1.04    | 1.1     | 1.14    | 1.08    | 1.14    | 1.11    |
| 20  | 1.06    | 1.07    | 1.05    | 1.11    | 1.01    | 1.03    | 1.03    |
| 21  | 1.12    | 1.13    | 1.11    | 1.16    | 1.1     | 1.13    | 1.13    |
| 22  | 1.11    | 1.12    | 1.06    | 1.16    | 1.1     | 1.13    | 1.11    |
| 23  | 1.11    | 1.12    | 1.11    | 1.18    | 1.08    | 1.12    | 1.12    |
| 24  | 1.08    | 1.1     | 1.08    | 1.09    | 1.04    | 1.06    | 1.07    |
| 25  | 1.09    | 1.05    | 1.06    | 1.11    | 1.1     | 1.08    | 1.06    |
| 26  | 1.08    | 1.05    | 1.02    | 1.1     | 1.09    | 1.08    | 1.06    |
| 27  | 1.07    | 1.03    | 1.02    | 1.04    | 1.08    | 1.07    | 1.06    |
| 28  | 1.09    | 1.03    | 1.04    | 1.08    | 1.06    | 1.05    | 1.05    |
| 29  | 1.08    | 1.04    | 1.05    | 1.12    | 1.12    | 1.11    | 1.07    |
| 30  | 1.06    | 1.04    | 1.04    | 1.14    | 1.08    | 1.08    | 1.07    |
| 31  | 1.09    | 1.05    | 1.07    | 1.11    | 1.05    | 1.07    | 1.08    |
| 32  | 1.1     | 1.05    | 1.07    | 1.1     | 1.11    | 1.1     | 1.08    |

│ Row │ NCols31 │ NCols32 │
│     │ Float64Float64 │
├─────┼─────────┼─────────┤
| 3   | 1.12    | 1.14    |
| 4   | 1.1     | 1.12    |
| 5   | 1.15    | 1.17    |
| 6   | 1.13    | 1.16    |
| 7   | 1.15    | 1.16    |
| 8   | 1.1     | 1.13    |
| 9   | 1.07    | 1.08    |
| 10  | 1.04    | 1.05    |
| 11  | 1.07    | 1.14    |
| 12  | 1.07    | 1.07    |
| 13  | 1.15    | 1.18    |
| 14  | 1.18    | 1.19    |
| 15  | 1.16    | 1.17    |
| 16  | 1.08    | 1.11    |
| 17  | 1.05    | 1.06    |
| 18  | 1.03    | 1.04    |
| 19  | 1.09    | 1.1     |
| 20  | 1.03    | 1.04    |
| 21  | 1.13    | 1.13    |
| 22  | 1.12    | 1.12    |
| 23  | 1.13    | 1.13    |
| 24  | 1.06    | 1.05    |
| 25  | 1.03    | 1.05    |
| 26  | 0.98    | 1.04    |
| 27  | 1.02    | 1.02    |
| 28  | 1.02    | 1.04    |
| 29  | 0.997   | 1.05    |
| 30  | 1.07    | 1.06    |
| 31  | 1.04    | 1.04    |
| 32  | 1.05    | 1.08    |

While still well behind unpadded PaddedMatrices, it takes an easy second place for the case where we do not have padding. It was actually slightly (negligibly?) faster for the cases of $M=26,N=31$ and $M=29,N=31$ (taking 2% and 0.03% less time), and within 20% for many of the remaining times.

Now, let us take a look at the padded case. Because Eigen didn’t do particularly well when we had multiples of 8 or 16 rows, we’d naturally guess that padding wouldn’t do it much good:

In [25]:
print_results("eigen", "native julia", minimum_times, padded = true)
30×30 DataFrame
│ Row │ NCols3  │ NCols4  │ NCols5  │ NCols6  │ NCols7  │ NCols8  │ NCols9  │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 5.64    | 4.06    | 4.72    | 4.98    | 4.76    | 3.96    | 4.26    |
| 4   | 5.6     | 4.12    | 4.76    | 4.9     | 5.04    | 3.7     | 4.35    |
| 5   | 5.89    | 4.35    | 5.04    | 5.28    | 5.44    | 4.06    | 4.57    |
| 6   | 5.81    | 4.43    | 5.04    | 5.29    | 5.61    | 4.3     | 4.65    |
| 7   | 5.75    | 4.43    | 4.65    | 5.37    | 5.43    | 4.07    | 4.56    |
| 8   | 5.94    | 4.09    | 5.16    | 5.25    | 5.21    | 4.35    | 4.42    |
| 9   | 8.1     | 6.09    | 6.48    | 6.43    | 6.55    | 4.91    | 5.29    |
| 10  | 8.06    | 6.41    | 6.58    | 6.43    | 6.57    | 4.81    | 5.23    |
| 11  | 8.14    | 6.33    | 6.49    | 6.44    | 6.55    | 5.09    | 5.1     |
| 12  | 8.22    | 6.21    | 6.45    | 6.54    | 6.57    | 5.1     | 5.37    |
| 13  | 8.1     | 6.38    | 6.5     | 6.43    | 6.54    | 5.08    | 5.36    |
| 14  | 8.16    | 6.22    | 6.64    | 6.46    | 6.56    | 5.07    | 5.35    |
| 15  | 8.19    | 5.97    | 6.63    | 6.44    | 6.54    | 5.03    | 5.38    |
| 16  | 7.65    | 6.16    | 6.55    | 6.45    | 6.54    | 4.96    | 5.23    |
| 17  | 6.96    | 5.1     | 4.96    | 4.91    | 4.86    | 3.89    | 4.05    |
| 18  | 6.71    | 5.0     | 4.91    | 5.07    | 5.0     | 4.0     | 4.13    |
| 19  | 6.65    | 4.89    | 4.81    | 4.84    | 4.99    | 3.98    | 4.12    |
| 20  | 6.96    | 5.01    | 4.76    | 5.07    | 4.85    | 3.98    | 4.08    |
| 21  | 6.95    | 5.07    | 4.92    | 5.07    | 4.99    | 4.0     | 3.95    |
| 22  | 6.68    | 4.93    | 4.78    | 4.84    | 4.96    | 3.99    | 4.14    |
| 23  | 6.56    | 5.01    | 4.9     | 5.09    | 4.84    | 3.98    | 4.15    |
| 24  | 6.91    | 5.01    | 4.88    | 4.7     | 4.75    | 4.03    | 4.15    |
| 25  | 7.44    | 5.09    | 5.31    | 5.05    | 4.56    | 3.8     | 4.12    |
| 26  | 7.45    | 5.06    | 5.26    | 5.21    | 4.54    | 3.83    | 4.08    |
| 27  | 8.01    | 5.09    | 5.0     | 5.07    | 4.55    | 3.81    | 4.07    |
| 28  | 7.46    | 5.05    | 5.23    | 5.17    | 4.56    | 3.79    | 4.02    |
| 29  | 7.69    | 5.09    | 5.1     | 5.06    | 4.57    | 3.84    | 4.14    |
| 30  | 7.68    | 5.09    | 5.27    | 5.1     | 4.57    | 3.78    | 4.05    |
| 31  | 7.65    | 4.73    | 5.15    | 5.13    | 4.55    | 3.81    | 4.11    |
| 32  | 7.7     | 4.83    | 4.98    | 5.18    | 4.57    | 3.74    | 4.14    |

│ Row │ NCols10 │ NCols11 │ NCols12 │ NCols13 │ NCols14 │ NCols15 │ NCols16 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 4.2     | 4.43    | 3.6     | 3.69    | 3.94    | 4.01    | 3.27    |
| 4   | 3.99    | 4.43    | 3.6     | 3.83    | 3.87    | 4.01    | 3.42    |
| 5   | 4.63    | 4.82    | 3.93    | 4.16    | 4.32    | 4.4     | 3.62    |
| 6   | 4.73    | 4.86    | 3.97    | 4.15    | 4.3     | 4.39    | 3.79    |
| 7   | 4.7     | 4.81    | 3.94    | 4.17    | 4.13    | 4.38    | 3.78    |
| 8   | 4.73    | 4.8     | 3.84    | 4.14    | 4.22    | 4.33    | 3.59    |
| 9   | 5.29    | 5.42    | 4.5     | 4.77    | 4.57    | 4.37    | 3.93    |
| 10  | 5.23    | 5.36    | 4.51    | 4.79    | 4.59    | 4.31    | 3.93    |
| 11  | 5.19    | 5.4     | 4.42    | 4.8     | 4.52    | 4.35    | 3.94    |
| 12  | 5.04    | 5.33    | 4.39    | 4.69    | 4.51    | 4.35    | 3.94    |
| 13  | 5.31    | 5.2     | 4.37    | 4.68    | 4.46    | 4.36    | 3.94    |
| 14  | 5.31    | 5.46    | 4.3     | 4.66    | 4.51    | 4.34    | 3.95    |
| 15  | 5.27    | 5.44    | 4.51    | 4.58    | 4.45    | 4.34    | 3.96    |
| 16  | 5.31    | 5.46    | 4.51    | 4.53    | 4.52    | 4.34    | 3.94    |
| 17  | 3.82    | 4.05    | 3.57    | 3.71    | 3.79    | 3.81    | 3.5     |
| 18  | 3.84    | 4.04    | 3.56    | 3.66    | 3.79    | 3.86    | 3.54    |
| 19  | 3.79    | 4.03    | 3.45    | 3.66    | 3.77    | 3.85    | 3.53    |
| 20  | 3.69    | 4.03    | 3.51    | 3.69    | 3.77    | 3.8     | 3.46    |
| 21  | 3.82    | 4.0     | 3.5     | 3.69    | 3.77    | 3.85    | 3.49    |
| 22  | 3.77    | 4.01    | 3.47    | 3.64    | 3.77    | 3.84    | 3.54    |
| 23  | 3.84    | 4.04    | 3.57    | 3.66    | 3.77    | 3.82    | 3.48    |
| 24  | 3.83    | 4.01    | 3.51    | 3.69    | 3.79    | 3.79    | 3.47    |
| 25  | 4.26    | 4.46    | 3.52    | 3.53    | 3.77    | 3.98    | 3.4     |
| 26  | 4.18    | 4.44    | 3.51    | 3.56    | 3.77    | 3.99    | 3.43    |
| 27  | 4.25    | 4.45    | 3.5     | 3.53    | 3.75    | 3.98    | 3.44    |
| 28  | 4.25    | 4.48    | 3.51    | 3.55    | 3.74    | 3.96    | 3.41    |
| 29  | 4.22    | 4.54    | 3.51    | 3.54    | 3.76    | 3.95    | 3.45    |
| 30  | 4.24    | 4.45    | 3.51    | 3.55    | 3.77    | 4.06    | 3.46    |
| 31  | 4.25    | 4.44    | 3.52    | 3.53    | 3.77    | 3.98    | 3.42    |
| 32  | 4.25    | 4.56    | 3.51    | 3.53    | 3.76    | 3.97    | 3.42    |

│ Row │ NCols17 │ NCols18 │ NCols19 │ NCols20 │ NCols21 │ NCols22 │ NCols23 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 3.54    | 3.42    | 3.78    | 3.22    | 3.4     | 3.51    | 3.5     |
| 4   | 3.59    | 3.57    | 3.71    | 3.18    | 3.4     | 3.42    | 3.53    |
| 5   | 3.93    | 3.94    | 4.02    | 3.58    | 3.71    | 3.72    | 3.93    |
| 6   | 3.69    | 3.94    | 4.05    | 3.69    | 3.72    | 3.75    | 3.91    |
| 7   | 3.88    | 3.94    | 4.0     | 3.52    | 3.71    | 3.74    | 3.9     |
| 8   | 4.05    | 4.01    | 4.06    | 3.58    | 3.69    | 3.7     | 3.92    |
| 9   | 4.04    | 4.25    | 4.34    | 3.93    | 4.09    | 4.16    | 4.37    |
| 10  | 4.07    | 4.18    | 4.35    | 3.99    | 4.09    | 4.16    | 4.45    |
| 11  | 4.08    | 4.24    | 4.36    | 3.92    | 4.1     | 4.13    | 4.35    |
| 12  | 4.15    | 4.25    | 4.29    | 3.85    | 4.1     | 4.17    | 4.32    |
| 13  | 4.08    | 4.23    | 4.35    | 3.76    | 4.05    | 4.16    | 4.33    |
| 14  | 4.08    | 4.24    | 4.36    | 3.9     | 4.05    | 4.12    | 4.3     |
| 15  | 4.15    | 4.22    | 4.35    | 3.93    | 4.2     | 4.13    | 4.25    |
| 16  | 4.18    | 4.16    | 4.4     | 3.9     | 4.11    | 4.16    | 4.33    |
| 17  | 3.62    | 3.54    | 3.48    | 3.25    | 3.47    | 3.54    | 3.64    |
| 18  | 3.6     | 3.49    | 3.49    | 3.28    | 3.43    | 3.54    | 3.61    |
| 19  | 3.6     | 3.53    | 3.51    | 3.27    | 3.42    | 3.56    | 3.61    |
| 20  | 3.57    | 3.52    | 3.48    | 3.29    | 3.43    | 3.54    | 3.64    |
| 21  | 3.52    | 3.53    | 3.56    | 3.28    | 3.41    | 3.54    | 3.65    |
| 22  | 3.57    | 3.61    | 3.5     | 3.3     | 3.45    | 3.55    | 3.61    |
| 23  | 3.63    | 3.51    | 3.47    | 3.27    | 3.43    | 3.53    | 3.66    |
| 24  | 3.58    | 3.53    | 3.48    | 3.27    | 3.48    | 3.53    | 3.64    |
| 25  | 3.68    | 3.87    | 3.88    | 3.41    | 3.67    | 3.81    | 3.93    |
| 26  | 3.65    | 3.87    | 3.88    | 3.46    | 3.75    | 3.84    | 3.98    |
| 27  | 3.7     | 3.89    | 3.77    | 3.49    | 3.68    | 3.87    | 3.94    |
| 28  | 3.69    | 3.88    | 3.91    | 3.47    | 3.67    | 3.79    | 3.92    |
| 29  | 3.68    | 3.82    | 3.83    | 3.45    | 3.68    | 3.86    | 3.89    |
| 30  | 3.68    | 3.88    | 3.88    | 3.43    | 3.64    | 3.87    | 4.04    |
| 31  | 3.7     | 3.93    | 3.85    | 3.46    | 3.7     | 3.85    | 3.93    |
| 32  | 3.66    | 3.87    | 3.81    | 3.48    | 3.7     | 3.86    | 3.94    |

│ Row │ NCols24 │ NCols25 │ NCols26 │ NCols27 │ NCols28 │ NCols29 │ NCols30 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 3.15    | 3.33    | 3.24    | 3.36    | 3.09    | 3.22    | 3.27    |
| 4   | 3.12    | 3.29    | 3.33    | 3.3     | 3.22    | 3.24    | 3.36    |
| 5   | 3.46    | 3.62    | 3.67    | 3.86    | 3.34    | 3.48    | 3.58    |
| 6   | 3.48    | 3.63    | 3.8     | 3.63    | 3.36    | 3.5     | 3.56    |
| 7   | 3.33    | 3.61    | 3.77    | 3.85    | 3.34    | 3.49    | 3.6     |
| 8   | 3.47    | 3.42    | 3.51    | 3.75    | 3.44    | 3.56    | 3.74    |
| 9   | 3.93    | 4.12    | 4.12    | 4.32    | 3.91    | 3.68    | 3.86    |
| 10  | 3.9     | 4.08    | 4.33    | 4.31    | 3.88    | 3.69    | 3.86    |
| 11  | 3.91    | 4.08    | 4.16    | 4.34    | 3.85    | 3.67    | 3.85    |
| 12  | 3.91    | 4.09    | 4.16    | 4.33    | 3.82    | 3.76    | 3.92    |
| 13  | 3.91    | 4.18    | 4.2     | 4.34    | 3.91    | 3.69    | 3.81    |
| 14  | 3.93    | 4.08    | 4.22    | 4.34    | 3.89    | 3.69    | 3.87    |
| 15  | 3.88    | 4.08    | 4.15    | 4.34    | 3.83    | 3.69    | 3.85    |
| 16  | 3.88    | 4.19    | 4.28    | 4.47    | 3.87    | 3.67    | 3.84    |
| 17  | 3.37    | 3.48    | 3.59    | 3.6     | 3.26    | 3.41    | 3.55    |
| 18  | 3.39    | 3.51    | 3.55    | 3.63    | 3.15    | 3.46    | 3.56    |
| 19  | 3.38    | 3.49    | 3.62    | 3.66    | 3.18    | 3.54    | 3.57    |
| 20  | 3.43    | 3.48    | 3.55    | 3.64    | 3.25    | 3.42    | 3.55    |
| 21  | 3.24    | 3.5     | 3.55    | 3.62    | 3.17    | 3.47    | 3.56    |
| 22  | 3.38    | 3.49    | 3.53    | 3.66    | 3.33    | 3.5     | 3.5     |
| 23  | 3.38    | 3.5     | 3.56    | 3.65    | 3.27    | 3.45    | 3.63    |
| 24  | 3.33    | 3.47    | 3.62    | 3.63    | 3.34    | 3.47    | 3.54    |
| 25  | 3.53    | 3.51    | 3.71    | 3.9     | 3.39    | 3.67    | 3.66    |
| 26  | 3.53    | 3.57    | 3.72    | 3.85    | 3.39    | 3.63    | 3.66    |
| 27  | 3.48    | 3.56    | 3.71    | 3.91    | 3.4     | 3.68    | 3.69    |
| 28  | 3.53    | 3.5     | 3.68    | 3.88    | 3.42    | 3.73    | 3.68    |
| 29  | 3.52    | 3.57    | 3.73    | 3.88    | 3.55    | 3.6     | 3.74    |
| 30  | 3.54    | 3.54    | 3.71    | 3.87    | 3.38    | 3.66    | 3.68    |
| 31  | 3.6     | 3.57    | 3.67    | 3.81    | 3.49    | 3.64    | 3.68    |
| 32  | 3.53    | 3.52    | 3.68    | 3.88    | 3.48    | 3.68    | 3.76    |

│ Row │ NCols31 │ NCols32 │
│     │ Float64Float64 │
├─────┼─────────┼─────────┤
| 3   | 2.94    | 2.76    |
| 4   | 2.95    | 2.79    |
| 5   | 3.27    | 3.1     |
| 6   | 3.27    | 3.09    |
| 7   | 3.26    | 3.18    |
| 8   | 3.27    | 3.1     |
| 9   | 4.02    | 3.77    |
| 10  | 4.01    | 3.64    |
| 11  | 3.99    | 3.75    |
| 12  | 4.0     | 3.78    |
| 13  | 3.97    | 3.86    |
| 14  | 4.0     | 3.75    |
| 15  | 4.0     | 3.77    |
| 16  | 4.01    | 3.88    |
| 17  | 3.69    | 3.45    |
| 18  | 3.72    | 3.4     |
| 19  | 3.67    | 3.41    |
| 20  | 3.67    | 3.47    |
| 21  | 3.66    | 3.48    |
| 22  | 3.68    | 3.42    |
| 23  | 3.7     | 3.47    |
| 24  | 3.69    | 3.34    |
| 25  | 3.62    | 3.35    |
| 26  | 3.63    | 3.34    |
| 27  | 3.59    | 3.44    |
| 28  | 3.63    | 3.34    |
| 29  | 3.61    | 3.36    |
| 30  | 3.59    | 3.36    |
| 31  | 3.62    | 3.37    |
| 32  | 3.59    | 3.34    |

Padding did not do it much good.

Blaze, on the otherhand, implements automatic padding. So between that $-$ meaning we expect it to be optimized for $M$ being a multiple of 8 $-$ and its good performance earlier when $M=8$, we expect it to do well:

In [26]:
print_results("blaze", "native julia", minimum_times, padded = true)
30×30 DataFrame
│ Row │ NCols3  │ NCols4  │ NCols5  │ NCols6  │ NCols7  │ NCols8  │ NCols9  │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 0.992   | 0.957   | 1.26    | 1.27    | 1.11    | 1.22    | 1.44    |
| 4   | 2.68    | 2.8     | 3.44    | 3.41    | 3.75    | 3.52    | 4.11    |
| 5   | 0.78    | 0.841   | 1.16    | 1.19    | 1.06    | 1.08    | 1.29    |
| 6   | 0.783   | 0.855   | 1.18    | 1.2     | 1.11    | 1.14    | 1.29    |
| 7   | 0.782   | 0.858   | 1.08    | 1.2     | 1.06    | 1.08    | 1.28    |
| 8   | 0.801   | 0.828   | 1.22    | 1.19    | 1.06    | 1.19    | 1.27    |
| 9   | 1.23    | 1.11    | 1.33    | 1.21    | 1.25    | 1.11    | 1.22    |
| 10  | 1.23    | 1.14    | 1.27    | 1.21    | 1.31    | 1.08    | 1.2     |
| 11  | 1.22    | 1.12    | 1.29    | 1.21    | 1.24    | 1.18    | 1.15    |
| 12  | 1.24    | 1.11    | 1.27    | 1.22    | 1.25    | 1.17    | 1.25    |
| 13  | 1.21    | 1.13    | 1.28    | 1.21    | 1.24    | 1.16    | 1.24    |
| 14  | 1.23    | 1.1     | 1.3     | 1.21    | 1.3     | 1.17    | 1.24    |
| 15  | 1.24    | 1.06    | 1.28    | 1.22    | 1.24    | 1.17    | 1.26    |
| 16  | 1.22    | 1.21    | 1.45    | 1.26    | 1.24    | 1.17    | 1.22    |
| 17  | 1.52    | 1.32    | 1.46    | 1.3     | 1.38    | 1.28    | 1.37    |
| 18  | 1.47    | 1.34    | 1.46    | 1.33    | 1.43    | 1.32    | 1.4     |
| 19  | 1.45    | 1.31    | 1.43    | 1.33    | 1.42    | 1.32    | 1.39    |
| 20  | 1.52    | 1.34    | 1.43    | 1.33    | 1.38    | 1.31    | 1.4     |
| 21  | 1.52    | 1.34    | 1.45    | 1.33    | 1.42    | 1.33    | 1.34    |
| 22  | 1.44    | 1.3     | 1.42    | 1.33    | 1.46    | 1.31    | 1.4     |
| 23  | 1.43    | 1.34    | 1.46    | 1.32    | 1.37    | 1.32    | 1.4     |
| 24  | 1.55    | 1.34    | 1.48    | 1.31    | 1.37    | 1.34    | 1.42    |
| 25  | 1.35    | 1.15    | 1.28    | 1.11    | 1.09    | 1.08    | 1.19    |
| 26  | 1.35    | 1.16    | 1.27    | 1.13    | 1.07    | 1.08    | 1.16    |
| 27  | 1.39    | 1.14    | 1.25    | 1.1     | 1.08    | 1.08    | 1.18    |
| 28  | 1.35    | 1.16    | 1.27    | 1.12    | 1.08    | 1.07    | 1.17    |
| 29  | 1.38    | 1.17    | 1.24    | 1.1     | 1.07    | 1.08    | 1.21    |
| 30  | 1.4     | 1.16    | 1.28    | 1.11    | 1.08    | 1.07    | 1.18    |
| 31  | 1.39    | 1.09    | 1.25    | 1.12    | 1.07    | 1.07    | 1.18    |
| 32  | 1.41    | 1.12    | 1.24    | 1.13    | 1.09    | 1.07    | 1.2     |

│ Row │ NCols10 │ NCols11 │ NCols12 │ NCols13 │ NCols14 │ NCols15 │ NCols16 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 1.35    | 1.29    | 1.28    | 1.32    | 1.36    | 1.3     | 1.22    |
| 4   | 3.79    | 4.23    | 4.04    | 4.23    | 4.5     | 4.25    | 4.11    |
| 5   | 1.26    | 1.2     | 1.18    | 1.29    | 1.25    | 1.2     | 1.12    |
| 6   | 1.26    | 1.19    | 1.19    | 1.29    | 1.27    | 1.21    | 1.19    |
| 7   | 1.26    | 1.19    | 1.19    | 1.29    | 1.22    | 1.28    | 1.18    |
| 8   | 1.26    | 1.23    | 1.18    | 1.32    | 1.28    | 1.22    | 1.15    |
| 9   | 1.19    | 1.2     | 1.12    | 1.18    | 1.16    | 1.01    | 1.05    |
| 10  | 1.14    | 1.18    | 1.13    | 1.17    | 1.15    | 1.01    | 1.04    |
| 11  | 1.13    | 1.18    | 1.12    | 1.17    | 1.15    | 1.04    | 1.04    |
| 12  | 1.1     | 1.16    | 1.11    | 1.16    | 1.15    | 1.02    | 1.05    |
| 13  | 1.16    | 1.15    | 1.09    | 1.15    | 1.13    | 1.01    | 1.07    |
| 14  | 1.17    | 1.2     | 1.09    | 1.14    | 1.12    | 1.0     | 1.05    |
| 15  | 1.2     | 1.18    | 1.12    | 1.12    | 1.16    | 1.0     | 1.05    |
| 16  | 1.18    | 1.17    | 1.16    | 1.12    | 1.14    | 1.02    | 1.07    |
| 17  | 1.21    | 1.32    | 1.26    | 1.32    | 1.26    | 1.37    | 1.34    |
| 18  | 1.19    | 1.31    | 1.26    | 1.31    | 1.31    | 1.32    | 1.3     |
| 19  | 1.19    | 1.31    | 1.27    | 1.3     | 1.26    | 1.32    | 1.35    |
| 20  | 1.15    | 1.37    | 1.24    | 1.31    | 1.32    | 1.3     | 1.32    |
| 21  | 1.19    | 1.31    | 1.24    | 1.39    | 1.26    | 1.33    | 1.28    |
| 22  | 1.19    | 1.3     | 1.23    | 1.36    | 1.26    | 1.32    | 1.3     |
| 23  | 1.19    | 1.31    | 1.26    | 1.31    | 1.26    | 1.3     | 1.28    |
| 24  | 1.2     | 1.32    | 1.25    | 1.32    | 1.28    | 1.3     | 1.28    |
| 25  | 1.12    | 1.2     | 1.08    | 1.07    | 1.05    | 1.18    | 1.07    |
| 26  | 1.1     | 1.24    | 1.09    | 1.07    | 1.06    | 1.13    | 1.09    |
| 27  | 1.12    | 1.24    | 1.08    | 1.06    | 1.05    | 1.18    | 1.07    |
| 28  | 1.17    | 1.2     | 1.08    | 1.06    | 1.06    | 1.11    | 1.07    |
| 29  | 1.12    | 1.21    | 1.08    | 1.06    | 1.05    | 1.11    | 1.08    |
| 30  | 1.11    | 1.18    | 1.09    | 1.06    | 1.05    | 1.13    | 1.07    |
| 31  | 1.12    | 1.19    | 1.15    | 1.06    | 1.06    | 1.12    | 1.07    |
| 32  | 1.13    | 1.22    | 1.14    | 1.16    | 1.06    | 1.12    | 1.14    |

│ Row │ NCols17 │ NCols18 │ NCols19 │ NCols20 │ NCols21 │ NCols22 │ NCols23 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 1.34    | 1.27    | 1.28    | 1.27    | 1.34    | 1.34    | 1.28    |
| 4   | 4.26    | 4.15    | 4.6     | 4.09    | 4.26    | 4.15    | 4.24    |
| 5   | 1.26    | 1.21    | 1.17    | 1.19    | 1.26    | 1.25    | 1.21    |
| 6   | 1.18    | 1.22    | 1.18    | 1.19    | 1.26    | 1.25    | 1.21    |
| 7   | 1.24    | 1.29    | 1.17    | 1.15    | 1.26    | 1.33    | 1.21    |
| 8   | 1.3     | 1.24    | 1.22    | 1.21    | 1.26    | 1.26    | 1.23    |
| 9   | 1.09    | 1.08    | 1.1     | 1.08    | 1.17    | 1.11    | 1.13    |
| 10  | 1.1     | 1.07    | 1.1     | 1.07    | 1.17    | 1.12    | 1.12    |
| 11  | 1.1     | 1.08    | 1.12    | 1.15    | 1.13    | 1.12    | 1.12    |
| 12  | 1.12    | 1.08    | 1.12    | 1.06    | 1.13    | 1.11    | 1.12    |
| 13  | 1.1     | 1.08    | 1.11    | 1.04    | 1.12    | 1.11    | 1.12    |
| 14  | 1.1     | 1.11    | 1.1     | 1.07    | 1.11    | 1.09    | 1.16    |
| 15  | 1.12    | 1.08    | 1.1     | 1.08    | 1.13    | 1.11    | 1.11    |
| 16  | 1.12    | 1.07    | 1.12    | 1.1     | 1.13    | 1.11    | 1.17    |
| 17  | 1.33    | 1.22    | 1.21    | 1.23    | 1.27    | 1.24    | 1.42    |
| 18  | 1.32    | 1.27    | 1.21    | 1.21    | 1.26    | 1.24    | 1.26    |
| 19  | 1.33    | 1.21    | 1.21    | 1.2     | 1.26    | 1.23    | 1.27    |
| 20  | 1.36    | 1.21    | 1.2     | 1.27    | 1.26    | 1.24    | 1.28    |
| 21  | 1.3     | 1.21    | 1.21    | 1.22    | 1.26    | 1.24    | 1.28    |
| 22  | 1.38    | 1.22    | 1.2     | 1.21    | 1.26    | 1.23    | 1.27    |
| 23  | 1.33    | 1.22    | 1.2     | 1.2     | 1.26    | 1.24    | 1.28    |
| 24  | 1.32    | 1.23    | 1.21    | 1.21    | 1.33    | 1.24    | 1.28    |
| 25  | 1.12    | 1.09    | 1.1     | 1.06    | 1.12    | 1.07    | 1.18    |
| 26  | 1.12    | 1.09    | 1.09    | 1.08    | 1.13    | 1.13    | 1.12    |
| 27  | 1.12    | 1.09    | 1.12    | 1.07    | 1.12    | 1.09    | 1.21    |
| 28  | 1.12    | 1.09    | 1.09    | 1.07    | 1.11    | 1.12    | 1.13    |
| 29  | 1.12    | 1.07    | 1.08    | 1.07    | 1.12    | 1.09    | 1.22    |
| 30  | 1.11    | 1.14    | 1.06    | 1.06    | 1.16    | 1.09    | 1.13    |
| 31  | 1.12    | 1.09    | 1.08    | 1.12    | 1.12    | 1.08    | 1.12    |
| 32  | 1.12    | 1.09    | 1.07    | 1.07    | 1.13    | 1.09    | 1.13    |

│ Row │ NCols24 │ NCols25 │ NCols26 │ NCols27 │ NCols28 │ NCols29 │ NCols30 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 1.26    | 1.34    | 1.3     | 1.27    | 1.29    | 1.35    | 1.35    |
| 4   | 4.12    | 4.23    | 4.08    | 4.4     | 4.11    | 4.08    | 4.55    |
| 5   | 1.18    | 1.26    | 1.23    | 1.19    | 1.18    | 1.25    | 1.23    |
| 6   | 1.19    | 1.25    | 1.23    | 1.15    | 1.19    | 1.26    | 1.22    |
| 7   | 1.14    | 1.26    | 1.23    | 1.19    | 1.18    | 1.26    | 1.24    |
| 8   | 1.21    | 1.2     | 1.2     | 1.21    | 1.23    | 1.29    | 1.26    |
| 9   | 1.1     | 1.14    | 1.11    | 1.12    | 1.13    | 1.03    | 1.03    |
| 10  | 1.09    | 1.15    | 1.13    | 1.11    | 1.07    | 1.02    | 1.03    |
| 11  | 1.09    | 1.15    | 1.13    | 1.13    | 1.07    | 1.02    | 1.02    |
| 12  | 1.1     | 1.14    | 1.12    | 1.13    | 1.07    | 1.02    | 1.03    |
| 13  | 1.17    | 1.13    | 1.13    | 1.14    | 1.07    | 1.03    | 1.03    |
| 14  | 1.09    | 1.14    | 1.13    | 1.13    | 1.08    | 1.1     | 1.03    |
| 15  | 1.11    | 1.2     | 1.13    | 1.14    | 1.07    | 1.03    | 1.04    |
| 16  | 1.1     | 1.15    | 1.15    | 1.14    | 1.15    | 1.0     | 1.04    |
| 17  | 1.24    | 1.3     | 1.24    | 1.28    | 1.21    | 1.24    | 1.24    |
| 18  | 1.24    | 1.34    | 1.24    | 1.28    | 1.2     | 1.25    | 1.33    |
| 19  | 1.24    | 1.27    | 1.24    | 1.29    | 1.21    | 1.26    | 1.23    |
| 20  | 1.25    | 1.27    | 1.31    | 1.28    | 1.21    | 1.24    | 1.24    |
| 21  | 1.3     | 1.28    | 1.24    | 1.28    | 1.27    | 1.26    | 1.25    |
| 22  | 1.24    | 1.27    | 1.24    | 1.29    | 1.22    | 1.24    | 1.29    |
| 23  | 1.24    | 1.28    | 1.29    | 1.28    | 1.25    | 1.25    | 1.24    |
| 24  | 1.27    | 1.27    | 1.25    | 1.28    | 1.22    | 1.26    | 1.24    |
| 25  | 1.09    | 1.07    | 1.15    | 1.12    | 1.09    | 1.11    | 1.07    |
| 26  | 1.09    | 1.08    | 1.13    | 1.11    | 1.14    | 1.11    | 1.08    |
| 27  | 1.08    | 1.13    | 1.07    | 1.12    | 1.09    | 1.11    | 1.08    |
| 28  | 1.06    | 1.06    | 1.06    | 1.2     | 1.08    | 1.11    | 1.08    |
| 29  | 1.08    | 1.09    | 1.16    | 1.11    | 1.09    | 1.09    | 1.07    |
| 30  | 1.09    | 1.08    | 1.07    | 1.1     | 1.09    | 1.12    | 1.09    |
| 31  | 1.09    | 1.09    | 1.08    | 1.11    | 1.07    | 1.11    | 1.08    |
| 32  | 1.09    | 1.16    | 1.07    | 1.18    | 1.17    | 1.19    | 1.09    |

│ Row │ NCols31 │ NCols32 │
│     │ Float64Float64 │
├─────┼─────────┼─────────┤
| 3   | 1.14    | 1.16    |
| 4   | 3.67    | 3.71    |
| 5   | 1.07    | 1.1     |
| 6   | 1.07    | 1.1     |
| 7   | 1.07    | 1.1     |
| 8   | 1.08    | 1.11    |
| 9   | 1.12    | 1.08    |
| 10  | 1.06    | 1.05    |
| 11  | 1.06    | 1.05    |
| 12  | 1.06    | 1.05    |
| 13  | 1.05    | 1.13    |
| 14  | 1.07    | 1.05    |
| 15  | 1.06    | 1.05    |
| 16  | 1.08    | 1.09    |
| 17  | 1.26    | 1.25    |
| 18  | 1.28    | 1.26    |
| 19  | 1.32    | 1.32    |
| 20  | 1.26    | 1.26    |
| 21  | 1.26    | 1.26    |
| 22  | 1.27    | 1.24    |
| 23  | 1.27    | 1.26    |
| 24  | 1.28    | 1.26    |
| 25  | 1.07    | 1.07    |
| 26  | 1.08    | 1.06    |
| 27  | 1.06    | 1.06    |
| 28  | 1.08    | 1.06    |
| 29  | 1.07    | 1.07    |
| 30  | 1.07    | 1.16    |
| 31  | 1.07    | 1.07    |
| 32  | 1.07    | 1.07    |

Now it beat PaddedMatrices when $M\leq8$ (and $M\neq4$), and $N=3$ or $N=4$. I’ll look into that later to improve performance for that case.

Other than this case, Blaze comes close but lags behind.

gfortran did worse than Julia across the board, so we’re not expecting any changes there.
The only thing I’m looking for is gfortran with tree-cunrolli doing better when $M\leq8$ and without doing better when $9\leq M\leq16$. Beyond that, they should perform similarly.

In [27]:
print_results("gfortran", "native julia", minimum_times, padded = true)
30×30 DataFrame
│ Row │ NCols3  │ NCols4  │ NCols5  │ NCols6  │ NCols7  │ NCols8  │ NCols9  │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 2.8     | 3.09    | 3.4     | 3.73    | 3.46    | 3.75    | 3.65    |
| 4   | 2.78    | 3.13    | 3.43    | 3.65    | 3.67    | 3.44    | 3.79    |
| 5   | 3.03    | 3.74    | 3.89    | 4.22    | 4.23    | 4.22    | 4.29    |
| 6   | 3.08    | 3.76    | 3.93    | 4.25    | 4.38    | 4.46    | 4.28    |
| 7   | 3.03    | 3.76    | 3.58    | 4.28    | 4.21    | 4.23    | 4.29    |
| 8   | 3.04    | 3.4     | 3.93    | 4.16    | 4.04    | 4.47    | 4.15    |
| 9   | 3.2     | 3.62    | 3.54    | 3.44    | 3.41    | 3.28    | 3.28    |
| 10  | 3.11    | 3.65    | 3.45    | 3.45    | 3.41    | 3.21    | 3.24    |
| 11  | 3.15    | 3.63    | 3.48    | 3.44    | 3.4     | 3.37    | 3.22    |
| 12  | 3.19    | 3.61    | 3.51    | 3.44    | 3.42    | 3.36    | 3.33    |
| 13  | 3.13    | 3.66    | 3.53    | 3.44    | 3.44    | 3.37    | 3.32    |
| 14  | 3.16    | 3.57    | 3.54    | 3.45    | 3.41    | 3.36    | 3.33    |
| 15  | 3.16    | 3.47    | 3.51    | 3.44    | 3.41    | 3.36    | 3.33    |
| 16  | 2.95    | 3.59    | 3.46    | 3.45    | 3.4     | 3.32    | 3.26    |
| 17  | 2.69    | 2.72    | 2.73    | 2.64    | 2.62    | 2.56    | 2.57    |
| 18  | 2.59    | 2.73    | 2.72    | 2.67    | 2.65    | 2.63    | 2.62    |
| 19  | 2.58    | 2.69    | 2.67    | 2.67    | 2.67    | 2.63    | 2.62    |
| 20  | 2.7     | 2.72    | 2.66    | 2.68    | 2.61    | 2.62    | 2.61    |
| 21  | 2.69    | 2.73    | 2.71    | 2.68    | 2.65    | 2.63    | 2.51    |
| 22  | 2.57    | 2.68    | 2.67    | 2.68    | 2.66    | 2.66    | 2.62    |
| 23  | 2.54    | 2.73    | 2.72    | 2.67    | 2.59    | 2.62    | 2.61    |
| 24  | 2.68    | 2.72    | 2.72    | 2.6     | 2.56    | 2.63    | 2.62    |
| 25  | 2.43    | 2.5     | 2.53    | 2.47    | 2.24    | 2.39    | 2.48    |
| 26  | 2.43    | 2.52    | 2.53    | 2.53    | 2.22    | 2.4     | 2.43    |
| 27  | 2.51    | 2.5     | 2.4     | 2.47    | 2.23    | 2.42    | 2.45    |
| 28  | 2.43    | 2.51    | 2.52    | 2.52    | 2.22    | 2.39    | 2.44    |
| 29  | 2.51    | 2.51    | 2.46    | 2.46    | 2.22    | 2.39    | 2.47    |
| 30  | 2.51    | 2.52    | 2.53    | 2.48    | 2.22    | 2.38    | 2.45    |
| 31  | 2.6     | 2.35    | 2.48    | 2.5     | 2.21    | 2.4     | 2.46    |
| 32  | 2.51    | 2.44    | 2.4     | 2.52    | 2.22    | 2.37    | 2.47    |

│ Row │ NCols10 │ NCols11 │ NCols12 │ NCols13 │ NCols14 │ NCols15 │ NCols16 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 3.8     | 3.85    | 3.85    | 3.73    | 3.89    | 3.88    | 3.71    |
| 4   | 3.61    | 3.86    | 3.85    | 3.87    | 3.81    | 3.88    | 3.88    |
| 5   | 4.28    | 4.31    | 4.22    | 4.3     | 4.29    | 4.27    | 4.03    |
| 6   | 4.27    | 4.3     | 4.24    | 4.29    | 4.29    | 4.31    | 4.25    |
| 7   | 4.28    | 4.3     | 4.22    | 4.3     | 4.1     | 4.3     | 4.25    |
| 8   | 4.28    | 4.3     | 4.1     | 4.3     | 4.27    | 4.25    | 4.03    |
| 9   | 3.33    | 3.3     | 3.29    | 3.29    | 3.28    | 2.87    | 3.01    |
| 10  | 3.27    | 3.31    | 3.36    | 3.29    | 3.29    | 2.88    | 3.02    |
| 11  | 3.3     | 3.25    | 3.24    | 3.29    | 3.33    | 2.88    | 3.03    |
| 12  | 3.2     | 3.29    | 3.31    | 3.24    | 3.28    | 2.87    | 3.04    |
| 13  | 3.33    | 3.21    | 3.21    | 3.24    | 3.24    | 2.87    | 3.02    |
| 14  | 3.37    | 3.36    | 3.15    | 3.2     | 3.25    | 2.87    | 3.04    |
| 15  | 3.37    | 3.36    | 3.29    | 3.16    | 3.23    | 2.87    | 3.04    |
| 16  | 3.32    | 3.36    | 3.29    | 3.13    | 3.24    | 2.87    | 3.03    |
| 17  | 2.38    | 2.51    | 2.52    | 2.5     | 2.53    | 2.47    | 2.51    |
| 18  | 2.38    | 2.49    | 2.52    | 2.49    | 2.5     | 2.49    | 2.53    |
| 19  | 2.38    | 2.49    | 2.45    | 2.47    | 2.5     | 2.49    | 2.53    |
| 20  | 2.3     | 2.49    | 2.48    | 2.5     | 2.5     | 2.46    | 2.48    |
| 21  | 2.38    | 2.49    | 2.48    | 2.49    | 2.53    | 2.52    | 2.49    |
| 22  | 2.38    | 2.49    | 2.46    | 2.48    | 2.51    | 2.49    | 2.55    |
| 23  | 2.41    | 2.49    | 2.52    | 2.49    | 2.54    | 2.47    | 2.49    |
| 24  | 2.38    | 2.49    | 2.48    | 2.49    | 2.5     | 2.44    | 2.52    |
| 25  | 2.52    | 2.53    | 2.43    | 2.29    | 2.38    | 2.41    | 2.42    |
| 26  | 2.48    | 2.52    | 2.42    | 2.27    | 2.37    | 2.41    | 2.43    |
| 27  | 2.5     | 2.52    | 2.43    | 2.27    | 2.37    | 2.41    | 2.42    |
| 28  | 2.51    | 2.53    | 2.43    | 2.28    | 2.37    | 2.4     | 2.42    |
| 29  | 2.48    | 2.52    | 2.43    | 2.27    | 2.37    | 2.4     | 2.43    |
| 30  | 2.5     | 2.52    | 2.43    | 2.27    | 2.39    | 2.41    | 2.42    |
| 31  | 2.52    | 2.52    | 2.43    | 2.27    | 2.37    | 2.41    | 2.42    |
| 32  | 2.53    | 2.53    | 2.43    | 2.27    | 2.37    | 2.4     | 2.41    |

│ Row │ NCols17 │ NCols18 │ NCols19 │ NCols20 │ NCols21 │ NCols22 │ NCols23 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 3.66    | 3.57    | 3.72    | 3.71    | 3.71    | 3.69    | 3.64    |
| 4   | 3.72    | 3.72    | 3.72    | 3.66    | 3.71    | 3.69    | 3.67    |
| 5   | 4.31    | 4.17    | 4.15    | 4.18    | 4.17    | 4.19    | 4.22    |
| 6   | 3.95    | 4.2     | 4.19    | 4.18    | 4.17    | 4.17    | 4.22    |
| 7   | 4.16    | 4.17    | 4.15    | 4.1     | 4.17    | 4.18    | 4.22    |
| 8   | 4.25    | 4.25    | 4.23    | 4.19    | 4.13    | 4.17    | 4.22    |
| 9   | 3.0     | 3.04    | 3.04    | 3.08    | 3.09    | 3.07    | 3.07    |
| 10  | 3.03    | 3.01    | 3.04    | 3.09    | 3.07    | 3.08    | 3.04    |
| 11  | 3.03    | 3.04    | 3.03    | 3.05    | 3.08    | 3.05    | 3.04    |
| 12  | 3.11    | 3.05    | 2.99    | 3.02    | 3.09    | 3.07    | 3.03    |
| 13  | 3.03    | 3.05    | 3.05    | 2.94    | 3.05    | 3.07    | 3.04    |
| 14  | 3.08    | 3.08    | 3.07    | 3.07    | 3.03    | 3.03    | 3.01    |
| 15  | 3.08    | 3.04    | 3.05    | 3.06    | 3.07    | 3.03    | 3.01    |
| 16  | 3.08    | 3.03    | 3.05    | 3.05    | 3.07    | 3.06    | 3.03    |
| 17  | 2.5     | 2.36    | 2.26    | 2.32    | 2.36    | 2.38    | 2.4     |
| 18  | 2.48    | 2.36    | 2.29    | 2.37    | 2.36    | 2.38    | 2.37    |
| 19  | 2.51    | 2.36    | 2.29    | 2.35    | 2.35    | 2.36    | 2.38    |
| 20  | 2.47    | 2.36    | 2.26    | 2.37    | 2.38    | 2.38    | 2.39    |
| 21  | 2.43    | 2.36    | 2.26    | 2.35    | 2.36    | 2.38    | 2.39    |
| 22  | 2.48    | 2.37    | 2.26    | 2.35    | 2.35    | 2.36    | 2.38    |
| 23  | 2.5     | 2.36    | 2.26    | 2.34    | 2.35    | 2.38    | 2.4     |
| 24  | 2.47    | 2.37    | 2.29    | 2.37    | 2.35    | 2.38    | 2.39    |
| 25  | 2.43    | 2.48    | 2.37    | 2.4     | 2.45    | 2.42    | 2.48    |
| 26  | 2.42    | 2.46    | 2.37    | 2.44    | 2.47    | 2.43    | 2.44    |
| 27  | 2.43    | 2.46    | 2.29    | 2.4     | 2.46    | 2.46    | 2.47    |
| 28  | 2.42    | 2.46    | 2.37    | 2.43    | 2.44    | 2.4     | 2.48    |
| 29  | 2.43    | 2.42    | 2.34    | 2.41    | 2.43    | 2.48    | 2.45    |
| 30  | 2.42    | 2.46    | 2.32    | 2.41    | 2.41    | 2.45    | 2.47    |
| 31  | 2.43    | 2.44    | 2.35    | 2.42    | 2.45    | 2.45    | 2.48    |
| 32  | 2.42    | 2.44    | 2.32    | 2.41    | 2.46    | 2.48    | 2.49    |

│ Row │ NCols24 │ NCols25 │ NCols26 │ NCols27 │ NCols28 │ NCols29 │ NCols30 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 3.63    | 3.66    | 3.6     | 3.63    | 3.61    | 3.61    | 3.61    |
| 4   | 3.64    | 3.66    | 3.59    | 3.5     | 3.61    | 3.61    | 3.61    |
| 5   | 4.16    | 4.17    | 4.14    | 4.14    | 4.02    | 4.07    | 4.07    |
| 6   | 4.19    | 4.17    | 4.15    | 4.02    | 4.04    | 4.08    | 4.04    |
| 7   | 4.02    | 4.17    | 4.14    | 4.14    | 4.02    | 4.07    | 4.09    |
| 8   | 4.16    | 3.96    | 4.02    | 4.14    | 4.1     | 4.15    | 4.12    |
| 9   | 3.08    | 3.09    | 3.05    | 3.04    | 2.95    | 2.73    | 2.75    |
| 10  | 3.05    | 3.08    | 3.09    | 3.0     | 2.96    | 2.74    | 2.74    |
| 11  | 3.05    | 3.07    | 3.07    | 3.08    | 2.94    | 2.73    | 2.73    |
| 12  | 3.06    | 3.07    | 3.07    | 3.09    | 2.93    | 2.72    | 2.73    |
| 13  | 3.06    | 3.08    | 3.1     | 3.05    | 2.93    | 2.74    | 2.71    |
| 14  | 3.07    | 3.08    | 3.1     | 3.05    | 2.93    | 2.75    | 2.75    |
| 15  | 3.04    | 3.07    | 3.09    | 3.05    | 2.93    | 2.74    | 2.76    |
| 16  | 3.05    | 3.06    | 3.09    | 3.05    | 2.94    | 2.66    | 2.73    |
| 17  | 2.4     | 2.4     | 2.41    | 2.41    | 2.34    | 2.35    | 2.41    |
| 18  | 2.41    | 2.38    | 2.41    | 2.42    | 2.32    | 2.41    | 2.43    |
| 19  | 2.42    | 2.4     | 2.39    | 2.44    | 2.34    | 2.43    | 2.37    |
| 20  | 2.4     | 2.4     | 2.4     | 2.43    | 2.34    | 2.39    | 2.44    |
| 21  | 2.41    | 2.41    | 2.4     | 2.43    | 2.33    | 2.39    | 2.42    |
| 22  | 2.4     | 2.4     | 2.4     | 2.44    | 2.39    | 2.37    | 2.38    |
| 23  | 2.4     | 2.44    | 2.4     | 2.46    | 2.35    | 2.38    | 2.44    |
| 24  | 2.37    | 2.38    | 2.43    | 2.45    | 2.35    | 2.42    | 2.43    |
| 25  | 2.46    | 2.35    | 2.44    | 7.35    | 6.99    | 7.04    | 6.87    |
| 26  | 2.47    | 2.41    | 2.45    | 7.26    | 7.08    | 6.92    | 7.13    |
| 27  | 2.43    | 2.38    | 2.45    | 7.27    | 7.06    | 7.08    | 6.92    |
| 28  | 2.42    | 2.34    | 2.43    | 7.3     | 6.95    | 7.03    | 6.96    |
| 29  | 2.47    | 2.41    | 2.44    | 8.12    | 7.21    | 7.13    | 6.88    |
| 30  | 2.46    | 2.38    | 2.43    | 7.18    | 7.75    | 6.98    | 7.77    |
| 31  | 2.46    | 2.41    | 2.46    | 7.55    | 6.89    | 6.96    | 7.18    |
| 32  | 2.47    | 2.38    | 2.43    | 7.24    | 7.46    | 7.55    | 6.92    |

│ Row │ NCols31 │ NCols32 │
│     │ Float64Float64 │
├─────┼─────────┼─────────┤
| 3   | 3.18    | 3.25    |
| 4   | 3.17    | 3.25    |
| 5   | 3.61    | 3.7     |
| 6   | 3.61    | 3.7     |
| 7   | 3.62    | 3.7     |
| 8   | 3.61    | 3.7     |
| 9   | 2.8     | 2.88    |
| 10  | 2.79    | 2.91    |
| 11  | 2.78    | 2.9     |
| 12  | 2.79    | 2.88    |
| 13  | 2.79    | 2.91    |
| 14  | 2.82    | 2.87    |
| 15  | 2.8     | 2.92    |
| 16  | 2.79    | 2.88    |
| 17  | 2.44    | 2.47    |
| 18  | 2.48    | 2.45    |
| 19  | 2.41    | 2.44    |
| 20  | 2.42    | 2.48    |
| 21  | 2.44    | 2.49    |
| 22  | 2.45    | 2.41    |
| 23  | 2.46    | 2.48    |
| 24  | 2.46    | 2.46    |
| 25  | 6.7     | 6.57    |
| 26  | 6.69    | 6.78    |
| 27  | 6.63    | 6.57    |
| 28  | 6.73    | 6.76    |
| 29  | 6.91    | 6.58    |
| 30  | 6.57    | 6.57    |
| 31  | 6.71    | 6.62    |
| 32  | 6.62    | 6.77    |
In [28]:
print_results("gfortran tc", "native julia", minimum_times, padded = true)
30×30 DataFrame
│ Row │ NCols3  │ NCols4  │ NCols5  │ NCols6  │ NCols7  │ NCols8  │ NCols9  │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 1.5     | 1.71    | 1.77    | 1.9     | 1.77    | 1.87    | 1.83    |
| 4   | 1.5     | 1.59    | 1.77    | 1.88    | 1.89    | 1.74    | 1.9     |
| 5   | 2.82    | 2.99    | 3.25    | 3.45    | 3.43    | 3.33    | 3.29    |
| 6   | 2.85    | 3.05    | 3.31    | 3.52    | 3.57    | 3.53    | 3.27    |
| 7   | 2.82    | 3.04    | 3.02    | 3.46    | 3.42    | 3.34    | 3.27    |
| 8   | 2.82    | 2.75    | 3.3     | 3.37    | 3.28    | 3.54    | 3.25    |
| 9   | 8.26    | 9.45    | 9.83    | 9.94    | 10.0    | 9.8     | 9.94    |
| 10  | 8.16    | 9.74    | 9.74    | 9.94    | 10.1    | 9.59    | 9.81    |
| 11  | 8.25    | 9.64    | 9.85    | 9.92    | 10.0    | 10.1    | 9.56    |
| 12  | 8.35    | 9.44    | 9.75    | 9.94    | 10.0    | 10.0    | 10.1    |
| 13  | 8.27    | 9.71    | 9.82    | 9.92    | 10.0    | 10.1    | 10.1    |
| 14  | 8.31    | 9.48    | 9.86    | 9.96    | 10.1    | 10.0    | 10.1    |
| 15  | 8.32    | 9.06    | 9.77    | 9.94    | 10.1    | 10.1    | 10.1    |
| 16  | 7.77    | 9.36    | 9.77    | 9.95    | 10.0    | 9.86    | 9.84    |
| 17  | 2.69    | 2.72    | 2.73    | 2.61    | 2.58    | 2.56    | 2.57    |
| 18  | 2.59    | 2.76    | 2.72    | 2.67    | 2.65    | 2.63    | 2.62    |
| 19  | 2.58    | 2.66    | 2.67    | 2.67    | 2.66    | 2.63    | 2.62    |
| 20  | 2.69    | 2.72    | 2.66    | 2.68    | 2.58    | 2.62    | 2.61    |
| 21  | 2.69    | 2.76    | 2.71    | 2.68    | 2.68    | 2.63    | 2.51    |
| 22  | 2.57    | 2.65    | 2.67    | 2.7     | 2.66    | 2.63    | 2.62    |
| 23  | 2.54    | 2.73    | 2.72    | 2.67    | 2.62    | 2.62    | 2.64    |
| 24  | 2.68    | 2.72    | 2.72    | 2.63    | 2.53    | 2.63    | 2.62    |
| 25  | 2.43    | 2.51    | 2.53    | 2.47    | 2.22    | 2.41    | 2.48    |
| 26  | 2.44    | 2.56    | 2.53    | 2.56    | 2.22    | 2.4     | 2.43    |
| 27  | 2.51    | 2.51    | 2.4     | 2.47    | 2.22    | 2.4     | 2.45    |
| 28  | 2.43    | 2.51    | 2.52    | 2.52    | 2.22    | 2.39    | 2.44    |
| 29  | 2.51    | 2.51    | 2.46    | 2.48    | 2.23    | 2.39    | 2.47    |
| 30  | 2.51    | 2.52    | 2.53    | 2.48    | 2.22    | 2.37    | 2.45    |
| 31  | 2.5     | 2.35    | 2.48    | 2.5     | 2.21    | 2.41    | 2.46    |
| 32  | 2.51    | 2.4     | 2.4     | 2.52    | 2.22    | 2.36    | 2.47    |

│ Row │ NCols10 │ NCols11 │ NCols12 │ NCols13 │ NCols14 │ NCols15 │ NCols16 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 1.91    | 1.9     | 1.88    | 1.98    | 1.9     | 1.85    | 1.78    |
| 4   | 1.82    | 1.93    | 1.88    | 1.89    | 2.02    | 2.04    | 1.88    |
| 5   | 3.24    | 3.25    | 3.28    | 3.19    | 3.2     | 3.19    | 3.03    |
| 6   | 3.27    | 3.25    | 3.3     | 3.19    | 3.31    | 3.2     | 3.15    |
| 7   | 3.35    | 3.27    | 3.19    | 3.19    | 3.17    | 3.15    | 3.15    |
| 8   | 3.24    | 3.25    | 3.08    | 3.19    | 3.18    | 3.12    | 2.99    |
| 9   | 10.1    | 10.1    | 10.0    | 10.1    | 10.1    | 8.83    | 9.32    |
| 10  | 9.89    | 9.88    | 10.1    | 10.1    | 10.1    | 8.82    | 9.3     |
| 11  | 9.82    | 9.91    | 9.9     | 10.1    | 10.1    | 8.83    | 9.35    |
| 12  | 9.52    | 9.85    | 9.9     | 9.92    | 10.0    | 8.82    | 9.37    |
| 13  | 10.1    | 9.61    | 9.77    | 9.87    | 9.93    | 8.82    | 9.33    |
| 14  | 10.0    | 10.1    | 9.61    | 9.79    | 9.91    | 8.81    | 9.37    |
| 15  | 10.0    | 10.1    | 10.1    | 9.68    | 9.88    | 8.85    | 9.36    |
| 16  | 10.1    | 10.1    | 10.1    | 9.56    | 9.91    | 8.82    | 9.37    |
| 17  | 2.38    | 2.51    | 2.52    | 2.51    | 2.5     | 2.46    | 2.51    |
| 18  | 2.38    | 2.49    | 2.52    | 2.49    | 2.5     | 2.52    | 2.53    |
| 19  | 2.38    | 2.48    | 2.45    | 2.47    | 2.5     | 2.49    | 2.53    |
| 20  | 2.3     | 2.49    | 2.48    | 2.5     | 2.5     | 2.46    | 2.48    |
| 21  | 2.38    | 2.49    | 2.48    | 2.49    | 2.5     | 2.49    | 2.5     |
| 22  | 2.38    | 2.49    | 2.46    | 2.48    | 2.5     | 2.5     | 2.53    |
| 23  | 2.38    | 2.49    | 2.52    | 2.49    | 2.54    | 2.5     | 2.5     |
| 24  | 2.38    | 2.53    | 2.48    | 2.49    | 2.5     | 2.45    | 2.49    |
| 25  | 2.52    | 2.53    | 2.43    | 2.28    | 2.37    | 2.41    | 2.42    |
| 26  | 2.48    | 2.52    | 2.42    | 2.27    | 2.37    | 2.43    | 2.43    |
| 27  | 2.51    | 2.52    | 2.44    | 2.29    | 2.37    | 2.41    | 2.42    |
| 28  | 2.51    | 2.53    | 2.43    | 2.28    | 2.39    | 2.4     | 2.42    |
| 29  | 2.48    | 2.52    | 2.42    | 2.27    | 2.37    | 2.41    | 2.42    |
| 30  | 2.5     | 2.52    | 2.42    | 2.27    | 2.37    | 2.41    | 2.42    |
| 31  | 2.51    | 2.52    | 2.42    | 2.27    | 2.37    | 2.41    | 2.42    |
| 32  | 2.52    | 2.53    | 2.43    | 2.27    | 2.37    | 2.4     | 2.41    |

│ Row │ NCols17 │ NCols18 │ NCols19 │ NCols20 │ NCols21 │ NCols22 │ NCols23 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 1.68    | 1.64    | 1.72    | 1.7     | 1.89    | 1.7     | 1.66    |
| 4   | 1.7     | 1.71    | 1.72    | 1.67    | 1.7     | 1.73    | 1.67    |
| 5   | 3.16    | 3.09    | 3.09    | 3.12    | 3.15    | 3.12    | 3.16    |
| 6   | 2.94    | 3.09    | 3.11    | 3.23    | 3.14    | 3.15    | 3.14    |
| 7   | 3.12    | 3.14    | 3.12    | 3.05    | 3.14    | 3.15    | 3.16    |
| 8   | 3.16    | 3.15    | 3.18    | 3.12    | 3.11    | 3.11    | 3.13    |
| 9   | 9.28    | 9.48    | 9.55    | 9.66    | 9.72    | 9.76    | 9.89    |
| 10  | 9.39    | 9.39    | 9.59    | 9.59    | 9.75    | 9.78    | 9.81    |
| 11  | 9.38    | 9.51    | 9.47    | 9.66    | 9.79    | 9.68    | 9.85    |
| 12  | 9.57    | 9.51    | 9.38    | 9.52    | 9.72    | 9.78    | 9.82    |
| 13  | 9.37    | 9.48    | 9.53    | 9.22    | 9.64    | 9.78    | 9.81    |
| 14  | 9.39    | 9.45    | 9.57    | 9.59    | 9.61    | 9.68    | 9.75    |
| 15  | 9.53    | 9.45    | 9.57    | 9.66    | 9.74    | 9.65    | 9.72    |
| 16  | 9.53    | 9.31    | 9.56    | 9.59    | 9.77    | 9.76    | 9.8     |
| 17  | 2.5     | 2.36    | 2.26    | 2.32    | 2.36    | 2.38    | 2.4     |
| 18  | 2.48    | 2.36    | 2.26    | 2.35    | 2.36    | 2.38    | 2.37    |
| 19  | 2.51    | 2.36    | 2.26    | 2.35    | 2.35    | 2.36    | 2.38    |
| 20  | 2.47    | 2.36    | 2.25    | 2.35    | 2.35    | 2.38    | 2.42    |
| 21  | 2.43    | 2.36    | 2.28    | 2.35    | 2.36    | 2.38    | 2.39    |
| 22  | 2.48    | 2.37    | 2.26    | 2.34    | 2.35    | 2.36    | 2.38    |
| 23  | 2.5     | 2.36    | 2.26    | 2.35    | 2.35    | 2.38    | 2.4     |
| 24  | 2.48    | 2.37    | 2.26    | 2.37    | 2.35    | 2.38    | 2.39    |
| 25  | 2.44    | 2.46    | 2.37    | 2.42    | 2.45    | 2.44    | 2.48    |
| 26  | 2.42    | 2.46    | 2.39    | 2.42    | 2.48    | 2.43    | 2.44    |
| 27  | 2.43    | 2.48    | 2.29    | 2.4     | 2.46    | 2.48    | 2.49    |
| 28  | 2.42    | 2.46    | 2.37    | 2.43    | 2.43    | 2.43    | 2.46    |
| 29  | 2.45    | 2.42    | 2.34    | 2.41    | 2.43    | 2.46    | 2.43    |
| 30  | 2.42    | 2.46    | 2.32    | 2.39    | 2.41    | 2.46    | 2.47    |
| 31  | 2.43    | 2.44    | 2.37    | 2.42    | 2.45    | 2.46    | 2.47    |
| 32  | 2.42    | 2.44    | 2.32    | 2.4     | 2.45    | 2.48    | 2.47    |

│ Row │ NCols24 │ NCols25 │ NCols26 │ NCols27 │ NCols28 │ NCols29 │ NCols30 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 1.66    | 1.67    | 1.67    | 1.69    | 1.68    | 1.69    | 1.65    |
| 4   | 1.71    | 1.68    | 1.84    | 1.61    | 1.65    | 1.66    | 1.86    |
| 5   | 3.1     | 3.12    | 3.23    | 3.09    | 3.0     | 3.05    | 3.06    |
| 6   | 3.22    | 3.11    | 3.23    | 3.03    | 3.02    | 3.06    | 3.02    |
| 7   | 2.99    | 3.12    | 3.1     | 3.09    | 3.01    | 3.05    | 3.06    |
| 8   | 3.09    | 2.96    | 3.0     | 3.12    | 3.07    | 3.15    | 3.1     |
| 9   | 9.89    | 9.96    | 9.85    | 9.96    | 9.66    | 9.0     | 9.12    |
| 10  | 9.85    | 9.93    | 10.0    | 9.87    | 9.69    | 9.01    | 9.16    |
| 11  | 9.82    | 9.93    | 9.98    | 10.0    | 9.68    | 9.0     | 9.08    |
| 12  | 9.9     | 9.94    | 9.98    | 10.0    | 9.64    | 8.97    | 9.09    |
| 13  | 9.85    | 9.93    | 10.0    | 10.0    | 9.6     | 9.04    | 9.05    |
| 14  | 9.83    | 9.93    | 10.0    | 10.0    | 9.64    | 9.03    | 9.17    |
| 15  | 9.79    | 9.93    | 9.97    | 10.0    | 9.64    | 9.04    | 9.16    |
| 16  | 9.77    | 9.93    | 10.1    | 10.0    | 9.67    | 8.77    | 9.11    |
| 17  | 2.4     | 2.41    | 2.41    | 2.41    | 2.35    | 2.35    | 2.41    |
| 18  | 2.4     | 2.38    | 2.43    | 2.42    | 2.35    | 2.41    | 2.42    |
| 19  | 2.39    | 2.4     | 2.4     | 2.44    | 2.37    | 2.4     | 2.38    |
| 20  | 2.4     | 2.4     | 2.4     | 2.43    | 2.34    | 2.36    | 2.44    |
| 21  | 2.38    | 2.43    | 2.4     | 2.43    | 2.33    | 2.39    | 2.46    |
| 22  | 2.4     | 2.4     | 2.4     | 2.46    | 2.39    | 2.4     | 2.37    |
| 23  | 2.4     | 2.41    | 2.4     | 2.43    | 2.35    | 2.41    | 2.44    |
| 24  | 2.37    | 2.38    | 2.43    | 2.42    | 2.37    | 2.39    | 2.4     |
| 25  | 2.46    | 2.35    | 2.44    | 7.29    | 7.0     | 7.04    | 6.95    |
| 26  | 2.47    | 2.41    | 2.44    | 7.17    | 7.12    | 6.9     | 6.86    |
| 27  | 2.43    | 2.4     | 2.45    | 7.27    | 7.02    | 7.82    | 6.99    |
| 28  | 2.43    | 2.36    | 2.43    | 8.11    | 6.97    | 7.4     | 6.96    |
| 29  | 2.47    | 2.41    | 2.44    | 7.33    | 7.02    | 6.93    | 6.88    |
| 30  | 2.48    | 2.38    | 2.46    | 7.23    | 7.57    | 6.98    | 7.78    |
| 31  | 2.46    | 2.41    | 2.44    | 7.25    | 6.88    | 7.19    | 7.11    |
| 32  | 2.47    | 2.38    | 2.44    | 7.28    | 7.34    | 6.95    | 6.9     |

│ Row │ NCols31 │ NCols32 │
│     │ Float64Float64 │
├─────┼─────────┼─────────┤
| 3   | 1.64    | 1.5     |
| 4   | 1.49    | 1.53    |
| 5   | 2.82    | 2.79    |
| 6   | 2.72    | 2.8     |
| 7   | 2.75    | 2.79    |
| 8   | 2.72    | 2.83    |
| 9   | 9.39    | 9.6     |
| 10  | 9.35    | 9.58    |
| 11  | 9.34    | 9.55    |
| 12  | 9.33    | 9.57    |
| 13  | 9.3     | 9.56    |
| 14  | 9.36    | 9.51    |
| 15  | 9.38    | 9.55    |
| 16  | 9.35    | 9.58    |
| 17  | 2.44    | 2.44    |
| 18  | 2.48    | 2.45    |
| 19  | 2.41    | 2.44    |
| 20  | 2.46    | 2.48    |
| 21  | 2.44    | 2.45    |
| 22  | 2.45    | 2.45    |
| 23  | 2.43    | 2.48    |
| 24  | 2.43    | 2.43    |
| 25  | 6.73    | 6.58    |
| 26  | 6.69    | 6.57    |
| 27  | 6.64    | 6.51    |
| 28  | 6.73    | 6.52    |
| 29  | 6.67    | 6.57    |
| 30  | 6.64    | 6.65    |
| 31  | 6.68    | 6.59    |
| 32  | 6.83    | 6.56    |

MKL JIT, while still not very far behind PaddedMatrices:

In [29]:
print_results("mkl jit", "native julia", minimum_times, padded = true)
30×30 DataFrame
│ Row │ NCols3  │ NCols4  │ NCols5  │ NCols6  │ NCols7  │ NCols8  │ NCols9  │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 2.17    | 2.06    | 1.98    | 2.0     | 1.73    | 1.79    | 1.69    |
| 4   | 2.14    | 2.09    | 2.0     | 1.96    | 1.84    | 1.65    | 1.76    |
| 5   | 2.08    | 2.01    | 1.94    | 1.92    | 1.79    | 1.68    | 1.72    |
| 6   | 2.11    | 2.05    | 1.97    | 1.93    | 1.86    | 1.78    | 1.72    |
| 7   | 2.09    | 2.05    | 1.81    | 1.93    | 1.78    | 1.68    | 1.73    |
| 8   | 2.09    | 1.85    | 1.96    | 1.88    | 1.71    | 1.78    | 1.67    |
| 9   | 2.05    | 1.96    | 1.85    | 1.72    | 1.62    | 1.55    | 1.47    |
| 10  | 2.03    | 2.02    | 1.88    | 1.73    | 1.62    | 1.48    | 1.46    |
| 11  | 2.05    | 2.0     | 1.89    | 1.72    | 1.61    | 1.55    | 1.42    |
| 12  | 2.07    | 1.96    | 1.83    | 1.72    | 1.62    | 1.55    | 1.5     |
| 13  | 2.04    | 2.01    | 1.85    | 1.72    | 1.62    | 1.59    | 1.49    |
| 14  | 2.06    | 1.97    | 1.84    | 1.72    | 1.63    | 1.56    | 1.5     |
| 15  | 2.06    | 1.88    | 1.83    | 1.72    | 1.62    | 1.55    | 1.5     |
| 16  | 1.92    | 1.94    | 1.83    | 1.72    | 1.62    | 1.52    | 1.47    |
| 17  | 1.85    | 1.7     | 1.58    | 1.45    | 1.38    | 1.32    | 1.35    |
| 18  | 1.8     | 1.7     | 1.57    | 1.48    | 1.41    | 1.36    | 1.38    |
| 19  | 1.78    | 1.66    | 1.54    | 1.48    | 1.41    | 1.36    | 1.38    |
| 20  | 1.86    | 1.7     | 1.54    | 1.48    | 1.38    | 1.35    | 1.37    |
| 21  | 1.86    | 1.68    | 1.57    | 1.48    | 1.41    | 1.36    | 1.32    |
| 22  | 1.77    | 1.65    | 1.54    | 1.48    | 1.42    | 1.36    | 1.38    |
| 23  | 1.75    | 1.68    | 1.57    | 1.48    | 1.38    | 1.35    | 1.38    |
| 24  | 1.85    | 1.7     | 1.57    | 1.44    | 1.35    | 1.36    | 1.38    |
| 25  | 1.68    | 1.54    | 1.44    | 1.38    | 1.18    | 1.23    | 1.23    |
| 26  | 1.68    | 1.55    | 1.44    | 1.42    | 1.18    | 1.23    | 1.21    |
| 27  | 1.73    | 1.54    | 1.37    | 1.38    | 1.18    | 1.24    | 1.22    |
| 28  | 1.67    | 1.54    | 1.43    | 1.41    | 1.17    | 1.23    | 1.21    |
| 29  | 1.73    | 1.59    | 1.4     | 1.36    | 1.17    | 1.23    | 1.23    |
| 30  | 1.73    | 1.55    | 1.47    | 1.37    | 1.18    | 1.22    | 1.21    |
| 31  | 1.72    | 1.45    | 1.44    | 1.39    | 1.18    | 1.24    | 1.22    |
| 32  | 1.73    | 1.48    | 1.36    | 1.39    | 1.17    | 1.22    | 1.23    |

│ Row │ NCols10 │ NCols11 │ NCols12 │ NCols13 │ NCols14 │ NCols15 │ NCols16 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 1.69    | 1.65    | 1.6     | 1.47    | 1.52    | 1.49    | 1.4     |
| 4   | 1.6     | 1.66    | 1.6     | 1.52    | 1.49    | 1.49    | 1.46    |
| 5   | 1.66    | 1.63    | 1.57    | 1.5     | 1.51    | 1.46    | 1.37    |
| 6   | 1.66    | 1.62    | 1.58    | 1.5     | 1.51    | 1.47    | 1.44    |
| 7   | 1.66    | 1.63    | 1.57    | 1.5     | 1.44    | 1.47    | 1.44    |
| 8   | 1.66    | 1.62    | 1.52    | 1.5     | 1.5     | 1.45    | 1.37    |
| 9   | 1.45    | 1.41    | 1.38    | 1.36    | 1.47    | 1.18    | 1.21    |
| 10  | 1.42    | 1.39    | 1.38    | 1.36    | 1.48    | 1.18    | 1.21    |
| 11  | 1.41    | 1.39    | 1.36    | 1.36    | 1.47    | 1.18    | 1.21    |
| 12  | 1.37    | 1.38    | 1.36    | 1.34    | 1.47    | 1.18    | 1.22    |
| 13  | 1.45    | 1.35    | 1.34    | 1.33    | 1.45    | 1.17    | 1.21    |
| 14  | 1.48    | 1.41    | 1.32    | 1.32    | 1.46    | 1.18    | 1.21    |
| 15  | 1.45    | 1.41    | 1.41    | 1.3     | 1.45    | 1.18    | 1.22    |
| 16  | 1.45    | 1.41    | 1.38    | 1.29    | 1.45    | 1.18    | 1.21    |
| 17  | 1.22    | 1.26    | 1.26    | 1.25    | 1.33    | 1.17    | 1.17    |
| 18  | 1.22    | 1.26    | 1.26    | 1.25    | 1.33    | 1.19    | 1.18    |
| 19  | 1.22    | 1.28    | 1.23    | 1.24    | 1.33    | 1.19    | 1.18    |
| 20  | 1.18    | 1.26    | 1.25    | 1.26    | 1.33    | 1.17    | 1.16    |
| 21  | 1.22    | 1.25    | 1.24    | 1.25    | 1.33    | 1.17    | 1.16    |
| 22  | 1.22    | 1.28    | 1.23    | 1.24    | 1.33    | 1.19    | 1.2     |
| 23  | 1.22    | 1.26    | 1.26    | 1.25    | 1.33    | 1.18    | 1.19    |
| 24  | 1.22    | 1.26    | 1.24    | 1.25    | 1.33    | 1.16    | 1.16    |
| 25  | 1.23    | 1.23    | 1.16    | 1.06    | 1.27    | 1.14    | 1.12    |
| 26  | 1.2     | 1.21    | 1.15    | 1.06    | 1.27    | 1.14    | 1.12    |
| 27  | 1.22    | 1.21    | 1.15    | 1.07    | 1.27    | 1.13    | 1.12    |
| 28  | 1.23    | 1.22    | 1.17    | 1.06    | 1.27    | 1.13    | 1.13    |
| 29  | 1.21    | 1.21    | 1.16    | 1.06    | 1.27    | 1.12    | 1.14    |
| 30  | 1.22    | 1.21    | 1.17    | 1.06    | 1.27    | 1.14    | 1.13    |
| 31  | 1.22    | 1.21    | 1.15    | 1.06    | 1.27    | 1.14    | 1.13    |
| 32  | 1.23    | 1.22    | 1.15    | 1.07    | 1.27    | 1.13    | 1.11    |

│ Row │ NCols17 │ NCols18 │ NCols19 │ NCols20 │ NCols21 │ NCols22 │ NCols23 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 1.41    | 1.36    | 1.39    | 1.35    | 1.35    | 1.34    | 1.31    |
| 4   | 1.43    | 1.41    | 1.39    | 1.34    | 1.35    | 1.34    | 1.32    |
| 5   | 1.41    | 1.37    | 1.36    | 1.34    | 1.34    | 1.32    | 1.32    |
| 6   | 1.32    | 1.38    | 1.37    | 1.34    | 1.34    | 1.32    | 1.32    |
| 7   | 1.39    | 1.38    | 1.36    | 1.31    | 1.34    | 1.32    | 1.32    |
| 8   | 1.42    | 1.39    | 1.39    | 1.33    | 1.33    | 1.32    | 1.32    |
| 9   | 1.18    | 1.2     | 1.19    | 1.18    | 1.19    | 1.18    | 1.18    |
| 10  | 1.21    | 1.19    | 1.2     | 1.18    | 1.19    | 1.18    | 1.17    |
| 11  | 1.21    | 1.2     | 1.19    | 1.18    | 1.19    | 1.16    | 1.17    |
| 12  | 1.22    | 1.19    | 1.17    | 1.16    | 1.19    | 1.17    | 1.17    |
| 13  | 1.21    | 1.2     | 1.19    | 1.13    | 1.17    | 1.17    | 1.17    |
| 14  | 1.21    | 1.2     | 1.2     | 1.17    | 1.17    | 1.16    | 1.16    |
| 15  | 1.22    | 1.19    | 1.2     | 1.18    | 1.19    | 1.16    | 1.16    |
| 16  | 1.23    | 1.17    | 1.2     | 1.18    | 1.18    | 1.18    | 1.17    |
| 17  | 1.19    | 1.11    | 1.09    | 1.07    | 1.1     | 1.1     | 1.09    |
| 18  | 1.19    | 1.11    | 1.05    | 1.11    | 1.09    | 1.09    | 1.08    |
| 19  | 1.2     | 1.11    | 1.06    | 1.09    | 1.08    | 1.08    | 1.08    |
| 20  | 1.18    | 1.11    | 1.07    | 1.11    | 1.09    | 1.1     | 1.09    |
| 21  | 1.16    | 1.13    | 1.07    | 1.09    | 1.09    | 1.1     | 1.09    |
| 22  | 1.19    | 1.14    | 1.06    | 1.08    | 1.09    | 1.09    | 1.08    |
| 23  | 1.2     | 1.11    | 1.07    | 1.09    | 1.09    | 1.1     | 1.09    |
| 24  | 1.18    | 1.12    | 1.06    | 1.09    | 1.09    | 1.09    | 1.09    |
| 25  | 1.13    | 1.12    | 1.08    | 1.08    | 1.11    | 1.08    | 1.1     |
| 26  | 1.12    | 1.12    | 1.08    | 1.09    | 1.11    | 1.09    | 1.09    |
| 27  | 1.13    | 1.12    | 1.04    | 1.08    | 1.09    | 1.1     | 1.12    |
| 28  | 1.13    | 1.12    | 1.08    | 1.09    | 1.09    | 1.08    | 1.1     |
| 29  | 1.13    | 1.1     | 1.07    | 1.09    | 1.09    | 1.1     | 1.09    |
| 30  | 1.12    | 1.12    | 1.06    | 1.08    | 1.08    | 1.1     | 1.11    |
| 31  | 1.13    | 1.11    | 1.07    | 1.09    | 1.1     | 1.1     | 1.1     |
| 32  | 1.12    | 1.12    | 1.06    | 1.08    | 1.1     | 1.1     | 1.11    |

│ Row │ NCols24 │ NCols25 │ NCols26 │ NCols27 │ NCols28 │ NCols29 │ NCols30 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 1.3     | 1.31    | 1.29    | 1.27    | 1.27    | 1.26    | 1.25    |
| 4   | 1.3     | 1.31    | 1.29    | 1.24    | 1.27    | 1.26    | 1.25    |
| 5   | 1.3     | 1.3     | 1.3     | 1.27    | 1.23    | 1.24    | 1.24    |
| 6   | 1.3     | 1.3     | 1.31    | 1.23    | 1.24    | 1.25    | 1.23    |
| 7   | 1.25    | 1.3     | 1.31    | 1.27    | 1.23    | 1.24    | 1.24    |
| 8   | 1.3     | 1.23    | 1.26    | 1.27    | 1.26    | 1.27    | 1.26    |
| 9   | 1.17    | 1.2     | 1.17    | 1.18    | 1.14    | 1.06    | 1.06    |
| 10  | 1.17    | 1.19    | 1.19    | 1.18    | 1.14    | 1.06    | 1.06    |
| 11  | 1.17    | 1.19    | 1.18    | 1.2     | 1.14    | 1.06    | 1.06    |
| 12  | 1.17    | 1.19    | 1.18    | 1.19    | 1.14    | 1.06    | 1.05    |
| 13  | 1.17    | 1.19    | 1.19    | 1.19    | 1.14    | 1.06    | 1.05    |
| 14  | 1.17    | 1.19    | 1.19    | 1.2     | 1.14    | 1.06    | 1.06    |
| 15  | 1.16    | 1.19    | 1.18    | 1.2     | 1.14    | 1.06    | 1.07    |
| 16  | 1.16    | 1.19    | 1.19    | 1.19    | 1.14    | 1.03    | 1.07    |
| 17  | 1.08    | 1.1     | 1.08    | 1.08    | 1.04    | 1.04    | 1.06    |
| 18  | 1.08    | 1.09    | 1.08    | 1.09    | 1.03    | 1.05    | 1.07    |
| 19  | 1.1     | 1.1     | 1.07    | 1.09    | 1.04    | 1.06    | 1.05    |
| 20  | 1.08    | 1.1     | 1.08    | 1.09    | 1.03    | 1.05    | 1.06    |
| 21  | 1.07    | 1.1     | 1.08    | 1.09    | 1.05    | 1.06    | 1.07    |
| 22  | 1.08    | 1.1     | 1.08    | 1.09    | 1.05    | 1.05    | 1.05    |
| 23  | 1.08    | 1.1     | 1.08    | 1.09    | 1.03    | 1.06    | 1.06    |
| 24  | 1.06    | 1.12    | 1.08    | 1.08    | 1.04    | 1.06    | 1.06    |
| 25  | 1.1     | 1.05    | 1.06    | 1.11    | 1.11    | 1.11    | 1.08    |
| 26  | 1.1     | 1.07    | 1.07    | 1.1     | 1.11    | 1.1     | 1.09    |
| 27  | 1.09    | 1.07    | 1.07    | 1.11    | 1.11    | 1.11    | 1.1     |
| 28  | 1.08    | 1.05    | 1.06    | 1.1     | 1.1     | 1.11    | 1.09    |
| 29  | 1.1     | 1.07    | 1.07    | 1.1     | 1.11    | 1.09    | 1.08    |
| 30  | 1.1     | 1.07    | 1.07    | 1.1     | 1.11    | 1.12    | 1.1     |
| 31  | 1.1     | 1.07    | 1.07    | 1.11    | 1.1     | 1.1     | 1.09    |
| 32  | 1.1     | 1.05    | 1.06    | 1.1     | 1.09    | 1.11    | 1.09    |

│ Row │ NCols31 │ NCols32 │
│     │ Float64Float64 │
├─────┼─────────┼─────────┤
| 3   | 1.1     | 1.12    |
| 4   | 1.1     | 1.12    |
| 5   | 1.1     | 1.13    |
| 6   | 1.1     | 1.13    |
| 7   | 1.1     | 1.13    |
| 8   | 1.1     | 1.13    |
| 9   | 1.08    | 1.11    |
| 10  | 1.08    | 1.11    |
| 11  | 1.08    | 1.11    |
| 12  | 1.08    | 1.12    |
| 13  | 1.07    | 1.11    |
| 14  | 1.08    | 1.11    |
| 15  | 1.08    | 1.11    |
| 16  | 1.08    | 1.11    |
| 17  | 1.06    | 1.07    |
| 18  | 1.08    | 1.08    |
| 19  | 1.06    | 1.06    |
| 20  | 1.06    | 1.07    |
| 21  | 1.08    | 1.08    |
| 22  | 1.06    | 1.06    |
| 23  | 1.07    | 1.07    |
| 24  | 1.07    | 1.07    |
| 25  | 1.06    | 1.07    |
| 26  | 1.06    | 1.07    |
| 27  | 1.05    | 1.08    |
| 28  | 1.07    | 1.07    |
| 29  | 1.05    | 1.07    |
| 30  | 1.05    | 1.07    |
| 31  | 1.07    | 1.08    |
| 32  | 1.05    | 1.07    |

It is now behind Blaze for most of the smaller sizes, but when $M \geq 17$ and $N \geq 7$ (or when $M=4$), MKL JIT has the edge:

In [35]:
print_results("mkl jit", "blaze", minimum_times, padded = true)
30×30 DataFrame
│ Row │ NCols3  │ NCols4  │ NCols5  │ NCols6  │ NCols7  │ NCols8  │ NCols9  │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 2.18    | 2.16    | 1.57    | 1.57    | 1.56    | 1.48    | 1.17    |
| 4   | 0.801   | 0.746   | 0.583   | 0.573   | 0.49    | 0.467   | 0.428   |
| 5   | 2.67    | 2.4     | 1.67    | 1.61    | 1.68    | 1.56    | 1.34    |
| 6   | 2.69    | 2.4     | 1.67    | 1.61    | 1.68    | 1.56    | 1.34    |
| 7   | 2.67    | 2.39    | 1.67    | 1.62    | 1.68    | 1.56    | 1.35    |
| 8   | 2.61    | 2.23    | 1.6     | 1.58    | 1.61    | 1.49    | 1.31    |
| 9   | 1.66    | 1.77    | 1.39    | 1.43    | 1.3     | 1.39    | 1.21    |
| 10  | 1.65    | 1.78    | 1.48    | 1.43    | 1.24    | 1.37    | 1.21    |
| 11  | 1.68    | 1.78    | 1.46    | 1.42    | 1.31    | 1.32    | 1.23    |
| 12  | 1.67    | 1.77    | 1.45    | 1.41    | 1.3     | 1.33    | 1.2     |
| 13  | 1.68    | 1.77    | 1.44    | 1.43    | 1.31    | 1.37    | 1.2     |
| 14  | 1.67    | 1.78    | 1.42    | 1.42    | 1.25    | 1.33    | 1.21    |
| 15  | 1.67    | 1.78    | 1.44    | 1.42    | 1.3     | 1.32    | 1.19    |
| 16  | 1.58    | 1.61    | 1.26    | 1.37    | 1.31    | 1.3     | 1.2     |
| 17  | 1.22    | 1.28    | 1.08    | 1.12    | 0.997   | 1.03    | 0.986   |
| 18  | 1.23    | 1.27    | 1.08    | 1.12    | 0.992   | 1.03    | 0.984   |
| 19  | 1.22    | 1.27    | 1.08    | 1.11    | 0.998   | 1.03    | 0.988   |
| 20  | 1.23    | 1.27    | 1.08    | 1.12    | 1.0     | 1.03    | 0.981   |
| 21  | 1.22    | 1.25    | 1.08    | 1.12    | 0.993   | 1.02    | 0.986   |
| 22  | 1.23    | 1.27    | 1.08    | 1.11    | 0.971   | 1.04    | 0.986   |
| 23  | 1.22    | 1.25    | 1.08    | 1.12    | 1.01    | 1.02    | 0.985   |
| 24  | 1.19    | 1.27    | 1.06    | 1.1     | 0.986   | 1.02    | 0.971   |
| 25  | 1.24    | 1.34    | 1.13    | 1.24    | 1.08    | 1.13    | 1.04    |
| 26  | 1.24    | 1.33    | 1.13    | 1.25    | 1.1     | 1.14    | 1.04    |
| 27  | 1.25    | 1.35    | 1.09    | 1.25    | 1.09    | 1.14    | 1.03    |
| 28  | 1.24    | 1.33    | 1.13    | 1.25    | 1.08    | 1.15    | 1.03    |
| 29  | 1.26    | 1.36    | 1.13    | 1.24    | 1.09    | 1.15    | 1.01    |
| 30  | 1.24    | 1.34    | 1.15    | 1.24    | 1.09    | 1.14    | 1.03    |
| 31  | 1.24    | 1.33    | 1.15    | 1.24    | 1.09    | 1.16    | 1.03    |
| 32  | 1.23    | 1.32    | 1.1     | 1.23    | 1.07    | 1.14    | 1.02    |

│ Row │ NCols10 │ NCols11 │ NCols12 │ NCols13 │ NCols14 │ NCols15 │ NCols16 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 1.25    | 1.28    | 1.25    | 1.11    | 1.12    | 1.14    | 1.15    |
| 4   | 0.423   | 0.392   | 0.396   | 0.361   | 0.332   | 0.35    | 0.356   |
| 5   | 1.32    | 1.36    | 1.33    | 1.16    | 1.2     | 1.22    | 1.22    |
| 6   | 1.32    | 1.37    | 1.32    | 1.16    | 1.18    | 1.22    | 1.21    |
| 7   | 1.31    | 1.36    | 1.32    | 1.16    | 1.18    | 1.14    | 1.22    |
| 8   | 1.31    | 1.32    | 1.29    | 1.14    | 1.17    | 1.19    | 1.19    |
| 9   | 1.21    | 1.17    | 1.23    | 1.15    | 1.27    | 1.16    | 1.15    |
| 10  | 1.25    | 1.18    | 1.22    | 1.16    | 1.28    | 1.16    | 1.16    |
| 11  | 1.26    | 1.18    | 1.21    | 1.16    | 1.28    | 1.14    | 1.15    |
| 12  | 1.25    | 1.2     | 1.22    | 1.16    | 1.28    | 1.16    | 1.16    |
| 13  | 1.25    | 1.17    | 1.23    | 1.16    | 1.28    | 1.16    | 1.13    |
| 14  | 1.26    | 1.17    | 1.21    | 1.16    | 1.3     | 1.17    | 1.15    |
| 15  | 1.21    | 1.2     | 1.26    | 1.16    | 1.25    | 1.17    | 1.16    |
| 16  | 1.23    | 1.2     | 1.19    | 1.15    | 1.27    | 1.16    | 1.13    |
| 17  | 1.01    | 0.958   | 0.998   | 0.95    | 1.05    | 0.854   | 0.871   |
| 18  | 1.03    | 0.956   | 0.999   | 0.955   | 1.01    | 0.9     | 0.909   |
| 19  | 1.03    | 0.975   | 0.968   | 0.953   | 1.06    | 0.9     | 0.874   |
| 20  | 1.03    | 0.914   | 1.01    | 0.963   | 1.01    | 0.897   | 0.877   |
| 21  | 1.03    | 0.958   | 0.999   | 0.904   | 1.05    | 0.885   | 0.911   |
| 22  | 1.02    | 0.978   | 0.996   | 0.912   | 1.05    | 0.902   | 0.923   |
| 23  | 1.03    | 0.963   | 1.0     | 0.957   | 1.06    | 0.902   | 0.932   |
| 24  | 1.02    | 0.951   | 0.991   | 0.948   | 1.04    | 0.898   | 0.905   |
| 25  | 1.1     | 1.03    | 1.07    | 0.995   | 1.21    | 0.962   | 1.04    |
| 26  | 1.1     | 0.98    | 1.05    | 0.994   | 1.2     | 1.01    | 1.03    |
| 27  | 1.09    | 0.981   | 1.07    | 1.0     | 1.2     | 0.959   | 1.05    |
| 28  | 1.05    | 1.01    | 1.08    | 1.0     | 1.2     | 1.02    | 1.05    |
| 29  | 1.08    | 1.0     | 1.07    | 1.0     | 1.2     | 1.01    | 1.06    |
| 30  | 1.09    | 1.02    | 1.08    | 1.0     | 1.21    | 1.0     | 1.06    |
| 31  | 1.1     | 1.02    | 1.0     | 1.0     | 1.2     | 1.02    | 1.06    |
| 32  | 1.1     | 1.0     | 1.01    | 0.916   | 1.2     | 1.01    | 0.976   |

│ Row │ NCols17 │ NCols18 │ NCols19 │ NCols20 │ NCols21 │ NCols22 │ NCols23 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 1.05    | 1.07    | 1.09    | 1.07    | 1.01    | 1.0     | 1.03    |
| 4   | 0.336   | 0.341   | 0.303   | 0.326   | 0.318   | 0.322   | 0.313   |
| 5   | 1.12    | 1.13    | 1.16    | 1.13    | 1.07    | 1.06    | 1.09    |
| 6   | 1.12    | 1.13    | 1.17    | 1.13    | 1.06    | 1.06    | 1.09    |
| 7   | 1.12    | 1.07    | 1.16    | 1.14    | 1.07    | 0.998   | 1.09    |
| 8   | 1.09    | 1.12    | 1.14    | 1.1     | 1.05    | 1.05    | 1.07    |
| 9   | 1.09    | 1.11    | 1.09    | 1.09    | 1.01    | 1.06    | 1.04    |
| 10  | 1.1     | 1.11    | 1.09    | 1.1     | 1.01    | 1.05    | 1.04    |
| 11  | 1.1     | 1.11    | 1.06    | 1.03    | 1.05    | 1.04    | 1.05    |
| 12  | 1.09    | 1.1     | 1.05    | 1.1     | 1.05    | 1.06    | 1.05    |
| 13  | 1.1     | 1.11    | 1.08    | 1.09    | 1.04    | 1.06    | 1.05    |
| 14  | 1.1     | 1.08    | 1.09    | 1.1     | 1.05    | 1.06    | 1.0     |
| 15  | 1.1     | 1.1     | 1.08    | 1.1     | 1.05    | 1.04    | 1.05    |
| 16  | 1.1     | 1.09    | 1.07    | 1.07    | 1.05    | 1.05    | 1.0     |
| 17  | 0.901   | 0.914   | 0.899   | 0.87    | 0.867   | 0.886   | 0.765   |
| 18  | 0.899   | 0.875   | 0.872   | 0.914   | 0.866   | 0.883   | 0.852   |
| 19  | 0.903   | 0.915   | 0.877   | 0.903   | 0.858   | 0.881   | 0.852   |
| 20  | 0.867   | 0.918   | 0.888   | 0.876   | 0.863   | 0.888   | 0.852   |
| 21  | 0.897   | 0.937   | 0.886   | 0.895   | 0.864   | 0.888   | 0.853   |
| 22  | 0.863   | 0.935   | 0.884   | 0.9     | 0.864   | 0.888   | 0.855   |
| 23  | 0.899   | 0.912   | 0.891   | 0.903   | 0.865   | 0.889   | 0.856   |
| 24  | 0.899   | 0.912   | 0.871   | 0.9     | 0.821   | 0.88    | 0.85    |
| 25  | 1.0     | 1.03    | 0.986   | 1.02    | 0.993   | 1.01    | 0.935   |
| 26  | 1.01    | 1.03    | 0.989   | 1.01    | 0.978   | 0.961   | 0.971   |
| 27  | 1.01    | 1.03    | 0.932   | 1.01    | 0.975   | 1.01    | 0.924   |
| 28  | 1.0     | 1.03    | 0.989   | 1.02    | 0.982   | 0.96    | 0.976   |
| 29  | 1.01    | 1.03    | 0.993   | 1.02    | 0.978   | 1.01    | 0.892   |
| 30  | 1.01    | 0.984   | 0.996   | 1.02    | 0.934   | 1.01    | 0.981   |
| 31  | 1.01    | 1.03    | 0.993   | 0.971   | 0.98    | 1.02    | 0.977   |
| 32  | 0.995   | 1.03    | 0.987   | 1.01    | 0.973   | 1.01    | 0.978   |

│ Row │ NCols24 │ NCols25 │ NCols26 │ NCols27 │ NCols28 │ NCols29 │ NCols30 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 1.03    | 0.973   | 0.99    | 1.0     | 0.983   | 0.937   | 0.927   |
| 4   | 0.317   | 0.309   | 0.316   | 0.281   | 0.308   | 0.309   | 0.276   |
| 5   | 1.1     | 1.03    | 1.06    | 1.07    | 1.05    | 0.991   | 1.0     |
| 6   | 1.1     | 1.03    | 1.06    | 1.07    | 1.04    | 0.991   | 1.0     |
| 7   | 1.1     | 1.03    | 1.06    | 1.07    | 1.04    | 0.99    | 1.0     |
| 8   | 1.08    | 1.02    | 1.06    | 1.05    | 1.02    | 0.982   | 0.997   |
| 9   | 1.07    | 1.04    | 1.05    | 1.06    | 1.01    | 1.03    | 1.02    |
| 10  | 1.07    | 1.03    | 1.05    | 1.06    | 1.07    | 1.04    | 1.04    |
| 11  | 1.07    | 1.03    | 1.05    | 1.06    | 1.07    | 1.03    | 1.03    |
| 12  | 1.07    | 1.05    | 1.06    | 1.06    | 1.06    | 1.04    | 1.02    |
| 13  | 0.995   | 1.05    | 1.06    | 1.05    | 1.07    | 1.03    | 1.02    |
| 14  | 1.07    | 1.04    | 1.06    | 1.05    | 1.05    | 0.963   | 1.03    |
| 15  | 1.05    | 0.99    | 1.05    | 1.05    | 1.07    | 1.03    | 1.03    |
| 16  | 1.05    | 1.04    | 1.03    | 1.05    | 0.99    | 1.03    | 1.02    |
| 17  | 0.87    | 0.848   | 0.867   | 0.847   | 0.858   | 0.841   | 0.853   |
| 18  | 0.869   | 0.811   | 0.871   | 0.849   | 0.864   | 0.844   | 0.806   |
| 19  | 0.887   | 0.862   | 0.869   | 0.848   | 0.861   | 0.841   | 0.854   |
| 20  | 0.866   | 0.864   | 0.822   | 0.85    | 0.853   | 0.841   | 0.853   |
| 21  | 0.824   | 0.862   | 0.87    | 0.847   | 0.827   | 0.842   | 0.856   |
| 22  | 0.872   | 0.864   | 0.869   | 0.847   | 0.858   | 0.843   | 0.81    |
| 23  | 0.868   | 0.863   | 0.836   | 0.849   | 0.824   | 0.847   | 0.858   |
| 24  | 0.837   | 0.878   | 0.864   | 0.847   | 0.854   | 0.842   | 0.85    |
| 25  | 1.01    | 0.983   | 0.92    | 0.991   | 1.02    | 0.999   | 1.01    |
| 26  | 1.02    | 0.984   | 0.947   | 0.991   | 0.976   | 0.989   | 1.01    |
| 27  | 1.0     | 0.945   | 1.0     | 0.987   | 1.02    | 0.998   | 1.01    |
| 28  | 1.02    | 0.984   | 1.0     | 0.918   | 1.02    | 0.997   | 1.01    |
| 29  | 1.01    | 0.981   | 0.922   | 0.992   | 1.02    | 0.997   | 1.01    |
| 30  | 1.01    | 0.982   | 1.0     | 0.998   | 1.02    | 0.997   | 1.01    |
| 31  | 1.02    | 0.983   | 0.996   | 0.997   | 1.02    | 0.991   | 1.01    |
| 32  | 1.01    | 0.906   | 0.99    | 0.934   | 0.937   | 0.936   | 1.01    |

│ Row │ NCols31 │ NCols32 │
│     │ Float64Float64 │
├─────┼─────────┼─────────┤
| 3   | 0.965   | 0.964   |
| 4   | 0.3     | 0.302   |
| 5   | 1.02    | 1.03    |
| 6   | 1.02    | 1.03    |
| 7   | 1.02    | 1.03    |
| 8   | 1.01    | 1.02    |
| 9   | 0.967   | 1.03    |
| 10  | 1.02    | 1.06    |
| 11  | 1.02    | 1.05    |
| 12  | 1.02    | 1.06    |
| 13  | 1.02    | 0.982   |
| 14  | 1.01    | 1.06    |
| 15  | 1.02    | 1.05    |
| 16  | 1.0     | 1.02    |
| 17  | 0.841   | 0.857   |
| 18  | 0.838   | 0.853   |
| 19  | 0.804   | 0.803   |
| 20  | 0.845   | 0.854   |
| 21  | 0.856   | 0.853   |
| 22  | 0.839   | 0.853   |
| 23  | 0.841   | 0.854   |
| 24  | 0.837   | 0.85    |
| 25  | 0.985   | 1.0     |
| 26  | 0.988   | 1.01    |
| 27  | 0.984   | 1.01    |
| 28  | 0.986   | 1.01    |
| 29  | 0.985   | 1.01    |
| 30  | 0.982   | 0.923   |
| 31  | 0.995   | 1.01    |
| 32  | 0.976   | 0.996   |

All in all, PaddedMatrices is the clear fastest over this range of matrix sizes. If your matrices can be padded, or you’re working in C++ and want to be able to compile into an executable or shared library, Blaze is a great choice.

MKL JIT also did well (and has C bindings), and like PaddedMatrices, the JIT doesn’t force you to specify sizes ahead of time.

While padding made a large difference for a few of the tested libraries, it actually made little difference to PaddedMatrices itself:

In [30]:
pad_test = round.(minimum_times[end,:,32,:,1] ./ minimum_times[end,:,32,:,2], sigdigits = 3).parent |>
    DataFrame
print_results(pad_test, 2, 2)
30×30 DataFrame
│ Row │ NCols3  │ NCols4  │ NCols5  │ NCols6  │ NCols7  │ NCols8  │ NCols9  │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 0.952   | 0.946   | 0.98    | 1.02    | 1.1     | 1.02    | 1.01    |
| 4   | 1.0     | 0.996   | 0.986   | 1.01    | 0.921   | 1.08    | 1.01    |
| 5   | 0.9     | 0.867   | 0.903   | 0.953   | 0.915   | 0.962   | 0.905   |
| 6   | 0.946   | 0.934   | 0.883   | 0.96    | 0.914   | 0.843   | 0.937   |
| 7   | 0.943   | 0.836   | 0.955   | 0.917   | 0.942   | 1.03    | 0.92    |
| 8   | 0.997   | 0.983   | 0.924   | 1.03    | 1.1     | 0.991   | 1.0     |
| 9   | 0.867   | 0.883   | 0.977   | 0.989   | 0.987   | 0.898   | 0.997   |
| 10  | 0.878   | 0.917   | 0.972   | 0.972   | 0.964   | 1.01    | 0.951   |
| 11  | 0.867   | 0.919   | 0.953   | 0.956   | 0.945   | 0.916   | 0.972   |
| 12  | 0.911   | 0.963   | 0.982   | 0.99    | 0.979   | 0.987   | 0.988   |
| 13  | 0.898   | 0.919   | 0.958   | 0.964   | 0.929   | 0.917   | 0.931   |
| 14  | 0.922   | 0.947   | 0.952   | 0.963   | 0.948   | 0.846   | 0.943   |
| 15  | 0.906   | 0.89    | 0.966   | 0.965   | 0.932   | 0.827   | 0.935   |
| 16  | 1.05    | 1.01    | 1.01    | 0.999   | 1.0     | 1.0     | 1.02    |
| 17  | 0.934   | 0.951   | 0.938   | 0.963   | 0.973   | 1.0     | 0.996   |
| 18  | 0.945   | 0.928   | 0.922   | 0.924   | 0.942   | 0.97    | 0.969   |
| 19  | 0.947   | 0.961   | 0.975   | 0.916   | 0.942   | 0.943   | 0.93    |
| 20  | 0.956   | 0.962   | 0.971   | 0.932   | 0.976   | 0.989   | 0.985   |
| 21  | 0.91    | 0.942   | 0.912   | 0.919   | 0.936   | 0.951   | 0.933   |
| 22  | 0.925   | 0.884   | 0.911   | 0.924   | 0.926   | 0.977   | 0.972   |
| 23  | 0.964   | 0.927   | 0.923   | 0.924   | 0.978   | 0.951   | 0.944   |
| 24  | 0.949   | 1.0     | 0.961   | 1.01    | 1.05    | 1.0     | 1.0     |
| 25  | 0.978   | 0.933   | 0.934   | 0.962   | 0.944   | 0.97    | 0.995   |
| 26  | 0.962   | 0.92    | 0.941   | 0.933   | 0.95    | 0.968   | 0.986   |
| 27  | 0.902   | 0.916   | 0.973   | 0.981   | 0.93    | 0.951   | 0.977   |
| 28  | 0.983   | 0.95    | 0.947   | 0.986   | 0.983   | 0.998   | 0.995   |
| 29  | 0.927   | 0.916   | 0.97    | 0.978   | 0.929   | 0.955   | 0.98    |
| 30  | 0.94    | 0.917   | 0.933   | 0.994   | 0.952   | 0.973   | 0.989   |
| 31  | 0.942   | 0.944   | 0.951   | 0.98    | 0.927   | 0.953   | 0.978   |
| 32  | 1.0     | 1.02    | 1.05    | 1.0     | 0.98    | 1.01    | 0.997   |

│ Row │ NCols10 │ NCols11 │ NCols12 │ NCols13 │ NCols14 │ NCols15 │ NCols16 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 0.867   | 0.969   | 0.989   | 1.02    | 0.978   | 0.982   | 1.02    |
| 4   | 1.06    | 1.0     | 1.0     | 0.987   | 1.02    | 0.95    | 1.0     |
| 5   | 0.979   | 0.92    | 0.955   | 0.941   | 0.957   | 0.95    | 1.02    |
| 6   | 0.88    | 0.95    | 0.978   | 0.969   | 0.971   | 0.947   | 0.971   |
| 7   | 0.972   | 0.933   | 0.98    | 0.935   | 0.997   | 0.939   | 0.971   |
| 8   | 1.0     | 1.0     | 1.03    | 1.0     | 1.0     | 1.0     | 1.05    |
| 9   | 0.98    | 0.989   | 0.992   | 0.993   | 0.986   | 0.97    | 0.97    |
| 10  | 0.982   | 0.989   | 0.959   | 0.956   | 0.958   | 0.964   | 0.966   |
| 11  | 0.964   | 0.965   | 0.962   | 0.968   | 0.932   | 0.959   | 0.959   |
| 12  | 1.03    | 0.962   | 0.994   | 0.989   | 0.986   | 0.972   | 0.969   |
| 13  | 0.957   | 0.957   | 0.961   | 0.964   | 0.951   | 0.961   | 0.962   |
| 14  | 0.967   | 0.956   | 0.983   | 0.989   | 0.956   | 0.965   | 0.961   |
| 15  | 0.96    | 0.955   | 0.925   | 0.979   | 0.976   | 0.961   | 0.958   |
| 16  | 1.0     | 0.999   | 1.0     | 1.02    | 1.0     | 1.0     | 0.996   |
| 17  | 0.933   | 0.964   | 1.0     | 1.01    | 0.993   | 0.989   | 0.976   |
| 18  | 0.918   | 0.959   | 0.994   | 0.997   | 0.981   | 0.968   | 0.956   |
| 19  | 0.912   | 0.953   | 1.01    | 1.01    | 0.974   | 0.968   | 0.964   |
| 20  | 0.986   | 0.982   | 1.02    | 1.02    | 1.01    | 0.992   | 1.0     |
| 21  | 0.91    | 0.951   | 0.999   | 0.98    | 0.983   | 0.966   | 0.98    |
| 22  | 0.93    | 0.965   | 1.0     | 0.997   | 0.985   | 0.972   | 0.96    |
| 23  | 0.911   | 0.949   | 0.981   | 0.982   | 0.976   | 1.0     | 0.983   |
| 24  | 0.997   | 0.999   | 1.01    | 1.01    | 1.0     | 1.01    | 0.997   |
| 25  | 0.991   | 0.971   | 0.963   | 0.969   | 0.925   | 0.974   | 0.967   |
| 26  | 1.0     | 0.967   | 0.959   | 0.959   | 0.955   | 0.967   | 0.951   |
| 27  | 0.964   | 0.957   | 0.956   | 0.953   | 0.902   | 0.955   | 0.961   |
| 28  | 0.984   | 0.967   | 0.965   | 0.975   | 0.968   | 0.987   | 0.981   |
| 29  | 0.973   | 0.961   | 0.953   | 0.941   | 0.948   | 0.963   | 0.961   |
| 30  | 0.982   | 0.97    | 0.955   | 0.96    | 0.954   | 0.959   | 0.968   |
| 31  | 0.968   | 0.957   | 0.951   | 0.946   | 0.949   | 0.955   | 0.937   |
| 32  | 0.997   | 0.996   | 0.999   | 0.999   | 1.0     | 1.0     | 1.0     |

│ Row │ NCols17 │ NCols18 │ NCols19 │ NCols20 │ NCols21 │ NCols22 │ NCols23 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 0.974   | 1.01    | 0.953   | 0.963   | 0.967   | 0.973   | 0.98    |
| 4   | 1.0     | 1.0     | 0.985   | 1.01    | 0.998   | 1.0     | 0.955   |
| 5   | 0.968   | 0.943   | 0.942   | 0.954   | 0.939   | 0.96    | 0.945   |
| 6   | 1.01    | 0.971   | 0.959   | 0.954   | 0.929   | 0.945   | 0.931   |
| 7   | 0.955   | 0.954   | 0.949   | 0.97    | 0.939   | 0.943   | 0.922   |
| 8   | 0.979   | 1.0     | 0.948   | 0.987   | 1.0     | 1.0     | 1.0     |
| 9   | 1.03    | 1.04    | 1.03    | 1.03    | 1.01    | 1.02    | 1.01    |
| 10  | 1.0     | 1.02    | 0.984   | 1.02    | 1.0     | 1.0     | 1.0     |
| 11  | 1.0     | 1.01    | 1.02    | 0.995   | 1.0     | 0.926   | 0.991   |
| 12  | 0.995   | 1.03    | 1.02    | 1.0     | 1.0     | 1.02    | 1.01    |
| 13  | 1.01    | 1.01    | 1.02    | 1.04    | 1.0     | 1.0     | 0.994   |
| 14  | 1.01    | 1.01    | 1.03    | 1.02    | 1.01    | 1.02    | 0.936   |
| 15  | 0.986   | 1.01    | 1.01    | 1.01    | 1.01    | 1.01    | 1.01    |
| 16  | 0.985   | 1.02    | 1.0     | 1.01    | 1.0     | 0.999   | 0.987   |
| 17  | 0.991   | 0.979   | 0.955   | 0.979   | 0.968   | 0.967   | 0.971   |
| 18  | 0.994   | 0.972   | 0.942   | 0.945   | 0.955   | 0.959   | 0.966   |
| 19  | 0.973   | 0.973   | 0.938   | 0.946   | 0.952   | 0.951   | 0.949   |
| 20  | 1.0     | 0.983   | 0.962   | 0.967   | 0.971   | 0.97    | 0.973   |
| 21  | 1.0     | 0.972   | 0.939   | 0.941   | 0.951   | 0.95    | 0.95    |
| 22  | 1.0     | 0.974   | 0.947   | 0.957   | 0.962   | 0.97    | 0.94    |
| 23  | 0.976   | 0.966   | 0.942   | 0.95    | 0.955   | 0.934   | 0.937   |
| 24  | 1.0     | 0.996   | 0.998   | 1.0     | 1.0     | 1.0     | 0.999   |
| 25  | 0.968   | 0.985   | 0.984   | 0.991   | 0.972   | 0.989   | 0.98    |
| 26  | 0.97    | 0.982   | 0.956   | 0.966   | 0.967   | 0.983   | 0.951   |
| 27  | 0.916   | 0.969   | 0.981   | 0.935   | 0.94    | 0.945   | 0.944   |
| 28  | 0.98    | 0.976   | 0.97    | 0.961   | 0.981   | 0.975   | 0.961   |
| 29  | 0.952   | 0.97    | 0.948   | 0.954   | 0.966   | 0.956   | 0.967   |
| 30  | 0.969   | 0.981   | 0.982   | 0.959   | 0.935   | 0.962   | 0.955   |
| 31  | 0.943   | 0.958   | 0.952   | 0.949   | 0.959   | 0.963   | 0.958   |
| 32  | 0.988   | 0.988   | 1.02    | 1.0     | 0.99    | 1.01    | 0.986   |

│ Row │ NCols24 │ NCols25 │ NCols26 │ NCols27 │ NCols28 │ NCols29 │ NCols30 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 0.982   | 0.97    | 0.979   | 0.946   | 0.986   | 0.981   | 0.958   |
| 4   | 0.998   | 0.978   | 1.02    | 0.999   | 1.0     | 1.0     | 0.999   |
| 5   | 0.932   | 0.925   | 0.931   | 0.929   | 0.966   | 0.912   | 0.94    |
| 6   | 0.953   | 0.95    | 0.937   | 0.97    | 0.94    | 0.96    | 0.966   |
| 7   | 0.98    | 0.927   | 0.797   | 0.798   | 0.944   | 0.934   | 0.95    |
| 8   | 1.0     | 1.02    | 1.02    | 1.0     | 1.01    | 1.0     | 1.0     |
| 9   | 1.01    | 1.0     | 1.0     | 1.01    | 0.962   | 0.959   | 0.962   |
| 10  | 0.973   | 0.987   | 0.975   | 0.998   | 0.945   | 0.893   | 0.943   |
| 11  | 0.97    | 0.98    | 0.985   | 0.976   | 0.939   | 0.922   | 0.926   |
| 12  | 1.01    | 1.01    | 1.0     | 0.995   | 0.962   | 0.966   | 0.966   |
| 13  | 0.974   | 0.968   | 0.978   | 0.981   | 0.937   | 0.921   | 0.933   |
| 14  | 0.98    | 0.986   | 0.983   | 0.985   | 0.953   | 0.937   | 0.934   |
| 15  | 1.0     | 0.983   | 0.987   | 0.986   | 0.943   | 0.929   | 0.917   |
| 16  | 1.0     | 1.0     | 0.993   | 0.999   | 1.01    | 1.03    | 1.0     |
| 17  | 0.97    | 0.972   | 0.972   | 0.994   | 0.983   | 0.996   | 0.992   |
| 18  | 0.955   | 0.966   | 0.96    | 0.957   | 0.959   | 0.949   | 0.942   |
| 19  | 0.952   | 0.906   | 0.936   | 0.947   | 0.939   | 0.948   | 0.971   |
| 20  | 0.97    | 0.972   | 0.969   | 0.982   | 0.977   | 0.985   | 0.963   |
| 21  | 0.949   | 0.94    | 0.942   | 0.961   | 0.952   | 0.959   | 0.954   |
| 22  | 0.94    | 0.946   | 0.912   | 0.963   | 0.948   | 0.976   | 0.967   |
| 23  | 0.933   | 0.937   | 0.943   | 0.965   | 0.938   | 0.952   | 0.951   |
| 24  | 1.02    | 1.01    | 1.0     | 1.01    | 0.999   | 1.0     | 1.01    |
| 25  | 0.981   | 0.978   | 0.977   | 0.974   | 0.981   | 0.972   | 0.986   |
| 26  | 0.968   | 0.953   | 0.935   | 0.983   | 0.973   | 0.988   | 0.977   |
| 27  | 0.96    | 0.933   | 0.923   | 0.908   | 0.949   | 0.953   | 0.956   |
| 28  | 0.982   | 0.982   | 0.976   | 0.975   | 0.972   | 0.958   | 0.97    |
| 29  | 0.953   | 0.942   | 0.944   | 0.962   | 0.962   | 0.983   | 0.962   |
| 30  | 0.936   | 0.95    | 0.942   | 0.96    | 0.959   | 0.966   | 0.969   |
| 31  | 0.945   | 0.947   | 0.953   | 0.957   | 0.931   | 0.958   | 0.965   |
| 32  | 1.0     | 0.999   | 1.01    | 0.998   | 1.02    | 0.99    | 0.99    |

│ Row │ NCols31 │ NCols32 │
│     │ Float64Float64 │
├─────┼─────────┼─────────┤
| 3   | 0.988   | 0.99    |
| 4   | 1.0     | 1.0     |
| 5   | 0.979   | 0.982   |
| 6   | 0.977   | 0.98    |
| 7   | 0.979   | 0.982   |
| 8   | 1.0     | 1.0     |
| 9   | 0.97    | 0.97    |
| 10  | 0.952   | 0.942   |
| 11  | 0.88    | 0.922   |
| 12  | 0.967   | 0.964   |
| 13  | 0.927   | 0.932   |
| 14  | 0.945   | 0.948   |
| 15  | 0.919   | 0.924   |
| 16  | 1.0     | 1.0     |
| 17  | 0.974   | 0.97    |
| 18  | 0.941   | 0.948   |
| 19  | 0.945   | 0.96    |
| 20  | 0.965   | 0.965   |
| 21  | 0.974   | 0.954   |
| 22  | 0.963   | 0.976   |
| 23  | 0.962   | 0.961   |
| 24  | 0.993   | 0.986   |
| 25  | 0.98    | 0.989   |
| 26  | 0.931   | 0.983   |
| 27  | 0.96    | 0.945   |
| 28  | 0.973   | 0.983   |
| 29  | 0.929   | 0.965   |
| 30  | 0.976   | 0.976   |
| 31  | 0.959   | 0.952   |
| 32  | 0.999   | 1.01    |

These are ratios of runtimes of padded vs unpadded. As most numbers are less than 1, padding did provide a slight performance advantage. Padding’s advantage is likely to be bigger than pre-avx512 architectures, because the unpadded avx512 code uses bitmasks to handle the non-integer multiples of vector widths. Eg, when the matrices are padded and $M = 14$, there will be two rows of padding so that we can pretend $M=16$ in vectorizing the code. When matrices are not padded, PaddedMatrices will instead “mask” the last two elements of a vector, which achieves the same effect.

I’ve found the advantages of not using padded are:

  • It let’s you efficiently operate on larger matrices blockwise. You can’t treat those blockwise pieces as padded, or you’ll end up overwriting parts of the full matrix that’re outside of your submatrix!
  • Easier for other libraries to interface, because many won’t be using matrices with padding.
  • You need masking anyway when performing reductions (eg, summing all values of the matrix), because you don’t want the junk padding to contaminate results.

But there are many situations where these don’t apply, so I do think padding is a great approach in general.

For all my above comparisons I used minimum times instead of medium. Many argue (and I agree) that minimum runtimes are less biased when you don’t have a running garbage collector (Julia’s garbage collector wasn’t being triggered, because we weren’t using Julia to allocate memory when multiplying our matrices).

If you do have a garbage collector, that is consistent noise I think should be averaged in. Other sources of noise, however, are all random events that add on to the time. Regardless, median times were hardly higher than minimum times:

In [31]:
pad_test = round.(median_times[end,:,32,:,2] ./ minimum_times[end,:,32,:,2], sigdigits = 3).parent |>
    DataFrame
print_results(pad_test, 2, 2)
30×30 DataFrame
│ Row │ NCols3  │ NCols4  │ NCols5  │ NCols6  │ NCols7  │ NCols8  │ NCols9  │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 1.06    | 1.01    | 1.03    | 1.02    | 1.0     | 1.0     | 1.0     |
| 4   | 1.01    | 1.01    | 1.01    | 1.02    | 1.0     | 1.0     | 1.01    |
| 5   | 1.0     | 1.01    | 1.02    | 1.0     | 1.01    | 1.01    | 1.0     |
| 6   | 1.0     | 1.01    | 1.01    | 1.0     | 1.02    | 1.02    | 1.0     |
| 7   | 1.0     | 1.02    | 1.01    | 1.03    | 1.01    | 1.0     | 1.0     |
| 8   | 1.0     | 1.0     | 1.02    | 1.01    | 1.02    | 1.01    | 1.01    |
| 9   | 1.01    | 1.03    | 1.0     | 1.01    | 1.01    | 1.05    | 1.01    |
| 10  | 1.01    | 1.0     | 1.03    | 1.02    | 1.01    | 1.04    | 1.0     |
| 11  | 1.0     | 1.0     | 1.04    | 1.03    | 1.01    | 1.0     | 1.01    |
| 12  | 1.02    | 1.02    | 1.02    | 1.03    | 1.01    | 1.01    | 1.01    |
| 13  | 1.0     | 1.0     | 1.0     | 1.03    | 1.02    | 1.0     | 1.02    |
| 14  | 1.01    | 1.0     | 1.01    | 1.02    | 1.02    | 1.02    | 1.02    |
| 15  | 1.03    | 1.01    | 1.0     | 1.03    | 1.02    | 1.01    | 1.02    |
| 16  | 1.01    | 1.01    | 1.02    | 1.0     | 1.0     | 1.01    | 1.0     |
| 17  | 1.01    | 1.01    | 1.02    | 1.01    | 1.0     | 1.0     | 1.01    |
| 18  | 1.01    | 1.01    | 1.03    | 1.0     | 1.02    | 1.01    | 1.02    |
| 19  | 1.0     | 1.0     | 1.01    | 1.0     | 1.03    | 1.01    | 1.01    |
| 20  | 1.01    | 1.01    | 1.02    | 1.0     | 1.01    | 1.02    | 1.01    |
| 21  | 1.01    | 1.03    | 1.01    | 1.01    | 1.02    | 1.02    | 1.0     |
| 22  | 1.01    | 1.01    | 1.03    | 1.0     | 1.0     | 1.01    | 1.01    |
| 23  | 1.0     | 1.01    | 1.01    | 1.01    | 1.05    | 1.01    | 1.01    |
| 24  | 1.01    | 1.0     | 1.02    | 1.0     | 1.0     | 1.0     | 1.0     |
| 25  | 1.02    | 1.01    | 1.02    | 1.01    | 1.0     | 1.0     | 1.07    |
| 26  | 1.02    | 1.02    | 1.01    | 1.01    | 1.01    | 1.0     | 1.02    |
| 27  | 1.02    | 1.01    | 1.0     | 1.01    | 1.0     | 1.0     | 1.01    |
| 28  | 1.03    | 1.02    | 1.01    | 1.0     | 1.0     | 1.01    | 1.01    |
| 29  | 1.01    | 1.01    | 1.0     | 1.01    | 1.0     | 1.0     | 1.01    |
| 30  | 1.01    | 1.01    | 1.0     | 1.0     | 1.0     | 1.0     | 1.01    |
| 31  | 1.0     | 1.02    | 1.03    | 1.0     | 1.0     | 1.0     | 1.01    |
| 32  | 1.02    | 1.0     | 1.01    | 1.0     | 1.0     | 1.0     | 1.0     |

│ Row │ NCols10 │ NCols11 │ NCols12 │ NCols13 │ NCols14 │ NCols15 │ NCols16 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 1.01    | 1.01    | 1.01    | 1.01    | 1.0     | 1.01    | 1.01    |
| 4   | 1.01    | 1.02    | 1.0     | 1.0     | 1.03    | 1.0     | 1.0     |
| 5   | 1.0     | 1.0     | 1.0     | 1.02    | 1.01    | 1.13    | 1.0     |
| 6   | 1.03    | 1.0     | 1.18    | 1.01    | 1.0     | 1.01    | 1.01    |
| 7   | 1.01    | 1.01    | 1.01    | 1.04    | 1.01    | 1.01    | 1.0     |
| 8   | 1.02    | 1.01    | 1.02    | 1.0     | 1.01    | 1.06    | 1.0     |
| 9   | 1.01    | 1.01    | 1.0     | 1.0     | 1.01    | 1.01    | 1.0     |
| 10  | 1.0     | 1.03    | 1.01    | 1.04    | 1.01    | 1.01    | 1.0     |
| 11  | 1.01    | 1.02    | 1.01    | 1.01    | 1.02    | 1.0     | 1.0     |
| 12  | 1.02    | 1.01    | 1.0     | 1.0     | 1.08    | 1.01    | 1.01    |
| 13  | 1.03    | 1.0     | 1.0     | 1.02    | 1.02    | 1.0     | 1.0     |
| 14  | 1.02    | 1.01    | 1.0     | 1.0     | 1.01    | 1.02    | 1.0     |
| 15  | 1.03    | 1.03    | 1.02    | 1.0     | 1.01    | 1.0     | 1.04    |
| 16  | 1.0     | 1.03    | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 17  | 1.0     | 1.0     | 1.0     | 1.0     | 1.02    | 1.01    | 1.0     |
| 18  | 1.0     | 1.0     | 1.0     | 1.04    | 1.02    | 1.02    | 1.01    |
| 19  | 1.0     | 1.0     | 1.01    | 1.01    | 1.01    | 1.01    | 1.01    |
| 20  | 1.0     | 1.0     | 1.0     | 1.03    | 1.02    | 1.02    | 1.01    |
| 21  | 1.0     | 1.0     | 1.0     | 1.01    | 1.01    | 1.01    | 1.03    |
| 22  | 1.0     | 1.0     | 1.01    | 1.01    | 1.02    | 1.02    | 1.02    |
| 23  | 1.0     | 1.0     | 1.0     | 1.01    | 1.01    | 1.02    | 1.02    |
| 24  | 1.04    | 1.02    | 1.03    | 1.01    | 1.01    | 1.0     | 1.01    |
| 25  | 1.01    | 1.01    | 1.02    | 1.0     | 1.0     | 1.0     | 1.0     |
| 26  | 1.02    | 1.02    | 1.0     | 1.0     | 1.0     | 1.01    | 1.0     |
| 27  | 1.02    | 1.01    | 1.0     | 1.0     | 1.01    | 1.0     | 1.0     |
| 28  | 1.02    | 1.02    | 1.0     | 1.0     | 1.01    | 1.01    | 1.0     |
| 29  | 1.03    | 1.02    | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 30  | 1.04    | 1.02    | 1.0     | 1.0     | 1.0     | 1.0     | 1.01    |
| 31  | 1.01    | 1.01    | 1.0     | 1.01    | 1.0     | 1.01    | 1.05    |
| 32  | 1.01    | 1.01    | 1.02    | 1.0     | 1.0     | 1.0     | 1.01    |

│ Row │ NCols17 │ NCols18 │ NCols19 │ NCols20 │ NCols21 │ NCols22 │ NCols23 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 1.03    | 1.0     | 1.1     | 1.01    | 1.01    | 1.0     | 1.0     |
| 4   | 1.0     | 1.0     | 1.01    | 1.0     | 1.0     | 1.04    | 1.0     |
| 5   | 1.01    | 1.01    | 1.01    | 1.01    | 1.0     | 1.02    | 1.12    |
| 6   | 1.12    | 1.01    | 1.0     | 1.01    | 1.01    | 1.0     | 1.0     |
| 7   | 1.02    | 1.02    | 1.01    | 1.0     | 1.01    | 1.02    | 1.0     |
| 8   | 1.0     | 1.0     | 1.0     | 1.0     | 1.01    | 1.0     | 1.0     |
| 9   | 1.0     | 1.0     | 1.01    | 1.01    | 1.0     | 1.0     | 1.0     |
| 10  | 1.0     | 1.0     | 1.01    | 1.01    | 1.01    | 1.01    | 1.01    |
| 11  | 1.0     | 1.07    | 1.01    | 1.01    | 1.0     | 1.02    | 1.02    |
| 12  | 1.0     | 1.03    | 1.01    | 1.01    | 1.01    | 1.01    | 1.0     |
| 13  | 1.0     | 1.0     | 1.03    | 1.0     | 1.01    | 1.01    | 1.02    |
| 14  | 1.0     | 1.02    | 1.04    | 1.02    | 1.03    | 1.01    | 1.01    |
| 15  | 1.0     | 1.01    | 1.02    | 1.01    | 1.02    | 1.01    | 1.01    |
| 16  | 1.02    | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 17  | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 18  | 1.02    | 1.0     | 1.0     | 1.0     | 1.01    | 1.01    | 1.0     |
| 19  | 1.02    | 1.0     | 1.0     | 1.06    | 1.02    | 1.01    | 1.01    |
| 20  | 1.02    | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 21  | 1.02    | 1.01    | 1.01    | 1.01    | 1.01    | 1.01    | 1.02    |
| 22  | 1.02    | 1.0     | 1.0     | 1.0     | 1.07    | 1.0     | 1.0     |
| 23  | 1.02    | 1.0     | 1.0     | 1.01    | 1.02    | 1.01    | 1.01    |
| 24  | 1.01    | 1.0     | 1.02    | 1.0     | 1.0     | 1.0     | 1.0     |
| 25  | 1.03    | 1.0     | 1.0     | 1.0     | 1.01    | 1.0     | 1.0     |
| 26  | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 27  | 1.0     | 1.0     | 1.0     | 1.03    | 1.01    | 1.0     | 1.0     |
| 28  | 1.0     | 1.0     | 1.01    | 1.0     | 1.0     | 1.0     | 1.0     |
| 29  | 1.01    | 1.01    | 1.0     | 1.0     | 1.01    | 1.01    | 1.0     |
| 30  | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.01    | 1.0     |
| 31  | 1.0     | 1.0     | 1.0     | 1.01    | 1.01    | 1.01    | 1.01    |
| 32  | 1.0     | 1.0     | 1.02    | 1.0     | 1.0     | 1.01    | 1.0     |

│ Row │ NCols24 │ NCols25 │ NCols26 │ NCols27 │ NCols28 │ NCols29 │ NCols30 │
│     │ Float64Float64Float64Float64Float64Float64Float64 │
├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
| 3   | 1.09    | 1.0     | 1.01    | 1.01    | 1.0     | 1.0     | 1.1     |
| 4   | 1.03    | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     |
| 5   | 1.0     | 1.01    | 1.01    | 1.0     | 1.18    | 1.01    | 1.0     |
| 6   | 1.01    | 1.03    | 1.01    | 1.01    | 1.02    | 1.02    | 1.01    |
| 7   | 1.15    | 1.0     | 1.02    | 1.02    | 1.19    | 1.0     | 1.01    |
| 8   | 1.0     | 1.01    | 1.0     | 1.0     | 1.0     | 1.0     | 1.01    |
| 9   | 1.0     | 1.0     | 1.01    | 1.0     | 1.0     | 1.0     | 1.0     |
| 10  | 1.01    | 1.03    | 1.09    | 1.01    | 1.0     | 1.01    | 1.0     |
| 11  | 1.01    | 1.01    | 1.01    | 1.09    | 1.02    | 1.01    | 1.05    |
| 12  | 1.0     | 1.07    | 1.0     | 1.0     | 1.0     | 1.05    | 1.04    |
| 13  | 1.01    | 1.1     | 1.02    | 1.01    | 1.0     | 1.01    | 1.05    |
| 14  | 1.08    | 1.02    | 1.01    | 1.01    | 1.0     | 1.01    | 1.0     |
| 15  | 1.01    | 1.01    | 1.1     | 1.01    | 1.01    | 1.02    | 1.01    |
| 16  | 1.0     | 1.01    | 1.0     | 1.0     | 1.02    | 1.0     | 1.0     |
| 17  | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.01    |
| 18  | 1.0     | 1.01    | 1.0     | 1.0     | 1.01    | 1.01    | 1.0     |
| 19  | 1.02    | 1.0     | 1.01    | 1.0     | 1.05    | 1.0     | 1.01    |
| 20  | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.01    |
| 21  | 1.01    | 1.01    | 1.02    | 1.0     | 1.0     | 1.01    | 1.01    |
| 22  | 1.0     | 1.0     | 1.0     | 1.0     | 1.04    | 1.0     | 1.01    |
| 23  | 1.02    | 1.02    | 1.02    | 1.0     | 1.01    | 1.0     | 1.01    |
| 24  | 1.0     | 1.0     | 1.0     | 1.0     | 1.0     | 1.01    | 1.01    |
| 25  | 1.0     | 1.0     | 1.01    | 1.01    | 1.01    | 1.0     | 1.0     |
| 26  | 1.01    | 1.03    | 1.0     | 1.0     | 1.01    | 1.0     | 1.03    |
| 27  | 1.0     | 1.04    | 1.0     | 1.0     | 1.0     | 1.01    | 1.01    |
| 28  | 1.0     | 1.0     | 1.0     | 1.01    | 1.01    | 1.0     | 1.03    |
| 29  | 1.0     | 1.0     | 1.0     | 1.0     | 1.04    | 1.01    | 1.01    |
| 30  | 1.0     | 1.0     | 1.01    | 1.01    | 1.01    | 1.01    | 1.01    |
| 31  | 1.0     | 1.01    | 1.01    | 1.0     | 1.01    | 1.01    | 1.01    |
| 32  | 1.0     | 1.01    | 1.01    | 1.02    | 1.01    | 1.03    | 1.01    |

│ Row │ NCols31 │ NCols32 │
│     │ Float64Float64 │
├─────┼─────────┼─────────┤
| 3   | 1.01    | 1.0     |
| 4   | 1.0     | 1.0     |
| 5   | 1.0     | 1.0     |
| 6   | 1.01    | 1.0     |
| 7   | 1.0     | 1.0     |
| 8   | 1.0     | 1.0     |
| 9   | 1.0     | 1.0     |
| 10  | 1.0     | 1.01    |
| 11  | 1.0     | 1.05    |
| 12  | 1.0     | 1.0     |
| 13  | 1.01    | 1.01    |
| 14  | 1.0     | 1.0     |
| 15  | 1.0     | 1.0     |
| 16  | 1.0     | 1.0     |
| 17  | 1.0     | 1.0     |
| 18  | 1.01    | 1.01    |
| 19  | 1.01    | 1.01    |
| 20  | 1.0     | 1.01    |
| 21  | 1.0     | 1.01    |
| 22  | 1.0     | 1.0     |
| 23  | 1.0     | 1.02    |
| 24  | 1.01    | 1.01    |
| 25  | 1.01    | 1.01    |
| 26  | 1.01    | 1.0     |
| 27  | 1.0     | 1.01    |
| 28  | 1.02    | 1.01    |
| 29  | 1.0     | 1.01    |
| 30  | 1.0     | 1.0     |
| 31  | 1.01    | 1.0     |
| 32  | 1.01    | 1.01    |