LoopSet Structure
The loopsets define loops as a set of operations that depend on one another, and also on loops. Cycles are not allowed, making it a directed acyclic graph. Let's use a set of nested loops performing matrix multiplication as an example. We can create a naive LoopSet
from an expression (naive due to being created without access to any type information):
julia> using LoopVectorization
julia> AmulBq = :(for m ∈ 1:M, n ∈ 1:N
C[m,n] = zero(eltype(B))
for k ∈ 1:K
C[m,n] += A[m,k] * B[k,n]
end
end);
julia> lsAmulB = LoopVectorization.LoopSet(AmulBq);
This LoopSet consists of seven operations that define the relationships within the loop:
julia> LoopVectorization.operations(lsAmulB)
7-element Array{LoopVectorization.Operation,1}:
var"##RHS#256" = var"##zero#257"
C[m, n] = var"##RHS#256"
var"##tempload#258" = A[m, k]
var"##tempload#259" = B[k, n]
var"##reduction#260" = var"##reductzero#261"
var"##reduction#260" = LoopVectorization.vfmadd_fast(var"##tempload#258", var"##tempload#259", var"##reduction#260")
var"##RHS#256" = LoopVectorization.reduce_to_add(var"##reduction#260", var"##RHS#256")
The act of performing a "reduction" across a loop introduces a few extra operations that manage creating a "zero" with respect to the reduction, and then combining with the specified value using reduce_to_add
, which performs any necessary type conversions, such as from an Vec
vector-type to a scalar, if necessary. This simplifies code generation, by making the functions agnostic with respect to the actual vectorization decisions the library makes.
Each operation is listed as depending on a set of loop iteration symbols:
julia> LoopVectorization.loopdependencies.(LoopVectorization.operations(lsAmulB))
7-element Array{Array{Symbol,1},1}:
[:m, :n]
[:m, :n]
[:m, :k]
[:k, :n]
[:m, :n]
[:m, :k, :n]
[:m, :n]
We can also see which of the operations each of these operations depend on:
julia> LoopVectorization.operations(lsAmulB)[6]
var"##reduction#260" = LoopVectorization.vfmadd_fast(var"##tempload#258", var"##tempload#259", var"##reduction#260")
julia> LoopVectorization.parents(ans)
3-element Array{LoopVectorization.Operation,1}:
var"##tempload#258" = A[m, k]
var"##tempload#259" = B[k, n]
var"##reduction#260" = var"##reductzero#261"
References to arrays are represented with an ArrayReferenceMeta
data structure:
julia> LoopVectorization.operations(lsAmulB)[3].ref
LoopVectorization.ArrayReferenceMeta(LoopVectorization.ArrayReference(:A, [:m, :k], Int8[0, 0]), Bool[1, 1], Symbol("##vptr##_A"))
It contains the name of the parent array (:A
), the indices [:m,:k]
, and a boolean vector (Bool[1, 1]
) indicating whether these indices are loop iterables. Note that the optimizer assumes arrays are column-major, and thus that it is efficient to read contiguous elements from the first index. In lower level terms, it means that high-throughput vmov instructions can be used rather than low-throughput gathers. Similar story for storing elements. When no axis has unit stride, the first given index will be the dummy Symbol("##DISCONTIGUOUSSUBARRAY##")
.
Currently, only single return values are supported (tuple destructuring is not supported in assignments).