# LoopSet Structure

The loopsets define loops as a set of operations that depend on one another, and also on loops. Cycles are not allowed, making it a directed acyclic graph. Let's use a set of nested loops performing matrix multiplication as an example. We can create a naive `LoopSet`

from an expression (naive due to being created without access to any type information):

```
julia> using LoopVectorization
julia> AmulBq = :(for m ∈ 1:M, n ∈ 1:N
C[m,n] = zero(eltype(B))
for k ∈ 1:K
C[m,n] += A[m,k] * B[k,n]
end
end);
julia> lsAmulB = LoopVectorization.LoopSet(AmulBq);
```

This LoopSet consists of seven operations that define the relationships within the loop:

```
julia> LoopVectorization.operations(lsAmulB)
7-element Array{LoopVectorization.Operation,1}:
var"##RHS#256" = var"##zero#257"
C[m, n] = var"##RHS#256"
var"##tempload#258" = A[m, k]
var"##tempload#259" = B[k, n]
var"##reduction#260" = var"##reductzero#261"
var"##reduction#260" = LoopVectorization.vfmadd_fast(var"##tempload#258", var"##tempload#259", var"##reduction#260")
var"##RHS#256" = LoopVectorization.reduce_to_add(var"##reduction#260", var"##RHS#256")
```

The act of performing a "reduction" across a loop introduces a few extra operations that manage creating a "zero" with respect to the reduction, and then combining with the specified value using `reduce_to_add`

, which performs any necessary type conversions, such as from an `Vec`

vector-type to a scalar, if necessary. This simplifies code generation, by making the functions agnostic with respect to the actual vectorization decisions the library makes.

Each operation is listed as depending on a set of loop iteration symbols:

```
julia> LoopVectorization.loopdependencies.(LoopVectorization.operations(lsAmulB))
7-element Array{Array{Symbol,1},1}:
[:m, :n]
[:m, :n]
[:m, :k]
[:k, :n]
[:m, :n]
[:m, :k, :n]
[:m, :n]
```

We can also see which of the operations each of these operations depend on:

```
julia> LoopVectorization.operations(lsAmulB)[6]
var"##reduction#260" = LoopVectorization.vfmadd_fast(var"##tempload#258", var"##tempload#259", var"##reduction#260")
julia> LoopVectorization.parents(ans)
3-element Array{LoopVectorization.Operation,1}:
var"##tempload#258" = A[m, k]
var"##tempload#259" = B[k, n]
var"##reduction#260" = var"##reductzero#261"
```

References to arrays are represented with an `ArrayReferenceMeta`

data structure:

```
julia> LoopVectorization.operations(lsAmulB)[3].ref
LoopVectorization.ArrayReferenceMeta(LoopVectorization.ArrayReference(:A, [:m, :k], Int8[0, 0]), Bool[1, 1], Symbol("##vptr##_A"))
```

It contains the name of the parent array (`:A`

), the indices `[:m,:k]`

, and a boolean vector (`Bool[1, 1]`

) indicating whether these indices are loop iterables. Note that the optimizer assumes arrays are column-major, and thus that it is efficient to read contiguous elements from the first index. In lower level terms, it means that high-throughput vmov instructions can be used rather than low-throughput gathers. Similar story for storing elements. When no axis has unit stride, the first given index will be the dummy `Symbol("##DISCONTIGUOUSSUBARRAY##")`

.

Currently, only single return values are supported (tuple destructuring is not supported in assignments).