Composite Types: DateTime Arrays
Currently, loops over some types are easier to work with than others. Here we show: (1) a sequential loop over Vector{DateTime}
that cannot have @turbo
applied directly, and (2) a solution that uses the interpreted integer representation of DateTime
.
This may be applicable if you have a composite type that may be represented with primitive types.
Setting up the Problem
Here's a simple problem involving timestamps:
Problem statement:
- Given: a vector of strictly increasing timestamps.
- Output: a vector of the same length starting at
0.0
and ending at1.0
. Each intermediate element is scaled proportionally to the length of time since the beginning.
Sample Output:
using Dates
sample_input = [
Dates.DateTime(2021, 5, 5, 10, 0, 0),
Dates.DateTime(2021, 5, 5, 10, 5, 15),
Dates.DateTime(2021, 5, 6, 10, 0, 0),
Dates.DateTime(2021, 5, 6, 10, 5, 15),
Dates.DateTime(2021, 5, 7, 10, 0, 20),
]
expected_output = [
0.0,
0.0018227057053581761,
0.499942136326814,
0.5017648420321722,
1.0,
]
First Attempt: Sequential version of the loop
This implementation satisfies the problem statement by iterating over the examples:
using Dates
function scale_timeseries_sequential(data::Vector{Dates.DateTime})
out = similar(data, Float64)
ϕ = (data[lastindex(data)] - data[1]).value
@inbounds for i ∈ eachindex(data)
out[i] = (data[i] - data[1]).value / ϕ
end
return out
end
Second Attempt: Turbo Loop
Our Vector{Dates.DateTime}
has an integer interpretation which we can take advantage of here. We'll reinterpret
our vector as Int
, make the needed adjustments, then apply the @turbo
macro to our loop:
using LoopVectorization, Dates
function scale_timeseries_turbo(data::Vector{Dates.DateTime})
# Interpret our DateTime vector as Int
tsi = reinterpret(Int, data)
out = similar(data, Float64)
# We've interpreted our data as integers, so we no longer need `.value`
ϕ = tsi[lastindex(tsi)] - tsi[1]
@turbo for i ∈ eachindex(tsi)
out[i] = (tsi[i] - tsi[1]) / ϕ
end
return out
end
Benchmarks
We'll benchmark with randomly generated data:
function generate_timestamps(N::Int64)
data = Vector{Dates.DateTime}(undef,N)
v = DateTime(1990, 1, 1, 0, 0, 0)
for i in 1:N
v += Second(rand(1:5, 1)[1])
data[i] =v
end
return data
end
Briefly, the benchmark suggests that the mean time for the sequential vs. turbo solution is a ~3x speedup while holding memory requirements constant:
julia> using BenchmarkTools
julia> data_100000 = generate_timestamps(100000);
julia> data_200000 = generate_timestamps(200000);
julia> @benchmark scale_timeseries_sequential(data_100000)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 318.864 μs … 967.760 μs ┊ GC (min … max): 0.00% … 40.41%
Time (median): 321.291 μs ┊ GC (median): 0.00%
Time (mean ± σ): 332.503 μs ± 52.040 μs ┊ GC (mean ± σ): 1.97% ± 6.98%
█▆▅▂▂▂▁ ▁
█████████▆▆▆▅▅▅▅▅▄▄▄▄▁▁▄▁▁▃▁▁▄▄▃▃▁▃▄▁▃▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅█▇ █
319 μs Histogram: log(frequency) by time 701 μs <
Memory estimate: 781.33 KiB, allocs estimate: 2.
julia> @benchmark scale_timeseries_turbo(data_100000)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 71.942 μs … 933.400 μs ┊ GC (min … max): 0.00% … 71.93%
Time (median): 87.926 μs ┊ GC (median): 0.00%
Time (mean ± σ): 100.082 μs ± 89.095 μs ┊ GC (mean ± σ): 11.63% ± 11.43%
▄█▃▁ ▁ ▁
████▇▄▁▁▁▁▁▁▁▁▃▄▆▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ █
71.9 μs Histogram: log(frequency) by time 764 μs <
Memory estimate: 781.33 KiB, allocs estimate: 2.
julia> @benchmark scale_timeseries_sequential(data_200000)
BenchmarkTools.Trial: 7153 samples with 1 evaluation.
Range (min … max): 637.692 μs … 2.277 ms ┊ GC (min … max): 0.00% … 65.01%
Time (median): 640.729 μs ┊ GC (median): 0.00%
Time (mean ± σ): 694.282 μs ± 184.965 μs ┊ GC (mean ± σ): 3.69% ± 8.68%
█▆▅▃▂▁ ▁
███████▇▅▆▆▄▄▅▁▁▁▁▁▁▆██▇▅▄▄▁▃▁▃▃▄▁▁▁▁▁▁▁▁▁▁▁▇█▇▇▅▅▄▃▄▄▁▁▁▄▄█▇ █
638 μs Histogram: log(frequency) by time 1.71 ms <
Memory estimate: 1.53 MiB, allocs estimate: 2.
julia> @benchmark scale_timeseries_turbo(data_200000)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 159.023 μs … 2.092 ms ┊ GC (min … max): 0.00% … 50.30%
Time (median): 176.559 μs ┊ GC (median): 0.00%
Time (mean ± σ): 230.513 μs ± 189.542 μs ┊ GC (mean ± σ): 11.86% ± 12.80%
█▇▅▄▄▃▂▂▁ ▁▁ ▂
██████████▇▅▅▃▁▄▁▃▁██▇▆▄▅▄▄▅▄▅▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁███▇▅▅▅▅▄▅▄▅█▇▇ █
159 μs Histogram: log(frequency) by time 1.22 ms <
Memory estimate: 1.53 MiB, allocs estimate: 2.