CirculantAttention
Documentation for CirculantAttention.
CirculantAttention.DistanceSimilarity
CirculantAttention.DotSimilarity
CirculantAttention.circulant_adjacency
CirculantAttention.circulant_attention
CirculantAttention.circulant_attention
CirculantAttention.circulant_mh_attention
CirculantAttention.circulant_mh_attention
CirculantAttention.circulant_similarity
NNlib.softmax
CirculantAttention.DistanceSimilarity
— TypeDistanceSimilarity()
Used in circulant_attention
, circulant_similarity
, and circulant_adjacency
to indicate use of distance similarity:
$S_{ij} = \frac{1}{2}\mathrm{sum}(\mathrm{abs2}, q[i] - k[j]).$
See also DotSimilarity
.
CirculantAttention.DotSimilarity
— TypeDotSimilarity()
Used in circulant_attention
, circulant_similarity
, and circulant_adjacency
to indicate use of dot-product similarity:
$S_{ij} = \mathrm{Real}(q[i]^H k[j]).$
See also DistanceSimilarity
.
CirculantAttention.circulant_adjacency
— Methodcirculant_adjacency(simfun::AbstractSimilarity, x, y, W::Int)
Equivalent to (softmax ∘ circulant_similarity)(simfun, x, y, W)
.
See also circulant_similarity
, NNlib.softmax
.
CirculantAttention.circulant_attention
— Methody = circulant_attention(A::Circulant, x::AbstractArray)
y = A ⊗ x # \otimes
Applies circulant matrix A
to x
. See also circulant_adjacency
.
CirculantAttention.circulant_attention
— Methody, A = circulant_attention(simfun::AbstractSimilarity, q, k, v, W::Int)
Perform circulant attention on y=Av
, where A is a row-softmax normalized circulant-sparse attention matrix (A = rowsoftmax(S)). Each non-zero entry $S_{ij}$ is generated via the similarity function acting on the channel representations of q
and k
at (linearly indexed) pixels i
, j
($S_{ij} = \mathrm{simfun}(q_i, k_j)$). Adjacency matrix $A$ is generated internal and returned as the second argument. Note: q and k are internally scaled by sqrt(sqrt(channels))
before being passed to circulant_adjacency
.
See also circulant_adjacency
, circulant_similarity
, DotSimilarity
, DistanceSimilarity
.
CirculantAttention.circulant_mh_attention
— Methody = circulant_mh_attention(A::Circulant, x::AbstractArray)
y = A ⨷ x # \Otimes
Applies circulant matrix A
(with channel dimension > 1) to x
. See also circulant_attention
, circulant_adjacency
.
CirculantAttention.circulant_mh_attention
— Methody, A = circulant_mh_attention(simfun::AbstractSimilarity, q, k, v, W::Int, nheads::Int)
Performs circulant multi-head attention, i.e., performing circulant attetion of nheads
-groups separately and concatenating the result along channels. The number of channels in q
, k
, v
must be divisible by nheads
. The returned adjacency matrix A
will have size(A, 3) == nheads
.
See also circulant_attention
, DotSimilarity
, DistanceSimilarity
.
CirculantAttention.circulant_similarity
— Methodcirculant_similarity(simfun::AbstractSimilarity, x, y, W::Int)
Returns Circulant matrix with circulant-sparse data. Each non-zero S[i,j,b]
is populated by simfun
evaluated at the linearized pixel locations of x
and y
, i.e. S[i,j,b] = simfun(x[...,i,b], y[...,j,b], W)
for max(i⃗, j⃗) ≤ W. The non-zero entrie locations are determined by the windowsize W
and number of spatial dimensions in x
and y
.
See also DotSimilarity
, DistanceSimilarity
.
NNlib.softmax
— MethodNNlib.softmax(A::Circulant)
Row-wise softmax of Circulant matrix A
.