CirculantAttention
Documentation for CirculantAttention.
CirculantAttention.DistanceSimilarityCirculantAttention.DotSimilarityCirculantAttention.circulant_adjacencyCirculantAttention.circulant_attentionCirculantAttention.circulant_attentionCirculantAttention.circulant_mh_attentionCirculantAttention.circulant_mh_attentionCirculantAttention.circulant_similarityNNlib.softmax
CirculantAttention.DistanceSimilarity — TypeDistanceSimilarity()Used in circulant_attention, circulant_similarity, and circulant_adjacency to indicate use of distance similarity:
$S_{ij} = \frac{1}{2}\mathrm{sum}(\mathrm{abs2}, q[i] - k[j]).$
See also DotSimilarity.
CirculantAttention.DotSimilarity — TypeDotSimilarity()Used in circulant_attention, circulant_similarity, and circulant_adjacency to indicate use of dot-product similarity:
$S_{ij} = \mathrm{Real}(q[i]^H k[j]).$
See also DistanceSimilarity.
CirculantAttention.circulant_adjacency — Methodcirculant_adjacency(simfun::AbstractSimilarity, x, y, W::Int)Equivalent to (softmax ∘ circulant_similarity)(simfun, x, y, W).
See also circulant_similarity, NNlib.softmax.
CirculantAttention.circulant_attention — Methody = circulant_attention(A::Circulant, x::AbstractArray)
y = A ⊗ x # \otimesApplies circulant matrix A to x. See also circulant_adjacency.
CirculantAttention.circulant_attention — Methody, A = circulant_attention(simfun::AbstractSimilarity, q, k, v, W::Int)Perform circulant attention on y=Av, where A is a row-softmax normalized circulant-sparse attention matrix (A = rowsoftmax(S)). Each non-zero entry $S_{ij}$ is generated via the similarity function acting on the channel representations of q and k at (linearly indexed) pixels i, j ($S_{ij} = \mathrm{simfun}(q_i, k_j)$). Adjacency matrix $A$ is generated internal and returned as the second argument. Note: q and k are internally scaled by sqrt(sqrt(channels)) before being passed to circulant_adjacency.
See also circulant_adjacency, circulant_similarity, DotSimilarity, DistanceSimilarity.
CirculantAttention.circulant_mh_attention — Methody = circulant_mh_attention(A::Circulant, x::AbstractArray)
y = A ⨷ x # \OtimesApplies circulant matrix A (with channel dimension > 1) to x. See also circulant_attention, circulant_adjacency.
CirculantAttention.circulant_mh_attention — Methody, A = circulant_mh_attention(simfun::AbstractSimilarity, q, k, v, W::Int, nheads::Int)Performs circulant multi-head attention, i.e., performing circulant attetion of nheads-groups separately and concatenating the result along channels. The number of channels in q, k, v must be divisible by nheads. The returned adjacency matrix A will have size(A, 3) == nheads.
See also circulant_attention, DotSimilarity, DistanceSimilarity.
CirculantAttention.circulant_similarity — Methodcirculant_similarity(simfun::AbstractSimilarity, x, y, W::Int)Returns Circulant matrix with circulant-sparse data. Each non-zero S[i,j,b] is populated by simfun evaluated at the linearized pixel locations of x and y, i.e. S[i,j,b] = simfun(x[...,i,b], y[...,j,b], W) for max(i⃗, j⃗) ≤ W. The non-zero entrie locations are determined by the windowsize W and number of spatial dimensions in x and y.
See also DotSimilarity, DistanceSimilarity.
NNlib.softmax — MethodNNlib.softmax(A::Circulant)Row-wise softmax of Circulant matrix A.