CirculantAttention

Documentation for CirculantAttention.

CirculantAttention.circulant_attentionMethod
y, A = circulant_attention(simfun::AbstractSimilarity, q, k, v, W::Int)

Perform circulant attention on y=Av, where A is a row-softmax normalized circulant-sparse attention matrix (A = rowsoftmax(S)). Each non-zero entry $S_{ij}$ is generated via the similarity function acting on the channel representations of q and k at (linearly indexed) pixels i, j ($S_{ij} = \mathrm{simfun}(q_i, k_j)$). Adjacency matrix $A$ is generated internal and returned as the second argument. Note: q and k are internally scaled by sqrt(sqrt(channels)) before being passed to circulant_adjacency.

See also circulant_adjacency, circulant_similarity, DotSimilarity, DistanceSimilarity.

source
CirculantAttention.circulant_mh_attentionMethod
y, A = circulant_mh_attention(simfun::AbstractSimilarity, q, k, v, W::Int, nheads::Int)

Performs circulant multi-head attention, i.e., performing circulant attetion of nheads-groups separately and concatenating the result along channels. The number of channels in q, k, v must be divisible by nheads. The returned adjacency matrix A will have size(A, 3) == nheads.

See also circulant_attention, DotSimilarity, DistanceSimilarity.

source
CirculantAttention.circulant_similarityMethod
circulant_similarity(simfun::AbstractSimilarity, x, y, W::Int)

Returns Circulant matrix with circulant-sparse data. Each non-zero S[i,j,b] is populated by simfun evaluated at the linearized pixel locations of x and y, i.e. S[i,j,b] = simfun(x[...,i,b], y[...,j,b], W) for max(i⃗, j⃗) ≤ W. The non-zero entrie locations are determined by the windowsize W and number of spatial dimensions in x and y.

See also DotSimilarity, DistanceSimilarity.

source
NNlib.softmaxMethod
NNlib.softmax(A::Circulant)

Row-wise softmax of Circulant matrix A.

source