Index#

class scio.scores.utils.Index(*, dim, metric)[source]#

Bases: object

Practical faiss index wrapper.

Eases the use of nearest neighbors search library faiss.

Parameters:
  • dim (int) – Dimensionality of the data space.

  • metric (IndexMetricLike) – See IndexMetric.

Notes

The current implementation uses faiss-cpu, which may involve a GPU > CPU > GPU bottleneck.

To avoid a dtype conversion bottleneck, use float32 data.

Current implementation only supports flat indexes.

Useful methods defined here

add(samples)

Add reference samples to the index.

remove_ids(ids)

Remove selected reference samples from the index.

search(samples, k, *[, self_query])

Run nearest neighbors search.

Useful attributes defined here

D_is_similarity

Whether D is a similarity measure.

dim

Dimensionality of the data space.

metric

See IndexMetric.

ntotal

Number of samples in reference population.

add(samples)[source]#

Add reference samples to the index.

remove_ids(ids)[source]#

Remove selected reference samples from the index.

Parameters:

ids (Tensor) – Indices to remove from reference population.

search(samples, k, *, self_query=False)[source]#

Run nearest neighbors search.

Parameters:
  • samples (Tensor) – The query samples.

  • k (int) – Number of neighbors to look up.

  • self_query (bool) – Set to True if index was built using exactly samples, unshuffled. In this case, if a sample is itself amongst its closest neighbors (it is not always the case, e.g. inner product without prior normalization), it is excluded from the result. Requires looking up k + 1 neighbors. Defaults to False.

Returns:

D, I (tuple[Tensor, Tensor]) – Result from faiss.Index.search() with postprocessing, including torch conversion back to samples’ device (and dtype for D), with self removal if required.

Note

D is the squared distance for \(L^2\) metric, and (decreasing) similarity for inner product metric.