neighbors

KNN lookup management

MuyGPyS.neighbors.NN_Wrapper is an api for tasking several KNN libraries with the construction of lookup indexes that empower fast training and inference. The wrapper constructor expects the training features, the number of nearest neighbors, and a method string specifying which algorithm to use, as well as any additional kwargs used by the methods. Currently supported implementations include exact KNN using sklearn (“exact”) and approximate KNN using hnsw (“hnsw”).

class MuyGPyS.neighbors.NN_Wrapper(train, nn_count, nn_method='exact', **kwargs)[source]

Nearest Neighbors lookup datastructure wrapper.

Wraps the logic driving nearest neighbor data structure training and querying. Currently supports sklearn.neighbors.NearestNeighbors for exact computation and hnswlib.Index for approximate nearest neighbors.

An example constructing exact and approximate KNN data lookups with k = 10.

Example

>>> from MuyGPyS.neighors import NN_Wrapper
>>> train_features = load_train_features()
>>> nn_count = 10
>>> exact_nbrs_lookup = NN_Wrapper(
...         train_features, nn_count, nn_method="exact", algorithm="ball_tree"
... )
>>> approx_nbrs_lookup = NN_Wrapper(
...         train_features, nn_count, nn_method="hnsw", space="l2", M=16
... )
Parameters:
  • train (ndarray) – The full training data of shape (train_count, feature_count) that will construct the nearest neighbor query datastructure.

  • nn_count (int) – The number of nearest neighbors to return in queries.

  • nn_method (str) – Indicates which nearest neighbor algorithm should be used. Currently “exact” indicates sklearn.neighbors.NearestNeighbors, while “hnsw” indicates hnswlib.Index (requires installing MuyGPyS with the “hnswlib” extras flag).

  • kwargs – Additional kwargs used for lookup data structure construction. nn_method="exact" supports “radius”, “algorithm”, “leaf_size”, “metric”, “p”, “metric_params”, and “n_jobs” kwargs. nn_method="hnsw" supports “space”, “ef_construction”, “M”, and “random_seed” kwargs.

get_batch_nns(batch_indices)[source]

Get the non-self nearest neighbors for indices into the training data.

Find the nearest neighbors and associated distances for each specified index into the training data.

Example

>>> from MuyGPyS.neighbors import NN_Wrapper
>>> from numpy.random import choice
>>> train_features = load_train_features()
>>> nn_count = 10
>>> nbrs_lookup = NN_Wrapper(
...         train_features, nn_count, nn_method="exact", algorithm="ball_tree"
... )
>>> train_count, _ = train_features.shape
>>> batch_count = 50
>>> batch_indices = choice(train_count, batch_count, replace=False)
>>> nn_indices, nn_dists = nbrs_lookup.get_nns(batch_indices)
Parameters:

batch_indices (ndarray) – Indices into the training data of shape (batch_count,).

Return type:

Tuple[ndarray, ndarray]

Returns:

  • batch_nn_indices – Matrix of nearest neighbor indices of shape (batch_count, nn_count). Each row lists the nearest neighbor indices (self excluded) of the corresponding batch element.

  • batch_nn_dists (numpy.ndarray(int), shape=(batch_count, nn_count)) – Matrix of distances of shape (batch_count, nn_count). Each row lists the distance to the batch element of the corresponding element in batch_nn_indices.

get_nns(test)[source]

Get the nearest neighbors for each row of test dataset.

Find the nearest neighbors and associated distances for each element of the given test dataset. Here we assume that the test dataset is distinct from the train dataset used in the construction of the nearest neighbor lookup data structure.

Example

>>> from MuyGPyS.neighbors import NN_Wrapper
>>> train_features = load_train_features()
>>> test_features = load_test_features()
>>> nn_count = 10
>>> nbrs_lookup = NN_Wrapper(
...         train_features, nn_count, nn_method="exact", algorithm="ball_tree"
... )
>>> nn_indices, nn_dists = nbrs_lookup.get_nns(test_features)
Parameters:

test (ndarray) – Testing data matrix of shape (test_count, feature_count).

Return type:

Tuple[ndarray, ndarray]

Returns:

  • nn_indices – Matrix of nearest neighbor indices of shape (test_count, nn_count). Each row lists the nearest neighbor indices of the corresponding test element.

  • nn_dists – Matrix of distances of shape (test_count, nn_count). Each row lists the distance to the test element of the corresponding element in nn_indices.