neighbors
KNN lookup management
MuyGPyS.neighbors.NN_Wrapper
is an api for tasking several KNN libraries with
the construction of lookup indexes that empower fast training and inference.
The wrapper constructor expects the training features, the number of nearest
neighbors, and a method string specifying which algorithm to use, as well as any
additional kwargs used by the methods.
Currently supported implementations include exact KNN using
sklearn
(“exact”) and approximate KNN using hnsw
(“hnsw”).
- class MuyGPyS.neighbors.NN_Wrapper(train, nn_count, nn_method='exact', **kwargs)[source]
Nearest Neighbors lookup datastructure wrapper.
Wraps the logic driving nearest neighbor data structure training and querying. Currently supports
sklearn.neighbors.NearestNeighbors
for exact computation andhnswlib.Index
for approximate nearest neighbors.An example constructing exact and approximate KNN data lookups with k = 10.
Example
>>> from MuyGPyS.neighors import NN_Wrapper >>> train_features = load_train_features() >>> nn_count = 10 >>> exact_nbrs_lookup = NN_Wrapper( ... train_features, nn_count, nn_method="exact", algorithm="ball_tree" ... ) >>> approx_nbrs_lookup = NN_Wrapper( ... train_features, nn_count, nn_method="hnsw", space="l2", M=16 ... )
- Parameters:
train (
ndarray
) – The full training data of shape(train_count, feature_count)
that will construct the nearest neighbor query datastructure.nn_count (
int
) – The number of nearest neighbors to return in queries.nn_method (
str
) – Indicates which nearest neighbor algorithm should be used. Currently “exact” indicatessklearn.neighbors.NearestNeighbors
, while “hnsw” indicateshnswlib.Index
(requires installing MuyGPyS with the “hnswlib” extras flag).kwargs – Additional kwargs used for lookup data structure construction.
nn_method="exact"
supports “radius”, “algorithm”, “leaf_size”, “metric”, “p”, “metric_params”, and “n_jobs” kwargs.nn_method="hnsw"
supports “space”, “ef_construction”, “M”, and “random_seed” kwargs.
- get_batch_nns(batch_indices)[source]
Get the non-self nearest neighbors for indices into the training data.
Find the nearest neighbors and associated distances for each specified index into the training data.
Example
>>> from MuyGPyS.neighbors import NN_Wrapper >>> from numpy.random import choice >>> train_features = load_train_features() >>> nn_count = 10 >>> nbrs_lookup = NN_Wrapper( ... train_features, nn_count, nn_method="exact", algorithm="ball_tree" ... ) >>> train_count, _ = train_features.shape >>> batch_count = 50 >>> batch_indices = choice(train_count, batch_count, replace=False) >>> nn_indices, nn_dists = nbrs_lookup.get_nns(batch_indices)
- Parameters:
batch_indices (
ndarray
) – Indices into the training data of shape(batch_count,)
.- Return type:
Tuple
[ndarray
,ndarray
]- Returns:
batch_nn_indices – Matrix of nearest neighbor indices of shape
(batch_count, nn_count)
. Each row lists the nearest neighbor indices (self excluded) of the corresponding batch element.batch_nn_dists (numpy.ndarray(int), shape=(batch_count, nn_count)) – Matrix of distances of shape
(batch_count, nn_count)
. Each row lists the distance to the batch element of the corresponding element inbatch_nn_indices
.
- get_nns(test)[source]
Get the nearest neighbors for each row of
test
dataset.Find the nearest neighbors and associated distances for each element of the given test dataset. Here we assume that the test dataset is distinct from the train dataset used in the construction of the nearest neighbor lookup data structure.
Example
>>> from MuyGPyS.neighbors import NN_Wrapper >>> train_features = load_train_features() >>> test_features = load_test_features() >>> nn_count = 10 >>> nbrs_lookup = NN_Wrapper( ... train_features, nn_count, nn_method="exact", algorithm="ball_tree" ... ) >>> nn_indices, nn_dists = nbrs_lookup.get_nns(test_features)
- Parameters:
test (
ndarray
) – Testing data matrix of shape(test_count, feature_count)
.- Return type:
Tuple
[ndarray
,ndarray
]- Returns:
nn_indices – Matrix of nearest neighbor indices of shape
(test_count, nn_count)
. Each row lists the nearest neighbor indices of the corresponding test element.nn_dists – Matrix of distances of shape
(test_count, nn_count)
. Each row lists the distance to the test element of the corresponding element innn_indices
.