distance¶
Distance functions
Compute pairwise and crosswise distance tensors for the purposes of kernel construction.
See the following example computing the pairwise and crosswise distances between a batch of training data and their nearest neighbors.
Example
>>> from MuyGPyS.neighbors import NN_Wrapper
>>> from MuyGPyS.optimize.batch import sample_batch
>>> from MuyGPyS.gp.distance import crosswise_distances
>>> train_features = load_train_features()
>>> nn_count = 10
>>> nbrs_lookup = NN_Wrapper(
... train_features, nn_count, nn_method="exact", algorithm="ball_tree"
... )
>>> train_count, _ = train_features.shape
>>> batch_count = 50
>>> batch_indices, batch_nn_indices = sample_batch(
... nbrs_lookup, batch_count, train_count
... )
>>> pairwise_dists = pairwise_distances(
... train_features, batch_nn_inidices, metric="l2"
... )
>>> crosswise_dists = crosswise_distances(
... train_features,
... train_features,
... batch_indices,
... batch_nn_indices,
... metric="l2",
... )
)
See also the following example computing the crosswise distances between a test dataset and their nearest neighors in the training data.
Example
>>> from MuyGPyS.neighbors import NN_Wrapper
>>> from MuyGPyS.gp.distance import crosswise_distances
>>> train_features = load_train_features()
>>> test_features = load_test_features()
>>> nn_count = 10
>>> nbrs_lookup = NN_Wrapper(
... train_features, nn_count, nn_method="exact", algorithm="ball_tree"
... )
>>> nn_indices, nn_dists = nbrs_lookup.get_nns(test_features)
>>> test_count, _ = test_features.shape
>>> indices = np.arange(test_count)
>>> nn_indices, _ = nbrs_lookup.get_nns(test_features)
>>> pairwise_dists = pairwise_distances(
... train_features, nn_inidices, metric="l2"
... )
>>> crosswise_dists = crosswise_distances(
... test_features,
... train_features,
... indices,
... nn_indices,
... metric="l2"
... )
The helper functions MuyGPyS.gp.distance.make_regress_tensors()
and
MuyGPyS.gp.distance.make_train_tensors()
wrap these distances tensors and
also return the nearest neighbors sets’ training targets and (in the latter
case) the training targets of the training batch. These functions are convenient
as the distance and target tensors are usually needed together.
- MuyGPyS.gp.distance.crosswise_distances(data, nn_data, data_indices, nn_indices, metric='l2')[source]¶
Compute a matrix of distances between data and their nearest neighbors.
Takes full datasets of records of interest
data
and neighbor candidatesnn_data
and produces the distances between each element ofdata
indicated bydata_indices
and each of the nearest neighbors innn_data
as indicated by the corresponding rows ofnn_indices
.data
andnn_data
can refer to the same dataset.See the following example computing the crosswise distances between a batch of training data and their nearest neighbors.
- Parameters
data (
ndarray
) – The data matrix of shape(data_count, feature_count)
containing batch elements.nn_data (
ndarray
) – The data matrix of shape(candidate_count, feature_count)
containing the universe of candidate neighbors for the batch elements. Might be the same asdata
.indices – An integral vector of shape
(batch_count,)
containing the indices of the batch.nn_indices (
ndarray
) – An integral matrix of shape (batch_count, nn_count) listing the nearest neighbor indices for the batch of data points.metric (
str
) – The name of the metric to use in order to form distances. Supported values arel2
,F2
,ip
(inner product, a distance only if data is normalized to the unit hypersphere), andcosine
.
- Return type
ndarray
- Returns
A matrix of shape
(batch_count, nn_count)
whose rows list the distance of the corresponding batch element to each of its nearest neighbors.
- MuyGPyS.gp.distance.make_regress_tensors(metric, batch_indices, batch_nn_indices, test_features, train_features, train_targets)[source]¶
Create the distance and target tensors for regression.
Creates the
crosswise_dists
,pairwise_dists
andbatch_nn_targets
tensors required byMuyGPyS.gp.MuyGPyS.regress()
.- Parameters
metric (
str
) – The metric to be used to compute distances.batch_indices (
ndarray
) – A vector of integers of shape(batch_count,)
identifying the training batch of observations to be approximated.batch_nn_indices (
ndarray
) – A matrix of integers of shape(batch_count, nn_count)
listing the nearest neighbor indices for all observations in the batch.test_features (
ndarray
) – The full floating point testing data matrix of shape(test_count, feature_count)
.train_features (
ndarray
) – The full floating point training data matrix of shape(train_count, feature_count)
.train_targets (
ndarray
) – A matrix of shape(train_count, feature_count)
whose rows are vector-valued responses for each training element.
- Return type
Tuple
[ndarray
,ndarray
,ndarray
]- Returns
crosswise_dists – A matrix of shape
(batch_count, nn_count)
whose rows list the distance of the corresponding batch element to each of its nearest neighbors.pairwise_dists – A tensor of shape
(batch_count, nn_count, nn_count,)
whose latter two dimensions contain square matrices containing the pairwise distances between the nearest neighbors of the batch elements.batch_nn_targets – Tensor of floats of shape
(batch_count, nn_count, response_count)
containing the expected response for each nearest neighbor of each batch element.
- MuyGPyS.gp.distance.make_train_tensors(metric, batch_indices, batch_nn_indices, train_features, train_targets)[source]¶
Create the distance and target tensors needed for training.
Similar to
make_regress_tensors()
but returns the additionalbatch_targets
matrix, which is only defined for a batch of training data.- Parameters
metric (
str
) – The metric to be used to compute distances.batch_indices (
ndarray
) – A vector of integers of shape(batch_count,)
identifying the training batch of observations to be approximated.batch_nn_indices (
ndarray
) – A matrix of integers of shape(batch_count, nn_count)
listing the nearest neighbor indices for all observations in the batch.train_features (
ndarray
) – The full floating point training data matrix of shape(train_count, feature_count)
.train_targets (
ndarray
) – A matrix of shape(train_count, feature_count)
whose rows are vector-valued responses for each training element.
- Return type
Tuple
[ndarray
,ndarray
,ndarray
,ndarray
]- Returns
crosswise_dists – A matrix of shape
(batch_count, nn_count)
whose rows list the distance of the corresponding batch element to each of its nearest neighbors.pairwise_dists – A tensor of shape
(batch_count, nn_count, nn_count,)
whose latter two dimensions contain square matrices containing the pairwise distances between the nearest neighbors of the batch elements.batch_targets – Matrix of floats of shape
(batch_count, response_count)
whose rows give the expected response for each batch element.batch_nn_targets – Tensor of floats of shape
(batch_count, nn_count, response_count)
containing the expected response for each nearest neighbor of each batch element.
- MuyGPyS.gp.distance.pairwise_distances(data, nn_indices, metric='l2')[source]¶
Compute a tensor of pairwise distances among sets of nearest neighbors.
Takes a full dataset of records of interest
data
and produces the pairwise distances between the elements indicated by each row ofnn_indices
.- Parameters
data (
ndarray
) – The data matrix of shape(batch_count, feature_count)
containing batch elements.nn_indices (
ndarray
) – An integral matrix of shape (batch_count, nn_count) listing the nearest neighbor indices for the batch of data points.metric (
str
) – The name of the metric to use in order to form distances. Supported values arel2
,F2
,ip
(inner product, a distance only if data is normalized to the unit hypersphere), andcosine
.
- Return type
ndarray
- Returns
A tensor of shape
(batch_count, nn_count, nn_count,)
whose latter two dimensions contain square matrices containing the pairwise distances between the nearest neighbors of the batch elements.