classify¶

Resources and high-level API for a simple classification workflow.

make_classifier() is a high-level API for creating and training MuyGPyS.gp.muygps.MuyGPS objects for classification. make_multivariate_classifier() is a high-level API for creating and training : class:MuyGPyS.gp.muygps.MultivariateMuyGPS objects for classification.

do_classify() is a high-level api for executing a simple, generic classification workflow given data. It calls the maker APIs above and classify_any().

MuyGPyS.examples.classify.classify_any(surrogate, test_features, train_features, train_nbrs_lookup, train_labels)[source]¶

Simulatneously predicts the surrogate regression means for each test item.

Parameters

surrogate (Union[MuyGPS, MultivariateMuyGPS]) – Surrogate regressor.
test_features (ndarray) – Test observations of shape (test_count, feature_count).
train_features (ndarray) – Train observations of shape (train_count, feature_count).
train_nbrs_lookup (NN_Wrapper) – Trained nearest neighbor query data structure.
train_labels (ndarray) – One-hot encoding of class labels for all training data of shape (train_count, class_count).

Return type

Tuple[ndarray, Dict[str, float]]

Returns

predictions – The surrogate predictions of shape (test_count, class_count) for each test observation.
timing – Timing for the subroutines of this function.

MuyGPyS.examples.classify.do_classify(test_features, train_features, train_labels, nn_count=30, batch_count=200, loss_method='log', kern=None, k_kwargs={}, nn_kwargs={}, return_distances=False, verbose=False)[source]¶

Convenience function for initializing a model and performing surrogate classification.

Expected parameters include keyword argument dicts specifying kernel parameters and nearest neighbor parameters. See the docstrings of the appropriate functions for specifics.

Example

>>> import numpy as np
>>> from MuyGPyS.testing.test_utils import _make_gaussian_data
>>> from MuyGPyS.examples.regress import do_classify
>>> train, test  = _make_gaussian_dict(10000, 100, 100, 10, categorial=True)
>>> nn_kwargs = {"nn_method": "exact", "algorithm": "ball_tree"}
>>> k_kwargs = {
...         "kern": "rbf",
...         "metric": "F2",
...         "eps": {"val": 1e-5},
...         "length_scale": {"val": 1.0, "bounds": (1e-2, 1e2)},
... }
>>> muygps, nbrs_lookup, surrogate_predictions = do_classify(
...         test['input'],
...         train['input'],
...         train['output'],
...         nn_count=30,
...         batch_count=200,
...         loss_method="log",
...         k_kwargs=k_kwargs,
...         nn_kwargs=nn_kwargs,
...         verbose=False,
... )
>>> # Can alternately return distance tensors for reuse
>>> muygps, nbrs_lookup, surrogate_predictions = do_classify(
...         test['input'],
...         train['input'],
...         train['output'],
...         nn_count=30,
...         batch_count=200,
...         loss_method="log",
...         k_kwargs=k_kwargs,
...         nn_kwargs=nn_kwargs,
...         return_distances=return_distances,
...         verbose=False,
... )
>>> predicted_labels = np.argmax(surrogate_predictions, axis=1)
>>> true_labels = np.argmax(test['output'], axis=1)
>>> acc = np.mean(predicted_labels == true_labels)
>>> print(f"obtained accuracy {acc}")
obtained accuracy: 0.973...

Parameters

test_features (ndarray) – A matrix of shape (test_count, feature_count) whose rows consist of observation vectors of the test data.
train_features (ndarray) – A matrix of shape (train_count, feature_count) whose rows consist of observation vectors of the train data.
train_labels (ndarray) – A matrix of shape (train_count, response_count) whose rows consist of label vectors for the training data.
train_features – A matrix of shape (train_count, feature_count) whose rows consist of observation vectors of the testing data.
nn_count (int) – The number of nearest neighbors to employ.
batch_count (int) – The batch size for hyperparameter optimization.
loss_method (str) – The loss method to use in hyperparameter optimization. Ignored if all of the parameters specified by k_kwargs are fixed. Currently supports only "log" (also known as "cross_entropy") and "mse" for classification.
kern (Optional[str]) – The kernel function to be used. Only relevant for multivariate case where k_kwargs is a list of hyperparameter dicts. Currently supports only "matern" and "rbf".
k_kwargs (Union[Dict, List[Dict], Tuple[Dict, …]]) – Parameters for the kernel, possibly including kernel type, distance metric, epsilon and sigma hyperparameter specifications, and specifications for kernel hyperparameters. If all of the hyperparameters are fixed or are not given optimization bounds, no optimization will occur. If "kern" is specified and "k_kwargs" is a list of such dicts, will create a multivariate regressor model.
nn_kwargs (Dict) – Parameters for the nearest neighbors wrapper. See MuyGPyS.neighbors.NN_Wrapper for the supported methods and their parameters.
return_distances (bool) – If True and any training occurs, returns a (batch_count, nn_count) matrix containing the crosswise distances between the batch’s elements and their nearest neighbor sets and a (batch_count, nn_count, nn_count) matrix containing the pairwise distances between the batch’s nearest neighbor sets.
verbose (bool) – If True, print summary statistics.

Return type

Union[Tuple[Union[MuyGPS, MultivariateMuyGPS], NN_Wrapper, ndarray], Tuple[Union[MuyGPS, MultivariateMuyGPS], NN_Wrapper, ndarray, ndarray, ndarray]]

Returns

muygps – A (possibly trained) MuyGPs object.
nbrs_lookup – A data structure supporting nearest neighbor queries into train_features.
surrogate_predictions – A matrix of shape (test_count, response_count) whose rows indicate the surrogate predictions of the model. The predicted classes are given by the indices of the largest elements of each row.

MuyGPyS.examples.classify.make_classifier(train_features, train_labels, nn_count=30, batch_count=200, loss_method='log', k_kwargs={}, nn_kwargs={}, return_distances=False, verbose=False)[source]¶

Convenience function for creating MuyGPyS functor and neighbor lookup data structure.

Expected parameters include keyword argument dicts specifying kernel parameters and nearest neighbor parameters. See the docstrings of the appropriate functions for specifics.

Example

>>> from MuyGPyS.testing.test_utils import _make_gaussian_data
>>> from MuyGPyS.examples.regress import make_regressor
>>> train = _make_gaussian_dict(10000, 100, 10, categorial=True)
>>> nn_kwargs = {"nn_method": "exact", "algorithm": "ball_tree"}
>>> k_kwargs = {
...         "kern": "rbf",
...         "metric": "F2",
...         "eps": {"val": 1e-5},
...         "length_scale": {"val": 1.0, "bounds": (1e-2, 1e2)},
... }
>>> muygps, nbrs_lookup = make_classifier(
...         train['input'],
...         train['output'],
...         nn_count=30,
...         batch_count=200,
...         loss_method="log",
...         k_kwargs=k_kwargs,
...         nn_kwargs=nn_kwargs,
...         verbose=False,
... )
>>> # Can alternately return distance tensors for reuse
>>> muygps, nbrs_lookup = make_classifier(
...         train['input'],
...         train['output'],
...         nn_count=30,
...         batch_count=200,
...         loss_method="log",
...         k_kwargs=k_kwargs,
...         nn_kwargs=nn_kwargs,
...         return_distances=True,
...         verbose=False,
... )

Parameters

train_features (ndarray) – A matrix of shape (train_count, feature_count) whose rows consist of observation vectors of the train data.
train_labels (ndarray) – A matrix of shape (train_count, class_count) whose rows consist of one-hot class label vectors of the train data.
nn_count (int) – The number of nearest neighbors to employ.
batch_count (int) – The number of elements to sample batch for hyperparameter optimization.
loss_method (str) – The loss method to use in hyperparameter optimization. Ignored if all of the parameters specified by argument k_kwargs are fixed. Currently supports only "log" (or "cross-entropy") and "mse" for classification.
k_kwargs (Dict) – Parameters for the kernel, possibly including kernel type, distance metric, epsilon and sigma hyperparameter specifications, and specifications for kernel hyperparameters. See kernels for examples and requirements. If all of the hyperparameters are fixed or are not given optimization bounds, no optimization will occur.
nn_kwargs (Dict) – Parameters for the nearest neighbors wrapper. See MuyGPyS.neighbors.NN_Wrapper for the supported methods and their parameters.
return_distances (bool) – If True and any training occurs, returns a (batch_count, nn_count) matrix containing the crosswise distances between the batch’s elements and their nearest neighbor sets and a (batch_count, nn_count, nn_count) matrix containing the pairwise distances between the batch’s nearest neighbor sets.
verbose (bool) – Boolean If True, print summary statistics.

Return type

Union[Tuple[MuyGPS, NN_Wrapper], Tuple[MuyGPS, NN_Wrapper, ndarray, ndarray]]

Returns

muygps – A (possibly trained) MuyGPs object.
nbrs_lookup – A data structure supporting nearest neighbor queries into train_features.
crosswise_dists – A matrix of shape (batch_count, nn_count) whose rows list the distance of the corresponding batch element to each of its nearest neighbors. Only returned if return_distances is True.
pairwise_dists – A tensor of shape (batch_count, nn_count, nn_count,) whose latter two dimensions contain square matrices containing the pairwise distances between the nearest neighbors of the batch elements. Only returned if return_distances is True.

MuyGPyS.examples.classify.make_multivariate_classifier(train_features, train_labels, nn_count=30, batch_count=200, loss_method='mse', kern='matern', k_args=[], nn_kwargs={}, return_distances=False, verbose=False)[source]¶

Convenience function for creating MuyGPyS functor and neighbor lookup data structure.

Expected parameters include keyword argument dicts specifying kernel parameters and nearest neighbor parameters. See the docstrings of the appropriate functions for specifics.

Example

>>> from MuyGPyS.testing.test_utils import _make_gaussian_data
>>> from MuyGPyS.examples.regress import make_regressor
>>> train = _make_gaussian_dict(10000, 100, 10, categorial=True)
>>> nn_kwargs = {"nn_method": "exact", "algorithm": "ball_tree"}
>>> k_args = [
...         {
...             "length_scale": {"val": 1.0, "bounds": (1e-2, 1e2)}
...             "eps": {"val": 1e-5},
...         },
...         {
...             "length_scale": {"val": 1.5, "bounds": (1e-2, 1e2)}
...             "eps": {"val": 1e-5},
...         },
... ]
>>> mmuygps, nbrs_lookup = make_multivariate_classifier(
...         train['input'],
...         train['output'],
...         nn_count=30,
...         batch_count=200,
...         loss_method="mse",
...         kern="rbf",
...         k_args=k_args,
...         nn_kwargs=nn_kwargs,
...         verbose=False,
... )
>>> # Can alternately return distance tensors for reuse
>>> mmuygps, nbrs_lookup = make_multivariate_classifier(
...         train['input'],
...         train['output'],
...         nn_count=30,
...         batch_count=200,
...         loss_method="mse",
...         kern="rbf",
...         k_args=k_args,
...         nn_kwargs=nn_kwargs,
...         return_distances=return_distances,
...         verbose=False,
... )

Parameters

train_features (ndarray) – A matrix of shape (train_count, feature_count) whose rows consist of observation vectors of the train data.
train_labels (ndarray) – A matrix of shape (train_count, class_count) whose rows consist of one-hot encoded label vectors of the train data.
nn_count (int) – The number of nearest neighbors to employ.
batch_count (int) – The number of elements to sample batch for hyperparameter optimization.
loss_method (str) – The loss method to use in hyperparameter optimization. Ignored if all of the parameters specified by argument k_kwargs are fixed. Currently supports only "mse" for regression.
kern (str) – The kernel function to be used. See kernels for details.
k_args (Union[List[Dict], Tuple[Dict, …]]) – A list of response_count dicts containing kernel initialization keyword arguments. Each dict specifies parameters for the kernel, possibly including epsilon and sigma hyperparameter specifications and specifications for specific kernel hyperparameters. If all of the hyperparameters are fixed or are not given optimization bounds, no optimization will occur.
nn_kwargs (Dict) – Parameters for the nearest neighbors wrapper. See MuyGPyS.neighbors.NN_Wrapper for the supported methods and their parameters.
return_distances (bool) – If True and any training occurs, returns a (batch_count, nn_count) matrix containing the crosswise distances between the batch’s elements and their nearest neighbor sets and a (batch_count, nn_count, nn_count) matrix containing the pairwise distances between the batch’s nearest neighbor sets.
verbose (bool) – If True, print summary statistics.

Return type

Union[Tuple[MultivariateMuyGPS, NN_Wrapper], Tuple[MultivariateMuyGPS, NN_Wrapper, ndarray, ndarray]]

Returns

muygps – A (possibly trained) MuyGPs object.
nbrs_lookup – A data structure supporting nearest neighbor queries into train_features.
crosswise_dists – A matrix of shape (batch_count, nn_count) whose rows list the distance of the corresponding batch element to each of its nearest neighbors. Only returned if return_distances is True.
pairwise_dists – A tensor of shape (batch_count, nn_count, nn_count,) whose latter two dimensions contain square matrices containing the pairwise distances between the nearest neighbors of the batch elements. Only returned if return_distances is True.