classify

Resources and high-level API for a simple classification workflow.

make_classifier() is a high-level API for creating and training MuyGPyS.gp.muygps.MuyGPS objects for classification. make_multivariate_classifier() is a high-level API for creating and training MuyGPyS.gp.muygps.MultivariateMuyGPS objects for classification.

do_classify() is a high-level api for executing a simple, generic classification workflow given data. It calls the maker APIs above and classify_any().

MuyGPyS.examples.classify.classify_any(surrogate, test_features, train_features, train_nbrs_lookup, train_labels)[source]

Simulatneously predicts the surrogate regression means for each test item.

Parameters:
  • surrogate (Union[MuyGPS, MultivariateMuyGPS]) – Surrogate regressor.

  • test_features (ndarray) – Test observations of shape (test_count, feature_count).

  • train_features (ndarray) – Train observations of shape (train_count, feature_count).

  • train_nbrs_lookup (NN_Wrapper) – Trained nearest neighbor query data structure.

  • train_labels (ndarray) – One-hot encoding of class labels for all training data of shape (train_count, class_count).

Return type:

Tuple[ndarray, Dict[str, float]]

Returns:

  • predictions – The surrogate predictions of shape (test_count, class_count) for each test observation.

  • timing – Timing for the subroutines of this function.

MuyGPyS.examples.classify.do_classify(test_features, train_features, train_labels, nn_count=30, batch_count=200, loss_fn=<MuyGPyS.optimize.loss.LossFn object>, opt_fn=<MuyGPyS.optimize.chassis.OptimizeFn object>, k_kwargs={}, nn_kwargs={}, opt_kwargs={}, verbose=False)[source]

Convenience function for initializing a model and performing surrogate classification.

Expected parameters include keyword argument dicts specifying kernel parameters and nearest neighbor parameters. See the docstrings of the appropriate functions for specifics.

Example

>>> import numpy as np
>>> from MuyGPyS.examples.classify import do_classify
>>> from MuyGPyS.gp.deformation import F2, Isotropy
>>> from MuyGPyS.gp.hyperparameter import Parameter
>>> from MuyGPyS.gp.kernels import RBF
>>> from MuyGPyS.gp.noise import HomoscedasticNoise
>>> from MuyGPyS.optimize import Bayes_optimize
>>> train_features, train_responses = make_train()  # stand-in function
>>> test_features, test_responses = make_test()  # stand-in function
>>> nn_kwargs = {"nn_method": "exact", "algorithm": "ball_tree"}
>>> k_kwargs = {
...     "kernel": RBF(
...         deformation=Isotropy(
...             metric=F2,
...             length_scale=Parameter(0.5, (0.01, 1)),
...         ),
...     )
...     "noise": HomoscedasticNoise(1e-5),
... }
>>> muygps, nbrs_lookup, surrogate_predictions = do_classify(
...         test_features,
...         train_features,
...         train_responses,
...         nn_count=30,
...         batch_count=200,
...         loss_fn=cross_entropy_fn,
...         opt_fn=Bayes_optimize,
...         k_kwargs=k_kwargs,
...         nn_kwargs=nn_kwargs,
...         verbose=False,
... )
>>> predicted_labels = np.argmax(surrogate_predictions, axis=1)
>>> true_labels = np.argmax(test_features, axis=1)
>>> acc = np.mean(predicted_labels == true_labels)
>>> print(f"obtained accuracy {acc}")
obtained accuracy: 0.973...
Parameters:
  • test_features (ndarray) – A matrix of shape (test_count, feature_count) whose rows consist of observation vectors of the test data.

  • train_features (ndarray) – A matrix of shape (train_count, feature_count) whose rows consist of observation vectors of the train data.

  • train_labels (ndarray) – A matrix of shape (train_count, response_count) whose rows consist of label vectors for the training data.

  • nn_count (int) – The number of nearest neighbors to employ.

  • batch_count (int) – The batch size for hyperparameter optimization.

  • loss_fn (LossFn) – The loss functor to use in hyperparameter optimization. Ignored if all of the parameters specified by k_kwargs are fixed.

  • opt_fn (OptimizeFn) – The optimization functor to use in hyperparameter optimization. Ignored if all of the parameters specified by argument k_kwargs are fixed.

  • k_kwargs (Union[Dict, List[Dict], Tuple[Dict, ...]]) – Parameters for the kernel, possibly including kernel type, deformation function, noise and scale hyperparameter specifications, and specifications for kernel hyperparameters. If all of the hyperparameters are fixed or are not given optimization bounds, no optimization will occur. If "k_kwargs" is a list of such dicts, will create a multivariate classifier model.

  • nn_kwargs (Dict) – Parameters for the nearest neighbors wrapper. See MuyGPyS.neighbors.NN_Wrapper for the supported methods and their parameters.

  • opt_kwargs (Dict) – Parameters for the wrapped optimizer. See the docs of the corresponding library for supported parameters.

  • verbose (bool) – If True, print summary statistics.

Return type:

Tuple[Union[MuyGPS, MultivariateMuyGPS], NN_Wrapper, ndarray]

Returns:

  • muygps – A (possibly trained) MuyGPs object.

  • nbrs_lookup – A data structure supporting nearest neighbor queries into train_features.

  • surrogate_predictions – A matrix of shape (test_count, response_count) whose rows indicate the surrogate predictions of the model. The predicted classes are given by the indices of the largest elements of each row.

MuyGPyS.examples.classify.make_classifier(train_features, train_labels, nn_count=30, batch_count=200, loss_fn=<MuyGPyS.optimize.loss.LossFn object>, opt_fn=<MuyGPyS.optimize.chassis.OptimizeFn object>, k_kwargs={}, nn_kwargs={}, opt_kwargs={}, verbose=False)[source]

Convenience function for creating MuyGPyS functor and neighbor lookup data structure.

Expected parameters include keyword argument dicts specifying kernel parameters and nearest neighbor parameters. See the docstrings of the appropriate functions for specifics.

Example

>>> from MuyGPyS.examples.regress import make_classifier
>>> from MuyGPyS.gp.deformation import F2, Isotropy
>>> from MuyGPyS.gp.hyperparameter import Parameter
>>> from MuyGPyS.gp.kernels import RBF
>>> from MuyGPyS.gp.noise import HomoscedasticNoise
>>> from MuyGPyS.optimize import Bayes_optimize
>>> from MuyGPyS.examples.classify import make_classifier
>>> train_features, train_responses = make_train()  # stand-in function
>>> nn_kwargs = {"nn_method": "exact", "algorithm": "ball_tree"}
>>> k_kwargs = {
...     "kernel": RBF(
...         deformation=Isotropy(
...             metric=F2,
...             length_scale=Parameter(1.0, (1e-2, 1e2))
...         )
...     ),
...     "noise": HomoscedasticNoise(1e-5),
... }
>>> muygps, nbrs_lookup = make_classifier(
...         train_features,
...         train_responses,
...         nn_count=30,
...         batch_count=200,
...         loss_fn=cross_entropy_fn,
...         opt_fn=Bayes_optimize,
...         k_kwargs=k_kwargs,
...         nn_kwargs=nn_kwargs,
...         verbose=False,
... )
Parameters:
  • train_features (ndarray) – A matrix of shape (train_count, feature_count) whose rows consist of observation vectors of the train data.

  • train_labels (ndarray) – A matrix of shape (train_count, class_count) whose rows consist of one-hot class label vectors of the train data.

  • nn_count (int) – The number of nearest neighbors to employ.

  • batch_count (int) – The number of elements to sample batch for hyperparameter optimization.

  • loss_fn (LossFn) – The loss functor to use in hyperparameter optimization. Ignored if all of the parameters specified by argument k_kwargs are fixed.

  • opt_fn (OptimizeFn) – The optimization functor to use in hyperparameter optimization. Ignored if all of the parameters specified by argument k_kwargs are fixed.

  • k_kwargs (Dict) – Parameters for the kernel, possibly including kernel type, deformation function, noise and scale hyperparameter specifications, and specifications for kernel hyperparameters. See kernels for examples and requirements. If all of the hyperparameters are fixed or are not given optimization bounds, no optimization will occur.

  • nn_kwargs (Dict) – Parameters for the nearest neighbors wrapper. See MuyGPyS.neighbors.NN_Wrapper for the supported methods and their parameters.

  • opt_kwargs (Dict) – Parameters for the wrapped optimizer. See the docs of the corresponding library for supported parameters.

  • verbose (bool) – Boolean If True, print summary statistics.

Return type:

Tuple[MuyGPS, NN_Wrapper]

Returns:

  • muygps – A (possibly trained) MuyGPs object.

  • nbrs_lookup – A data structure supporting nearest neighbor queries into train_features.

MuyGPyS.examples.classify.make_multivariate_classifier(train_features, train_labels, nn_count=30, batch_count=200, loss_fn=<MuyGPyS.optimize.loss.LossFn object>, opt_fn=<MuyGPyS.optimize.chassis.OptimizeFn object>, k_args=[], nn_kwargs={}, opt_kwargs={}, verbose=False)[source]

Convenience function for creating MuyGPyS functor and neighbor lookup data structure.

Expected parameters include keyword argument dicts specifying kernel parameters and nearest neighbor parameters. See the docstrings of the appropriate functions for specifics.

Example

>>> from MuyGPyS.examples.classify import make_multivariate_classifier
>>> from MuyGPyS.gp.deformation import F2, Isotropy
>>> from MuyGPyS.gp.hyperparameter import Parameter
>>> from MuyGPyS.gp.kernels import RBF
>>> from MuyGPyS.gp.noise import HomoscedasticNoise
>>> from MuyGPyS.optimize import Bayes_optimize
>>> train_features, train_responses = make_train()  # stand-in function
>>> nn_kwargs = {"nn_method": "exact", "algorithm": "ball_tree"}
>>> k_args = [
...     {
...         "kernel": RBF(
...             deformation=Isotropy(
...                 metric=F2,
...                 length_scale=Parameter(0.5, (0.01, 1)),
...             ),
...         )
...         "noise": HomoscedasticNoise(1e-5),
...     },
...     {
...         "kernel": RBF(
...             deformation=Isotropy(
...                 metric=F2,
...                 length_scale=Parameter(0.5, (0.01, 1)),
...             ),
...         )
...         "noise": HomoscedasticNoise(1e-5),
...     },
... ]
>>> mmuygps, nbrs_lookup = make_multivariate_classifier(
...         train_features,
...         train_responses,
...         nn_count=30,
...         batch_count=200,
...         loss_fn=cross_entropy_fn,
...         opt_fn=Bayes_optimize,
...         k_args=k_args,
...         nn_kwargs=nn_kwargs,
...         verbose=False,
... )
Parameters:
  • train_features (ndarray) – A matrix of shape (train_count, feature_count) whose rows consist of observation vectors of the train data.

  • train_labels (ndarray) – A matrix of shape (train_count, class_count) whose rows consist of one-hot encoded label vectors of the train data.

  • nn_count (int) – The number of nearest neighbors to employ.

  • batch_count (int) – The number of elements to sample batch for hyperparameter optimization.

  • loss_fn (LossFn) – The loss functor to use in hyperparameter optimization. Ignored if all of the parameters specified by argument k_kwargs are fixed.

  • opt_fn (OptimizeFn) – The optimization functor to use in hyperparameter optimization. Ignored if all of the parameters specified by argument k_kwargs are fixed.

  • k_args (Union[List[Dict], Tuple[Dict, ...]]) – A list of response_count dicts containing kernel initialization keyword arguments. Each dict specifies parameters for the kernel, possibly including noise and scale hyperparameter specifications and specifications for specific kernel hyperparameters. If all of the hyperparameters are fixed or are not given optimization bounds, no optimization will occur.

  • nn_kwargs (Dict) – Parameters for the nearest neighbors wrapper. See MuyGPyS.neighbors.NN_Wrapper for the supported methods and their parameters.

  • opt_kwargs (Dict) – Parameters for the wrapped optimizer. See the docs of the corresponding library for supported parameters.

  • verbose (bool) – If True, print summary statistics.

Return type:

Tuple[MultivariateMuyGPS, NN_Wrapper]

Returns:

  • muygps – A (possibly trained) MuyGPs object.

  • nbrs_lookup – A data structure supporting nearest neighbor queries into train_features.