regress

Resources and high-level API for a simple regression workflow.

make_regressor() is a high-level API for creating and training MuyGPyS.gp.muygps.MuyGPS objects for regression. make_multivariate_regressor() is a high-level API for creating and training MuyGPyS.gp.muygps.MultivariateMuyGPS objects for regression.

do_regress() is a high-level api for executing a simple, generic regression workflow given data. It calls the maker APIs above and regress_any().

MuyGPyS.examples.regress.do_regress(test_features, train_features, train_targets, nn_count=30, batch_count=200, loss_fn=<MuyGPyS.optimize.loss.LossFn object>, opt_fn=<MuyGPyS.optimize.chassis.OptimizeFn object>, k_kwargs={}, nn_kwargs={}, opt_kwargs={}, verbose=False)[source]

Convenience function initializing a model and performing regression.

Expected parameters include keyword argument dicts specifying kernel parameters and nearest neighbor parameters. See the docstrings of the appropriate functions for specifics.

Also supports workflows relying upon multivariate models. In order to create a multivariate model, pass a list of hyperparameter dicts to k_kwargs.

Example

>>> from MuyGPyS.examples.regress import do_regress
>>> from MuyGPyS.gp.deformation import F2, Isotropy
>>> from MuyGPyS.gp.hyperparameter import Parameter
>>> from MuyGPyS.gp.hyperparameter import AnalyticScale
>>> from MuyGPyS.gp.kernels import RBF
>>> from MuyGPyS.gp.noise import HomoscedasticNoise
>>> from MuyGPyS.examples.regress import do_regress
>>> from MuyGPyS.optimize import Bayes_optimize
>>> from MuyGPyS.optimize.objective import mse_fn
>>> train_features, train_responses = make_train()  # stand-in function
>>> test_features, test_responses = make_test()  # stand-in function
>>> nn_kwargs = {"nn_method": "exact", "algorithm": "ball_tree"}
>>> k_kwargs = {
...     "kernel": RBF(
...         deformation=Isotropy(
...             metric=F2,
...             length_scale=Parameter(1.0, (1e-2, 1e2))
...         )
...     ),
...     "noise": HomoscedasticNoise(1e-5),
...     "scale": AnalyticScale(),
... }
>>> muygps, nbrs_lookup, predictions, variance = do_regress(
...         test_features,
...         train_features,
...         train_responses,
...         nn_count=30,
...         batch_count=200,
...         loss_fn=lool_fn,
...         opt_fn=Bayes_optimize,
...         k_kwargs=k_kwargs,
...         nn_kwargs=nn_kwargs,
...         verbose=False,
... )
>>> mse = mse_fn(test_responses, predictions)
>>> print(f"obtained mse: {mse}")
obtained mse: 0.20842...
Parameters:
  • test_features (ndarray) – A matrix of shape (test_count, feature_count) whose rows consist of observation vectors of the test data.

  • train_features (ndarray) – A matrix of shape (train_count, feature_count) whose rows consist of observation vectors of the train data.

  • train_targets (ndarray) – A matrix of shape (train_count, response_count) whose rows consist of response vectors of the train data.

  • nn_count (int) – The number of nearest neighbors to employ.

  • batch_count (int) – The number of elements to sample batch for hyperparameter optimization.

  • loss_fn (LossFn) – The loss functor to use in hyperparameter optimization. Ignored if all of the parameters specified by argument k_kwargs are fixed.

  • opt_fn (OptimizeFn) – The optimization functor to use in hyperparameter optimization. Ignored if all of the parameters specified by argument k_kwargs are fixed.

  • k_kwargs (Union[Dict, List[Dict], Tuple[Dict, ...]]) – If given a list or tuple of length response_count, assume that the elements are dicts containing kernel initialization keyword arguments for the creation of a multivariate model (see make_multivariate_regressor()). If given a dict, assume that the elements are keyword arguments to a MuyGPs model (see make_regressor()).

  • nn_kwargs (Dict) – Parameters for the nearest neighbors wrapper. See MuyGPyS.neighbors.NN_Wrapper for the supported methods and their parameters.

  • opt_kwargs (Dict) – Parameters for the wrapped optimizer. See the docs of the corresponding library for supported parameters.

  • verbose (bool) – If True, print summary statistics.

Return type:

Tuple[Union[MuyGPS, MultivariateMuyGPS], NN_Wrapper, ndarray, ndarray]

Returns:

  • muygps – A (possibly trained) MuyGPs object.

  • nbrs_lookup – A data structure supporting nearest neighbor queries into train_features.

  • predictions – The predicted response associated with each test observation.

  • variance – Estimated (test_count, response_count) posterior variance of each test prediction.

MuyGPyS.examples.regress.make_multivariate_regressor(train_features, train_targets, nn_count=30, batch_count=200, loss_fn=<MuyGPyS.optimize.loss.LossFn object>, opt_fn=<MuyGPyS.optimize.chassis.OptimizeFn object>, k_args=[], nn_kwargs={}, opt_kwargs={}, verbose=False)[source]

Convenience function for creating a Multivariate MuyGPyS functor and neighbor lookup data structure.

Expected parameters include a list of keyword argument dicts specifying kernel parameters and a dict listing nearest neighbor parameters. See the docstrings of the appropriate functions for specifics.

Example

>>> from MuyGPyS.examples.regress import make_multivariate_regressor
>>> from MuyGPyS.gp.deformation import F2, Isotropy
>>> from MuyGPyS.gp.hyperparameter import Parameter
>>> from MuyGPyS.gp.hyperparameter import AnalyticScale
>>> from MuyGPyS.gp.kernels import RBF
>>> from MuyGPyS.gp.noise import HomoscedasticNoise
>>> from MuyGPyS.optimize import Bayes_optimize
>>> train_features, train_responses = make_train()  # stand-in function
>>> nn_kwargs = {"nn_method": "exact", "algorithm": "ball_tree"}
>>> k_args = [
...         {
...             "kernel": RBF(
...                 deformation=Isotropy(
...                     metric=F2,
...                     length_scale=Parameter(1.0, (1e-2, 1e2))
...                 )
...             ),
...             "noise": HomoscedasticNoise(1e-5),
...             "scale": AnalyticScale(),
...         },
...         {
...             "kernel": RBF(
...                 deformation=Isotropy(
...                     metric=F2,
...                     length_scale=Parameter(1.0, (1e-2, 1e2))
...                 )
...             ),
...             "noise": HomoscedasticNoise(1e-5),
...             "scale": AnalyticScale(),
...         },
... ]
>>> mmuygps, nbrs_lookup = make_multivariate_regressor(
...         train_features,
...         train_responses,
...         nn_count=30,
...         batch_count=200,
...         loss_fn=lool_fn,
...         opt_fn=Bayes_optimize,
...         k_args=k_args,
...         nn_kwargs=nn_kwargs,
...         verbose=False,
... )
Parameters:
  • train_features (ndarray) – A matrix of shape (train_count, feature_count) whose rows consist of observation vectors of the train data.

  • train_targets (ndarray) – A matrix of shape (train_count, response_count) whose rows consist of response vectors of the train data.

  • nn_count (int) – The number of nearest neighbors to employ.

  • batch_count (int) – The number of elements to sample batch for hyperparameter optimization.

  • loss_fn (LossFn) – The loss method to use in hyperparameter optimization. Ignored if all of the parameters specified by argument k_kwargs are fixed.

  • opt_fn (OptimizeFn) – The optimization functor to use in hyperparameter optimization. Ignored if all of the parameters specified by argument k_kwargs are fixed.

  • k_args (Union[List[Dict], Tuple[Dict, ...]]) – A list of response_count dicts containing kernel initialization keyword arguments. Each dict specifies parameters for the kernel, possibly including noise and scale hyperparameter specifications and specifications for specific kernel hyperparameters. If all of the hyperparameters are fixed or are not given optimization bounds, no optimization will occur.

  • nn_kwargs (Dict) – Parameters for the nearest neighbors wrapper. See MuyGPyS.neighbors.NN_Wrapper for the supported methods and their parameters.

  • opt_kwargs (Dict) – Parameters for the wrapped optimizer. See the docs of the corresponding library for supported parameters.

  • verbose (bool) – If True, print summary statistics.

Return type:

Tuple[MultivariateMuyGPS, NN_Wrapper]

Returns:

  • mmuygps – A Multivariate MuyGPs object with a separate (possibly trained) kernel function associated with each response dimension.

  • nbrs_lookup – A data structure supporting nearest neighbor queries into train_features.

MuyGPyS.examples.regress.make_regressor(train_features, train_targets, nn_count=30, batch_count=200, loss_fn=<MuyGPyS.optimize.loss.LossFn object>, opt_fn=<MuyGPyS.optimize.chassis.OptimizeFn object>, k_kwargs={}, nn_kwargs={}, opt_kwargs={}, verbose=False)[source]

Convenience function for creating MuyGPyS functor and neighbor lookup data structure.

Expected parameters include keyword argument dicts specifying kernel parameters and nearest neighbor parameters. See the docstrings of the appropriate functions for specifics.

Example

>>> from MuyGPyS.examples.regress import make_regressor
>>> from MuyGPyS.gp.deformation import F2, Isotropy
>>> from MuyGPyS.gp.hyperparameter import Parameter
>>> from MuyGPyS.gp.hyperparameter import AnalyticScale
>>> from MuyGPyS.gp.kernels import RBF
>>> from MuyGPyS.gp.noise import HomoscedasticNoise
>>> from MuyGPyS.optimize import Bayes_optimize
>>> from MuyGPyS.examples.regress import make_regressor
>>> train_features, train_responses = make_train()  # stand-in function
>>> nn_kwargs = {"nn_method": "exact", "algorithm": "ball_tree"}
>>> k_kwargs = {
...     "kernel": RBF(
...         deformation=Isotropy(
...             metric=F2,
...             length_scale=Parameter(1.0, (1e-2, 1e2))
...         )
...     ),
...     "noise": HomoscedasticNoise(1e-5),
...     "scale": AnalyticScale(),
... }
>>> muygps, nbrs_lookup = make_regressor(
...         train_features,
...         train_responses,
...         nn_count=30,
...         batch_count=200,
...         loss_fn=lool_fn,
...         opt_fn=Bayes_optimize,
...         k_kwargs=k_kwargs,
...         nn_kwargs=nn_kwargs,
...         verbose=False,
... )
Parameters:
  • train_features (ndarray) – A matrix of shape (train_count, feature_count) whose rows consist of observation vectors of the train data.

  • train_targets (ndarray) – A matrix of shape (train_count, response_count) whose rows consist of response vectors of the train data.

  • nn_count (int) – The number of nearest neighbors to employ.

  • batch_count (int) – The number of elements to sample batch for hyperparameter optimization.

  • loss_fn (LossFn) – The loss method to use in hyperparameter optimization. Ignored if all of the parameters specified by argument k_kwargs are fixed.

  • opt_fn (OptimizeFn) – The optimization functor to use in hyperparameter optimization. Ignored if all of the parameters specified by argument k_kwargs are fixed.

  • k_kwargs (Dict) – Parameters for the kernel, possibly including kernel type, deformation function, noise and scale hyperparameter specifications, and specifications for kernel hyperparameters. See kernels for examples and requirements. If all of the hyperparameters are fixed or are not given optimization bounds, no optimization will occur.

  • nn_kwargs (Dict) – Parameters for the nearest neighbors wrapper. See MuyGPyS.neighbors.NN_Wrapper for the supported methods and their parameters.

  • opt_kwargs (Dict) – Parameters for the wrapped optimizer. See the docs of the corresponding library for supported parameters.

  • verbose (bool) – If True, print summary statistics.

Return type:

Tuple[MuyGPS, NN_Wrapper]

Returns:

  • muygps – A (possibly trained) MuyGPs object.

  • nbrs_lookup – A data structure supporting nearest neighbor queries into train_features.

MuyGPyS.examples.regress.regress_any(regressor, test_features, train_features, train_nbrs_lookup, train_targets)[source]

Simultaneously predicts the response for each test item.

Parameters:
  • regressor (Union[MuyGPS, MultivariateMuyGPS]) – Regressor object.

  • test_features (ndarray) – Test observations of shape (test_count, feature_count).

  • train_features (ndarray) – Train observations of shape (train_count, feature_count).

  • train_nbrs_lookup (NN_Wrapper) – Trained nearest neighbor query data structure.

  • train_targets (ndarray) – Observed response for all training data of shape (train_count, class_count).

Return type:

Tuple[ndarray, ndarray, Dict[str, float]]

Returns:

  • means – The predicted response of shape (test_count, response_count,) for each of the test examples.

  • variances – The independent posterior variances for each of the test examples. Of shape (test_count,) if the argument regressor is an instance of MuyGPyS.gp.muygps.MuyGPS, and of shape (test_count, response_count) if regressor is an instance of MuyGPyS.gp.muygps.MultivariateMuyGPS.

  • timing (dict) – Timing for the subroutines of this function.