regress

Resources and high-level API for a simple regression workflow.

make_regressor() is a high-level API for creating and training MuyGPyS.gp.muygps.MuyGPS objects for regression. make_multivariate_regressor() is a high-level API for creating and training MuyGPyS.gp.muygps.MultivariateMuyGPS objects for regression.

do_regress() is a high-level api for executing a simple, generic regression workflow given data. It calls the maker APIs above and regress_any().

MuyGPyS.examples.regress.do_regress(test_features, train_features, train_targets, nn_count=30, batch_count=200, loss_method='mse', sigma_method='analytic', variance_mode=None, kern=None, k_kwargs={}, nn_kwargs={}, apply_sigma_sq=True, return_distances=False, verbose=False)[source]

Convenience function initializing a model and performing regression.

Expected parameters include keyword argument dicts specifying kernel parameters and nearest neighbor parameters. See the docstrings of the appropriate functions for specifics.

Also supports workflows relying upon multivariate models. In order to create a multivariate model, specify the kern argument and pass a list of hyperparameter dicts to k_kwargs.

Example

>>> from MuyGPyS.testing.test_utils import _make_gaussian_data
>>> from MuyGPyS.examples.regress import do_regress
>>> from MuyGPyS.optimize.objective import mse_fn
>>> train, test = _make_gaussian_data(10000, 1000, 100, 10)
>>> nn_kwargs = {"nn_method": "exact", "algorithm": "ball_tree"}
>>> k_kwargs = {
...         "kern": "rbf",
...         "metric": "F2",
...         "eps": {"val": 1e-5},
...         "length_scale": {"val": 1.0, "bounds": (1e-2, 1e2)}
... }
>>> muygps, nbrs_lookup, predictions, variance = do_regress(
...         test['input'],
...         train['input'],
...         train['output'],
...         nn_count=30,
...         batch_count=200,
...         loss_method="mse",
...         variance_mode="diagonal",
...         k_kwargs=k_kwargs,
...         nn_kwargs=nn_kwargs,
...         verbose=False,
... )
>>> # Can alternately return distance tensors for reuse
>>> muygps, nbrs_lookup, predictions, variance, crosswise_dists, pairwise_dists = do_regress(
...         test['input'],
...         train['input'],
...         train['output'],
...         nn_count=30,
...         batch_count=200,
...         loss_method="mse",
...         variance_mode="diagonal",
...         k_kwargs=k_kwargs,
...         nn_kwargs=nn_kwargs,
...         return_distances=True,
...         verbose=False,
... )
>>> mse = mse_fn(test['output'], predictions)
>>> print(f"obtained mse: {mse}")
obtained mse: 0.20842...
Parameters
  • test_features (ndarray) – A matrix of shape (test_count, feature_count) whose rows consist of observation vectors of the test data.

  • train_features (ndarray) – A matrix of shape (train_count, feature_count) whose rows consist of observation vectors of the train data.

  • train_targets (ndarray) – A matrix of shape (train_count, response_count) whose rows consist of response vectors of the train data.

  • nn_count (int) – The number of nearest neighbors to employ.

  • batch_count (int) – The number of elements to sample batch for hyperparameter optimization.

  • loss_method (str) – The loss method to use in hyperparameter optimization. Ignored if all of the parameters specified by argument k_kwargs are fixed. Currently supports only "mse" for regression.

  • sigma_method (Optional[str]) – The optimization method to be employed to learn the sigma_sq hyperparameter. Currently supports only "analytic" and None. If the value is not None, the returned MuyGPyS.gp.muygps.MuyGPS object will possess a sigma_sq member whose value, invoked via muygps.sigma_sq(), is a (response_count,) vector to be used for scaling posterior variances.

  • variance_mode (Optional[str]) – Specifies the type of variance to return. Currently supports diagonal and None. If None, report no variance term.

  • kern (Optional[str]) – The kernel function to be used. See kernels for details. Only used in the multivariate case. If None, assume that we are not using a multivariate model.

  • k_kwargs (Union[Dict, List[Dict], Tuple[Dict, …]]) – If given a list or tuple of length response_count, assume that the elements are dicts containing kernel initialization keyword arguments for the creation of a multivariate model (see make_multivariate_regressor()). If given a dict, assume that the elements are keyword arguments to a MuyGPs model (see make_regressor()).

  • nn_kwargs (Dict) – Parameters for the nearest neighbors wrapper. See MuyGPyS.neighbors.NN_Wrapper for the supported methods and their parameters.

  • apply_sigma_sq (bool) – If True and variance_mode is not None, automatically scale the posterior variances by sigma_sq.

  • return_distances (bool) – If True, returns a (test_count, nn_count) matrix containing the crosswise distances between the test elements and their nearest neighbor sets and a (test_count, nn_count, nn_count) tensor containing the pairwise distances between the test’s nearest neighbor sets.

  • verbose (bool) – If True, print summary statistics.

Return type

Union[Tuple[Union[MuyGPS, MultivariateMuyGPS], NN_Wrapper, ndarray], Tuple[Union[MuyGPS, MultivariateMuyGPS], NN_Wrapper, ndarray, ndarray], Tuple[Union[MuyGPS, MultivariateMuyGPS], NN_Wrapper, ndarray, ndarray, ndarray], Tuple[Union[MuyGPS, MultivariateMuyGPS], NN_Wrapper, ndarray, ndarray, ndarray, ndarray]]

Returns

  • muygps – A (possibly trained) MuyGPs object.

  • nbrs_lookup – A data structure supporting nearest neighbor queries into train_features.

  • predictions – The predicted response associated with each test observation.

  • variance – Estimated posterior variance of each test prediction. If variance_mode == "diagonal" return a (test_count, response_count) matrix where each row is the posterior variance. If sigma_method is not None and apply_sigma_sq is True, each column of the variance is automatically scaled by the corresponding sigma_sq parameter.

  • crosswise_dists – A matrix of shape (test_count, nn_count) whose rows list the distance of the corresponding test element to each of its nearest neighbors. Only returned if return_distances is True.

  • pairwise_dists – A tensor of shape (test_count, nn_count, nn_count,) whose latter two dimensions contain square matrices containing the pairwise distances between the nearest neighbors of the test elements. Only returned if return_distances is True.

MuyGPyS.examples.regress.make_multivariate_regressor(train_features, train_targets, nn_count=30, batch_count=200, loss_method='mse', sigma_method='analytic', kern='matern', k_args=[], nn_kwargs={}, return_distances=False, verbose=False)[source]

Convenience function for creating a Multivariate MuyGPyS functor and neighbor lookup data structure.

Expected parameters include a list of keyword argument dicts specifying kernel parameters and a dict listing nearest neighbor parameters. See the docstrings of the appropriate functions for specifics.

Example

>>> from MuyGPyS.testing.test_utils import _make_gaussian_data
>>> from MuyGPyS.examples.regress import make_regressor
>>> train_features, train_responses = make_train()  # stand-in function
>>> nn_kwargs = {"nn_method": "exact", "algorithm": "ball_tree"}
>>> k_args = [
...         {
...             "length_scale": {"val": 1.0, "bounds": (1e-2, 1e2)}
...             "eps": {"val": 1e-5},
...         },
...         {
...             "length_scale": {"val": 1.5, "bounds": (1e-2, 1e2)}
...             "eps": {"val": 1e-5},
...         },
... ]
>>> mmuygps, nbrs_lookup = make_multivariate_regressor(
...         train_features,
...         train_responses,
...         nn_count=30,
...         batch_count=200,
...         loss_method="mse",
...         sigma_method="analytic",
...         kern="rbf",
...         k_args=k_args,
...         nn_kwargs=nn_kwargs,
...         verbose=False,
... )
>>> # Can alternately return distance tensors for reuse
>>> mmuygps, nbrs_lookup = make_multivariate_regressor(
...         train_features,
...         train_responses,
...         nn_count=30,
...         batch_count=200,
...         loss_method="mse",
...         sigma_method="analytic",
...         kern="rbf",
...         k_args=k_args,
...         nn_kwargs=nn_kwargs,
...         return_distances=return_distances,
...         verbose=False,
... )
Parameters
  • train_features (ndarray) – A matrix of shape (train_count, feature_count) whose rows consist of observation vectors of the train data.

  • train_targets (ndarray) – A matrix of shape (train_count, response_count) whose rows consist of response vectors of the train data.

  • nn_count (int) – The number of nearest neighbors to employ.

  • batch_count (int) – The number of elements to sample batch for hyperparameter optimization.

  • loss_method (str) – The loss method to use in hyperparameter optimization. Ignored if all of the parameters specified by argument k_kwargs are fixed. Currently supports only "mse" for regression.

  • sigma_method (Optional[str]) – The optimization method to be employed to learn the sigma_sq hyperparameter. Currently supports only "analytic" and None. If the value is not None, the returned MuyGPyS.gp.muygps.MultivariateMuyGPS object will possess a sigma_sq member whose value, invoked via mmuygps.sigma_sq(), is a (response_count,) vector to be used for scaling posterior variances.

  • kern (str) – The kernel function to be used. See kernels for details.

  • k_args (Union[List[Dict], Tuple[Dict, …]]) – A list of response_count dicts containing kernel initialization keyword arguments. Each dict specifies parameters for the kernel, possibly including epsilon and sigma hyperparameter specifications and specifications for specific kernel hyperparameters. If all of the hyperparameters are fixed or are not given optimization bounds, no optimization will occur.

  • nn_kwargs (Dict) – Parameters for the nearest neighbors wrapper. See MuyGPyS.neighbors.NN_Wrapper for the supported methods and their parameters.

  • return_distances (bool) – If True and any training occurs, returns a (batch_count, nn_count) matrix containing the crosswise distances between the batch’s elements and their nearest neighbor sets and a (batch_count, nn_count, nn_count) matrix containing the pairwise distances between the batch’s nearest neighbor sets.

  • verbose (bool) – If True, print summary statistics.

Return type

Union[Tuple[MultivariateMuyGPS, NN_Wrapper], Tuple[MultivariateMuyGPS, NN_Wrapper, ndarray, ndarray]]

Returns

  • mmuygps – A Multivariate MuyGPs object with a separate (possibly trained) kernel function associated with each response dimension.

  • nbrs_lookup – A data structure supporting nearest neighbor queries into train_features.

  • crosswise_dists – A matrix of shape (batch_count, nn_count) whose rows list the distance of the corresponding batch element to each of its nearest neighbors. Only returned if return_distances is True.

  • pairwise_dists – A tensor of shape (batch_count, nn_count, nn_count,) whose latter two dimensions contain square matrices containing the pairwise distances between the nearest neighbors of the batch elements. Only returned if return_distances is True.

MuyGPyS.examples.regress.make_regressor(train_features, train_targets, nn_count=30, batch_count=200, loss_method='mse', sigma_method='analytic', k_kwargs={}, nn_kwargs={}, return_distances=False, verbose=False)[source]

Convenience function for creating MuyGPyS functor and neighbor lookup data structure.

Expected parameters include keyword argument dicts specifying kernel parameters and nearest neighbor parameters. See the docstrings of the appropriate functions for specifics.

Example

>>> from MuyGPyS.testing.test_utils import _make_gaussian_data
>>> from MuyGPyS.examples.regress import make_regressor
>>> train_features, train_responses = make_train()  # stand-in function
>>> nn_kwargs = {"nn_method": "exact", "algorithm": "ball_tree"}
>>> k_kwargs = {
...         "kern": "rbf",
...         "metric": "F2",
...         "eps": {"val": 1e-5},
...         "length_scale": {"val": 1.0, "bounds": (1e-2, 1e2)}
... }
>>> muygps, nbrs_lookup = make_regressor(
...         train_features,
...         train_responses,
...         nn_count=30,
...         batch_count=200,
...         loss_method="mse",
...         sigma_method="analytic",
...         k_kwargs=k_kwargs,
...         nn_kwargs=nn_kwargs,
...         verbose=False,
... )
>>> # Can alternately return distance tensors for reuse
>>> muygps, nbrs_lookup, crosswise_dists, pairwise_dists = make_regressor(
...         train_features,
...         train_responses,
...         nn_count=30,
...         batch_count=200,
...         loss_method="mse",
...         sigma_method="analytic",
...         k_kwargs=k_kwargs,
...         nn_kwargs=nn_kwargs,
...         return_distances=True,
...         verbose=False,
... )
Parameters
  • train_features (ndarray) – A matrix of shape (train_count, feature_count) whose rows consist of observation vectors of the train data.

  • train_targets (ndarray) – A matrix of shape (train_count, response_count) whose rows consist of response vectors of the train data.

  • nn_count (int) – The number of nearest neighbors to employ.

  • batch_count (int) – The number of elements to sample batch for hyperparameter optimization.

  • loss_method (str) – The loss method to use in hyperparameter optimization. Ignored if all of the parameters specified by argument k_kwargs are fixed. Currently supports only "mse" for regression.

  • sigma_method (Optional[str]) – The optimization method to be employed to learn the sigma_sq hyperparameter. Currently supports only "analytic" and None. If the value is not None, the returned MuyGPyS.gp.muygps.MuyGPS object will possess a sigma_sq member whose value, invoked via muygps.sigma_sq(), is a (response_count,) vector to be used for scaling posterior variances.

  • k_kwargs (Dict) – Parameters for the kernel, possibly including kernel type, distance metric, epsilon and sigma hyperparameter specifications, and specifications for kernel hyperparameters. See kernels for examples and requirements. If all of the hyperparameters are fixed or are not given optimization bounds, no optimization will occur.

  • nn_kwargs (Dict) – Parameters for the nearest neighbors wrapper. See MuyGPyS.neighbors.NN_Wrapper for the supported methods and their parameters.

  • return_distances (bool) – If True and any training occurs, returns a (batch_count, nn_count) matrix containing the crosswise distances between the batch’s elements and their nearest neighbor sets and a (batch_count, nn_count, nn_count) matrix containing the pairwise distances between the batch’s nearest neighbor sets.

  • verbose (bool) – If True, print summary statistics.

Return type

Union[Tuple[MuyGPS, NN_Wrapper], Tuple[MuyGPS, NN_Wrapper, ndarray, ndarray]]

Returns

  • muygps – A (possibly trained) MuyGPs object.

  • nbrs_lookup – A data structure supporting nearest neighbor queries into train_features.

  • crosswise_dists – A matrix of shape (batch_count, nn_count) whose rows list the distance of the corresponding batch element to each of its nearest neighbors. Only returned if return_distances is True.

  • pairwise_dists – A tensor of shape (batch_count, nn_count, nn_count,) whose latter two dimensions contain square matrices containing the pairwise distances between the nearest neighbors of the batch elements. Only returned if return_distances is True.

MuyGPyS.examples.regress.regress_any(regressor, test_features, train_features, train_nbrs_lookup, train_targets, variance_mode=None, apply_sigma_sq=True, return_distances=False)[source]

Simultaneously predicts the response for each test item.

Parameters
  • regressor (Union[MuyGPS, MultivariateMuyGPS]) – Regressor object.

  • test_features (ndarray) – Test observations of shape (test_count, feature_count).

  • train_features (ndarray) – Train observations of shape (train_count, feature_count).

  • train_nbrs_lookup (NN_Wrapper) – Trained nearest neighbor query data structure.

  • train_targets (ndarray) – Observed response for all training data of shape (train_count, class_count).

  • variance_mode (Optional[str]) – str or None Specifies the type of variance to return. Currently supports diagonal and None. If None, report no variance term.

  • apply_sigma_sq (bool) – If True and variance_mode is not None, automatically scale the posterior variances by sigma_sq.

  • return_distances (bool) – If True, returns a (test_count, nn_count) matrix containing the crosswise distances between the test elements and their nearest neighbor sets and a (test_count, nn_count, nn_count) tensor containing the pairwise distances between the test data’s nearest neighbor sets.

Return type

Union[Tuple[ndarray, Dict[str, float]], Tuple[Tuple[ndarray, ndarray], Dict[str, float]], Tuple[Tuple[ndarray, ndarray, ndarray], Dict[str, float]], Tuple[Tuple[ndarray, ndarray, ndarray, ndarray], Dict[str, float]]]

Returns

  • means – The predicted response of shape (test_count, response_count,) for each of the test examples.

  • variances – The independent posterior variances for each of the test examples. Of shape (test_count,) if the argument regressor is an instance of MuyGPyS.gp.muygps.MuyGPS, and of shape (test_count, response_count) if regressor is an instance of MuyGPyS.gp.muygps.MultivariateMuyGPS. Returned only when variance_mode == "diagonal".

  • crosswise_dists – A matrix of shape (test_count, nn_count) whose rows list the distance of the corresponding test element to each of its nearest neighbors. Only returned if return_distances is True.

  • pairwise_dists – A tensor of shape (test_count, nn_count, nn_count,) whose latter two dimensions contain square matrices containing the pairwise distances between the nearest neighbors of the test elements. Only returned if return_distances is True.

  • timing (dict) – Timing for the subroutines of this function.