regress¶
Resources and high-level API for a simple regression workflow.
make_regressor()
is a high-level API for
creating and training MuyGPyS.gp.muygps.MuyGPS
objects for regression.
make_multivariate_regressor()
is a high-level
API for creating and training MuyGPyS.gp.muygps.MultivariateMuyGPS
objects for regression.
do_regress()
is a high-level api for executing
a simple, generic regression workflow given data. It calls the maker APIs
above and regress_any()
.
- MuyGPyS.examples.regress.do_regress(test_features, train_features, train_targets, nn_count=30, batch_count=200, loss_method='mse', sigma_method='analytic', variance_mode=None, kern=None, k_kwargs={}, nn_kwargs={}, apply_sigma_sq=True, return_distances=False, verbose=False)[source]¶
Convenience function initializing a model and performing regression.
Expected parameters include keyword argument dicts specifying kernel parameters and nearest neighbor parameters. See the docstrings of the appropriate functions for specifics.
Also supports workflows relying upon multivariate models. In order to create a multivariate model, specify the
kern
argument and pass a list of hyperparameter dicts tok_kwargs
.Example
>>> from MuyGPyS.testing.test_utils import _make_gaussian_data >>> from MuyGPyS.examples.regress import do_regress >>> from MuyGPyS.optimize.objective import mse_fn >>> train, test = _make_gaussian_data(10000, 1000, 100, 10) >>> nn_kwargs = {"nn_method": "exact", "algorithm": "ball_tree"} >>> k_kwargs = { ... "kern": "rbf", ... "metric": "F2", ... "eps": {"val": 1e-5}, ... "length_scale": {"val": 1.0, "bounds": (1e-2, 1e2)} ... } >>> muygps, nbrs_lookup, predictions, variance = do_regress( ... test['input'], ... train['input'], ... train['output'], ... nn_count=30, ... batch_count=200, ... loss_method="mse", ... variance_mode="diagonal", ... k_kwargs=k_kwargs, ... nn_kwargs=nn_kwargs, ... verbose=False, ... ) >>> # Can alternately return distance tensors for reuse >>> muygps, nbrs_lookup, predictions, variance, crosswise_dists, pairwise_dists = do_regress( ... test['input'], ... train['input'], ... train['output'], ... nn_count=30, ... batch_count=200, ... loss_method="mse", ... variance_mode="diagonal", ... k_kwargs=k_kwargs, ... nn_kwargs=nn_kwargs, ... return_distances=True, ... verbose=False, ... ) >>> mse = mse_fn(test['output'], predictions) >>> print(f"obtained mse: {mse}") obtained mse: 0.20842...
- Parameters
test_features (
ndarray
) – A matrix of shape(test_count, feature_count)
whose rows consist of observation vectors of the test data.train_features (
ndarray
) – A matrix of shape(train_count, feature_count)
whose rows consist of observation vectors of the train data.train_targets (
ndarray
) – A matrix of shape(train_count, response_count)
whose rows consist of response vectors of the train data.nn_count (
int
) – The number of nearest neighbors to employ.batch_count (
int
) – The number of elements to sample batch for hyperparameter optimization.loss_method (
str
) – The loss method to use in hyperparameter optimization. Ignored if all of the parameters specified by argumentk_kwargs
are fixed. Currently supports only"mse"
for regression.sigma_method (
Optional
[str
]) – The optimization method to be employed to learn thesigma_sq
hyperparameter. Currently supports only"analytic"
andNone
. If the value is notNone
, the returnedMuyGPyS.gp.muygps.MuyGPS
object will possess asigma_sq
member whose value, invoked viamuygps.sigma_sq()
, is a(response_count,)
vector to be used for scaling posterior variances.variance_mode (
Optional
[str
]) – Specifies the type of variance to return. Currently supportsdiagonal
and None. If None, report no variance term.kern (
Optional
[str
]) – The kernel function to be used. See kernels for details. Only used in the multivariate case. IfNone
, assume that we are not using a multivariate model.k_kwargs (
Union
[Dict
,List
[Dict
],Tuple
[Dict
, …]]) – If given a list or tuple of lengthresponse_count
, assume that the elements are dicts containing kernel initialization keyword arguments for the creation of a multivariate model (seemake_multivariate_regressor()
). If given a dict, assume that the elements are keyword arguments to a MuyGPs model (seemake_regressor()
).nn_kwargs (
Dict
) – Parameters for the nearest neighbors wrapper. SeeMuyGPyS.neighbors.NN_Wrapper
for the supported methods and their parameters.apply_sigma_sq (
bool
) – IfTrue
andvariance_mode is not None
, automatically scale the posterior variances bysigma_sq
.return_distances (
bool
) – IfTrue
, returns a(test_count, nn_count)
matrix containing the crosswise distances between the test elements and their nearest neighbor sets and a(test_count, nn_count, nn_count)
tensor containing the pairwise distances between the test’s nearest neighbor sets.verbose (
bool
) – IfTrue
, print summary statistics.
- Return type
Union
[Tuple
[Union
[MuyGPS
,MultivariateMuyGPS
],NN_Wrapper
,ndarray
],Tuple
[Union
[MuyGPS
,MultivariateMuyGPS
],NN_Wrapper
,ndarray
,ndarray
],Tuple
[Union
[MuyGPS
,MultivariateMuyGPS
],NN_Wrapper
,ndarray
,ndarray
,ndarray
],Tuple
[Union
[MuyGPS
,MultivariateMuyGPS
],NN_Wrapper
,ndarray
,ndarray
,ndarray
,ndarray
]]- Returns
muygps – A (possibly trained) MuyGPs object.
nbrs_lookup – A data structure supporting nearest neighbor queries into
train_features
.predictions – The predicted response associated with each test observation.
variance – Estimated posterior variance of each test prediction. If
variance_mode == "diagonal"
return a(test_count, response_count)
matrix where each row is the posterior variance. Ifsigma_method is not None
andapply_sigma_sq is True
, each column of the variance is automatically scaled by the correspondingsigma_sq
parameter.crosswise_dists – A matrix of shape
(test_count, nn_count)
whose rows list the distance of the corresponding test element to each of its nearest neighbors. Only returned ifreturn_distances is True
.pairwise_dists – A tensor of shape
(test_count, nn_count, nn_count,)
whose latter two dimensions contain square matrices containing the pairwise distances between the nearest neighbors of the test elements. Only returned ifreturn_distances is True
.
- MuyGPyS.examples.regress.make_multivariate_regressor(train_features, train_targets, nn_count=30, batch_count=200, loss_method='mse', sigma_method='analytic', kern='matern', k_args=[], nn_kwargs={}, return_distances=False, verbose=False)[source]¶
Convenience function for creating a Multivariate MuyGPyS functor and neighbor lookup data structure.
Expected parameters include a list of keyword argument dicts specifying kernel parameters and a dict listing nearest neighbor parameters. See the docstrings of the appropriate functions for specifics.
Example
>>> from MuyGPyS.testing.test_utils import _make_gaussian_data >>> from MuyGPyS.examples.regress import make_regressor >>> train_features, train_responses = make_train() # stand-in function >>> nn_kwargs = {"nn_method": "exact", "algorithm": "ball_tree"} >>> k_args = [ ... { ... "length_scale": {"val": 1.0, "bounds": (1e-2, 1e2)} ... "eps": {"val": 1e-5}, ... }, ... { ... "length_scale": {"val": 1.5, "bounds": (1e-2, 1e2)} ... "eps": {"val": 1e-5}, ... }, ... ] >>> mmuygps, nbrs_lookup = make_multivariate_regressor( ... train_features, ... train_responses, ... nn_count=30, ... batch_count=200, ... loss_method="mse", ... sigma_method="analytic", ... kern="rbf", ... k_args=k_args, ... nn_kwargs=nn_kwargs, ... verbose=False, ... ) >>> # Can alternately return distance tensors for reuse >>> mmuygps, nbrs_lookup = make_multivariate_regressor( ... train_features, ... train_responses, ... nn_count=30, ... batch_count=200, ... loss_method="mse", ... sigma_method="analytic", ... kern="rbf", ... k_args=k_args, ... nn_kwargs=nn_kwargs, ... return_distances=return_distances, ... verbose=False, ... )
- Parameters
train_features (
ndarray
) – A matrix of shape(train_count, feature_count)
whose rows consist of observation vectors of the train data.train_targets (
ndarray
) – A matrix of shape(train_count, response_count)
whose rows consist of response vectors of the train data.nn_count (
int
) – The number of nearest neighbors to employ.batch_count (
int
) – The number of elements to sample batch for hyperparameter optimization.loss_method (
str
) – The loss method to use in hyperparameter optimization. Ignored if all of the parameters specified by argumentk_kwargs
are fixed. Currently supports only"mse"
for regression.sigma_method (
Optional
[str
]) – The optimization method to be employed to learn thesigma_sq
hyperparameter. Currently supports only"analytic"
andNone
. If the value is notNone
, the returnedMuyGPyS.gp.muygps.MultivariateMuyGPS
object will possess asigma_sq
member whose value, invoked viammuygps.sigma_sq()
, is a(response_count,)
vector to be used for scaling posterior variances.kern (
str
) – The kernel function to be used. See kernels for details.k_args (
Union
[List
[Dict
],Tuple
[Dict
, …]]) – A list ofresponse_count
dicts containing kernel initialization keyword arguments. Each dict specifies parameters for the kernel, possibly including epsilon and sigma hyperparameter specifications and specifications for specific kernel hyperparameters. If all of the hyperparameters are fixed or are not given optimization bounds, no optimization will occur.nn_kwargs (
Dict
) – Parameters for the nearest neighbors wrapper. SeeMuyGPyS.neighbors.NN_Wrapper
for the supported methods and their parameters.return_distances (
bool
) – IfTrue
and any training occurs, returns a(batch_count, nn_count)
matrix containing the crosswise distances between the batch’s elements and their nearest neighbor sets and a(batch_count, nn_count, nn_count)
matrix containing the pairwise distances between the batch’s nearest neighbor sets.verbose (
bool
) – IfTrue
, print summary statistics.
- Return type
Union
[Tuple
[MultivariateMuyGPS
,NN_Wrapper
],Tuple
[MultivariateMuyGPS
,NN_Wrapper
,ndarray
,ndarray
]]- Returns
mmuygps – A Multivariate MuyGPs object with a separate (possibly trained) kernel function associated with each response dimension.
nbrs_lookup – A data structure supporting nearest neighbor queries into
train_features
.crosswise_dists – A matrix of shape
(batch_count, nn_count)
whose rows list the distance of the corresponding batch element to each of its nearest neighbors. Only returned ifreturn_distances is True
.pairwise_dists – A tensor of shape
(batch_count, nn_count, nn_count,)
whose latter two dimensions contain square matrices containing the pairwise distances between the nearest neighbors of the batch elements. Only returned ifreturn_distances is True
.
- MuyGPyS.examples.regress.make_regressor(train_features, train_targets, nn_count=30, batch_count=200, loss_method='mse', sigma_method='analytic', k_kwargs={}, nn_kwargs={}, return_distances=False, verbose=False)[source]¶
Convenience function for creating MuyGPyS functor and neighbor lookup data structure.
Expected parameters include keyword argument dicts specifying kernel parameters and nearest neighbor parameters. See the docstrings of the appropriate functions for specifics.
Example
>>> from MuyGPyS.testing.test_utils import _make_gaussian_data >>> from MuyGPyS.examples.regress import make_regressor >>> train_features, train_responses = make_train() # stand-in function >>> nn_kwargs = {"nn_method": "exact", "algorithm": "ball_tree"} >>> k_kwargs = { ... "kern": "rbf", ... "metric": "F2", ... "eps": {"val": 1e-5}, ... "length_scale": {"val": 1.0, "bounds": (1e-2, 1e2)} ... } >>> muygps, nbrs_lookup = make_regressor( ... train_features, ... train_responses, ... nn_count=30, ... batch_count=200, ... loss_method="mse", ... sigma_method="analytic", ... k_kwargs=k_kwargs, ... nn_kwargs=nn_kwargs, ... verbose=False, ... ) >>> # Can alternately return distance tensors for reuse >>> muygps, nbrs_lookup, crosswise_dists, pairwise_dists = make_regressor( ... train_features, ... train_responses, ... nn_count=30, ... batch_count=200, ... loss_method="mse", ... sigma_method="analytic", ... k_kwargs=k_kwargs, ... nn_kwargs=nn_kwargs, ... return_distances=True, ... verbose=False, ... )
- Parameters
train_features (
ndarray
) – A matrix of shape(train_count, feature_count)
whose rows consist of observation vectors of the train data.train_targets (
ndarray
) – A matrix of shape(train_count, response_count)
whose rows consist of response vectors of the train data.nn_count (
int
) – The number of nearest neighbors to employ.batch_count (
int
) – The number of elements to sample batch for hyperparameter optimization.loss_method (
str
) – The loss method to use in hyperparameter optimization. Ignored if all of the parameters specified by argumentk_kwargs
are fixed. Currently supports only"mse"
for regression.sigma_method (
Optional
[str
]) – The optimization method to be employed to learn thesigma_sq
hyperparameter. Currently supports only"analytic"
andNone
. If the value is notNone
, the returnedMuyGPyS.gp.muygps.MuyGPS
object will possess asigma_sq
member whose value, invoked viamuygps.sigma_sq()
, is a(response_count,)
vector to be used for scaling posterior variances.k_kwargs (
Dict
) – Parameters for the kernel, possibly including kernel type, distance metric, epsilon and sigma hyperparameter specifications, and specifications for kernel hyperparameters. See kernels for examples and requirements. If all of the hyperparameters are fixed or are not given optimization bounds, no optimization will occur.nn_kwargs (
Dict
) – Parameters for the nearest neighbors wrapper. SeeMuyGPyS.neighbors.NN_Wrapper
for the supported methods and their parameters.return_distances (
bool
) – IfTrue
and any training occurs, returns a(batch_count, nn_count)
matrix containing the crosswise distances between the batch’s elements and their nearest neighbor sets and a(batch_count, nn_count, nn_count)
matrix containing the pairwise distances between the batch’s nearest neighbor sets.verbose (
bool
) – IfTrue
, print summary statistics.
- Return type
Union
[Tuple
[MuyGPS
,NN_Wrapper
],Tuple
[MuyGPS
,NN_Wrapper
,ndarray
,ndarray
]]- Returns
muygps – A (possibly trained) MuyGPs object.
nbrs_lookup – A data structure supporting nearest neighbor queries into
train_features
.crosswise_dists – A matrix of shape
(batch_count, nn_count)
whose rows list the distance of the corresponding batch element to each of its nearest neighbors. Only returned ifreturn_distances is True
.pairwise_dists – A tensor of shape
(batch_count, nn_count, nn_count,)
whose latter two dimensions contain square matrices containing the pairwise distances between the nearest neighbors of the batch elements. Only returned ifreturn_distances is True
.
- MuyGPyS.examples.regress.regress_any(regressor, test_features, train_features, train_nbrs_lookup, train_targets, variance_mode=None, apply_sigma_sq=True, return_distances=False)[source]¶
Simultaneously predicts the response for each test item.
- Parameters
regressor (
Union
[MuyGPS
,MultivariateMuyGPS
]) – Regressor object.test_features (
ndarray
) – Test observations of shape(test_count, feature_count)
.train_features (
ndarray
) – Train observations of shape(train_count, feature_count)
.train_nbrs_lookup (
NN_Wrapper
) – Trained nearest neighbor query data structure.train_targets (
ndarray
) – Observed response for all training data of shape(train_count, class_count)
.variance_mode (
Optional
[str
]) – str or None Specifies the type of variance to return. Currently supportsdiagonal
and None. If None, report no variance term.apply_sigma_sq (
bool
) – IfTrue
andvariance_mode is not None
, automatically scale the posterior variances bysigma_sq
.return_distances (
bool
) – IfTrue
, returns a(test_count, nn_count)
matrix containing the crosswise distances between the test elements and their nearest neighbor sets and a(test_count, nn_count, nn_count)
tensor containing the pairwise distances between the test data’s nearest neighbor sets.
- Return type
Union
[Tuple
[ndarray
,Dict
[str
,float
]],Tuple
[Tuple
[ndarray
,ndarray
],Dict
[str
,float
]],Tuple
[Tuple
[ndarray
,ndarray
,ndarray
],Dict
[str
,float
]],Tuple
[Tuple
[ndarray
,ndarray
,ndarray
,ndarray
],Dict
[str
,float
]]]- Returns
means – The predicted response of shape
(test_count, response_count,)
for each of the test examples.variances – The independent posterior variances for each of the test examples. Of shape
(test_count,)
if the argumentregressor
is an instance ofMuyGPyS.gp.muygps.MuyGPS
, and of shape(test_count, response_count)
ifregressor
is an instance ofMuyGPyS.gp.muygps.MultivariateMuyGPS
. Returned only whenvariance_mode == "diagonal"
.crosswise_dists – A matrix of shape
(test_count, nn_count)
whose rows list the distance of the corresponding test element to each of its nearest neighbors. Only returned ifreturn_distances is True
.pairwise_dists – A tensor of shape
(test_count, nn_count, nn_count,)
whose latter two dimensions contain square matrices containing the pairwise distances between the nearest neighbors of the test elements. Only returned ifreturn_distances is True
.timing (dict) – Timing for the subroutines of this function.