classify¶
Resources and high-level API for a simple classification workflow.
make_classifier()
is a high-level API for
creating and training MuyGPyS.gp.muygps.MuyGPS
objects for
classification. make_multivariate_classifier()
is a high-level API for creating and training :
class:MuyGPyS.gp.muygps.MultivariateMuyGPS
objects for classification.
do_classify()
is a high-level api for executing
a simple, generic classification workflow given data. It calls the maker APIs
above and classify_any()
.
- MuyGPyS.examples.classify.classify_any(surrogate, test_features, train_features, train_nbrs_lookup, train_labels)[source]¶
Simulatneously predicts the surrogate regression means for each test item.
- Parameters
surrogate (
Union
[MuyGPS
,MultivariateMuyGPS
]) – Surrogate regressor.test_features (
ndarray
) – Test observations of shape(test_count, feature_count)
.train_features (
ndarray
) – Train observations of shape(train_count, feature_count)
.train_nbrs_lookup (
NN_Wrapper
) – Trained nearest neighbor query data structure.train_labels (
ndarray
) – One-hot encoding of class labels for all training data of shape(train_count, class_count)
.
- Return type
Tuple
[ndarray
,Dict
[str
,float
]]- Returns
predictions – The surrogate predictions of shape
(test_count, class_count)
for each test observation.timing – Timing for the subroutines of this function.
- MuyGPyS.examples.classify.do_classify(test_features, train_features, train_labels, nn_count=30, batch_count=200, loss_method='log', kern=None, k_kwargs={}, nn_kwargs={}, return_distances=False, verbose=False)[source]¶
Convenience function for initializing a model and performing surrogate classification.
Expected parameters include keyword argument dicts specifying kernel parameters and nearest neighbor parameters. See the docstrings of the appropriate functions for specifics.
Example
>>> import numpy as np >>> from MuyGPyS.testing.test_utils import _make_gaussian_data >>> from MuyGPyS.examples.regress import do_classify >>> train, test = _make_gaussian_dict(10000, 100, 100, 10, categorial=True) >>> nn_kwargs = {"nn_method": "exact", "algorithm": "ball_tree"} >>> k_kwargs = { ... "kern": "rbf", ... "metric": "F2", ... "eps": {"val": 1e-5}, ... "length_scale": {"val": 1.0, "bounds": (1e-2, 1e2)}, ... } >>> muygps, nbrs_lookup, surrogate_predictions = do_classify( ... test['input'], ... train['input'], ... train['output'], ... nn_count=30, ... batch_count=200, ... loss_method="log", ... k_kwargs=k_kwargs, ... nn_kwargs=nn_kwargs, ... verbose=False, ... ) >>> # Can alternately return distance tensors for reuse >>> muygps, nbrs_lookup, surrogate_predictions = do_classify( ... test['input'], ... train['input'], ... train['output'], ... nn_count=30, ... batch_count=200, ... loss_method="log", ... k_kwargs=k_kwargs, ... nn_kwargs=nn_kwargs, ... return_distances=return_distances, ... verbose=False, ... ) >>> predicted_labels = np.argmax(surrogate_predictions, axis=1) >>> true_labels = np.argmax(test['output'], axis=1) >>> acc = np.mean(predicted_labels == true_labels) >>> print(f"obtained accuracy {acc}") obtained accuracy: 0.973...
- Parameters
test_features (
ndarray
) – A matrix of shape(test_count, feature_count)
whose rows consist of observation vectors of the test data.train_features (
ndarray
) – A matrix of shape(train_count, feature_count)
whose rows consist of observation vectors of the train data.train_labels (
ndarray
) – A matrix of shape(train_count, response_count)
whose rows consist of label vectors for the training data.train_features – A matrix of shape
(train_count, feature_count)
whose rows consist of observation vectors of the testing data.nn_count (
int
) – The number of nearest neighbors to employ.batch_count (
int
) – The batch size for hyperparameter optimization.loss_method (
str
) – The loss method to use in hyperparameter optimization. Ignored if all of the parameters specified byk_kwargs
are fixed. Currently supports only"log"
(also known as"cross_entropy"
) and"mse"
for classification.kern (
Optional
[str
]) – The kernel function to be used. Only relevant for multivariate case wherek_kwargs
is a list of hyperparameter dicts. Currently supports only"matern"
and"rbf"
.k_kwargs (
Union
[Dict
,List
[Dict
],Tuple
[Dict
, …]]) – Parameters for the kernel, possibly including kernel type, distance metric, epsilon and sigma hyperparameter specifications, and specifications for kernel hyperparameters. If all of the hyperparameters are fixed or are not given optimization bounds, no optimization will occur. If"kern"
is specified and"k_kwargs"
is a list of such dicts, will create a multivariate regressor model.nn_kwargs (
Dict
) – Parameters for the nearest neighbors wrapper. SeeMuyGPyS.neighbors.NN_Wrapper
for the supported methods and their parameters.return_distances (
bool
) – IfTrue
and any training occurs, returns a(batch_count, nn_count)
matrix containing the crosswise distances between the batch’s elements and their nearest neighbor sets and a(batch_count, nn_count, nn_count)
matrix containing the pairwise distances between the batch’s nearest neighbor sets.verbose (
bool
) – IfTrue
, print summary statistics.
- Return type
Union
[Tuple
[Union
[MuyGPS
,MultivariateMuyGPS
],NN_Wrapper
,ndarray
],Tuple
[Union
[MuyGPS
,MultivariateMuyGPS
],NN_Wrapper
,ndarray
,ndarray
,ndarray
]]- Returns
muygps – A (possibly trained) MuyGPs object.
nbrs_lookup – A data structure supporting nearest neighbor queries into
train_features
.surrogate_predictions – A matrix of shape
(test_count, response_count)
whose rows indicate the surrogate predictions of the model. The predicted classes are given by the indices of the largest elements of each row.
- MuyGPyS.examples.classify.make_classifier(train_features, train_labels, nn_count=30, batch_count=200, loss_method='log', k_kwargs={}, nn_kwargs={}, return_distances=False, verbose=False)[source]¶
Convenience function for creating MuyGPyS functor and neighbor lookup data structure.
Expected parameters include keyword argument dicts specifying kernel parameters and nearest neighbor parameters. See the docstrings of the appropriate functions for specifics.
Example
>>> from MuyGPyS.testing.test_utils import _make_gaussian_data >>> from MuyGPyS.examples.regress import make_regressor >>> train = _make_gaussian_dict(10000, 100, 10, categorial=True) >>> nn_kwargs = {"nn_method": "exact", "algorithm": "ball_tree"} >>> k_kwargs = { ... "kern": "rbf", ... "metric": "F2", ... "eps": {"val": 1e-5}, ... "length_scale": {"val": 1.0, "bounds": (1e-2, 1e2)}, ... } >>> muygps, nbrs_lookup = make_classifier( ... train['input'], ... train['output'], ... nn_count=30, ... batch_count=200, ... loss_method="log", ... k_kwargs=k_kwargs, ... nn_kwargs=nn_kwargs, ... verbose=False, ... ) >>> # Can alternately return distance tensors for reuse >>> muygps, nbrs_lookup = make_classifier( ... train['input'], ... train['output'], ... nn_count=30, ... batch_count=200, ... loss_method="log", ... k_kwargs=k_kwargs, ... nn_kwargs=nn_kwargs, ... return_distances=True, ... verbose=False, ... )
- Parameters
train_features (
ndarray
) – A matrix of shape(train_count, feature_count)
whose rows consist of observation vectors of the train data.train_labels (
ndarray
) – A matrix of shape(train_count, class_count)
whose rows consist of one-hot class label vectors of the train data.nn_count (
int
) – The number of nearest neighbors to employ.batch_count (
int
) – The number of elements to sample batch for hyperparameter optimization.loss_method (
str
) – The loss method to use in hyperparameter optimization. Ignored if all of the parameters specified by argumentk_kwargs
are fixed. Currently supports only"log"
(or"cross-entropy"
) and"mse"
for classification.k_kwargs (
Dict
) – Parameters for the kernel, possibly including kernel type, distance metric, epsilon and sigma hyperparameter specifications, and specifications for kernel hyperparameters. See kernels for examples and requirements. If all of the hyperparameters are fixed or are not given optimization bounds, no optimization will occur.nn_kwargs (
Dict
) – Parameters for the nearest neighbors wrapper. SeeMuyGPyS.neighbors.NN_Wrapper
for the supported methods and their parameters.return_distances (
bool
) – IfTrue
and any training occurs, returns a(batch_count, nn_count)
matrix containing the crosswise distances between the batch’s elements and their nearest neighbor sets and a(batch_count, nn_count, nn_count)
matrix containing the pairwise distances between the batch’s nearest neighbor sets.verbose (
bool
) – Boolean IfTrue
, print summary statistics.
- Return type
Union
[Tuple
[MuyGPS
,NN_Wrapper
],Tuple
[MuyGPS
,NN_Wrapper
,ndarray
,ndarray
]]- Returns
muygps – A (possibly trained) MuyGPs object.
nbrs_lookup – A data structure supporting nearest neighbor queries into
train_features
.crosswise_dists – A matrix of shape
(batch_count, nn_count)
whose rows list the distance of the corresponding batch element to each of its nearest neighbors. Only returned ifreturn_distances is True
.pairwise_dists – A tensor of shape
(batch_count, nn_count, nn_count,)
whose latter two dimensions contain square matrices containing the pairwise distances between the nearest neighbors of the batch elements. Only returned ifreturn_distances is True
.
- MuyGPyS.examples.classify.make_multivariate_classifier(train_features, train_labels, nn_count=30, batch_count=200, loss_method='mse', kern='matern', k_args=[], nn_kwargs={}, return_distances=False, verbose=False)[source]¶
Convenience function for creating MuyGPyS functor and neighbor lookup data structure.
Expected parameters include keyword argument dicts specifying kernel parameters and nearest neighbor parameters. See the docstrings of the appropriate functions for specifics.
Example
>>> from MuyGPyS.testing.test_utils import _make_gaussian_data >>> from MuyGPyS.examples.regress import make_regressor >>> train = _make_gaussian_dict(10000, 100, 10, categorial=True) >>> nn_kwargs = {"nn_method": "exact", "algorithm": "ball_tree"} >>> k_args = [ ... { ... "length_scale": {"val": 1.0, "bounds": (1e-2, 1e2)} ... "eps": {"val": 1e-5}, ... }, ... { ... "length_scale": {"val": 1.5, "bounds": (1e-2, 1e2)} ... "eps": {"val": 1e-5}, ... }, ... ] >>> mmuygps, nbrs_lookup = make_multivariate_classifier( ... train['input'], ... train['output'], ... nn_count=30, ... batch_count=200, ... loss_method="mse", ... kern="rbf", ... k_args=k_args, ... nn_kwargs=nn_kwargs, ... verbose=False, ... ) >>> # Can alternately return distance tensors for reuse >>> mmuygps, nbrs_lookup = make_multivariate_classifier( ... train['input'], ... train['output'], ... nn_count=30, ... batch_count=200, ... loss_method="mse", ... kern="rbf", ... k_args=k_args, ... nn_kwargs=nn_kwargs, ... return_distances=return_distances, ... verbose=False, ... )
- Parameters
train_features (
ndarray
) – A matrix of shape(train_count, feature_count)
whose rows consist of observation vectors of the train data.train_labels (
ndarray
) – A matrix of shape(train_count, class_count)
whose rows consist of one-hot encoded label vectors of the train data.nn_count (
int
) – The number of nearest neighbors to employ.batch_count (
int
) – The number of elements to sample batch for hyperparameter optimization.loss_method (
str
) – The loss method to use in hyperparameter optimization. Ignored if all of the parameters specified by argumentk_kwargs
are fixed. Currently supports only"mse"
for regression.kern (
str
) – The kernel function to be used. See kernels for details.k_args (
Union
[List
[Dict
],Tuple
[Dict
, …]]) – A list ofresponse_count
dicts containing kernel initialization keyword arguments. Each dict specifies parameters for the kernel, possibly including epsilon and sigma hyperparameter specifications and specifications for specific kernel hyperparameters. If all of the hyperparameters are fixed or are not given optimization bounds, no optimization will occur.nn_kwargs (
Dict
) – Parameters for the nearest neighbors wrapper. SeeMuyGPyS.neighbors.NN_Wrapper
for the supported methods and their parameters.return_distances (
bool
) – IfTrue
and any training occurs, returns a(batch_count, nn_count)
matrix containing the crosswise distances between the batch’s elements and their nearest neighbor sets and a(batch_count, nn_count, nn_count)
matrix containing the pairwise distances between the batch’s nearest neighbor sets.verbose (
bool
) – IfTrue
, print summary statistics.
- Return type
Union
[Tuple
[MultivariateMuyGPS
,NN_Wrapper
],Tuple
[MultivariateMuyGPS
,NN_Wrapper
,ndarray
,ndarray
]]- Returns
muygps – A (possibly trained) MuyGPs object.
nbrs_lookup – A data structure supporting nearest neighbor queries into
train_features
.crosswise_dists – A matrix of shape
(batch_count, nn_count)
whose rows list the distance of the corresponding batch element to each of its nearest neighbors. Only returned ifreturn_distances is True
.pairwise_dists – A tensor of shape
(batch_count, nn_count, nn_count,)
whose latter two dimensions contain square matrices containing the pairwise distances between the nearest neighbors of the batch elements. Only returned ifreturn_distances is True
.