Stree

class stree.Stree(C: float = 1.0, kernel: str = 'linear', max_iter: int = 100000.0, random_state: Optional[int] = None, max_depth: Optional[int] = None, tol: float = 0.0001, degree: int = 3, gamma='scale', split_criteria: str = 'impurity', criterion: str = 'entropy', min_samples_split: int = 0, max_features=None, splitter: str = 'random', multiclass_strategy: str = 'ovo', normalize: bool = False)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.ClassifierMixin

Estimator that is based on binary trees of svm nodes can deal with sample_weights in predict, used in boosting sklearn methods inheriting from BaseEstimator implements get_params and set_params methods inheriting from ClassifierMixin implement the attribute _estimator_type with “classifier” as value

Cfloat, optional

Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive., by default 1.0

kernelstr, optional

Specifies the kernel type to be used in the algorithm. It must be one of ‘liblinear’, ‘linear’, ‘poly’ or ‘rbf’. liblinear uses [liblinear](https://www.csie.ntu.edu.tw/~cjlin/liblinear/) library and the rest uses [libsvm](https://www.csie.ntu.edu.tw/~cjlin/libsvm/) library through scikit-learn library, by default “linear”

max_iterint, optional

Hard limit on iterations within solver, or -1 for no limit., by default 1e5

random_stateint, optional

Controls the pseudo random number generation for shuffling the data for probability estimates. Ignored when probability is False.Pass an int for reproducible output across multiple function calls, by default None

max_depthint, optional

Specifies the maximum depth of the tree, by default None

tolfloat, optional

Tolerance for stopping, by default 1e-4

degreeint, optional

Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels., by default 3

gammastr, optional

Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.if gamma=’scale’ (default) is passed then it uses 1 / (n_features * X.var()) as value of gamma,if ‘auto’, uses 1 / n_features., by default “scale”

split_criteriastr, optional

Decides (just in case of a multi class classification) which column (class) use to split the dataset in a node. max_samples is incompatible with ‘ovo’ multiclass_strategy, by default “impurity”

criterionstr, optional

The function to measure the quality of a split (only used if max_features != num_features). Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain., by default “entropy”

min_samples_splitint, optional

The minimum number of samples required to split an internal node. 0 (default) for any, by default 0

max_featuresoptional

The number of features to consider when looking for the split: If int, then consider max_features features at each split. If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split. If “auto”, then max_features= sqrt(n_features). If “sqrt”, then max_features=sqrt(n_features). If “log2”, then max_features=log2(n_features). If None, then max_features= n_features., by default None

splitterstr, optional

The strategy used to choose the feature set at each node (only used if max_features < num_features). Supported strategies are: “best”: sklearn SelectKBest algorithm is used in every node to choose the max_features best features. “random”: The algorithm generates 5 candidates and choose the best (max. info. gain) of them. “trandom”: The algorithm generates only one random combination. “mutual”: Chooses the best features w.r.t. their mutual info with the label. “cfs”: Apply Correlation-based Feature Selection. “fcbf”: Apply Fast Correlation- Based , by default “random”

multiclass_strategystr, optional

Strategy to use with multiclass datasets, “ovo”: one versus one. “ovr”: one versus rest, by default “ovo”

normalizebool, optional

If standardization of features should be applied on each node with the samples that reach it , by default False

classes_ndarray of shape (n_classes,)

The classes labels.

n_classes_int

The number of classes

n_iter_int

Max number of iterations in classifier

depth_int

Max depht of the tree

n_features_int

The number of features when fit is performed.

n_features_in_int

Number of features seen during fit.

max_features_int

Number of features to use in hyperplane computation

tree_Node

root of the tree

X_ndarray

points to the input dataset

y_ndarray

points to the input labels

R. Montañana, J. A. Gámez, J. M. Puerta, “STree: a single multi-class oblique decision tree based on support vector machines.”, 2021 LNAI 12882

_build_clf()[source]

Build the right classifier for the node

_initialize_max_features() int[source]
_more_tags() dict[source]

Required by sklearn to supply features of the classifier make mandatory the labels array

Returns

the tag required

Return type

dict

static _reorder_results(y: numpy.array, indices: numpy.array) numpy.array[source]

Reorder an array based on the array of indices passed

ynp.array

data untidy

indicesnp.array

indices used to set order

np.array

array y ordered

_train(X: numpy.ndarray, y: numpy.ndarray, sample_weight: numpy.ndarray, depth: int, title: str) Optional[stree.Splitter.Snode][source]

Recursive function to split the original dataset into predictor nodes (leaves)

Xnp.ndarray

samples dataset

ynp.ndarray

samples labels

sample_weightnp.ndarray

weight of samples. Rescale C per sample.

depthint

actual depth in the tree

titlestr

description of the node

Optional[Snode]

binary tree

fit(X: numpy.ndarray, y: numpy.ndarray, sample_weight: Optional[numpy.array] = None) stree.Strees.Stree[source]

Build the tree based on the dataset of samples and its labels

Stree

itself to be able to chain actions: fit().predict() …

ValueError

if C < 0

ValueError

if max_depth < 1

ValueError

if all samples have 0 or negative weights

nodes_leaves() tuple[source]

Compute the number of nodes and leaves in the built tree

[tuple]

tuple with the number of nodes and the number of leaves

predict(X: numpy.array) numpy.array[source]

Predict labels for each sample in dataset passed

Xnp.array

dataset of samples

np.array

array of labels

ValueError

if dataset with inconsistent number of features

NotFittedError

if model is not fitted