ML之sklearn：sklearn的make_pipeline函数、RobustScaler函数、KFold函数、cross_val_score函数的代码解释、使用方法之详细攻略（一）

2023-11-07 13:46:34

sklearn的make_pipeline函数的代码解释、使用方法

为了简化构建变换和模型链的过程，Scikit-Learn提供了pipeline类，可以将多个处理步骤合并为单个Scikit-Learn估计器。pipeline类本身具有fit、predict和score方法，其行为与Scikit-Learn中的其他模型相同。

sklearn的make_pipeline函数的代码解释

def make_pipeline(*steps, **kwargs):

"""Construct a Pipeline from the given estimators.

This is a shorthand for the Pipeline constructor; it does not require, and does not permit, naming the estimators. Instead, their names will be set to the lowercase of their types automatically.

Parameters

----------

*steps : list of estimators,

memory : None, str or object with the joblib.Memory interface, optional

Used to cache the fitted transformers of the pipeline. By default, no caching is performed. If a string is given, it is the path to the caching directory. Enabling caching triggers a clone of the transformers before fitting. Therefore, the transformer instance given to the pipeline cannot be inspected directly. Use the attribute ``named_steps`` or ``steps`` to inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time consuming.

根据给定的估算器构造一条管道。

这是管道构造函数的简写;它不需要，也不允许命名估算器。相反，它们的名称将自动设置为类型的小写。

参数

----------

*steps :评估表、

memory:无，str或带有joblib的对象。内存接口,可选

用于缓存安装在管道中的变压器。默认情况下，不执行缓存。如果给定一个字符串，它就是到缓存目录的路径。启用缓存会在安装前触发变压器的克隆。因此，给管线的变压器实例不能直接检查。使用属性' ' named_steps ' ' '或' ' steps ' '检查管道中的评估器。当装配耗时时，缓存变压器是有利的。

Examples

--------

>>> from sklearn.naive_bayes import GaussianNB

>>> from sklearn.preprocessing import StandardScaler

>>> make_pipeline(StandardScaler(), GaussianNB(priors=None))

... # doctest: +NORMALIZE_WHITESPACE

Pipeline(memory=None,

steps=[('standardscaler',

StandardScaler(copy=True, with_mean=True, with_std=True)),

('gaussiannb', GaussianNB(priors=None))])

Returns

-------

p : Pipeline

"""

memory = kwargs.pop('memory', None)

if kwargs:

raise TypeError('Unknown keyword arguments: "{}"'

.format(list(kwargs.keys())[0]))

return Pipeline(_name_estimators(steps), memory=memory)

sklearn的make_pipeline函数的使用方法

Examples

--------

>>> from sklearn.naive_bayes import GaussianNB

>>> from sklearn.preprocessing import StandardScaler

>>> make_pipeline(StandardScaler(), GaussianNB(priors=None))

... # doctest: +NORMALIZE_WHITESPACE

Pipeline(memory=None,

steps=[('standardscaler',

StandardScaler(copy=True, with_mean=True, with_std=True)),

('gaussiannb', GaussianNB(priors=None))])

Returns

-------

p : Pipeline

1、使用Pipeline类来表示在使用MinMaxScaler缩放数据之后再训练一个SVM的工作流程

from sklearn.pipeline import Pipeline

pipe = Pipeline([("scaler",MinMaxScaler()),("svm",SVC())])

pip.fit(X_train,y_train)

pip.score(X_test,y_test)

2、make_pipeline函数创建管道

用Pipeline类构建管道时语法有点麻烦，我们通常不需要为每一个步骤提供用户指定的名称，这种情况下，就可以用make_pipeline函数创建管道，它可以为我们创建管道并根据每个步骤所属的类为其自动命名。

from sklearn.pipeline import make_pipeline

pipe = make_pipeline(MinMaxScaler(),SVC())

参考文章

《Python机器学习基础教程》构建管道(make_pipeline)

Python sklearn.pipeline.make_pipeline() Examples

sklearn的RobustScaler函数的代码解释、使用方法

RobustScaler函数的代码解释

class RobustScaler(BaseEstimator, TransformerMixin):

"""Scale features using statistics that are robust to outliers.

This Scaler removes the median and scales the data according to the quantile range (defaults to IQR: Interquartile Range).

The IQR is the range between the 1st quartile (25th quantile) and the 3rd quartile (75th quantile). Centering and scaling happen independently on each feature (or each sample, depending on the ``axis`` argument) by computing the relevant statistics on the samples in the training set. Median and interquartile range are then stored to be used on later data using the ``transform`` method.

Standardization of a dataset is a common requirement for many machine learning estimators. Typically this is done by removing the mean and scaling to unit variance. However, outliers can often influence the sample mean / variance in a negative way. In such cases, the median and the interquartile range often give better results.

.. versionadded:: 0.17

Read more in the :ref:`User Guide <preprocessing_scaler>`.

Parameters

----------

with_centering : boolean, True by default

If True, center the data before scaling. This will cause ``transform`` to raise an exception when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in memory.

with_scaling : boolean, True by default

If True, scale the data to interquartile range.

quantile_range : tuple (q_min, q_max), 0.0 < q_min < q_max < 100.0

Default: (25.0, 75.0) = (1st quantile, 3rd quantile) = IQR

Quantile range used to calculate ``scale_``.

.. versionadded:: 0.18

copy : boolean, optional, default is True

If False, try to avoid a copy and do inplace scaling instead. This is not guaranteed to always work inplace; e.g. if the data is not a NumPy array or scipy.sparse CSR matrix, a copy may still be

returned.

Attributes

----------

center_ : array of floats

The median value for each feature in the training set.

scale_ : array of floats

The (scaled) interquartile range for each feature in the training set.

.. versionadded:: 0.17

*scale_* attribute.

码农公寓

sklearn的make_pipeline函数的代码解释、使用方法

sklearn的RobustScaler函数的代码解释、使用方法

相关文章