ML之LiR&LassoR:利用boston房价数据集(PCA处理)采用线性回归和Lasso套索回归算法实现房价预测模型评估

输出结果


   Id  MSSubClass MSZoning  ...  SaleType  SaleCondition SalePrice

0   1          60       RL  ...        WD         Normal    208500

1   2          20       RL  ...        WD         Normal    181500

2   3          60       RL  ...        WD         Normal    223500

3   4          70       RL  ...        WD        Abnorml    140000

4   5          60       RL  ...        WD         Normal    250000

[5 rows x 81 columns]

numeric_columns 36 ['LotFrontage', 'LotArea', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea', 'BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF', '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd', 'Fireplaces', 'GarageYrBlt', 'GarageCars', 'GarageArea', 'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'MiscVal', 'MoSold', 'YrSold', 'SalePrice']

(1460, 36)

  LotFrontage  LotArea  OverallQual  ...  MoSold  YrSold  SalePrice

0         65.0     8450            7  ...       2    2008     208500

1         80.0     9600            6  ...       5    2007     181500

2         68.0    11250            7  ...       9    2008     223500

3         60.0     9550            7  ...       2    2006     140000

4         84.0    14260            8  ...      12    2008     250000

依次统计每列缺失值元素个数:

36 [259, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 81, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Missing_data_Per_dict_0: (33, 0.9167, {'LotArea': 0.0, 'OverallQual': 0.0, 'OverallCond': 0.0, 'YearBuilt': 0.0, 'YearRemodAdd': 0.0, 'BsmtFinSF1': 0.0, 'BsmtFinSF2': 0.0, 'BsmtUnfSF': 0.0, 'TotalBsmtSF': 0.0, '1stFlrSF': 0.0, '2ndFlrSF': 0.0, 'LowQualFinSF': 0.0, 'GrLivArea': 0.0, 'BsmtFullBath': 0.0, 'BsmtHalfBath': 0.0, 'FullBath': 0.0, 'HalfBath': 0.0, 'BedroomAbvGr': 0.0, 'KitchenAbvGr': 0.0, 'TotRmsAbvGrd': 0.0, 'Fireplaces': 0.0, 'GarageCars': 0.0, 'GarageArea': 0.0, 'WoodDeckSF': 0.0, 'OpenPorchSF': 0.0, 'EnclosedPorch': 0.0, '3SsnPorch': 0.0, 'ScreenPorch': 0.0, 'PoolArea': 0.0, 'MiscVal': 0.0, 'MoSold': 0.0, 'YrSold': 0.0, 'SalePrice': 0.0})

Missing_data_Per_dict_Not0: (3, 0.0833, {'LotFrontage': 0.177397, 'MasVnrArea': 0.005479, 'GarageYrBlt': 0.055479})

Missing_data_Per_dict_under01: (2, 0.0556, {'MasVnrArea': 0.005479, 'GarageYrBlt': 0.055479})

依次计算每列缺失值元素占比: {'LotFrontage': 0.177397, 'MasVnrArea': 0.005479, 'GarageYrBlt': 0.055479}

data_Missing_dict {'LotFrontage': 0.1773972602739726, 'LotArea': 0.0, 'OverallQual': 0.0, 'OverallCond': 0.0, 'YearBuilt': 0.0, 'YearRemodAdd': 0.0, 'MasVnrArea': 0.005479452054794521, 'BsmtFinSF1': 0.0, 'BsmtFinSF2': 0.0, 'BsmtUnfSF': 0.0, 'TotalBsmtSF': 0.0, '1stFlrSF': 0.0, '2ndFlrSF': 0.0, 'LowQualFinSF': 0.0, 'GrLivArea': 0.0, 'BsmtFullBath': 0.0, 'BsmtHalfBath': 0.0, 'FullBath': 0.0, 'HalfBath': 0.0, 'BedroomAbvGr': 0.0, 'KitchenAbvGr': 0.0, 'TotRmsAbvGrd': 0.0, 'Fireplaces': 0.0, 'GarageYrBlt': 0.05547945205479452, 'GarageCars': 0.0, 'GarageArea': 0.0, 'WoodDeckSF': 0.0, 'OpenPorchSF': 0.0, 'EnclosedPorch': 0.0, '3SsnPorch': 0.0, 'ScreenPorch': 0.0, 'PoolArea': 0.0, 'MiscVal': 0.0, 'MoSold': 0.0, 'YrSold': 0.0, 'SalePrice': 0.0}

after dropna (1121, 36)

<class 'numpy.ndarray'>

     LotFrontage   LotArea  OverallQual  ...    MiscVal    MoSold    YrSold

0       -0.233570 -0.205885     0.570704  ...  -0.141407 -1.615345  0.153084

1        0.384834 -0.064358    -0.153825  ...  -0.141407 -0.498715 -0.596291

2       -0.109889  0.138702     0.570704  ...  -0.141407  0.990125  0.153084

3       -0.439705 -0.070512     0.570704  ...  -0.141407 -1.615345 -1.345665

4        0.549742  0.509132     1.295234  ...  -0.141407  2.106755  0.153084

...           ...       ...          ...  ...        ...       ...       ...

1116    -0.357251 -0.271480    -0.153825  ...  -0.141407  0.617915 -0.596291

1117     0.590968  0.375605    -0.153825  ...  -0.141407 -1.615345  1.651832

1118    -0.192343 -0.133030     0.570704  ...  14.947388 -0.498715  1.651832

1119    -0.109889 -0.049960    -0.878355  ...  -0.141407 -0.870925  1.651832

1120     0.178699 -0.022885    -0.878355  ...  -0.141407 -0.126505  0.153084

[1121 rows x 35 columns]

前10个主成分解释了数据中63.80%的变化

经过PCA后,进行第一层主成分分析-------------------------------------

[(0.16970682313415306, 'LotFrontage'), (0.1211669980146095, 'LotArea'), (0.3008665261375608, 'OverallQual'), (-0.1017783758120348, 'OverallCond'), (0.23754113423286216, 'YearBuilt'), (0.21067267847804322, 'YearRemodAdd'), (0.19125461510335365, 'MasVnrArea'), (0.14136511574315347, 'BsmtFinSF1'), (-0.013552848692716916, 'BsmtFinSF2'), (0.11439764110410199, 'BsmtUnfSF'), (0.259354275741638, 'TotalBsmtSF'), (0.2591780447881022, '1stFlrSF'), (0.11504305093601253, '2ndFlrSF'), (0.004231304806602964, 'LowQualFinSF'), (0.2877802164879641, 'GrLivArea'), (0.08317879411803167, 'BsmtFullBath'), (-0.02114280846249704, 'BsmtHalfBath'), (0.25499633884283257, 'FullBath'), (0.11080279874459822, 'HalfBath'), (0.1017767099777179, 'BedroomAbvGr'), (-0.01012145139988125, 'KitchenAbvGr'), (0.23572236584667458, 'TotRmsAbvGrd'), (0.17611466785004926, 'Fireplaces'), (0.23726651555979883, 'GarageYrBlt'), (0.2831568046802727, 'GarageCars'), (0.279827792756442, 'GarageArea'), (0.13036585867815073, 'WoodDeckSF'), (0.16664693092097654, 'OpenPorchSF'), (-0.08602539908222213, 'EnclosedPorch'), (0.010532579475601184, '3SsnPorch'), (0.02556170369869493, 'ScreenPorch'), (0.06246570190310543, 'PoolArea'), (-0.015493399959318557, 'MiscVal'), (0.028399126033275164, 'MoSold'), (-0.011129722622237775, 'YrSold')]

[(0.3008665261375608, 'OverallQual'), (0.2877802164879641, 'GrLivArea'), (0.2831568046802727, 'GarageCars'), (0.279827792756442, 'GarageArea'), (0.259354275741638, 'TotalBsmtSF'), (0.2591780447881022, '1stFlrSF'), (0.25499633884283257, 'FullBath'), (0.23754113423286216, 'YearBuilt'), (0.23726651555979883, 'GarageYrBlt'), (0.23572236584667458, 'TotRmsAbvGrd'), (0.21067267847804322, 'YearRemodAdd'), (0.19125461510335365, 'MasVnrArea'), (0.17611466785004926, 'Fireplaces'), (0.16970682313415306, 'LotFrontage'), (0.16664693092097654, 'OpenPorchSF'), (0.14136511574315347, 'BsmtFinSF1'), (0.13036585867815073, 'WoodDeckSF'), (0.1211669980146095, 'LotArea'), (0.11504305093601253, '2ndFlrSF'), (0.11439764110410199, 'BsmtUnfSF'), (0.11080279874459822, 'HalfBath'), (0.1017767099777179, 'BedroomAbvGr'), (0.08317879411803167, 'BsmtFullBath'), (0.06246570190310543, 'PoolArea'), (0.028399126033275164, 'MoSold'), (0.02556170369869493, 'ScreenPorch'), (0.010532579475601184, '3SsnPorch'), (0.004231304806602964, 'LowQualFinSF'), (-0.01012145139988125, 'KitchenAbvGr'), (-0.011129722622237775, 'YrSold'), (-0.013552848692716916, 'BsmtFinSF2'), (-0.015493399959318557, 'MiscVal'), (-0.02114280846249704, 'BsmtHalfBath'), (-0.08602539908222213, 'EnclosedPorch'), (-0.1017783758120348, 'OverallCond')]

经过PCA后,进行第二层主成分分析-------------------------------------

[(0.037140668512444255, 'LotFrontage'), (0.005762269875424171, 'LotArea'), (-0.02265545744738413, 'OverallQual'), (0.06797580738610676, 'OverallCond'), (-0.22034458100877843, 'YearBuilt'), (-0.11769773674122082, 'YearRemodAdd'), (-0.02330741979867707, 'MasVnrArea'), (-0.26830830083400875, 'BsmtFinSF1'), (-0.06776753790369254, 'BsmtFinSF2'), (0.10349973537774373, 'BsmtUnfSF'), (-0.2014230745261159, 'TotalBsmtSF'), (-0.14501101153644946, '1stFlrSF'), (0.43960496790131565, '2ndFlrSF'), (0.11932040000909688, 'LowQualFinSF'), (0.2706724094458561, 'GrLivArea'), (-0.2741406761479087, 'BsmtFullBath'), (-0.001880261013674545, 'BsmtHalfBath'), (0.12608264523927462, 'FullBath'), (0.23358978781221817, 'HalfBath'), (0.3864399252645517, 'BedroomAbvGr'), (0.12179545892853964, 'KitchenAbvGr'), (0.3371810668951179, 'TotRmsAbvGrd'), (0.06581774146310777, 'Fireplaces'), (-0.1834261688794573, 'GarageYrBlt'), (-0.04640661259007604, 'GarageCars'), (-0.08613653500685643, 'GarageArea'), (-0.047991361825782064, 'WoodDeckSF'), (0.03130768246434415, 'OpenPorchSF'), (0.13376424222015906, 'EnclosedPorch'), (-0.02564456693744644, '3SsnPorch'), (0.04211790221668751, 'ScreenPorch'), (0.03032238859229474, 'PoolArea'), (0.04968459727862472, 'MiscVal'), (0.02754218343139985, 'MoSold'), (-0.04555808126996797, 'YrSold')]

[(0.43960496790131565, '2ndFlrSF'), (0.3864399252645517, 'BedroomAbvGr'), (0.3371810668951179, 'TotRmsAbvGrd'), (0.2706724094458561, 'GrLivArea'), (0.23358978781221817, 'HalfBath'), (0.13376424222015906, 'EnclosedPorch'), (0.12608264523927462, 'FullBath'), (0.12179545892853964, 'KitchenAbvGr'), (0.11932040000909688, 'LowQualFinSF'), (0.10349973537774373, 'BsmtUnfSF'), (0.06797580738610676, 'OverallCond'), (0.06581774146310777, 'Fireplaces'), (0.04968459727862472, 'MiscVal'), (0.04211790221668751, 'ScreenPorch'), (0.037140668512444255, 'LotFrontage'), (0.03130768246434415, 'OpenPorchSF'), (0.03032238859229474, 'PoolArea'), (0.02754218343139985, 'MoSold'), (0.005762269875424171, 'LotArea'), (-0.001880261013674545, 'BsmtHalfBath'), (-0.02265545744738413, 'OverallQual'), (-0.02330741979867707, 'MasVnrArea'), (-0.02564456693744644, '3SsnPorch'), (-0.04555808126996797, 'YrSold'), (-0.04640661259007604, 'GarageCars'), (-0.047991361825782064, 'WoodDeckSF'), (-0.06776753790369254, 'BsmtFinSF2'), (-0.08613653500685643, 'GarageArea'), (-0.11769773674122082, 'YearRemodAdd'), (-0.14501101153644946, '1stFlrSF'), (-0.1834261688794573, 'GarageYrBlt'), (-0.2014230745261159, 'TotalBsmtSF'), (-0.22034458100877843, 'YearBuilt'), (-0.26830830083400875, 'BsmtFinSF1'), (-0.2741406761479087, 'BsmtFullBath')]

不进行PCA的线性回归的MSE是1644140595.6636596

前10个PCA主成分进行线性回归的MSE是1836601962.4751632

[1e-10, 1e-09, 1e-08, 1e-07, 1e-06, 1e-05, 0.0001, 0.001, 0.01, 0.1]

[1642818822.3530025, 1642818822.3529558, 1642818822.3524888, 1642818822.3471866, 1642818822.3005185, 1642818821.7415214, 1642818817.1179569, 1642818756.7038794, 1642818283.0732899, 1642813588.5752773]

[1e-10, 1e-09, 1e-08, 1e-07, 1e-06, 1e-05, 0.0001, 0.001, 0.01, 0.1]

[1836601962.4751682, 1836601962.4752123, 1836601962.475657, 1836601962.480097, 1836601962.5245085, 1836601962.9652405, 1836601967.4063494, 1836602011.8174434, 1836602455.9288514, 1836606882.1034737]


ML之LiR&LassoR:利用boston房价数据集(PCA处理)采用线性回归和Lasso套索回归算法实现房价预测模型评估

ML之LiR&LassoR:利用boston房价数据集(PCA处理)采用线性回归和Lasso套索回归算法实现房价预测模型评估

ML之LiR&LassoR:利用boston房价数据集(PCA处理)采用线性回归和Lasso套索回归算法实现房价预测模型评估

ML之LiR&LassoR:利用boston房价数据集(PCA处理)采用线性回归和Lasso套索回归算法实现房价预测模型评估


核心代码

PCA

class TruncatedSVD Found at: sklearn.decomposition._truncated_svd

class TruncatedSVD(TransformerMixin, BaseEstimator):

   """Dimensionality reduction using truncated SVD (aka LSA).

   

   This transformer performs linear dimensionality reduction by means of

   truncated singular value decomposition (SVD). Contrary to PCA, this

   estimator does not center the data before computing the singular value

   decomposition. This means it can work with sparse matrices

   efficiently.

   

   In particular, truncated SVD works on term count/tf-idf matrices as

   returned by the vectorizers in :mod:`sklearn.feature_extraction.text`. In

   that context, it is known as latent semantic analysis (LSA).

   

   This estimator supports two algorithms: a fast randomized SVD solver,

    and

   a "naive" algorithm that uses ARPACK as an eigensolver on `X * X.T` or

   `X.T * X`, whichever is more efficient.

   

LinearRegression

class LinearRegression Found at: sklearn.linear_model._base

class LinearRegression(MultiOutputMixin, RegressorMixin, LinearModel):

   """

   Ordinary least squares Linear Regression.

   

   LinearRegression fits a linear model with coefficients w = (w1, ..., wp)

   to minimize the residual sum of squares between the observed targets in

   the dataset, and the targets predicted by the linear approximation.

   

Lasso

class Lasso Found at: sklearn.linear_model._coordinate_descent

class Lasso(ElasticNet):

   """Linear Model trained with L1 prior as regularizer (aka the Lasso)

   

   The optimization objective for Lasso is::

   

   (1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1

   

   Technically the Lasso model is optimizing the same objective function as

   the Elastic Net with ``l1_ratio=1.0`` (no L2 penalty).

   

   Read more in the :ref:`User Guide <lasso>`.


上一篇:Photoshop合成制作正在空中飞翔的战斗机海报


下一篇:painter绘制动漫仙境场景