Bagging编程实现（SVM，KNN）

Bagging 编程实现（SVM ，KNN ）

Bagging 介绍

基本思想就是给定⼀个学习算法和⼀个训练集（N个样本）让该学习短发训练多次，每次的训练集由初始的训练集中随机的选取n个样本组成（放回抽样），初始训练样本在某次训练集中可以重复出现或者根本不出现，训练之后得到⼀个预测函数序列。对于分类问题我们可以采

取voting的⽅法判断未知样本的label,⽽对于回归问题，我们可以使⽤平均值代表。

最终的结果都是准确率得到⼀定的提升。

代码实现

import numpy as np

关键词摘要from sklearn .model_selection import train_test_split

from sklearn import svm

import pandas as pd

import matplotlib .pyplot as plt

from sklearn .datasets import make_hastie_10_2

from sklearn .neighbors import KNeighborsClassifier as KNN

class bagging ():

蓝与白def __init__(self ,basic_clf ):

self .basic_clf = basic_clf

self _clf = []

def train (self ,iters , train_x , train_y ):

self _clf = []

for i in range (iters ):

length = int (0.8*len (train_x ))

index = np .random .randint (0, len (train_x ), length )

clf = self .basic_clf

clf .fit (train_x [index ], train_y [index ])

self _clf .append (clf )

def predict (self , test_x , index =0):

if index == 0:

武汉空气质量指数

com_pred = [clf .predict (test_x ) for clf in self _clf ]

pred_y = []

for i in range (len (test_x )):

pred = [com_pred [j ][i ] for j in range (len (self _clf ))]

pred = self .vote (pred )

pred_y .append (pred )

return np .array (pred_y )

else :

pred_y = self _clf [index -1].predict (test_x )

return pred_y

def vote (self , y ):

unique = np .unique (y )

unique = np.unique(y)

num =-np.inf

label =0

for i in unique:

cnt = y.count(i)

if cnt > num:

num = cnt

label = i

return label

if'__main__'=='__main__':

M =7

sonar_datas = pd.read_csv('D:\microsoft\sonar.csv', header=None)

sonar_datas[61]=0

sonar_datas.loc[np.where(sonar_datas[60]=='M')[0],61]=1

del sonar_datas[60]

sonar_datas = np.array(sonar_datas)

x = sonar_datas[:,:60]

y = sonar_datas[:,60]

print(y.dtype)

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

svm_clf = svm.SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,

decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',

drifts

max_iter=-1, probability=False, random_state=None, shrinking=True,

tol=0.001, verbose=False)

knn_claf = KNN()

bagging_clf = bagging(knn_clf)#choose the clf

ain(M, x, y)

直线加速器for i in range(M):

pred_y = bagging_clf.predict(X_test, i+1)

error = np.sum([1if pred_y[i]!= y_test[i]else0for i in range(len(y_test))])/len(y_test)

print('第{}次训练的模型的错误率为：{:.4f}'.format(i+1, error))

pred_y = bagging_clf.predict(X_test,0)

error = np.sum([1if pred_y[i]!= y_test[i]else0for i in range(len(y_test))])/len(y_test)

print('经过{}次组合之后产⽣的分类器，模型的错误率为：{:.4f}'.format(M,error))

模型结果以及分析

这是Knn得到的分类结果，但是它的错误率是不变化的，感觉有⼀点问题，但是也说不上来。如果有会的uu帮忙看⼀下。

bagging的⾃我理解

按照西⽠书中所写的内容，bagging的重点就是使得样本或者分类器具有Variety，这样的特性能够保证分类器可以关注数据尽可能多特征，帮助我们提⾼任务的准确度。

除了随机抽取训练集的⽅法，我们还可以改变分类器算法，使得每⼀次训练的分类器算法都不同，这样也算作Bagging，帮助我们提升任务的精度。

富通集团胡国强

本文发布于:2024-09-22 11:21:57，感谢您对本站的认可！

本文链接：https://www.17tex.com/xueshu/326251.html

上一篇：KNN的优化算法3：Ball-tree

下一篇：ZnO和CuO烧结助剂对KNN压电陶瓷性能的影响

标签：训练分类器样本得到使得任务

留言与评论（共有 0 条评论）