Numpy/Pandas均值处理数据缺失值

系统管理员 2023-06-27 08:52 45阅读 0赞
  1. # -*- coding: utf-8 -*-
  2. #-----------------------------------------------------------------------------------------------------------------------
  3. __Author__ = 'assasin'
  4. __DateTime__ = '2020/1/5 15:13'
  5. #-----------------------------------------------------------------------------------------------------------------------
  6. '''
  7. 处理数据缺失值
  8. Numpy均值处理数据缺失值
  9. Pandas加载数据,重构缺失数据矩阵
  10. Pandas 实现均质填充
  11. Pandas 处理缺失值: 标量法,丢失法,忽略法,前后法
  12. '''
  13. import numpy as np
  14. import pandas as pd
  15. from numpy import *
  16. def loadDataSet(filepath,delim='\t'):
  17. fr = open(filepath)
  18. stringArr = [line.strip().split(delim) for line in fr.readlines()]
  19. #print(stringArr)
  20. dataArr = [list(map(float,line)) for line in stringArr]
  21. return mat(dataArr)
  22. def replaceNanwithMean(dataArr):
  23. numfeat = shape(dataArr)
  24. for i in range(numfeat[1]-1):
  25. meanVal = mean(dataArr[nonzero((~isnan(dataArr[:,i].A))[0],i)])
  26. dataArr[nonzero(isnan(dataArr[:,i].A))[0],i] = meanVal
  27. return dataArr
  28. if __name__ == '__main__':
  29. # 加载数据集
  30. dataArr = loadDataSet(r'../xxx.txt',' ')
  31. # 均值填充缺失值
  32. replaceNanwithMean(dataArr)
  33. datamat = loadDataSet(r'../xxx.txt',' ')
  34. df = pd.DataFrame(datamat)
  35. # 重构矩阵
  36. df = df.reindex(range(datamat.shape[0] + 5 ))
  37. # NAN 视为0
  38. loassVs = [df[col].mean() for col in range(datamat.shape[1])]
  39. lists = [list(df[i].fillna(loassVs[i])) for i in range(len(loassVs))]
  40. print(mat(lists).T)

发表评论

表情:
评论列表 (有 0 条评论,45人围观)

还没有评论,来说两句吧...

相关阅读