1.pandas取出某一列的数值是缺失值的行并打印

在处理表格的时候有一个需求,取出指定列为null的名单,然后对其进行点对点通知,上代码

  1. def query():
  2. import pandas as pd
  3. df = pd.read_excel("manifest.xlsx")
  4. select = df[df['备注'].isnull()]
  5. data = select["考生姓名"]
  6. if data is not None:
  7. for i in data.keys():
  8. print(data[i], end="、")
  1. # 然后对姓名进行输出
  2. for i in data.keys():
  3. print (data[i],end="、")

去掉某一列有缺失值的行

  1. newdf = df[df["colname"].isnull().values==False]

读取列数不固定的数据集

  1. # 指定names,具体的就是
  2. nams = [1,2,3.....,n]
  3. df = pd.read_csv(file,names=names)

将数据集中的特定数据换成另一个值的方法

  1. df = pd.read_table("a.txt", header=None, sep = " ")
  2. # 无列名,故用df[1]获取第二列
  3. labelss = [("Film & Animation", "Animation"), ("News & Politics", "Politics"), ("Autos & Vehicles", "Vehicles"),
  4. ("People & Blogs", "Blogs"), (" UNA ", "UNA"), ("Gadgets & Games", "Games"), ("Howto & DIY", "DIY"),
  5. ("Travel & Places","Travel"), ("Pets & Animals", "Animals")]
  6. for label, label2 in labelss:
  7. df.loc[df[1] == label] = label2

时间序列预测

时间序列数据读取

  1. data = pd.read_csv("household_power_consumption.txt",
  2. sep=";",
  3. parse_dates={'dt' : ['Date','Time']},
  4. infer_datetime_format=True,
  5. na_values=['?'],
  6. index_col='dt')

数据重采样df.resample

  1. df.resample('D').mean()

如何处理非数值特征

参考连接
哑数据 Get_dummies哑变量处理

拆分一列数据变多列数据

  1. df_new = pd.DataFrame(df['天气状况'].str.split('/').tolist(),
  2. columns = ['白天天气','晚上天气'])
  1. df_merge = pd.merge(df, df_new, how='left', left_index = True, right_index = True)