1.pandas取出某一列的数值是缺失值的行并打印
在处理表格的时候有一个需求,取出指定列为null的名单,然后对其进行点对点通知,上代码
def query():
import pandas as pd
df = pd.read_excel("manifest.xlsx")
select = df[df['备注'].isnull()]
data = select["考生姓名"]
if data is not None:
for i in data.keys():
print(data[i], end="、")
# 然后对姓名进行输出
for i in data.keys():
print (data[i],end="、")
去掉某一列有缺失值的行
newdf = df[df["colname"].isnull().values==False]
读取列数不固定的数据集
# 指定names,具体的就是
nams = [1,2,3.....,n]
df = pd.read_csv(file,names=names)
将数据集中的特定数据换成另一个值的方法
df = pd.read_table("a.txt", header=None, sep = " ")
# 无列名,故用df[1]获取第二列
labelss = [("Film & Animation", "Animation"), ("News & Politics", "Politics"), ("Autos & Vehicles", "Vehicles"),
("People & Blogs", "Blogs"), (" UNA ", "UNA"), ("Gadgets & Games", "Games"), ("Howto & DIY", "DIY"),
("Travel & Places","Travel"), ("Pets & Animals", "Animals")]
for label, label2 in labelss:
df.loc[df[1] == label] = label2
时间序列预测
时间序列数据读取
data = pd.read_csv("household_power_consumption.txt",
sep=";",
parse_dates={'dt' : ['Date','Time']},
infer_datetime_format=True,
na_values=['?'],
index_col='dt')
数据重采样df.resample
df.resample('D').mean()
如何处理非数值特征
拆分一列数据变多列数据
df_new = pd.DataFrame(df['天气状况'].str.split('/').tolist(),
columns = ['白天天气','晚上天气'])
df_merge = pd.merge(df, df_new, how='left', left_index = True, right_index = True)