import pandas as pdposition = pd.read_csv('position.csv')company = pd.read_csv('company_sql.csv',encoding='gbk')position.groupby(by=['city','education']).mean()如何把上海博士平均薪资切片出来?1、(1)使用索引- position.groupby(by=['city','education']).mean().avg,变成series形式- 在series形式里面可以直接输入索引的标签,去进行一个切片,比如后边加上['上海']['博士']position.groupby(by=['city','education']).mean().avgposition.groupby(by=['city','education']).mean().avg['上海']['博士'](2)使用loc:loc支持对多个索引去进行引用- 有一个数据框,是多重索引的形式,想知道某个具体的索引下面的值,用loc最方便,如果想要一层层深入的话,可以再loc里输入多个值- 用series切片再进行查找,也挺方便。position.groupby(by=['city','education']).mean().loc['上海']['avg'] # 可以position.groupby(by=['city','education']).mean().loc['上海']['博士'] # 报错position.groupby(by=['city','education']).mean().loc['上海','博士'] # 报错的修改之后2、不借助groupby,如何进行多重索引呢?- set_index,把列变成索引- reset_index,把索引变成字段- 然后再进行切片就会比较方便position.set_index(['city','education']) # 会进行加工,但是没有排好序# 先排序再加工position.sort_values(['city','education']).set_index(['city','education'])# reset_index()把索引变成字段,再用query过滤或者切片筛选符合条件的值position.groupby(by=['city','education']).mean().reset_index()(1)query查找上海博士的平均薪资:(以下三个等价)- position.groupby(by=['city','education']).mean().reset_index().query("(city=='上海') and (education=='博士')")- position.groupby(by=['city','education']).mean().reset_index().query("city=='上海'").query("education=='博士'")- position.groupby(by=['city','education']).mean().reset_index().query("(city=='上海') & (education=='博士')")position.groupby(by=['city','education']).mean().reset_index().query('city=="上海"and education=="博士"')(2)切片查找上海博士的平均薪资:a=position.groupby(by=['city','education']).mean().reset_index()a[(a.city=='上海')&(a.education=='博士')]a[(a.city=='上海')&(a.education=='博士')].avg