案例中的2个表,operator和house,连接的字段为operator.id与house.operator_idoperator.head()id email create_time update_time0 10004 qd3@163.com 2018-02-23 15:01:46 2018-02-23 15:02:281 10005 alo@vip.qq.com 2018-01-10 20:56:46 2018-01-11 10:42:222 10008 qd3@163.com 2018-02-23 15:01:58 2018-02-23 15:02:383 10015 343148@qq.com 2018-01-11 10:46:22 2018-01-11 16:03:434 10029 123345@qq.com 2018-02-01 16:51:24 2018-02-01 16:51:24house.head()ID operator_id0 32 100041 33 100042 34 100043 35 100044 36 10008
df.merge(),基于字段列名连接
#merge,按列连接,连接方式默认是inneroperator.merge(house,how='inner',left_on='id',right_on='operator_id')id email create_time update_time ID operator_id0 10004 qdcs1234@163.com 2018-02-23 15:01:46 2018-02-23 15:02:28 32 100041 10004 qdcs1233@163.com 2018-02-23 15:01:46 2018-02-23 15:02:28 33 100042 10004 qdcs123a@163.com 2018-02-23 15:01:46 2018-02-23 15:02:28 34 100043 10004 qdcs123@163.com 2018-02-23 15:01:46 2018-02-23 15:02:28 35 100044 10004 qdcs123@163.com 2018-02-23 15:01:46 2018-02-23 15:02:28 39 10004... ... ... ... ... ... ...2953 10265 asl@123.com 2019-05-20 14:47:35 2019-05-21 11:12:19 3079 102652954 10265 asl@123.com 2019-05-20 14:47:35 2019-05-21 11:12:19 3080 102652955 10265 asl@123.com 2019-05-20 14:47:35 2019-05-21 11:12:19 3081 102652956 10269 NaN 2019-05-31 15:15:10 2019-05-31 15:15:10 3032 102692957 10270 NaN 2019-06-03 11:39:52 2019-06-03 11:39:52 3073 102702958 rows × 6 columns
df.join(),基于索引连接
#join,按索引连接,连接方式默认是left,一般不推荐
operator.join(house)
id email create_time update_time ID operator_id
0 10004 qdcsl2013@163.com 2018-02-23 15:01:46 2018-02-23 15:02:28 32 10004
1 10005 a123@vip.qq.com 2018-01-10 20:56:46 2018-01-11 10:42:22 33 10004
2 10008 qdcsl2013@163.com 2018-02-23 15:01:58 2018-02-23 15:02:38 34 10004
3 10015 343123@qq.com 2018-01-11 10:46:22 2018-01-11 16:03:43 35 10004
4 10029 123345@qq.com 2018-02-01 16:51:24 2018-02-01 16:51:24 36 10008
... ... ... ... ... ... ...
229 10266 NaN 2019-05-20 17:41:23 2019-05-28 16:11:14 266 10008
230 10267 NaN 2019-05-20 17:55:52 2019-05-28 16:11:10 267 10008
231 10268 NaN 2019-05-27 14:40:23 2019-05-28 16:11:05 268 10004
232 10269 NaN 2019-05-31 15:15:10 2019-05-31 15:15:10 270 10004
233 10270 NaN 2019-06-03 11:39:52 2019-06-03 11:39:52 271 10004
234 rows × 6 columns
pd.concat(),相当于UNION,当字段名一致时会并入一列
#concat,相当于MYSQL的UNION,方法属于pd级别,默认是上下连接,axis=1则是左右连接
pd.concat([operator,house],axis=0,sort=True)
ID create_time email id operator_id update_time
0 NaN 2018-02-23 15:01:46 qdc@163.com 10004.0 NaN 2018-02-23 15:02:28
1 NaN 2018-01-10 20:56:46 al@vip.qq.com 10005.0 NaN 2018-01-11 10:42:22
2 NaN 2018-02-23 15:01:58 qd13@163.com 10008.0 NaN 2018-02-23 15:02:38
3 NaN 2018-01-11 10:46:22 3448@qq.com 10015.0 NaN 2018-01-11 16:03:43
4 NaN 2018-02-01 16:51:24 123345@qq.com 10029.0 NaN 2018-02-01 16:51:24
... ... ... ... ... ... ...
2965 3077.0 NaN NaN NaN 10265.0 NaN
2966 3078.0 NaN NaN NaN 10265.0 NaN
2967 3079.0 NaN NaN NaN 10265.0 NaN
2968 3080.0 NaN NaN NaN 10265.0 NaN
2969 3081.0 NaN NaN NaN 10265.0 NaN
3204 rows × 6 columns
concat用于上下合并会较为方便,比如1、2、3多个月份的数据,字段相同,则可直接合并为”第一季度”的汇总表
#concat用于上下合并会较为方便,比如1、2、3多个月份的数据,字段相同,则可直接合并为"第一季度"的汇总表
#1月销售数据
df1=pd.DataFrame({
'品类':list('abcde'),
'销售额':[100,200,300,400,500]
})
#2月销售数据
df2=pd.DataFrame({
'品类':list('abcde'),
'销售额':[100,200,300,400,500]
})
#3月销售数据
df3=pd.DataFrame({
'品类':list('qpwor'),
'销售额':[100,200,300,400,500]
})
pd.concat([df1,df2,df3],sort=True)
品类 销售额
0 a 100
1 b 200
2 c 300
3 d 400
4 e 500
0 a 100
1 b 200
2 c 300
3 d 400
4 e 500
0 q 100
1 p 200
2 w 300
3 o 400
4 r 500
