案例中的2个表,operator和house,连接的字段为operator.id与house.operator_id
operator.head()
id email create_time update_time
0 10004 qd3@163.com 2018-02-23 15:01:46 2018-02-23 15:02:28
1 10005 alo@vip.qq.com 2018-01-10 20:56:46 2018-01-11 10:42:22
2 10008 qd3@163.com 2018-02-23 15:01:58 2018-02-23 15:02:38
3 10015 343148@qq.com 2018-01-11 10:46:22 2018-01-11 16:03:43
4 10029 123345@qq.com 2018-02-01 16:51:24 2018-02-01 16:51:24
house.head()
ID operator_id
0 32 10004
1 33 10004
2 34 10004
3 35 10004
4 36 10008
df.merge(),基于字段列名连接
#merge,按列连接,连接方式默认是inner
operator.merge(house,how='inner',left_on='id',right_on='operator_id')
id email create_time update_time ID operator_id
0 10004 qdcs1234@163.com 2018-02-23 15:01:46 2018-02-23 15:02:28 32 10004
1 10004 qdcs1233@163.com 2018-02-23 15:01:46 2018-02-23 15:02:28 33 10004
2 10004 qdcs123a@163.com 2018-02-23 15:01:46 2018-02-23 15:02:28 34 10004
3 10004 qdcs123@163.com 2018-02-23 15:01:46 2018-02-23 15:02:28 35 10004
4 10004 qdcs123@163.com 2018-02-23 15:01:46 2018-02-23 15:02:28 39 10004
... ... ... ... ... ... ...
2953 10265 asl@123.com 2019-05-20 14:47:35 2019-05-21 11:12:19 3079 10265
2954 10265 asl@123.com 2019-05-20 14:47:35 2019-05-21 11:12:19 3080 10265
2955 10265 asl@123.com 2019-05-20 14:47:35 2019-05-21 11:12:19 3081 10265
2956 10269 NaN 2019-05-31 15:15:10 2019-05-31 15:15:10 3032 10269
2957 10270 NaN 2019-06-03 11:39:52 2019-06-03 11:39:52 3073 10270
2958 rows × 6 columns
df.join(),基于索引连接
#join,按索引连接,连接方式默认是left,一般不推荐
operator.join(house)
id email create_time update_time ID operator_id
0 10004 qdcsl2013@163.com 2018-02-23 15:01:46 2018-02-23 15:02:28 32 10004
1 10005 a123@vip.qq.com 2018-01-10 20:56:46 2018-01-11 10:42:22 33 10004
2 10008 qdcsl2013@163.com 2018-02-23 15:01:58 2018-02-23 15:02:38 34 10004
3 10015 343123@qq.com 2018-01-11 10:46:22 2018-01-11 16:03:43 35 10004
4 10029 123345@qq.com 2018-02-01 16:51:24 2018-02-01 16:51:24 36 10008
... ... ... ... ... ... ...
229 10266 NaN 2019-05-20 17:41:23 2019-05-28 16:11:14 266 10008
230 10267 NaN 2019-05-20 17:55:52 2019-05-28 16:11:10 267 10008
231 10268 NaN 2019-05-27 14:40:23 2019-05-28 16:11:05 268 10004
232 10269 NaN 2019-05-31 15:15:10 2019-05-31 15:15:10 270 10004
233 10270 NaN 2019-06-03 11:39:52 2019-06-03 11:39:52 271 10004
234 rows × 6 columns
pd.concat(),相当于UNION,当字段名一致时会并入一列
#concat,相当于MYSQL的UNION,方法属于pd级别,默认是上下连接,axis=1则是左右连接
pd.concat([operator,house],axis=0,sort=True)
ID create_time email id operator_id update_time
0 NaN 2018-02-23 15:01:46 qdc@163.com 10004.0 NaN 2018-02-23 15:02:28
1 NaN 2018-01-10 20:56:46 al@vip.qq.com 10005.0 NaN 2018-01-11 10:42:22
2 NaN 2018-02-23 15:01:58 qd13@163.com 10008.0 NaN 2018-02-23 15:02:38
3 NaN 2018-01-11 10:46:22 3448@qq.com 10015.0 NaN 2018-01-11 16:03:43
4 NaN 2018-02-01 16:51:24 123345@qq.com 10029.0 NaN 2018-02-01 16:51:24
... ... ... ... ... ... ...
2965 3077.0 NaN NaN NaN 10265.0 NaN
2966 3078.0 NaN NaN NaN 10265.0 NaN
2967 3079.0 NaN NaN NaN 10265.0 NaN
2968 3080.0 NaN NaN NaN 10265.0 NaN
2969 3081.0 NaN NaN NaN 10265.0 NaN
3204 rows × 6 columns
concat用于上下合并会较为方便,比如1、2、3多个月份的数据,字段相同,则可直接合并为”第一季度”的汇总表
#concat用于上下合并会较为方便,比如1、2、3多个月份的数据,字段相同,则可直接合并为"第一季度"的汇总表
#1月销售数据
df1=pd.DataFrame({
'品类':list('abcde'),
'销售额':[100,200,300,400,500]
})
#2月销售数据
df2=pd.DataFrame({
'品类':list('abcde'),
'销售额':[100,200,300,400,500]
})
#3月销售数据
df3=pd.DataFrame({
'品类':list('qpwor'),
'销售额':[100,200,300,400,500]
})
pd.concat([df1,df2,df3],sort=True)
品类 销售额
0 a 100
1 b 200
2 c 300
3 d 400
4 e 500
0 a 100
1 b 200
2 c 300
3 d 400
4 e 500
0 q 100
1 p 200
2 w 300
3 o 400
4 r 500