点击查看【bilibili】

  1. 案例中的2个表,operatorhouse,连接的字段为operator.idhouse.operator_id
  2. operator.head()
  3. id email create_time update_time
  4. 0 10004 qd3@163.com 2018-02-23 15:01:46 2018-02-23 15:02:28
  5. 1 10005 alo@vip.qq.com 2018-01-10 20:56:46 2018-01-11 10:42:22
  6. 2 10008 qd3@163.com 2018-02-23 15:01:58 2018-02-23 15:02:38
  7. 3 10015 343148@qq.com 2018-01-11 10:46:22 2018-01-11 16:03:43
  8. 4 10029 123345@qq.com 2018-02-01 16:51:24 2018-02-01 16:51:24
  9. house.head()
  10. ID operator_id
  11. 0 32 10004
  12. 1 33 10004
  13. 2 34 10004
  14. 3 35 10004
  15. 4 36 10008

df.merge(),基于字段列名连接

  1. #merge,按列连接,连接方式默认是inner
  2. operator.merge(house,how='inner',left_on='id',right_on='operator_id')
  3. id email create_time update_time ID operator_id
  4. 0 10004 qdcs1234@163.com 2018-02-23 15:01:46 2018-02-23 15:02:28 32 10004
  5. 1 10004 qdcs1233@163.com 2018-02-23 15:01:46 2018-02-23 15:02:28 33 10004
  6. 2 10004 qdcs123a@163.com 2018-02-23 15:01:46 2018-02-23 15:02:28 34 10004
  7. 3 10004 qdcs123@163.com 2018-02-23 15:01:46 2018-02-23 15:02:28 35 10004
  8. 4 10004 qdcs123@163.com 2018-02-23 15:01:46 2018-02-23 15:02:28 39 10004
  9. ... ... ... ... ... ... ...
  10. 2953 10265 asl@123.com 2019-05-20 14:47:35 2019-05-21 11:12:19 3079 10265
  11. 2954 10265 asl@123.com 2019-05-20 14:47:35 2019-05-21 11:12:19 3080 10265
  12. 2955 10265 asl@123.com 2019-05-20 14:47:35 2019-05-21 11:12:19 3081 10265
  13. 2956 10269 NaN 2019-05-31 15:15:10 2019-05-31 15:15:10 3032 10269
  14. 2957 10270 NaN 2019-06-03 11:39:52 2019-06-03 11:39:52 3073 10270
  15. 2958 rows × 6 columns

df.join(),基于索引连接

#join,按索引连接,连接方式默认是left,一般不推荐
operator.join(house)

    id        email                create_time            update_time            ID    operator_id
0    10004    qdcsl2013@163.com    2018-02-23 15:01:46    2018-02-23 15:02:28    32    10004
1    10005    a123@vip.qq.com        2018-01-10 20:56:46    2018-01-11 10:42:22    33    10004
2    10008    qdcsl2013@163.com    2018-02-23 15:01:58    2018-02-23 15:02:38    34    10004
3    10015    343123@qq.com        2018-01-11 10:46:22    2018-01-11 16:03:43    35    10004
4    10029    123345@qq.com        2018-02-01 16:51:24    2018-02-01 16:51:24    36    10008
...    ...    ...    ...    ...    ...    ...
229    10266    NaN                    2019-05-20 17:41:23    2019-05-28 16:11:14    266    10008
230    10267    NaN                    2019-05-20 17:55:52    2019-05-28 16:11:10    267    10008
231    10268    NaN                    2019-05-27 14:40:23    2019-05-28 16:11:05    268    10004
232    10269    NaN                    2019-05-31 15:15:10    2019-05-31 15:15:10    270    10004
233    10270    NaN                    2019-06-03 11:39:52    2019-06-03 11:39:52    271    10004
234 rows × 6 columns

pd.concat(),相当于UNION,当字段名一致时会并入一列

#concat,相当于MYSQL的UNION,方法属于pd级别,默认是上下连接,axis=1则是左右连接
pd.concat([operator,house],axis=0,sort=True)

    ID    create_time            email                id        operator_id    update_time
0    NaN    2018-02-23 15:01:46    qdc@163.com        10004.0    NaN    2018-02-23 15:02:28
1    NaN    2018-01-10 20:56:46    al@vip.qq.com    10005.0    NaN    2018-01-11 10:42:22
2    NaN    2018-02-23 15:01:58    qd13@163.com    10008.0    NaN    2018-02-23 15:02:38
3    NaN    2018-01-11 10:46:22    3448@qq.com        10015.0    NaN    2018-01-11 16:03:43
4    NaN    2018-02-01 16:51:24    123345@qq.com    10029.0    NaN    2018-02-01 16:51:24
...    ...    ...    ...    ...    ...    ...
2965    3077.0    NaN    NaN    NaN    10265.0    NaN
2966    3078.0    NaN    NaN    NaN    10265.0    NaN
2967    3079.0    NaN    NaN    NaN    10265.0    NaN
2968    3080.0    NaN    NaN    NaN    10265.0    NaN
2969    3081.0    NaN    NaN    NaN    10265.0    NaN
3204 rows × 6 columns

concat用于上下合并会较为方便,比如1、2、3多个月份的数据,字段相同,则可直接合并为”第一季度”的汇总表

#concat用于上下合并会较为方便,比如1、2、3多个月份的数据,字段相同,则可直接合并为"第一季度"的汇总表
#1月销售数据
df1=pd.DataFrame({
    '品类':list('abcde'),
    '销售额':[100,200,300,400,500]
})
#2月销售数据
df2=pd.DataFrame({
    '品类':list('abcde'),
    '销售额':[100,200,300,400,500]
})
#3月销售数据
df3=pd.DataFrame({
    '品类':list('qpwor'),
    '销售额':[100,200,300,400,500]
})

pd.concat([df1,df2,df3],sort=True)

    品类    销售额
0    a    100
1    b    200
2    c    300
3    d    400
4    e    500
0    a    100
1    b    200
2    c    300
3    d    400
4    e    500
0    q    100
1    p    200
2    w    300
3    o    400
4    r    500