DataFrame.merge
DataFrame.merge(right, how=’inner’, on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=(‘_x’, ‘_y’), copy=True, indicator=False, validate=None)
使用数据库样式的连接合并DataFrame或命名Series对象。
Parameters
right | 要合并的对象。 |
---|---|
how | 要执行的合并方式: - left:类似于SQL的左外连接 - right:类似于SQL的右外连接 - outer:类似于SQL的全外连接 - inner:类似于SQL的内连接 - cross:创建笛卡尔积 |
on | 要连接的列或索引级别名称。 |
left_on | 要在左DataFrame中联接的列或索引级别名称 |
right_on | 要在右DataFrame中联接的列或索引级别名称 |
left_index | 使用左DataFrame的索引作为连接键 |
right_index | 使用右DataFrame的索引作为连接键 |
sort | 在结果中按字典顺序排序连接键。 |
suffixes | 长度为2的序列,其中每个元素都是可选的字符串,指示要分别添加到左侧和右侧重叠列名的后缀。 |
copy | False:尽可能避免复制 |
indicator | 如果为True,则在输出DataFrame中添加一个名为“_merge”的列,其中包含关于每一行的源的信息。 对于合并键只出现在左边DataFrame的值为“left_only”; 对于合并键只出现在右边DataFrame的值为“right_only”; 对于合并键出现在左右两边DataFrame的值为“both”; |
validate | 检查merge是否属于指定的类型: - “one_to_one” or “1:1”:检查合并键是否在左右数据集都是唯一的; - “one_to_many” or “1:m”:检查合并键在左数据集是否唯一; - “many_to_one” or “m:1”:检查合并键在右数据集是否唯一; - “many_to_many” or “m:m”:允许,但不导致检查 |
举例:内连接
import pandas as pd
df1 = pd.DataFrame({'key':['google', 'baidu', 'wiki'],
'age':[18, 39, 22]})
df2 = pd.DataFrame({'key':['google', 'baidu', 'safa'],
'size':[11, 44, 55]})
df1.merge(df2, how='inner', on='key')
------------------------------------------------------------
key age size
0 google 18 11
1 baidu 39 44
举例:左连接
import pandas as pd
df1 = pd.DataFrame({'key':['google', 'baidu', 'wiki'],
'age':[18, 39, 22]})
df2 = pd.DataFrame({'key':['google', 'baidu', 'safa'],
'size':[11, 44, 55]})
df1.merge(df2, how='left', on='key')
-----------------------------------------------------------------------
key age size
0 google 18 11.0
1 baidu 39 44.0
2 wiki 22 NaN
举例:右连接
import pandas as pd
df1 = pd.DataFrame({'key':['google', 'baidu', 'wiki'],
'age':[18, 39, 22]})
df2 = pd.DataFrame({'key':['google', 'baidu', 'safa'],
'size':[11, 44, 55]})
df1.merge(df2, how='right', on='key')
--------------------------------------------------------
key age size
0 google 18.0 11
1 baidu 39.0 44
2 safa NaN 55
举例:外连接
import pandas as pd
df1 = pd.DataFrame({'key':['google', 'baidu', 'wiki'],
'age':[18, 39, 22]})
df2 = pd.DataFrame({'key':['google', 'baidu', 'safa'],
'size':[11, 44, 55]})
df1.merge(df2, how='outer', on='key')
-----------------------------------------------------------
key age size
0 google 18.0 11.0
1 baidu 39.0 44.0
2 wiki 22.0 NaN
3 safa NaN 55.0
举例:外连接 + indicator=True
import pandas as pd
df1 = pd.DataFrame({'key':['google', 'baidu', 'wiki'],
'age':[18, 39, 22]})
df2 = pd.DataFrame({'key':['google', 'baidu', 'safa'],
'size':[11, 44, 55]})
df1.merge(df2, how='outer', on='key', indicator=True)
-------------------------------------------------------------
key age size _merge
0 google 18.0 11.0 both
1 baidu 39.0 44.0 both
2 wiki 22.0 NaN left_only
3 safa NaN 55.0 right_only
举例:笛卡尔积
import pandas as pd
df1 = pd.DataFrame({"site":["google", "baidu", "wiki", "pandas"]})
df2 = pd.DataFrame({"plant": ["P1", "P2", "P3", "P4"]})
df1["temp"] = 1
df2["temp"] = 1
df = df1.merge(df2, how="outer", on="temp")
df = df.drop(columns=["temp"])
df
---------------------------------------------------------
site plant
0 google P1
1 google P2
2 google P3
3 google P4
4 baidu P1
5 baidu P2
6 baidu P3
7 baidu P4
8 wiki P1
9 wiki P2
10 wiki P3
11 wiki P4
12 pandas P1
13 pandas P2
14 pandas P3
15 pandas P4