DataFrame.merge
DataFrame.merge(right, how=’inner’, on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=(‘_x’, ‘_y’), copy=True, indicator=False, validate=None)
使用数据库样式的连接合并DataFrame或命名Series对象。
Parameters
| right | 要合并的对象。 |
|---|---|
| how | 要执行的合并方式: - left:类似于SQL的左外连接 - right:类似于SQL的右外连接 - outer:类似于SQL的全外连接 - inner:类似于SQL的内连接 - cross:创建笛卡尔积 |
| on | 要连接的列或索引级别名称。 |
| left_on | 要在左DataFrame中联接的列或索引级别名称 |
| right_on | 要在右DataFrame中联接的列或索引级别名称 |
| left_index | 使用左DataFrame的索引作为连接键 |
| right_index | 使用右DataFrame的索引作为连接键 |
| sort | 在结果中按字典顺序排序连接键。 |
| suffixes | 长度为2的序列,其中每个元素都是可选的字符串,指示要分别添加到左侧和右侧重叠列名的后缀。 |
| copy | False:尽可能避免复制 |
| indicator | 如果为True,则在输出DataFrame中添加一个名为“_merge”的列,其中包含关于每一行的源的信息。 对于合并键只出现在左边DataFrame的值为“left_only”; 对于合并键只出现在右边DataFrame的值为“right_only”; 对于合并键出现在左右两边DataFrame的值为“both”; |
| validate | 检查merge是否属于指定的类型: - “one_to_one” or “1:1”:检查合并键是否在左右数据集都是唯一的; - “one_to_many” or “1:m”:检查合并键在左数据集是否唯一; - “many_to_one” or “m:1”:检查合并键在右数据集是否唯一; - “many_to_many” or “m:m”:允许,但不导致检查 |
举例:内连接
import pandas as pddf1 = pd.DataFrame({'key':['google', 'baidu', 'wiki'],'age':[18, 39, 22]})df2 = pd.DataFrame({'key':['google', 'baidu', 'safa'],'size':[11, 44, 55]})df1.merge(df2, how='inner', on='key')------------------------------------------------------------key age size0 google 18 111 baidu 39 44
举例:左连接
import pandas as pddf1 = pd.DataFrame({'key':['google', 'baidu', 'wiki'],'age':[18, 39, 22]})df2 = pd.DataFrame({'key':['google', 'baidu', 'safa'],'size':[11, 44, 55]})df1.merge(df2, how='left', on='key')-----------------------------------------------------------------------key age size0 google 18 11.01 baidu 39 44.02 wiki 22 NaN
举例:右连接
import pandas as pddf1 = pd.DataFrame({'key':['google', 'baidu', 'wiki'],'age':[18, 39, 22]})df2 = pd.DataFrame({'key':['google', 'baidu', 'safa'],'size':[11, 44, 55]})df1.merge(df2, how='right', on='key')--------------------------------------------------------key age size0 google 18.0 111 baidu 39.0 442 safa NaN 55
举例:外连接
import pandas as pddf1 = pd.DataFrame({'key':['google', 'baidu', 'wiki'],'age':[18, 39, 22]})df2 = pd.DataFrame({'key':['google', 'baidu', 'safa'],'size':[11, 44, 55]})df1.merge(df2, how='outer', on='key')-----------------------------------------------------------key age size0 google 18.0 11.01 baidu 39.0 44.02 wiki 22.0 NaN3 safa NaN 55.0
举例:外连接 + indicator=True
import pandas as pddf1 = pd.DataFrame({'key':['google', 'baidu', 'wiki'],'age':[18, 39, 22]})df2 = pd.DataFrame({'key':['google', 'baidu', 'safa'],'size':[11, 44, 55]})df1.merge(df2, how='outer', on='key', indicator=True)-------------------------------------------------------------key age size _merge0 google 18.0 11.0 both1 baidu 39.0 44.0 both2 wiki 22.0 NaN left_only3 safa NaN 55.0 right_only
举例:笛卡尔积
import pandas as pddf1 = pd.DataFrame({"site":["google", "baidu", "wiki", "pandas"]})df2 = pd.DataFrame({"plant": ["P1", "P2", "P3", "P4"]})df1["temp"] = 1df2["temp"] = 1df = df1.merge(df2, how="outer", on="temp")df = df.drop(columns=["temp"])df---------------------------------------------------------site plant0 google P11 google P22 google P33 google P44 baidu P15 baidu P26 baidu P37 baidu P48 wiki P19 wiki P210 wiki P311 wiki P412 pandas P113 pandas P214 pandas P315 pandas P4
