8.多表联合 - 《pandas》

1.merge
2.join
3.concat
4.append()

1.merge

合并时有4种方法how = [‘left’, ‘right’, ‘outer’, ‘inner’]，预设值how=’inner’。
indicator=True会将合并的记录放在新的一列。indicator=’indicator_column’修改indicator生成的列名

8.多表联合 - 图1

import pandas as pd
students = pd.read_excel('tmp1\Student_Score.xlsx', sheet_name='Students')
scores = pd.read_excel('tmp1\Student_Score.xlsx', sheet_name='Scores')
tables = students.merge(scores, how='left', on='ID').fillna(0)  
# 默认inner join 内连，两个表进行数据联立时，如果联立不上就丢弃数据
# 如果两个表没有相同的ID 就设置left_on=   right_on=
#  left 意思就是无论条件成不成立 都保留左边表的数据 则保留students表的数据
#  fillna()指的是用什么来替代NaN
tables.Score = tables.Score.astype(int)  # 将Score那一列的数据变为int整型
print(tables)
"""
    ID         Name  Score
0    1  Student_001     81
1    3  Student_003     83
2    5  Student_005     85
3    7  Student_007     87
4    9  Student_009     89
5   11  Student_011     91
6   13  Student_013     93
7   15  Student_015     95
8   17  Student_017     97
9   19  Student_019     99
10  21  Student_021      0
11  23  Student_023      0
12  25  Student_025      0
13  27  Student_027      0
14  29  Student_029      0
15  31  Student_031      0
16  33  Student_033      0
17  35  Student_035      0
18  37  Student_037      0
19  39  Student_039      0
"""

2.join

join在进行左右联立表时，会默认用index进行联立,join有on参数 ,但是去掉了left_on和right_on参数

import pandas as pd
students = pd.read_excel('tmp1\Student_Score.xlsx', sheet_name='Students', index_col='ID')
scores = pd.read_excel('tmp1\Student_Score.xlsx', sheet_name='Scores', index_col='ID')
tables = students.join(scores, how='left').fillna(0)  
# 默认inner join 内连，两个表进行数据联立时，如果联立不上就丢弃数据
tables.Score = tables.Score.astype(int)  # 将Score那一列的数据变为int整型
print(tables)

3.concat

concat将两张表串联起来默认从上到下0,
可以将数据根据不同的轴作简单的融合
参数说明
objs: series，dataframe或者是panel构成的序列list
axis：需要合并链接的轴，0是行，1是列
join：连接的方式 inner，或者outer,join=’outer’为预设值，因此未设定任何参数时，函数默认join=’outer’。此方式是依照column来做纵向合并，有相同的column上下合并在一起，其他独自的column个自成列，原本没有值的位置皆以NaN填充。

join = ‘innner’表示只有相同的column合并在一起，其他的会被抛弃。

Students16.xlsx

import pandas as pd
import numpy as np
page_001 = pd.read_excel('tmp1\Students16.xlsx', sheet_name='Page_001')
page_002 = pd.read_excel('tmp1\Students16.xlsx', sheet_name='Page_002')
# concat将两张表串联起来默认从上到下0
students = pd.concat([page_001, page_002]).reset_index(drop=True)
print(students)
# reset_index()重置index，drop=Ture删去原index
"""
    ID         Name  Score
0    1  Student_001     90
1    2  Student_002     90
2    3  Student_003     90
3    4  Student_004     90
4    5  Student_005     90
5    6  Student_006     90
6    7  Student_007     90
7    8  Student_008     90
8    9  Student_009     90
9   10  Student_010     90
10  11  Student_011     90
11  12  Student_012     90
12  13  Student_013     90
13  14  Student_014     90
14  15  Student_015     90
15  16  Student_016     90
16  17  Student_017     90
17  18  Student_018     90
18  19  Student_019     90
19  20  Student_020     90
20  21  Student_021     80
21  22  Student_022     80
22  23  Student_023     80
23  24  Student_024     80
24  25  Student_025     80
25  26  Student_026     80
26  27  Student_027     80
27  28  Student_028     80
28  29  Student_029     80
29  30  Student_030     80
30  31  Student_031     80
31  32  Student_032     80
32  33  Student_033     80
33  34  Student_034     80
34  35  Student_035     80
35  36  Student_036     80
36  37  Student_037     80
37  38  Student_038     80
38  39  Student_039     80
39  40  Student_040     80
"""

4.append()

append是series和dataframe的方法，使用它就是默认沿着列进行凭借（axis = 0，列对齐）

# pd.append(data2) # 在数据框data2的末尾添加数据框data1，其中data1和data2的列数应该相等

成绩最值比较(最好的成绩与最差的成绩)

print(data.sort_values('Score').head(1).append(data.sort_values('Score').tail(1)))
"""
    ID         Name  Age  Score
10  11  Student_011   22     50
2    3  Student_003   33    100
"""