DataFrame.drop_duplicates
DataFrame.drop_duplicates(subset=None, keep=’first’, inplace=False, ignore_index=False)
返回删除重复行的DataFrame。
索引,包括时间索引将被忽略。
Parameters
subset | 只考虑某些列来识别重复,默认使用所有列 |
---|---|
keep | - first:删除除第一次出现外的副本; - last:删除除最后一次出现外的副本; - False:删除所有副本 |
inplace | False:返回副本;True:就地执行操作并返回None |
ignore_index | True:将引列将被重置为0,1,2…,n-1 |
Example
import pandas as pd
df = pd.DataFrame({'site':['google', 'google', 'google', 'pandas'],
'age':[18, 39, 22, 45]})
df.drop_duplicates(subset=['site'])
----------------------------
site age
0 google 18
3 pandas 45
Example
import pandas as pd
df = pd.DataFrame({'site':['google', 'google', 'google', 'pandas'],
'age':[18, 39, 22, 45]})
df.drop_duplicates(subset=['site'], keep='last')
--------------------------------------
site age
2 google 22
3 pandas 45
Example
import pandas as pd
df = pd.DataFrame({'site':['google', 'google', 'google', 'pandas'],
'age':[18, 39, 22, 45]})
df.drop_duplicates(subset=['site'], keep=False)
-------------------------------------
site age
3 pandas 45