DataFrame.drop_duplicates
DataFrame.drop_duplicates(subset=None, keep=’first’, inplace=False, ignore_index=False)
返回删除重复行的DataFrame。
索引,包括时间索引将被忽略。
Parameters
| subset | 只考虑某些列来识别重复,默认使用所有列 |
|---|---|
| keep | - first:删除除第一次出现外的副本; - last:删除除最后一次出现外的副本; - False:删除所有副本 |
| inplace | False:返回副本;True:就地执行操作并返回None |
| ignore_index | True:将引列将被重置为0,1,2…,n-1 |
Example
import pandas as pddf = pd.DataFrame({'site':['google', 'google', 'google', 'pandas'],'age':[18, 39, 22, 45]})df.drop_duplicates(subset=['site'])----------------------------site age0 google 183 pandas 45
Example
import pandas as pddf = pd.DataFrame({'site':['google', 'google', 'google', 'pandas'],'age':[18, 39, 22, 45]})df.drop_duplicates(subset=['site'], keep='last')--------------------------------------site age2 google 223 pandas 45
Example
import pandas as pddf = pd.DataFrame({'site':['google', 'google', 'google', 'pandas'],'age':[18, 39, 22, 45]})df.drop_duplicates(subset=['site'], keep=False)-------------------------------------site age3 pandas 45
