DataFrame.drop_duplicates

DataFrame.drop_duplicates(subset=None, keep=’first’, inplace=False, ignore_index=False)
返回删除重复行的DataFrame
索引,包括时间索引将被忽略。

Parameters

subset 只考虑某些列来识别重复,默认使用所有列
keep
- first:删除除第一次出现外的副本;
- last:删除除最后一次出现外的副本;
- False:删除所有副本
inplace False:返回副本;True:就地执行操作并返回None
ignore_index True:将引列将被重置为0,1,2…,n-1

Example

  1. import pandas as pd
  2. df = pd.DataFrame({'site':['google', 'google', 'google', 'pandas'],
  3. 'age':[18, 39, 22, 45]})
  4. df.drop_duplicates(subset=['site'])
  5. ----------------------------
  6. site age
  7. 0 google 18
  8. 3 pandas 45

Example

  1. import pandas as pd
  2. df = pd.DataFrame({'site':['google', 'google', 'google', 'pandas'],
  3. 'age':[18, 39, 22, 45]})
  4. df.drop_duplicates(subset=['site'], keep='last')
  5. --------------------------------------
  6. site age
  7. 2 google 22
  8. 3 pandas 45

Example

  1. import pandas as pd
  2. df = pd.DataFrame({'site':['google', 'google', 'google', 'pandas'],
  3. 'age':[18, 39, 22, 45]})
  4. df.drop_duplicates(subset=['site'], keep=False)
  5. -------------------------------------
  6. site age
  7. 3 pandas 45