Students_Duplicates.xlsx
1.数据的去重
import pandas as pdstudents = pd.read_excel('tmp1\Students_Duplicates.xlsx')students.drop_duplicates(subset='Name', inplace=True, keep='first') # 进行数据的去重 keep='first'默认保留一组重复数据的前面几个,last保留后几个print(students)""" ID Name Test_1 Test_2 Test_30 1 Student_001 62 86 831 2 Student_002 77 97 782 3 Student_003 57 96 463 4 Student_004 57 87 804 5 Student_005 95 59 875 6 Student_006 56 97 616 7 Student_007 64 91 677 8 Student_008 96 70 488 9 Student_009 77 73 489 10 Student_010 90 94 6710 11 Student_011 62 55 6311 12 Student_012 83 76 8112 13 Student_013 68 60 9013 14 Student_014 82 68 9814 15 Student_015 61 67 9115 16 Student_016 59 63 4616 17 Student_017 62 83 9317 18 Student_018 90 75 8018 19 Student_019 100 95 5519 20 Student_020 61 87 100"""
2.拿到重复数据
import pandas as pdstudents = pd.read_excel('tmp1\Students_Duplicates.xlsx')dupe = students.duplicated(subset='Name')# print(dupe.any()) # True说明有重复数值dupe1 = dupe[dupe] # [dupe == True]print(students.iloc[dupe1.index]) # iloc根据index定位数据""" ID Name Test_1 Test_2 Test_320 21 Student_001 62 86 8321 22 Student_002 77 97 7822 23 Student_003 57 96 4623 24 Student_004 57 87 8024 25 Student_005 95 59 87"""