介绍

我们处理的数据经常会带有列名、索引名或我们不满意的其他按惯例进行的命名。在这部分教程中,您将学习如何使用pandas函数将违规条目的名称更改为更好的名称。
还将探索如何合并来自多个 DataFrames a和/或 Series的数据。

重命名

这里我们将要介绍的第一个函数是rename(),它可被用于更改索引名和列名。例如,要将数据集中的 points 列更改为score,我们将执行以下操作:

  1. import pandas as pd
  2. pd.set_option('display.max_columns', None) # 显示所有列
  3. pd.set_option('display.width', None) # 显示宽度是无限
  4. reviews = pd.read_csv("../data/winemag-data-130k-v2.csv", index_col=0)
  5. print(reviews.rename(columns={'points': 'score'})) # 更改 points 列名为 score
  1. country description designation score price province region_1 region_2 taster_name taster_twitter_handle title variety winery
  2. 0 Italy Aromas include tropical fruit, broom, brimston... Vulkà Bianco 87 NaN Sicily & Sardinia Etna NaN Kerin OKeefe @kerinokeefe Nicosia 2013 Vulkà Bianco (Etna) White Blend Nicosia
  3. 1 Portugal This is ripe and fruity, a wine that is smooth... Avidagos 87 15.0 Douro NaN NaN Roger Voss @vossroger Quinta dos Avidagos 2011 Avidagos Red (Douro) Portuguese Red Quinta dos Avidagos
  4. 2 US Tart and snappy, the flavors of lime flesh and... NaN 87 14.0 Oregon Willamette Valley Willamette Valley Paul Gregutt @paulgwine Rainstorm 2013 Pinot Gris (Willamette Valley) Pinot Gris Rainstorm
  5. 3 US Pineapple rind, lemon pith and orange blossom ... Reserve Late Harvest 87 13.0 Michigan Lake Michigan Shore NaN Alexander Peartree NaN St. Julian 2013 Reserve Late Harvest Riesling ... Riesling St. Julian
  6. 4 US Much like the regular bottling from 2012, this... Vintner's Reserve Wild Child Block 87 65.0 Oregon Willamette Valley Willamette Valley Paul Gregutt @paulgwine Sweet Cheeks 2012 Vintner's Reserve Wild Child... Pinot Noir Sweet Cheeks
  7. ... ... ... ... ... ... ... ... ... ... ... ... ... ...
  8. 129966 Germany Notes of honeysuckle and cantaloupe sweeten th... Brauneberger Juffer-Sonnenuhr Spätlese 90 28.0 Mosel NaN NaN Anna Lee C. Iijima NaN Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ... Riesling Dr. H. Thanisch (Erben Müller-Burggraef)
  9. 129967 US Citation is given as much as a decade of bottl... NaN 90 75.0 Oregon Oregon Oregon Other Paul Gregutt @paulgwine Citation 2004 Pinot Noir (Oregon) Pinot Noir Citation
  10. 129968 France Well-drained gravel soil gives this wine its c... Kritt 90 30.0 Alsace Alsace NaN Roger Voss @vossroger Domaine Gresser 2013 Kritt Gewurztraminer (Als... Gewürztraminer Domaine Gresser
  11. 129969 France A dry style of Pinot Gris, this is crisp with ... NaN 90 32.0 Alsace Alsace NaN Roger Voss @vossroger Domaine Marcel Deiss 2012 Pinot Gris (Alsace) Pinot Gris Domaine Marcel Deiss
  12. 129970 France Big, rich and off-dry, this is powered by inte... Lieu-dit Harth Cuvée Caroline 90 21.0 Alsace Alsace NaN Roger Voss @vossroger Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car... Gewürztraminer Domaine Schoffit
  13. [129971 rows x 13 columns]

rename() 允许通过分别指定 index 或 column 关键字参数来重命名索引或列。它支持多种输入格式,通常来说,使用 Python 中的字典数据类型是最方便的。下面是一个使用它重命名索引中某些元素的示例。

  1. import pandas as pd
  2. reviews = pd.read_csv("../data/winemag-data-130k-v2.csv", index_col=0)
  3. print(reviews.rename(index={0: 'firstEntry', 1: 'secondEntry'})) # 修改行索引
  1. country ... winery
  2. firstEntry Italy ... Nicosia
  3. secondEntry Portugal ... Quinta dos Avidagos
  4. 2 US ... Rainstorm
  5. 3 US ... St. Julian
  6. 4 US ... Sweet Cheeks
  7. ... ... ... ...
  8. 129966 Germany ... Dr. H. Thanisch (Erben Müller-Burggraef)
  9. 129967 US ... Citation
  10. 129968 France ... Domaine Gresser
  11. 129969 France ... Domaine Marcel Deiss
  12. 129970 France ... Domaine Schoffit
  13. [129971 rows x 13 columns]

您可能会经常重命名列,但很少重命名索引值,set_index() 通常可以更方便的重命名索引。
行索引和列索引都可以有自己的 name 属性。可以使用 rename_axis() 方法更改这些名称。例如:

  1. import pandas as pd
  2. pd.set_option('display.max_columns', None) # 显示所有列
  3. reviews = pd.read_csv("../data/winemag-data-130k-v2.csv", index_col=0)
  4. print(reviews)
  5. print(reviews.rename_axis("wines", axis='rows').rename_axis("fields", axis='columns'))
  6. # 第一行的标签修改为"wines" 第一列的标签修改为"fields"
  1. country ... winery
  2. 0 Italy ... Nicosia
  3. 1 Portugal ... Quinta dos Avidagos
  4. 2 US ... Rainstorm
  5. 3 US ... St. Julian
  6. 4 US ... Sweet Cheeks
  7. ... ... ... ...
  8. 129966 Germany ... Dr. H. Thanisch (Erben Müller-Burggraef)
  9. 129967 US ... Citation
  10. 129968 France ... Domaine Gresser
  11. 129969 France ... Domaine Marcel Deiss
  12. 129970 France ... Domaine Schoffit
  13. [129971 rows x 13 columns]
  1. fields country ... winery
  2. wines ...
  3. 0 Italy ... Nicosia
  4. 1 Portugal ... Quinta dos Avidagos
  5. 2 US ... Rainstorm
  6. 3 US ... St. Julian
  7. 4 US ... Sweet Cheeks
  8. ... ... ... ...
  9. 129966 Germany ... Dr. H. Thanisch (Erben Müller-Burggraef)
  10. 129967 US ... Citation
  11. 129968 France ... Domaine Gresser
  12. 129969 France ... Domaine Marcel Deiss
  13. 129970 France ... Domaine Schoffit
  14. [129971 rows x 13 columns]

合并数据

在对数据集执行操作时,我们有时需要以某种有意义的方式组合不同的DataFrames 和 Series 中的数据。为提高便利性,pandas提供了三种方法完成数据的合并,它们是concat()、join() 和merge()。merge() 所能做的大部分工作也可以使用 join() 更简单地来完成,因此我们将省略 join() 并将重点放在前两个函数上。
最简单的合并数据方法是 concat()。假定有一个元素列表,此函数将沿着一个轴将这些元素合并在一起。
这个方法在我们的数据来自不同的DataFrame或Series对象,但却具有相同的字段(列)名时非常有用。
例如:YouTube视频数据集,它根据来源国(如本例中的加拿大和英国)对数据进行拆分。如果我们想同时研究多个国家,我们可以使用 concat() 将它们混合在一起:

  1. import pandas as pd
  2. canadian_youtube = pd.read_csv("../data/CAvideos.csv")
  3. print(canadian_youtube) # 来自加拿大的数据
  1. video_id ... description
  2. 0 n1WpP7iowLc ... Eminem's new track Walk on Water ft. Beyoncé i...
  3. 1 0dBIkQ4Mz1M ... STill got a lot of packages. Probably will las...
  4. 2 5qpjK5DgCt4 ... WATCH MY PREVIOUS VIDEO ▶ \n\nSUBSCRIBE ► http...
  5. 3 d380meD0W0M ... I know it's been a while since we did this sho...
  6. 4 2Vv-BfVoq4g ... 🎧: https://ad.gt/yt-perfect\n💰: https://atlant...
  7. ... ... ... ...
  8. 40876 sGolxsMSGfQ ... 🚨 NEW MERCH! http://amzn.to/annoyingorange 🚨➤ ...
  9. 40877 8HNuRNi8t70 ... Retrouvez vos programmes préférés : https://...
  10. 40878 GWlKEM3m2EE ... Find out more about Kingdom Hearts 3: https://...
  11. 40879 lbMKLzQ4cNQ ... Peter Navarro isnt talking so tough now. Ana ...
  12. 40880 POTgw38-m58 ... 藝人:李妍瑾、玉兔、班傑、LaLa、小優、少少專家:陳筱屏(律師)、Wendy(心理師)、羅...
  13. [40881 rows x 16 columns]
  1. import pandas as pd
  2. british_youtube = pd.read_csv("../data/GBvideos.csv")
  3. print(british_youtube) # 来自英国的数据
  1. video_id ... description
  2. 0 Jw1Y-zhQURU ... Click here to continue the story and make your...
  3. 1 3s1rvMFUweQ ... Musical guest Taylor Swift performs Ready for...
  4. 2 n1WpP7iowLc ... Eminem's new track Walk on Water ft. Beyoncé i...
  5. 3 PUTEiSjKwJU ... Salford drew 4-4 against the Class of 92 and F...
  6. 4 rHwDegptbI4 ... Dashcam captures truck's near miss with child ...
  7. ... ... ... ...
  8. 38911 l884wKofd54 ... NEW SONG - MOVE TO MIAMI feat. Pitbull (Click ...
  9. 38912 IP8k2xkhOdI ... THE OFFICIAL UP WITH IT MUSIC VIDEO!Get my new...
  10. 38913 Il-an3K9pjg ... Get 2002 by Anne-Marie HERE http://ad.gt/200...
  11. 38914 -DRsfNObKIQ ... Eleni Foureira represented Cyprus at the first...
  12. 38915 4YFo4bdMO8Q ... Debut album 'Light of Mine' out now: http://ky...
  13. [38916 rows x 16 columns]
  1. import pandas as pd
  2. canadian_youtube = pd.read_csv("../data/CAvideos.csv")
  3. british_youtube = pd.read_csv("../data/GBvideos.csv")
  4. print(pd.concat([canadian_youtube, british_youtube])) # 合并加拿大和英国的数据
  1. video_id ... description
  2. 0 n1WpP7iowLc ... Eminem's new track Walk on Water ft. Beyoncé i...
  3. 1 0dBIkQ4Mz1M ... STill got a lot of packages. Probably will las...
  4. 2 5qpjK5DgCt4 ... WATCH MY PREVIOUS VIDEO ▶ \n\nSUBSCRIBE ► http...
  5. 3 d380meD0W0M ... I know it's been a while since we did this sho...
  6. 4 2Vv-BfVoq4g ... 🎧: https://ad.gt/yt-perfect\n💰: https://atlant...
  7. ... ... ... ...
  8. 38911 l884wKofd54 ... NEW SONG - MOVE TO MIAMI feat. Pitbull (Click ...
  9. 38912 IP8k2xkhOdI ... THE OFFICIAL UP WITH IT MUSIC VIDEO!Get my new...
  10. 38913 Il-an3K9pjg ... Get 2002 by Anne-Marie HERE http://ad.gt/200...
  11. 38914 -DRsfNObKIQ ... Eleni Foureira represented Cyprus at the first...
  12. 38915 4YFo4bdMO8Q ... Debut album 'Light of Mine' out now: http://ky...
  13. [79797 rows x 16 columns]

复杂性居中间的组合器是join(),join() 允许组合具有共同索引的来自不同 DataFrame 的对象。例如,要删除加拿大和英国同一天的视频,我们可以执行以下操作:

  1. import pandas as pd
  2. canadian_youtube = pd.read_csv("../data/CAvideos.csv")
  3. british_youtube = pd.read_csv("../data/GBvideos.csv")
  4. left = canadian_youtube.set_index(['title', 'trending_date'])
  5. right = british_youtube.set_index(['title', 'trending_date'])
  6. print(left.join(right, lsuffix='_CAN', rsuffix='_UK'))
  1. video_id_CAN channel_title_CAN category_id_CAN publish_time_CAN tags_CAN views_CAN likes_CAN dislikes_CAN comment_count_CAN thumbnail_link_CAN comments_disabled_CAN ratings_disabled_CAN video_error_or_removed_CAN description_CAN video_id_UK channel_title_UK category_id_UK publish_time_UK tags_UK views_UK likes_UK dislikes_UK comment_count_UK thumbnail_link_UK comments_disabled_UK ratings_disabled_UK video_error_or_removed_UK description_UK
  2. title trending_date
  3. !! THIS VIDEO IS NOTHING BUT PAIN !! | Getting ... 18.04.01 PNn8sECd7io Markiplier 20 2018-01-03T19:33:53.000Z getting over it|"markiplier"|"funny moments"|"... 835930 47058 1023 8250 https://i.ytimg.com/vi/PNn8sECd7io/default.jpg False False False Getting Over It continues with RAGE BEYOND ALL... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
  4. #1 Fortnite World Rank - 2,323 Solo Wins! 18.09.03 DvPW66IFhMI AlexRamiGaming 20 2018-03-09T07:15:52.000Z PS4 Battle Royale|"PS4 Pro Battle Royale"|"Bat... 212838 5199 542 11 https://i.ytimg.com/vi/DvPW66IFhMI/default.jpg False False False Discord For EVERYONE - https://discord.gg/nhud... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
  5. #1 Fortnite World Rank - 2,330 Solo Wins! 18.10.03 EXEaMjFeiEk AlexRamiGaming 20 2018-03-10T06:26:17.000Z PS4 Battle Royale|"PS4 Pro Battle Royale"|"Bat... 200764 5620 537 45 https://i.ytimg.com/vi/EXEaMjFeiEk/default.jpg False False False Discord For EVERYONE - https://discord.gg/nhud... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
  6. #1 MOST ANTICIPATED VIDEO (Timber Frame House R... 17.20.12 bYvQmusLaxw Pure Living for Life 24 2017-12-20T02:49:11.000Z timber frame|"timber framing"|"timber frame ra... 79152 7761 159 1965 https://i.ytimg.com/vi/bYvQmusLaxw/default.jpg False False False Shelter Institute: http://bit.ly/2iwXj8B\nFull... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
  7. 17.21.12 bYvQmusLaxw Pure Living for Life 24 2017-12-20T02:49:11.000Z timber frame|"timber framing"|"timber frame ra... 232762 15515 329 3601 https://i.ytimg.com/vi/bYvQmusLaxw/default.jpg False False False Shelter Institute: http://bit.ly/2iwXj8B\nFull... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
  8. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
  9. 😲She Is So Nervous But BLOWS The ROOF After Tak... 18.02.05 WttN1Z0XF4k How Talented 24 2018-04-28T19:40:58.000Z bgt|"bgt 2018"|"britain got talent"|"britain´s... 713400 4684 260 266 https://i.ytimg.com/vi/WttN1Z0XF4k/default.jpg False False False Amy Marie Borg - Britain´s Got Talent 2018\n\n... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
  10. 18.29.04 WttN1Z0XF4k How Talented 24 2018-04-28T19:40:58.000Z bgt|"bgt 2018"|"britain got talent"|"britain´s... 231906 1924 78 146 https://i.ytimg.com/vi/WttN1Z0XF4k/default.jpg False False False Amy Marie Borg - Britain´s Got Talent 2018\n\n... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
  11. 18.30.04 WttN1Z0XF4k How Talented 24 2018-04-28T19:40:58.000Z bgt|"bgt 2018"|"britain got talent"|"britain´s... 476253 3417 176 240 https://i.ytimg.com/vi/WttN1Z0XF4k/default.jpg False False False Amy Marie Borg - Britain´s Got Talent 2018\n\n... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
  12. 🚨 BREAKING NEWS 🔴 Raja Live all Slot Channels W... 18.07.05 Wt9Gkpmbt44 TheBigJackpot 24 2018-05-07T06:58:59.000Z Slot Machine|"win"|"Gambling"|"Big Win"|"raja"... 28973 2167 175 10 https://i.ytimg.com/vi/Wt9Gkpmbt44/default.jpg False False False The Raja takes matters in to his own hands ton... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
  13. 🚨Active Shooter at YouTube Headquarters - LIVE ... 18.04.04 Az72jrKbANA Right Side Broadcasting Network 25 2018-04-03T23:12:37.000Z YouTube shooter|"YouTube active shooter"|"acti... 103513 1722 181 76 https://i.ytimg.com/vi/Az72jrKbANA/default.jpg False False False An active shooter has been reported at the You... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
  14. [40900 rows x 28 columns]

这里需要lsuffix和rsuffix参数,因为数据在英国和加拿大数据集中具有相同的列名。如果不是这样(例如,我们已经事先给它们改名了),就不需要它们了。