介绍

我们处理的数据经常会带有列名、索引名或我们不满意的其他按惯例进行的命名。在这部分教程中,您将学习如何使用pandas函数将违规条目的名称更改为更好的名称。
还将探索如何合并来自多个 DataFrames a和/或 Series的数据。

重命名

这里我们将要介绍的第一个函数是rename(),它可被用于更改索引名和列名。例如,要将数据集中的 points 列更改为score,我们将执行以下操作:

  1. import pandas as pd
  2. pd.set_option('display.max_columns', None) # 显示所有列
  3. pd.set_option('display.width', None) # 显示宽度是无限
  4. reviews = pd.read_csv("../data/winemag-data-130k-v2.csv", index_col=0)
  5. print(reviews.rename(columns={'points': 'score'})) # 更改 points 列名为 score
         country                                        description                             designation  score  price           province             region_1           region_2         taster_name taster_twitter_handle                                              title         variety                                    winery
0          Italy  Aromas include tropical fruit, broom, brimston...                            Vulkà Bianco     87    NaN  Sicily & Sardinia                 Etna                NaN       Kerin O’Keefe          @kerinokeefe                  Nicosia 2013 Vulkà Bianco  (Etna)     White Blend                                   Nicosia
1       Portugal  This is ripe and fruity, a wine that is smooth...                                Avidagos     87   15.0              Douro                  NaN                NaN          Roger Voss            @vossroger      Quinta dos Avidagos 2011 Avidagos Red (Douro)  Portuguese Red                       Quinta dos Avidagos
2             US  Tart and snappy, the flavors of lime flesh and...                                     NaN     87   14.0             Oregon    Willamette Valley  Willamette Valley        Paul Gregutt           @paulgwine       Rainstorm 2013 Pinot Gris (Willamette Valley)      Pinot Gris                                 Rainstorm
3             US  Pineapple rind, lemon pith and orange blossom ...                    Reserve Late Harvest     87   13.0           Michigan  Lake Michigan Shore                NaN  Alexander Peartree                   NaN  St. Julian 2013 Reserve Late Harvest Riesling ...        Riesling                                St. Julian
4             US  Much like the regular bottling from 2012, this...      Vintner's Reserve Wild Child Block     87   65.0             Oregon    Willamette Valley  Willamette Valley        Paul Gregutt           @paulgwine   Sweet Cheeks 2012 Vintner's Reserve Wild Child...      Pinot Noir                              Sweet Cheeks
...          ...                                                ...                                     ...    ...    ...                ...                  ...                ...                 ...                   ...                                                ...             ...                                       ...
129966   Germany  Notes of honeysuckle and cantaloupe sweeten th...  Brauneberger Juffer-Sonnenuhr Spätlese     90   28.0              Mosel                  NaN                NaN  Anna Lee C. Iijima                   NaN  Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...        Riesling  Dr. H. Thanisch (Erben Müller-Burggraef)
129967        US  Citation is given as much as a decade of bottl...                                     NaN     90   75.0             Oregon               Oregon       Oregon Other        Paul Gregutt           @paulgwine                   Citation 2004 Pinot Noir (Oregon)      Pinot Noir                                  Citation
129968    France  Well-drained gravel soil gives this wine its c...                                   Kritt     90   30.0             Alsace               Alsace                NaN          Roger Voss            @vossroger  Domaine Gresser 2013 Kritt Gewurztraminer (Als...  Gewürztraminer                           Domaine Gresser
129969    France  A dry style of Pinot Gris, this is crisp with ...                                     NaN     90   32.0             Alsace               Alsace                NaN          Roger Voss            @vossroger      Domaine Marcel Deiss 2012 Pinot Gris (Alsace)      Pinot Gris                      Domaine Marcel Deiss
129970    France  Big, rich and off-dry, this is powered by inte...           Lieu-dit Harth Cuvée Caroline     90   21.0             Alsace               Alsace                NaN          Roger Voss            @vossroger  Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...  Gewürztraminer                          Domaine Schoffit

[129971 rows x 13 columns]

rename() 允许通过分别指定 index 或 column 关键字参数来重命名索引或列。它支持多种输入格式,通常来说,使用 Python 中的字典数据类型是最方便的。下面是一个使用它重命名索引中某些元素的示例。

import pandas as pd

reviews = pd.read_csv("../data/winemag-data-130k-v2.csv", index_col=0)
print(reviews.rename(index={0: 'firstEntry', 1: 'secondEntry'}))   # 修改行索引
              country  ...                                    winery
firstEntry      Italy  ...                                   Nicosia
secondEntry  Portugal  ...                       Quinta dos Avidagos
2                  US  ...                                 Rainstorm
3                  US  ...                                St. Julian
4                  US  ...                              Sweet Cheeks
...               ...  ...                                       ...
129966        Germany  ...  Dr. H. Thanisch (Erben Müller-Burggraef)
129967             US  ...                                  Citation
129968         France  ...                           Domaine Gresser
129969         France  ...                      Domaine Marcel Deiss
129970         France  ...                          Domaine Schoffit

[129971 rows x 13 columns]

您可能会经常重命名列,但很少重命名索引值,set_index() 通常可以更方便的重命名索引。
行索引和列索引都可以有自己的 name 属性。可以使用 rename_axis() 方法更改这些名称。例如:

import pandas as pd

pd.set_option('display.max_columns', None)   # 显示所有列
reviews = pd.read_csv("../data/winemag-data-130k-v2.csv", index_col=0)
print(reviews)
print(reviews.rename_axis("wines", axis='rows').rename_axis("fields", axis='columns'))
# 第一行的标签修改为"wines"   第一列的标签修改为"fields"
         country  ...                                    winery
0          Italy  ...                                   Nicosia
1       Portugal  ...                       Quinta dos Avidagos
2             US  ...                                 Rainstorm
3             US  ...                                St. Julian
4             US  ...                              Sweet Cheeks
...          ...  ...                                       ...
129966   Germany  ...  Dr. H. Thanisch (Erben Müller-Burggraef)
129967        US  ...                                  Citation
129968    France  ...                           Domaine Gresser
129969    France  ...                      Domaine Marcel Deiss
129970    France  ...                          Domaine Schoffit

[129971 rows x 13 columns]
fields   country  ...                                    winery
wines             ...                                          
0          Italy  ...                                   Nicosia
1       Portugal  ...                       Quinta dos Avidagos
2             US  ...                                 Rainstorm
3             US  ...                                St. Julian
4             US  ...                              Sweet Cheeks
...          ...  ...                                       ...
129966   Germany  ...  Dr. H. Thanisch (Erben Müller-Burggraef)
129967        US  ...                                  Citation
129968    France  ...                           Domaine Gresser
129969    France  ...                      Domaine Marcel Deiss
129970    France  ...                          Domaine Schoffit

[129971 rows x 13 columns]

合并数据

在对数据集执行操作时,我们有时需要以某种有意义的方式组合不同的DataFrames 和 Series 中的数据。为提高便利性,pandas提供了三种方法完成数据的合并,它们是concat()、join() 和merge()。merge() 所能做的大部分工作也可以使用 join() 更简单地来完成,因此我们将省略 join() 并将重点放在前两个函数上。
最简单的合并数据方法是 concat()。假定有一个元素列表,此函数将沿着一个轴将这些元素合并在一起。
这个方法在我们的数据来自不同的DataFrame或Series对象,但却具有相同的字段(列)名时非常有用。
例如:YouTube视频数据集,它根据来源国(如本例中的加拿大和英国)对数据进行拆分。如果我们想同时研究多个国家,我们可以使用 concat() 将它们混合在一起:

import pandas as pd

canadian_youtube = pd.read_csv("../data/CAvideos.csv")
print(canadian_youtube)   # 来自加拿大的数据
          video_id  ...                                        description
0      n1WpP7iowLc  ...  Eminem's new track Walk on Water ft. Beyoncé i...
1      0dBIkQ4Mz1M  ...  STill got a lot of packages. Probably will las...
2      5qpjK5DgCt4  ...  WATCH MY PREVIOUS VIDEO ▶ \n\nSUBSCRIBE ► http...
3      d380meD0W0M  ...  I know it's been a while since we did this sho...
4      2Vv-BfVoq4g  ...  🎧: https://ad.gt/yt-perfect\n💰: https://atlant...
...            ...  ...                                                ...
40876  sGolxsMSGfQ  ...  🚨 NEW MERCH! http://amzn.to/annoyingorange 🚨➤ ...
40877  8HNuRNi8t70  ...  ► Retrouvez vos programmes préférés : https://...
40878  GWlKEM3m2EE  ...  Find out more about Kingdom Hearts 3: https://...
40879  lbMKLzQ4cNQ  ...  Peter Navarro isn’t talking so tough now. Ana ...
40880  POTgw38-m58  ...  藝人:李妍瑾、玉兔、班傑、LaLa、小優、少少專家:陳筱屏(律師)、Wendy(心理師)、羅...

[40881 rows x 16 columns]
import pandas as pd

british_youtube = pd.read_csv("../data/GBvideos.csv")
print(british_youtube)  # 来自英国的数据
          video_id  ...                                        description
0      Jw1Y-zhQURU  ...  Click here to continue the story and make your...
1      3s1rvMFUweQ  ...  Musical guest Taylor Swift performs …Ready for...
2      n1WpP7iowLc  ...  Eminem's new track Walk on Water ft. Beyoncé i...
3      PUTEiSjKwJU  ...  Salford drew 4-4 against the Class of 92 and F...
4      rHwDegptbI4  ...  Dashcam captures truck's near miss with child ...
...            ...  ...                                                ...
38911  l884wKofd54  ...  NEW SONG - MOVE TO MIAMI feat. Pitbull (Click ...
38912  IP8k2xkhOdI  ...  THE OFFICIAL UP WITH IT MUSIC VIDEO!Get my new...
38913  Il-an3K9pjg  ...  Get 2002 by Anne-Marie HERE ▶ http://ad.gt/200...
38914  -DRsfNObKIQ  ...  Eleni Foureira represented Cyprus at the first...
38915  4YFo4bdMO8Q  ...  Debut album 'Light of Mine' out now: http://ky...

[38916 rows x 16 columns]
import pandas as pd

canadian_youtube = pd.read_csv("../data/CAvideos.csv")
british_youtube = pd.read_csv("../data/GBvideos.csv")
print(pd.concat([canadian_youtube, british_youtube])) # 合并加拿大和英国的数据
          video_id  ...                                        description
0      n1WpP7iowLc  ...  Eminem's new track Walk on Water ft. Beyoncé i...
1      0dBIkQ4Mz1M  ...  STill got a lot of packages. Probably will las...
2      5qpjK5DgCt4  ...  WATCH MY PREVIOUS VIDEO ▶ \n\nSUBSCRIBE ► http...
3      d380meD0W0M  ...  I know it's been a while since we did this sho...
4      2Vv-BfVoq4g  ...  🎧: https://ad.gt/yt-perfect\n💰: https://atlant...
...            ...  ...                                                ...
38911  l884wKofd54  ...  NEW SONG - MOVE TO MIAMI feat. Pitbull (Click ...
38912  IP8k2xkhOdI  ...  THE OFFICIAL UP WITH IT MUSIC VIDEO!Get my new...
38913  Il-an3K9pjg  ...  Get 2002 by Anne-Marie HERE ▶ http://ad.gt/200...
38914  -DRsfNObKIQ  ...  Eleni Foureira represented Cyprus at the first...
38915  4YFo4bdMO8Q  ...  Debut album 'Light of Mine' out now: http://ky...

[79797 rows x 16 columns]

复杂性居中间的组合器是join(),join() 允许组合具有共同索引的来自不同 DataFrame 的对象。例如,要删除加拿大和英国同一天的视频,我们可以执行以下操作:

import pandas as pd

canadian_youtube = pd.read_csv("../data/CAvideos.csv")
british_youtube = pd.read_csv("../data/GBvideos.csv")
left = canadian_youtube.set_index(['title', 'trending_date'])
right = british_youtube.set_index(['title', 'trending_date'])
print(left.join(right, lsuffix='_CAN', rsuffix='_UK'))
                                                                 video_id_CAN                channel_title_CAN  category_id_CAN          publish_time_CAN                                           tags_CAN  views_CAN  likes_CAN  dislikes_CAN  comment_count_CAN                              thumbnail_link_CAN  comments_disabled_CAN  ratings_disabled_CAN  video_error_or_removed_CAN                                    description_CAN video_id_UK channel_title_UK  category_id_UK publish_time_UK tags_UK  views_UK  likes_UK  dislikes_UK  comment_count_UK thumbnail_link_UK comments_disabled_UK ratings_disabled_UK video_error_or_removed_UK description_UK
title                                              trending_date                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
!! THIS VIDEO IS NOTHING BUT PAIN !! | Getting ... 18.04.01       PNn8sECd7io                       Markiplier               20  2018-01-03T19:33:53.000Z  getting over it|"markiplier"|"funny moments"|"...     835930      47058          1023               8250  https://i.ytimg.com/vi/PNn8sECd7io/default.jpg                  False                 False                       False  Getting Over It continues with RAGE BEYOND ALL...         NaN              NaN             NaN             NaN     NaN       NaN       NaN          NaN               NaN               NaN                  NaN                 NaN                       NaN            NaN
#1 Fortnite World Rank - 2,323 Solo Wins!          18.09.03       DvPW66IFhMI                   AlexRamiGaming               20  2018-03-09T07:15:52.000Z  PS4 Battle Royale|"PS4 Pro Battle Royale"|"Bat...     212838       5199           542                 11  https://i.ytimg.com/vi/DvPW66IFhMI/default.jpg                  False                 False                       False  Discord For EVERYONE - https://discord.gg/nhud...         NaN              NaN             NaN             NaN     NaN       NaN       NaN          NaN               NaN               NaN                  NaN                 NaN                       NaN            NaN
#1 Fortnite World Rank - 2,330 Solo Wins!          18.10.03       EXEaMjFeiEk                   AlexRamiGaming               20  2018-03-10T06:26:17.000Z  PS4 Battle Royale|"PS4 Pro Battle Royale"|"Bat...     200764       5620           537                 45  https://i.ytimg.com/vi/EXEaMjFeiEk/default.jpg                  False                 False                       False  Discord For EVERYONE - https://discord.gg/nhud...         NaN              NaN             NaN             NaN     NaN       NaN       NaN          NaN               NaN               NaN                  NaN                 NaN                       NaN            NaN
#1 MOST ANTICIPATED VIDEO (Timber Frame House R... 17.20.12       bYvQmusLaxw             Pure Living for Life               24  2017-12-20T02:49:11.000Z  timber frame|"timber framing"|"timber frame ra...      79152       7761           159               1965  https://i.ytimg.com/vi/bYvQmusLaxw/default.jpg                  False                 False                       False  Shelter Institute: http://bit.ly/2iwXj8B\nFull...         NaN              NaN             NaN             NaN     NaN       NaN       NaN          NaN               NaN               NaN                  NaN                 NaN                       NaN            NaN
                                                   17.21.12       bYvQmusLaxw             Pure Living for Life               24  2017-12-20T02:49:11.000Z  timber frame|"timber framing"|"timber frame ra...     232762      15515           329               3601  https://i.ytimg.com/vi/bYvQmusLaxw/default.jpg                  False                 False                       False  Shelter Institute: http://bit.ly/2iwXj8B\nFull...         NaN              NaN             NaN             NaN     NaN       NaN       NaN          NaN               NaN               NaN                  NaN                 NaN                       NaN            NaN
...                                                                       ...                              ...              ...                       ...                                                ...        ...        ...           ...                ...                                             ...                    ...                   ...                         ...                                                ...         ...              ...             ...             ...     ...       ...       ...          ...               ...               ...                  ...                 ...                       ...            ...
😲She Is So Nervous But BLOWS The ROOF After Tak... 18.02.05       WttN1Z0XF4k                     How Talented               24  2018-04-28T19:40:58.000Z  bgt|"bgt 2018"|"britain got talent"|"britain´s...     713400       4684           260                266  https://i.ytimg.com/vi/WttN1Z0XF4k/default.jpg                  False                 False                       False  Amy Marie Borg - Britain´s Got Talent 2018\n\n...         NaN              NaN             NaN             NaN     NaN       NaN       NaN          NaN               NaN               NaN                  NaN                 NaN                       NaN            NaN
                                                   18.29.04       WttN1Z0XF4k                     How Talented               24  2018-04-28T19:40:58.000Z  bgt|"bgt 2018"|"britain got talent"|"britain´s...     231906       1924            78                146  https://i.ytimg.com/vi/WttN1Z0XF4k/default.jpg                  False                 False                       False  Amy Marie Borg - Britain´s Got Talent 2018\n\n...         NaN              NaN             NaN             NaN     NaN       NaN       NaN          NaN               NaN               NaN                  NaN                 NaN                       NaN            NaN
                                                   18.30.04       WttN1Z0XF4k                     How Talented               24  2018-04-28T19:40:58.000Z  bgt|"bgt 2018"|"britain got talent"|"britain´s...     476253       3417           176                240  https://i.ytimg.com/vi/WttN1Z0XF4k/default.jpg                  False                 False                       False  Amy Marie Borg - Britain´s Got Talent 2018\n\n...         NaN              NaN             NaN             NaN     NaN       NaN       NaN          NaN               NaN               NaN                  NaN                 NaN                       NaN            NaN
🚨 BREAKING NEWS 🔴 Raja Live all Slot Channels W... 18.07.05       Wt9Gkpmbt44                    TheBigJackpot               24  2018-05-07T06:58:59.000Z  Slot Machine|"win"|"Gambling"|"Big Win"|"raja"...      28973       2167           175                 10  https://i.ytimg.com/vi/Wt9Gkpmbt44/default.jpg                  False                 False                       False  The Raja takes matters in to his own hands ton...         NaN              NaN             NaN             NaN     NaN       NaN       NaN          NaN               NaN               NaN                  NaN                 NaN                       NaN            NaN
🚨Active Shooter at YouTube Headquarters - LIVE ... 18.04.04       Az72jrKbANA  Right Side Broadcasting Network               25  2018-04-03T23:12:37.000Z  YouTube shooter|"YouTube active shooter"|"acti...     103513       1722           181                 76  https://i.ytimg.com/vi/Az72jrKbANA/default.jpg                  False                 False                       False  An active shooter has been reported at the You...         NaN              NaN             NaN             NaN     NaN       NaN       NaN          NaN               NaN               NaN                  NaN                 NaN                       NaN            NaN

[40900 rows x 28 columns]

这里需要lsuffix和rsuffix参数,因为数据在英国和加拿大数据集中具有相同的列名。如果不是这样(例如,我们已经事先给它们改名了),就不需要它们了。

转载自三味书屋