读取全部数据,显示前5行

    1. import pandas as pd
    2. def csv_to_df(file):
    3. pd.set_option('display.max_columns', None) # 显示所有列
    4. pd.set_option('display.max_rows', None) # 显示所有行
    5. pd.set_option('display.width', None) # 显示宽度是无限
    6. wine_df = pd.read_csv(file)
    7. print(wine_df.head())
    8. if __name__ == '__main__':
    9. filename = '../data/winemag-data-130k-v2.csv'
    10. csv_to_df(filename)
    1. number country description designation points price province region_1 region_2 taster_name taster_twitter_handle title variety winery
    2. 0 0 Italy Aromas include tropical fruit, broom, brimston... Vulkà Bianco 87 NaN Sicily & Sardinia Etna NaN Kerin OKeefe @kerinokeefe Nicosia 2013 Vulkà Bianco (Etna) White Blend Nicosia
    3. 1 1 Portugal This is ripe and fruity, a wine that is smooth... Avidagos 87 15.0 Douro NaN NaN Roger Voss @vossroger Quinta dos Avidagos 2011 Avidagos Red (Douro) Portuguese Red Quinta dos Avidagos
    4. 2 2 US Tart and snappy, the flavors of lime flesh and... NaN 87 14.0 Oregon Willamette Valley Willamette Valley Paul Gregutt @paulgwine Rainstorm 2013 Pinot Gris (Willamette Valley) Pinot Gris Rainstorm
    5. 3 3 US Pineapple rind, lemon pith and orange blossom ... Reserve Late Harvest 87 13.0 Michigan Lake Michigan Shore NaN Alexander Peartree NaN St. Julian 2013 Reserve Late Harvest Riesling ... Riesling St. Julian
    6. 4 4 US Much like the regular bottling from 2012, this... Vintner's Reserve Wild Child Block 87 65.0 Oregon Willamette Valley Willamette Valley Paul Gregutt @paulgwine Sweet Cheeks 2012 Vintner's Reserve Wild Child... Pinot Noir Sweet Cheeks

    自定义显示的列名

    1. import pandas as pd
    2. def csv_to_df(file):
    3. pd.set_option('display.max_columns', None) # 显示所有列
    4. pd.set_option('display.max_rows', None) # 显示所有行
    5. pd.set_option('display.width', None) # 显示宽度是无限
    6. wine_df = pd.read_csv(file, skiprows=1,names=['编号','国家','描述','标识','评分','价格','省份','地区1','地区2','品酒师','twitter帐号','名称','品种','酒厂'])
    7. print(wine_df.head())
    8. if __name__ == '__main__':
    9. filename = '../data/winemag-test.csv'
    10. csv_to_df(filename)
    1. 编号 国家 描述 标识 评分 价格 省份 地区1 地区2 品酒师 twitter帐号 名称 品种 酒厂
    2. 0 0 Italy Aromas include tropical fruit, broom, brimston... Vulkà Bianco 87.0 NaN Sicily & Sardinia Etna NaN Kerin OKeefe @kerinokeefe Nicosia 2013 Vulkà Bianco (Etna) White Blend Nicosia
    3. 1 1 Portugal This is ripe and fruity, a wine that is smooth... Avidagos 87.0 15.0 Douro NaN NaN Roger Voss @vossroger Quinta dos Avidagos 2011 Avidagos Red (Douro) Portuguese Red Quinta dos Avidagos
    4. 2 2 US Tart and snappy, the flavors of lime flesh and... NaN 87.0 14.0 Oregon Willamette Valley Willamette Valley Paul Gregutt @paulgwine Rainstorm 2013 Pinot Gris (Willamette Valley) Pinot Gris Rainstorm
    5. 3 3 NaN Pineapple rind, lemon pith and orange blossom ... Reserve Late Harvest NaN 13.0 Michigan Lake Michigan Shore NaN Alexander Peartree NaN St. Julian 2013 Reserve Late Harvest Riesling ... Riesling St. Julian
    6. 4 4 US Much like the regular bottling from 2012, this... Vintner's Reserve Wild Child Block 87.0 65.0 Oregon Willamette Valley Willamette Valley Paul Gregutt @paulgwine Sweet Cheeks 2012 Vintner's Reserve Wild Child... Pinot Noir Sweet Cheeks

    选择输出指定列的数据

    1. import pandas as pd
    2. def csv_to_df(file):
    3. pd.set_option('display.max_columns', None) # 显示所有列
    4. pd.set_option('display.max_rows', None) # 显示所有行
    5. pd.set_option('display.width', None) # 显示宽度是无限
    6. wine_df = pd.read_csv(file, usecols=['number','country','points','price','variety'])
    7. print(wine_df.head(10))
    8. if __name__ == '__main__':
    9. filename = '../data/winemag-test.csv'
    10. csv_to_df(filename)
    1. number country points price variety
    2. 0 0 Italy 87.0 NaN White Blend
    3. 1 1 Portugal 87.0 15.0 Portuguese Red
    4. 2 2 US 87.0 14.0 Pinot Gris
    5. 3 3 NaN NaN 13.0 Riesling
    6. 4 4 US 87.0 65.0 Pinot Noir
    7. 5 5 Spain 87.0 15.0 NaN
    8. 6 6 Italy 87.0 16.0 Frappato
    9. 7 7 France 87.0 24.0 Gewürztraminer
    10. 8 8 Germany 87.0 12.0 Gewürztraminer
    11. 9 9 France 87.0 27.0 Pinot Gris

    查看缺失数据情况

    1. import pandas as pd
    2. def csv_to_df(file):
    3. wine_df = pd.read_csv(file, usecols=['number','country','points','price','variety'])
    4. print(wine_df.head(10))
    5. print(wine_df.country[3])
    6. print(wine_df.points[3])
    7. print(wine_df.iloc[3, 2])
    8. print(type(wine_df.iloc[3,2]))
    9. print(type(wine_df.points[3]))
    10. if __name__ == '__main__':
    11. filename = '../data/winemag-test.csv'
    12. csv_to_df(filename)
    1. number country points price variety
    2. 0 0 Italy 87.0 NaN White Blend
    3. 1 1 Portugal 87.0 15.0 Portuguese Red
    4. 2 2 US 87.0 14.0 Pinot Gris
    5. 3 3 NaN NaN 13.0 Riesling
    6. 4 4 US 87.0 65.0 Pinot Noir
    7. 5 5 Spain 87.0 15.0 NaN
    8. 6 6 Italy 87.0 16.0 Frappato
    9. 7 7 France 87.0 24.0 Gewürztraminer
    10. 8 8 Germany 87.0 12.0 Gewürztraminer
    11. 9 9 France 87.0 27.0 Pinot Gris
    12. nan
    13. nan
    14. nan
    15. <class 'numpy.float64'>
    16. <class 'numpy.float64'>

    填充缺失值

    1. DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)

    inplace=False:默认值,填充后产生新的dataframe,用的时候需要重新命名
    inplace=True: 原地填充,直接改变当前dataframe的数据
    value:scalar, dict, Series, or DataFrame
    用来填充缺失的值(例如0),可以是dict、Series或DataFrame,指定每个索引(对于序列)或列(对于DataFrame)使用哪个值。不在dict、Series或DataFrame中的值将不会被填充。此值不能是列表。
    method:{‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
    用于在重新编制索引的系列垫中填充缺失值的方法
    pad / ffill表示用前面行/列的值,填充当前行/列的空值,
    backfill / bfill表示用后面行/列的值,填充当前行/列的空值。
    这里的前、后一个数值默认是纵向看的,如果需要使用左或者右边的数值进行填充,只需要加参数axis=1,就可以了。
    axis:{0 or ‘index’, 1 or ‘columns’}
    确定填充缺失值时的轴的方向,0 或‘index’为纵向, 1或 ‘columns’为横向填充。
    inplace:bool, 缺省 False
    缺省情况下不改变当前dataframe对象,创建 一个新的对象并填充,如果设为True,直接在当前dataframe中填充数据,值为True时返回值是None,值为False时返回值是填充后新的dataframe.
    limit:int, 缺省 None
    用limit限制每列可以替代NaN的数目,下面我们限制每列只能替代一个NaN。如果method被指定,对于连续的空值,这段连续区域,最多填充前 limit 个空值(如果存在多段连续区域,每段最多填充前 limit 个空值)。如果method未被指定, 在该axis下,最多填充前 limit 个空值(不论空值连续区间是否间断)
    downcast:dict, 缺省 None
    字典中的项为,为类型向下转换规则。或者为字符串“infer”,此时会在合适的等价类型之间进行向下转换,比如float64转 int64 。

    1. import pandas as pd
    2. def csv_to_df(file):
    3. wine_df = pd.read_csv(file, usecols=['number','country','points','price','variety'])
    4. wine_df.fillna(0, inplace=True)
    5. print(wine_df.head(10))
    6. if __name__ == '__main__':
    7. filename = '../data/winemag-test.csv'
    8. csv_to_df(filename)
    1. number country points price variety
    2. 0 0 Italy 87.0 0.0 White Blend
    3. 1 1 Portugal 87.0 15.0 Portuguese Red
    4. 2 2 US 87.0 14.0 Pinot Gris
    5. 3 3 0 0.0 13.0 Riesling
    6. 4 4 US 87.0 65.0 Pinot Noir
    7. 5 5 Spain 87.0 15.0 0
    8. 6 6 Italy 87.0 16.0 Frappato
    9. 7 7 France 87.0 24.0 Gewürztraminer
    10. 8 8 Germany 87.0 12.0 Gewürztraminer
    11. 9 9 France 87.0 27.0 Pinot Gris

    fillna可以指定数值进行填充,也可以使用计算公式进行填充,比如df.mean()、df.sum()等。
    还可以指定用那一列的数据进行填充,例如只填充price列:

    1. import pandas as pd
    2. def csv_to_df(file):
    3. wine_df = pd.read_csv(file, usecols=['number','country','points','price','variety'])
    4. wine_df.price.fillna(0, inplace=True)
    5. print(wine_df.head(10))
    6. if __name__ == '__main__':
    7. filename = '../data/winemag-test.csv'
    8. csv_to_df(filename)
    1. number country points price variety
    2. 0 0 Italy 87.0 0.0 White Blend
    3. 1 1 Portugal 87.0 15.0 Portuguese Red
    4. 2 2 US 87.0 14.0 Pinot Gris
    5. 3 3 NaN NaN 13.0 Riesling
    6. 4 4 US 87.0 65.0 Pinot Noir
    7. 5 5 Spain 87.0 15.0 NaN
    8. 6 6 Italy 87.0 16.0 Frappato
    9. 7 7 France 87.0 24.0 Gewürztraminer
    10. 8 8 Germany 87.0 12.0 Gewürztraminer
    11. 9 9 France 87.0 27.0 Pinot Gris

    用价格的平均值填充价格列:

    1. import pandas as pd
    2. def csv_to_df(file):
    3. wine_df = pd.read_csv(file, usecols=['number','country','points','price','variety'])
    4. wine_df.price.fillna(wine_df.price.mean(), inplace=True)
    5. print(wine_df.head(10))
    6. if __name__ == '__main__':
    7. filename = '../data/winemag-test.csv'
    8. csv_to_df(filename)
    1. number country points price variety
    2. 0 0 Italy 87.0 35.363389 White Blend
    3. 1 1 Portugal 87.0 15.000000 Portuguese Red
    4. 2 2 US 87.0 14.000000 Pinot Gris
    5. 3 3 NaN NaN 13.000000 Riesling
    6. 4 4 US 87.0 65.000000 Pinot Noir
    7. 5 5 Spain 87.0 15.000000 NaN
    8. 6 6 Italy 87.0 16.000000 Frappato
    9. 7 7 France 87.0 24.000000 Gewürztraminer
    10. 8 8 Germany 87.0 12.000000 Gewürztraminer
    11. 9 9 France 87.0 27.000000 Pinot Gris

    用后一个值填充

    1. import pandas as pd
    2. def csv_to_df(file):
    3. wine_df = pd.read_csv(file, usecols=['number','country','points','price','variety'])
    4. wine_df.points.fillna(method='bfill', inplace=True)
    5. print(wine_df.head(10))
    6. if __name__ == '__main__':
    7. filename = '../data/winemag-test.csv'
    8. csv_to_df(filename)
    1. number country points price variety
    2. 0 0 Italy 83.0 NaN White Blend
    3. 1 1 Portugal 87.0 15.0 Portuguese Red
    4. 2 2 US 89.0 14.0 Pinot Gris
    5. 3 3 NaN 85.0 13.0 Riesling
    6. 4 4 US 85.0 65.0 Pinot Noir
    7. 5 5 Spain 87.0 15.0 NaN
    8. 6 6 Italy 97.0 16.0 Frappato
    9. 7 7 France 87.0 24.0 Gewürztraminer
    10. 8 8 Germany 80.0 12.0 Gewürztraminer
    11. 9 9 France 87.0 27.0 Pinot Gris

    用前一个值进行填充

    1. import pandas as pd
    2. def csv_to_df(file):
    3. wine_df = pd.read_csv(file, usecols=['number','country','points','price','variety'])
    4. wine_df.points.fillna(method='ffill', inplace=True)
    5. print(wine_df.head(10))
    6. if __name__ == '__main__':
    7. filename = '../data/winemag-test.csv'
    8. csv_to_df(filename)
    1. number country points price variety
    2. 0 0 Italy 83.0 NaN White Blend
    3. 1 1 Portugal 87.0 15.0 Portuguese Red
    4. 2 2 US 89.0 14.0 Pinot Gris
    5. 3 3 NaN 89.0 13.0 Riesling
    6. 4 4 US 85.0 65.0 Pinot Noir
    7. 5 5 Spain 87.0 15.0 NaN
    8. 6 6 Italy 97.0 16.0 Frappato
    9. 7 7 France 87.0 24.0 Gewürztraminer
    10. 8 8 Germany 80.0 12.0 Gewürztraminer
    11. 9 9 France 87.0 27.0 Pinot Gris