1. 创建Series对象

1、列表生成Series

import pandas as pd
import numpy as np
a = pd.Series([3, 2, 4, 1, 5])
print(a)
print(a.values)
print(a.index)
0    3
1    2
2    4
3    1
4    5
dtype: int64
[3 2 4 1 5]
RangeIndex(start=0, stop=5, step=1)

2、新建Series时指定索引

b = pd.Series([3, 2, 4, 1, 5], index=('b', 'a', 'd', 'w', 'y'))
print(b)
print(b.values)
print(b.index)
b    3
a    2
d    4
w    1
y    5
dtype: int64
[3 2 4 1 5]
Index(['b', 'a', 'd', 'w', 'y'], dtype='object')

3、从Numpy数组生成Series

index默认值是整数序列：Series通用的Numpy数组，Numpy是隐式索引值

a = np.arange(3)
b = pd.Series(a)
print(b)
0    0
1    1
2    2
dtype: int32

4、从标量生成Series

标量生成Series，index默认值是整数序列

print(pd.Series(4, index=[100, 200, 300]))
100    4
200    4
300    4
dtype: int64

5、字典生成Series

index是字典键；Series是特殊的字典，字典键是Series的索引

population_dict = {'a': 12, 'b': 25, 'c': 56, 'd': 67, 'e': 42}
print("population_dict:\n", population_dict)
population = pd.Series(population_dict)
print("population:\n", population)
population_dict:
 {'a': 12, 'b': 25, 'c': 56, 'd': 67, 'e': 42}
population:
 a    12
b    25
c    56
d    67
e    42
dtype: int64

6、通过指定索引值生成Series

print(pd.Series({2: 'a', 1: 'b', 3: 'c'}, index=[3, 1, 2]))
print(pd.Series(population, index=['a', 'd']))
3    c
1    b
2    a
dtype: object
a    12
d    67
dtype: int64

7、使用DataFrame的一列映射Series数据

print(cities['area'])
a    12
b    25
c    56
d    67
e    42
Name: area, dtype: int64

2. 创建DateFrame对象

1、通过单个Series对象创建

import numpy as np
import pandas as pd
# 通过单个Series对象创建
a = np.arange(3)
b = pd.Series(a)
print(pd.DataFrame(b))
   0
0  0
1  1
2  2

2、典键值总是对应列名

data = [{'a': 12, 'b': 25}, {'a': 56, 'b': 67}]
print(pd.DataFrame(data))
    a   b
0  12  25
1  56  67

3、有规律的创建

data = [{'a': i, 'b': 2 * i} for i in range(3)]
print(pd.DataFrame(data))
   a  b
0  0  0
1  1  2
2  2  4

4、创建时有缺失值处理

创建时数据有缺失值时，缺失值一般用NaN(Not a Number)表示

data = [{'a': 12, 'b': 25}, {'a': 56, 'c': 67}]
print(pd.DataFrame(data))
    a     b     c
0  12  25.0   NaN
1  56   NaN  67.0

5、使用Series对象字典创建

population_dict = {'a': 12, 'b': 25, 'c': 56, 'd': 67, 'e': 42}
print("population_dict:\n", population_dict)
population = pd.Series(population_dict)
print(pd.DataFrame(population))
population_dict:
 {'a': 12, 'b': 25, 'c': 56, 'd': 67, 'e': 42}
    0
a  12
b  25
c  56
d  67
e  42

6、使用不同类型数据创建

使用字典作为其中一列数据，Series对象字典作为另一列数据进行创建

print(population_dict)
print(population)
cities = pd.DataFrame({'area': population_dict, 'bb': population})
print(cities)
{'a': 12, 'b': 25, 'c': 56, 'd': 67, 'e': 42}
a    12
b    25
c    56
d    67
e    42
dtype: int64
   area  bb
a    12  12
b    25  25
c    56  56
d    67  67
e    42  42

7、使用二维数组创建

print('通过二维数组创建')
aa = pd.DataFrame(np.random.rand(3, 2), columns=['foo', 'bar'], index=['a', 'b', 'c'])
print(aa)
        foo       bar
a  0.888702  0.962784
b  0.923443  0.935304
c  0.255582  0.105020

8、列名重命名

# 读入文件的时候 重命名
df = pd.read_csv('xxx.csv', names=new_columns, header=0)
# 部分列重命名
df.rename(columns={'a':'A'})
# 没有指定inplace=True， df本⾝的列名并没有改变。
df.rename(columns={'a':'A'}, inplace=True)
# 全部列重命名
# df.columns = new_columns, new_coumns 
# 可以是列表或元组， 但新旧列名的长度必须⼀致，否者会不匹配报错。
# 这种改变⽅式是直接改变了原始数据。
df.columns = ['a1', 'b1', 'c1', 'd1']
# str批量修改列名
# 将'a1', 'b1'...修改为'a2', 'b2'...
df.columns = df.columns.str.replace('1','2')

9、字典生成DataFrame

字典的键作为列名

import pandas as pd
dict_data={
    'Name':['Bambang'],
    'Gender':['Male'],
    'Age':[25]
}
df = pd.DataFrame.from_dict(dict_data)
df

字典的键作为index

dict_data={
    'Name':'Bambang',
    'Gender':'Male',
    'Age':25
}
df = pd.Series(dict_data)
df = pd.DataFrame(df, columns=['index'])
df

参考资料：https://wenku.baidu.com/view/d914e2aaa3116c175f0e7cd184254b35eefd1aa4.html

Pandas

1 创建DataFrame与Series对象

1. 创建Series对象

1、列表生成Series

2、新建Series时指定索引

3、从Numpy数组生成Series

4、从标量生成Series

5、字典生成Series

6、通过指定索引值生成Series

7、使用DataFrame的一列映射Series数据

2. 创建DateFrame对象

1、通过单个Series对象创建

2、典键值总是对应列名

3、有规律的创建

4、创建时有缺失值处理

5、使用Series对象字典创建

6、使用不同类型数据创建

7、使用二维数组创建

8、列名重命名

9、字典生成DataFrame

字典的键作为列名

字典的键作为index

1 创建DataFrame与Series对象

1. 创建Series对象

1、 列表生成Series

2、 新建Series时指定索引

3、 从Numpy数组生成Series

4、 从标量生成Series

5、 字典生成Series

6、 通过指定索引值生成Series

7、 使用DataFrame的一列映射Series数据

2. 创建DateFrame对象

1、 通过单个Series对象创建

2、典键值总是对应列名

3、 有规律的创建

4、 创建时有缺失值处理

5、 使用Series对象字典创建

6、 使用不同类型数据创建

7、 使用二维数组创建

8、列名重命名

9、字典生成DataFrame

字典的键作为列名

字典的键作为index

1、列表生成Series

2、新建Series时指定索引

3、从Numpy数组生成Series

4、从标量生成Series

5、字典生成Series

6、通过指定索引值生成Series

7、使用DataFrame的一列映射Series数据

1、通过单个Series对象创建

3、有规律的创建

4、创建时有缺失值处理

5、使用Series对象字典创建

6、使用不同类型数据创建

7、使用二维数组创建