📃 Pandas核心数据结构 - 📃 Series - 《小昱的Python深度学习笔记》

Series的创建
- 通过字典创建
- 指定索引
Series的常用属性
Series的获取
Series的运算

Series是一种类似于一维数组的对象，它由一维数组（各种numpy数据类型）以及一组与之相关的数据标签（即索引）组成。

Series的创建

通过以下方式创建的Series会自动添加索引：

s = pd.Series(['a','b','c','d','e'])
print(s)

输出：

0    a
1    b
2    c
3    d
4    e
dtype: object

通过字典创建

Series 可以用字典实例化：

d = {'b': 1, 'a': 0, 'c': 2}
pd.Series(d)

输出：

b    1
a    0
c    2
dtype: int64

指定索引

通过index指定索引，与字典不同的是：Series允许**重复的**索引。

s = pd.Series(['a','b','c','d','e'],index=[100,200,100,400,500])
print(s)

输出：

100    a
200    b
100    c
400    d
500    e
dtype: object

Series的常用属性

可以通过Series的values和index属性获取其数组表示形式和索引对象：

s = pd.Series(['a','b','c','d','e'])
print(s.values)
print(s.index)

输出：

['a' 'b' 'c' 'd' 'e']
Int64Index([100, 200, 100, 400, 500], dtype='int64')

Series的获取

与普通numpy数组相比，可以通过索引的方式选取Series中的单个或一组值

s = pd.Series(['a','b','c','d','e'],index=[100,200,100,400,500])

如果存在索引相同的项，则全部输出：

print(s[100])

输出：

100    a
100    c
dtype: object

获取多个索引项：

print(s[[400, 500]])

输出：

400    d
500    e
dtype: object

Series的运算

创建以下Series：

s = pd.Series(np.array([1,2,3,4,5]), index=['a', 'b', 'c', 'd', 'e'])
print(s)

a    1
b    2
c    3
d    4
e    5
dtype: int64

对应元素求和

使用+会将相同索引的元素进行相加：

print(s+s)

a     2
b     4
c     6
d     8
e    10
dtype: int64

对应元素求积

使用*会将相同索引的元素进行相乘：

print(s*3)

a     3
b     6
c     9
d    12
e    15
dtype: int64

不同索引的对齐

Series中最重要的一个功能是：它会在算术运算中自动对齐不同索引的数据
Series 和多维数组的主要区别在于， Series 之间的操作会自动基于标签对齐数据。因此，不用顾及执行计算操作的 Series 是否有相同的标签。

obj1 = pd.Series({"Ohio": 35000, "Oregon": 16000, "Texas": 71000, "Utah": 5000})
obj2 = pd.Series({"California": np.nan, "Ohio": 35000, "Oregon": 16000, "Texas": 71000})
print(obj1)
print(obj2)
print(obj1 + obj2)

输出：

Ohio      35000
Oregon    16000
Texas     71000
Utah       5000
dtype: int64
California        NaN
Ohio          35000.0
Oregon        16000.0
Texas         71000.0
dtype: float64
California         NaN
Ohio           70000.0
Oregon         32000.0
Texas         142000.0
Utah               NaN
dtype: float64

又如：

s = pd.Series(np.array([1,2,3,4,5]), index=['a', 'b', 'c', 'd', 'e'])
print(s[1:])
print(s[:-1])
print(s[1:] + s[:-1])

b    2
c    3
d    4
e    5
dtype: int64
a    1
b    2
c    3
d    4
dtype: int64
a    NaN
b    4.0
c    6.0
d    8.0
e    NaN
dtype: float64