2. numpy array - 《numpy》

Numpy 和 Python List 的差别
Numpy 的优势
创建数据
- np.array()，array.ndim
添加数据
- np.concatenate()， np.expand_dims()
  - 一维添加元素
  - 添加维度
合并数据
- np.concatenate() 矩阵行列合并
- np.vstack(); np.hstack() 两个比较好用的在二维数据上可以方便调用的合并函数
观察数据形态
- array.size； array.shape

Numpy 和 Python List 的差别

在 Numpy 中，我们会一直使用到它的一种 Array 数据。List 和 Numpy array 共同点, 存储并且可以按顺序提取出来, 对内部的一个值进行修改也是同样的逻辑。

my_list = [1,2,3]
print(my_list[0])
my_np_array = np.array([1,2,3])
print(my_array[0])

1
1

my_list[0] = -1
my_np_array[0] = -1
print(my_list)
print(my_np_array)

[-1, 2, 3]
[-1 2 3]

Numpy 的优势

Numpy的核心优势：运算快。用专业的语言描述的话，Numpy 喜欢用电脑内存中连续的一块物理地址存储数据，因为都是连号的嘛，找到前后的号，不用跑很远，非常迅速。而 Python 的 List 并不是连续存储的，它的数据是分散在不同的物理空间，在批量计算的时候，连号的肯定比不连号的算起来更快。因为找他们的时间更少了。而且 Numpy Array 存储的数据格式也有限制，尽量都是同一种数据格式，这样也有利于批量的数据计算。所以只要是处理大规模数据的批量计算，Numpy 肯定会比 Python 的原生 List 要快。

import time
t0 = time.time()
# python list
l = list(range(100))
for _ in range(10000):
    for i in range(len(l)):
        l[i] += 1
t1 = time.time()
# numpy array
a = np.array(l)
for _ in range(10000):
    a += 1
print("Python list spend {:.3f}s".format(t1-t0))
print("Numpy array spend {:.3f}s".format(time.time()-t1))

Python list spend 0.102s
Numpy array spend 0.012s
Numpy Array 和 Python List 在很多使用场景上是可以互换的，不过在大数据处理的场景下，而且你的数据类型又高度统一，那么 Numpy 绝对是你不二的人选，能提升的运算速度也是杠杠的~

创建数据

np.array()，array.ndim

1维

>>> import numpy as np
>>>
>>> cars = np.array([5, 10, 12, 6])
>>> print("数据：", cars, "\n维度：", cars.ndim)
数据： [ 5 10 12  6]
维度： 1

2维

>>> cars = np.array([
... [5, 10, 12, 6],
... [5.1, 8.2, 11, 6.3],
... [4.4, 9.1, 10, 6.6]
... ])
>>>
>>> print("数据：\n", cars, "\n维度：", cars.ndim)
数据：
 [[ 5.  10.  12.   6. ]
 [ 5.1  8.2 11.   6.3]
 [ 4.4  9.1 10.   6.6]]
维度： 2

3维

cars = np.array([
[
    [5, 10, 12, 6],
    [5.1, 8.2, 11, 6.3],
    [4.4, 9.1, 10, 6.6]
],
[
    [6, 11, 13, 7],
    [6.1, 9.2, 12, 7.3],
    [5.4, 10.1, 11, 7.6]
],
])
print("总维度：", cars.ndim)
print("场地 1 数据：\n", cars[0], "\n场地 1 维度：", cars[0].ndim)
print("场地 2 数据：\n", cars[1], "\n场地 2 维度：", cars[1].ndim)
print("场地 1 数据,第一行：", cars[0][0])
print("场地 1 数据,第一行,第一个元素：", cars[0][0][0])

总维度： 3
场地 1 数据：
[[ 5. 10. 12. 6. ]
[ 5.1 8.2 11. 6.3]
[ 4.4 9.1 10. 6.6]]
场地 1 维度： 2
场地 2 数据：
[[ 6. 11. 13. 7. ]
[ 6.1 9.2 12. 7.3]
[ 5.4 10.1 11. 7.6]]
场地 2 维度： 2
场地 1 数据,第一行： [ 5. 10. 12. 6.]
场地 1 数据,第一行,第一个元素： 5.0

添加数据

np.concatenate()， np.expand_dims()

一维添加元素

cars1 = np.array([5, 10, 12, 6])
cars2 = np.array([5.2, 4.2])
cars = np.concatenate([cars1, cars2])
print(cars)
[ 5.  10.  12.   6.   5.2  4.2]

添加维度

test1 = np.array([5, 10, 12, 6])
test2 = np.array([5.1, 8.2, 11, 6.3])
print("test1加维度前 ", test1)
print("test2加维度前 ", test2)
# 首先需要把它们都变成二维，下面这两种方法都可以加维度
test1 = np.expand_dims(test1, 0)
test2 = test2[np.newaxis, :]
print("test1加维度后 ", test1)
print("test2加维度后 ", test2)
# 然后再在第一个维度上叠加
all_tests = np.concatenate([test1, test2])
print("括展后\n", all_tests)

test1加维度前 [ 5 10 12 6]
test2加维度前 [ 5.1 8.2 11. 6.3]
test1加维度后 [[ 5 10 12 6]]
test2加维度后 [[ 5.1 8.2 11. 6.3]]
括展后
[[ 5. 10. 12. 6. ]
[ 5.1 8.2 11. 6.3]]

合并数据

np.concatenate() 矩阵行列合并

在第二个维度上叠加, 只需要巧妙给 np.concatenate 一个参数就好。

print("第一维度叠加：\n", np.concatenate([all_tests, all_tests], axis=0))
print("第二维度叠加：\n", np.concatenate([all_tests, all_tests], axis=1))

注意，有些数据维度是对不齐的，这样没办法合并。

a = np.array([
[1,2,3],
[4,5,6]
])
b = np.array([
[7,8],
[9,10]
])
print(np.concatenate([a,b], axis=1))  # 这个没问题
print(np.concatenate([a,b], axis=0))  # 这个会报错

[[ 1 2 3 7 8]
[ 4 5 6 9 10]]
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 3 and the array at index 1 has size 2

np.vstack(); np.hstack() 两个比较好用的在二维数据上可以方便调用的合并函数

>>> a = np.array([
... [1,2],
... [3,4]
... ])
>>> b = np.array([
... [5,6],
... [7,8]
... ])
>>> print("竖直合并\n", np.vstack([a, b]))
竖直合并
 [[1 2]
 [3 4]
 [5 6]
 [7 8]]
>>> print("水平合并\n", np.hstack([a, b]))
水平合并
 [[1 2 5 6]
 [3 4 7 8]]

观察数据形态

除了 np.ndim 来查看数据的形态，其实我们有时候还想更加了解数据的细节问题，比如这个数据的大小，规格。方便我们管理这些数据。比如当我想知道到底有多少车辆测试数据时，你可能会通过遍历的方法来计数。

array.size； array.shape

cars = np.array([
[5, 10, 12, 6],
[5.1, 8.2, 11, 6.3],
[4.4, 9.1, 10, 6.6]
])
print("总共多少测试数据：", cars.size) 
print("第一个维度：", cars.shape[0])
print("第二个维度：", cars.shape[1])
print("所有维度：", cars.shape)

总共多少测试数据： 12
第一个维度： 3
第二个维度： 4
所有维度： (3, 4)