数据科学基础plt-numpy-pandas - numpy 科学计算基础 - 《玩物丧记》

NumPy（Numeric Python）提供了许多高级的数值编程工具，如：矩阵数据类型、矢量处理，以及精密的运算库。专为进行严格的数字处理而产生。
创建数组（矩阵）
读取数据
生成随机数
NAN与无穷大
- 概念
- 注意点
数组的计算

NumPy（Numeric Python）提供了许多高级的数值编程工具，如：矩阵数据类型、矢量处理，以及精密的运算库。专为进行严格的数字处理而产生。

矩阵和行列式 https://blog.csdn.net/qq_37469992/article/details/56844407
有空把3Blue1Brown 的课程刷完，线性代数—>https://www.bilibili.com/video/av6731067/?p=6

创建数组（矩阵）

数据类型

通过array创建数组

>>> numpy.array(range(10))
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> numpy.arange(1,20,3)
array([ 1,  4,  7, 10, 13, 16, 19])

通过dtype属性查看数据类型
通过.astype(‘int8’)修改数据类型。

保留2位小数，与python中相同

数组的形状

两行三列的数组
通过.shape查看形状
通过reshape修改形状

>>> t = numpy.array([[1,2,3],[4,5,6]])
>>> t.shape
(2, 3)

一维数组转化为三行四列

（reshape不改变本身，而是返回一个改变后的值）

reshape成一个三维数组（2块，3行，4列）

一个三维数组：

修改为一维数组的两种方法 flatten 弄平的意思

>>> t = numpy.array([[1,2,3],[4,5,6]])
>>> t2 = t.reshape((t.shape[0]*t.shape[1],))
>>> t2
array([1, 2, 3, 4, 5, 6])
>>> t3 = t.flatten()
>>> t3
array([1, 2, 3, 4, 5, 6])

读取数据

本地数据读取

numpy.loadtxt(frame, dtype=none,delimiter=none,skiprows=0,usecols=none,unpack=False)

unpack 改为True，相当于是转置矩阵。
t = t.transpose() 或 t = t.T也是转置的效果
t = t.swapaxes(1,0) 交换轴也可以达到转置的效果，之前是（0,1）现在交换为（1,0）
取第2行第4列的值：
t[2,4]

取到之后可以直接修改值。

取行

t = numpy.arange(24).reshape(4,6)
取第三行
t[2]
取连续多行：
t[2:]
取不连续的多行：
t[[1,3,6,7,13]]

取列

取连续的多列：
t[:,[2:8]]
取不连续的多列：
t[:,[1,2,5,7,10]]
取3行到5行，2列到4列的结果：
t[2:5,1:4]

修改值

例子1：修改t中小于15的数为10

>>> t
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])
>>> t<15
array([[ True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True],
       [ True,  True,  True, False, False, False],
       [False, False, False, False, False, False]])
# 修改t中小于15的数为10
>>> t[t<15] = 10
>>> t
array([[10, 10, 10, 10, 10, 10],
       [10, 10, 10, 10, 10, 10],
       [10, 10, 10, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])

选择第2列中比50万小的每一行
例子2：修改t中小于10的数为0，大于大于等于10的为10（where 三元运算符）

>>> t = numpy.arange(24).reshape(4,6)
>>> t1 = numpy.where(t<10,0,10)
>>> t1
array([[ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0, 10, 10],
       [10, 10, 10, 10, 10, 10],
       [10, 10, 10, 10, 10, 10]])

又或者使用clip
小于10的替换为10，大于18的替换为18

>>> t1 = t.clip(10,18)
>>> t1
array([[10, 10, 10, 10, 10, 10],
       [10, 10, 10, 10, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 18, 18, 18, 18, 18]])

替换为nan的方法：
t1 = np.arange(24).reshape(4,6).astype(‘float’)
t1[0,0] = np.nan
t[t==0]=np.nan

数组的拼接

竖直拼接 vstack
水平拼接 hstack

行或列的交换

其他常用方法zeros ones argmax argmin

构造一个全为0或全为1的数组

np.argmax(t,axis=0)
axis是维度，0就是行的方向，计算出每一行中最大的数所在的位置

>>> y = numpy.eye(3)
>>> y
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])
>>> numpy.argmax(y,axis=0)
array([0, 1, 2])

生成随机数

random.randint(min,max,shape)

a = numpy.random.rand(10,20)# 生成一个列 20行的数组（值为浮点数 0到1）
>>> numpy.random.randint(10,20,(4,5)) # 大于等于10 小于20
array([[13, 15, 17, 17, 12],
       [15, 11, 17, 16, 12],
       [10, 12, 17, 17, 19],
       [14, 18, 16, 17, 12]])
>>> numpy.random.uniform(10,20,(4,5)) # 小数类型
array([[17.44760074, 18.28834687, 17.68752134, 13.64254043, 19.72146731],
       [14.92216006, 15.83619203, 15.37585188, 14.16668637, 14.19634753],
       [11.5405885 , 16.01012111, 13.70250036, 13.29277239, 12.55220888],
       [11.12925591, 11.6353873 , 15.38930101, 16.29085018, 15.33970246]])
>>> a = numpy.random.normal(1,2,(4,4))# 1是对称轴（均值），2是标准差
>>> a
array([[ 0.91765494,  2.14901645, -1.53539323,  0.69955653],
       [ 1.13330056,  2.69391375,  1.83874094,  0.86167836],
       [-2.21096195,  1.03397908,  1.46193918,  0.37835776],
       [-1.48510826, -4.54193375, -1.41579655,  4.25407471]])

NAN与无穷大

概念

注意点

np.count_nonzero(t) #判断数组中不为0的个数（nan也算不为0）

>>> a = np.ones((3,4))
>>> a
array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])
>>> a[0,0] = 0
>>> a
array([[0., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])
>>> a/0
array([[nan, inf, inf, inf],
       [inf, inf, inf, inf],
       [inf, inf, inf, inf]])
>>> a/0 == a/0
array([[False,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True]])
>>> b = a/0
>>> b != b
array([[ True, False, False, False],
       [False, False, False, False],
       [False, False, False, False]])
>>> np.count_nonzero(b != b)# 判断nan数量
1
>>> np.count_nonzero(np.isnan(b))
1

数组的计算

广播机制

一个数组加减乘除一个实数相当于数组中的每一个数都进行同样的计算。
0/0=nan 代表不是一个数（但是是浮点类型），n/0=inf表示无穷。

1不能计算，2能计算（每一块和三行两列的数组进行计算）
（3,3,2）与（3,3）也可以进行计算，（3,2,3）与（3,3）也可以进行计算。
只要在某一方向上也可以进行计算

两个数组的计算

当两个数组形状相同时，对应位置加减乘除。
当两个数组行的形状是一样的（相同列数），每一行进行计算。行数相同也类似，列进行计算
两个数组形状有一维是一样的那么就是可以计算的。

求和sum

1.所有数的和

>>> a
array([[0., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])
>>> np.sum(a)
11.0

2.求每个方向上的和例如行方向注意行方向是竖直向下的x轴

>>> a
array([[0., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])
>>> np.sum(a,axis=0)
array([2., 3., 3., 3.])

如果其中有nan，与任何值相加都是nan 所以最后还是nan，所以经常需要把nan替换成0，
t``[np.isnan(t)] = 0
当然也经常替换为均值

均值mean

也是可以指定轴的。

>>> a
array([2., 3., 3., 3.])
>>> np.mean(a)
2.75

numpy 科学计算基础