1. 基本数据结构
1.1 numpy.ndarray
1D : list
2D : list of lists
…
1.1.1 基本属性
以 元组 显示该 ndarray 的各维数
- ndarray[row_index,column_index]
二维下 选择某个元素, ,
是分隔 row 和 col 的标志
- ndarray[row_index], ndarray[ [row1, row2, row3], : ], ndarray[row_index_start : row_index_end]
二维下 选择某一行, 或某几行
- ndarray[:, col_index] ndarray[:, [col1, col2, col3] ], ndarray[:, col_index_start : col_index_end]
二维下 选择某一列, 或某几列
- ndarray[row, col_start : col_end]
二维下,选择 某一行的某几列
- ndarray[row_start : row_end, col_index]
二维下,选择某一列的某几行
- ndarray[ row_start : row_end, col_start : col_end ]
二维下,选择某几行的某几列
- ndarray[ :, 0] + ndarray[ :, 1]
Vectors Addition. col0与col1相加
1.1.2 基本方法
- numpy.ndarray.min()
- numpy.ndarray.max()
- numpy.ndarray.mean() — 平均值
- numpy.ndarray.median() — 中位数
- numpy.ndarray.sum()
- numpy.ndarray.reshape() — 按照新定义的shape来组织array
——————————————————-
以上方法中,其中一个参数为 axis
,该参数的作用为:方法作用在某个维度上,以max为例:
1.2 numpy.dtype
除基本数据类型,可自己定义数据结构类型,存储在 ndarray 中。
通过 自定义结构 实现 结构化数据
persontype = np.dtype(
{
'names':['name', 'age', 'chinese', 'math', 'english'],
'formats':['S32', 'i', 'i', 'i', 'f']
}
)
peoples = np.array([
('Leo', 20, 90, 91, 90.5),
('Tom', 21, 91, 92, 92.5),
('Lucy', 22, 80, 85, 84.5),
('Lily', 25, 79, 49, 76.5)],
dtype=persontype)
names = peoples[:]['name']
ages = peoples[:]['age']
chineses = peoples[:]['chinese']
maths = peoples[:]['math']
englishs = peoples[:]['english']
1.3 numpy的ufunc
1.3.1 连续数组的创建
numpy.arange([start,] stop[, step,], dtype=None) 类似range,不包含stop
numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0)
Return evenly spaced numbers over a specified interval.
Parameters
—————
num : int, optional
Number of samples to generate. Default is 50. Must be non-negative.<br /> endpoint : bool, optional<br /> If True, `stop` is the last sample. Otherwise, it is not included.<br /> Default is True.<br /> retstep : bool, optional<br /> If **True** , **return ( **`**samples**` **, **`**step**` **) ** , where `step` is the spacing between samples.<br /> dtype : dtype, optional<br /> The type of the output array. If `dtype` is not given, infer the data type from the other input arguments.<br /> axis : int, optional<br /> The axis in the result to store the samples. Relevant only if start or stop are array-like. By default (0), the samples will be along a new axis inserted at the beginning. Use -1 to get an axis at the end.
1.3.2 基本运算
- numpy.add
- numpy.subtract
- numpy.multiply
- numpy.divide
- numpy.power
- numpy.remainder / numpy.mod 求余
1.3.3 高级运算
- numpy.ptp(a, axis=None) — 统计最大值与最小值之差
- numpy.percentilee(a, q, axis=None) — 统计数组的百分位数,仅由最大最小值及
q : array_like of float<br /> Percentile or sequence of percentiles to compute, which must be **between 0 and 100 inclusive**.
a = np.array([[1,2,3],
[4,5,6],
[7,8,9]])
print np.percentile(a, 50)
print np.percentile(a, 30, axis=0)
print np.percentile(a, 50, axis=1)
# ---------OUTPUT----------
5.0
array([2.8, 3.8, 4.8]) # along x=0
array([1.6, 4.6, 7.6]) # along x=1
- numpy.average(a, axis=None, weights=None, returned=False) — 求加权平均
————-
weights : array_like, optional
An array of weights associated with the values in a
. Each value in
a
contributes to the average according to its associated weight.
The weights array can either be 1-D (in which case its length must be
the size of a
along the given axis) or of the same shape as a
.
If weights=None
, then all data in a
are assumed to have a
weight equal to one.
returned : bool, optional
Default is False
. If True
, the tuple ( average
, sum_of_weights
)
is returned, otherwise only the average is returned.
If weights=None
, sum_of_weights
is equivalent to the number of
elements over which the average is taken.
- numpy.var() — 方差:每个数值与平均值之差的平方求和的平均值,即 mean((x- x.mean())** 2)
- numpy.std() — 标准差:方差的算术平方根。在数学意义上,代表的是一组数据离平均值的分散程度
1.3.4 Numpy排序
numpy.sort(a, axis=-1, kind=’quicksort’, order=None)
Return a sorted copy
of an array.
Parameters
—————
axis : int or None, optional
Axis along which to sort. If None, the array is flattened before
sorting. The default is -1, which sorts along the last axis
.
kind : {‘ quicksort‘, ‘ mergesort‘, ‘ heapsort‘, ‘ stable‘}, optional
Sorting algorithm. Default is ‘quicksort’.
order : str or list of str, optional
When a
is an array with fields defined, this argument specifies
which fields to compare first, second, etc. A single field can
be specified as a string, and not all fields need be specified,
but unspecified fields will still be used, in the order in which
they come up in the dtype, to break ties.
Returns
———-
sorted_array : ndarray
Array of the same type and shape as a
.
a = np.array([[1,9,5], [6,4,7], [8,2,3]])
array([[1, 9, 5],
[6, 4, 7],
[8, 2, 3]])
np.sort(a, axis=None)
array([1, 2, 3, 4, 5, 6, 7, 8, 9])
np.sort(a, axis=0) # 纵向排列
array([[1, 2, 3],
[6, 4, 5],
[8, 9, 7]])
np.sort(a, axis=1)
array([[1, 5, 9],
[4, 6, 7],
[2, 3, 8]])
# 对1.2节的分数,根据成绩总分属进行讲序排序
rank_peoples = sorted(peoples, key=lambda x : sum([x[2],x[3], x[4]]), reverse=True)
print(rank_peoples)
# --------OUTPUT----------
[(b'Tom', 21, 91, 92, 92.5),
(b'Leo', 20, 90, 91, 90.5),
(b'Lucy', 22, 80, 85, 84.5),
(b'Lily', 25, 79, 49, 76.5)]