1. 基本数据结构

1. 基本数据结构

1.1 numpy.ndarray

python数据分析 -- numpy 库 - 图1

1D : list
2D : list of lists
…

1.1.1 基本属性

numpy.ndarray.shape

以元组显示该 ndarray 的各维数

ndarray[row_index,column_index]

二维下选择某个元素, , 是分隔 row 和 col 的标志

ndarray[row_index], ndarray[ [row1, row2, row3], : ], ndarray[row_index_start : row_index_end]

二维下选择某一行，或某几行

ndarray[:, col_index] ndarray[:, [col1, col2, col3] ], ndarray[:, col_index_start : col_index_end]

二维下选择某一列，或某几列

ndarray[row, col_start : col_end]

二维下，选择某一行的某几列

ndarray[row_start : row_end, col_index]

二维下，选择某一列的某几行

ndarray[ row_start : row_end, col_start : col_end ]

二维下，选择某几行的某几列

ndarray[ :, 0] + ndarray[ :, 1]

Vectors Addition. col0与col1相加

1.1.2 基本方法

numpy.ndarray.min()
numpy.ndarray.max()
numpy.ndarray.mean() — 平均值
numpy.ndarray.median() — 中位数
numpy.ndarray.sum()
numpy.ndarray.reshape() — 按照新定义的shape来组织array

——————————————————-
以上方法中，其中一个参数为 axis ，该参数的作用为：方法作用在某个维度上，以max为例：
python数据分析 -- numpy 库 - 图2
python数据分析 -- numpy 库 - 图3
python数据分析 -- numpy 库 - 图4

1.2 numpy.dtype

除基本数据类型，可自己定义数据结构类型，存储在 ndarray 中。

通过 自定义结构 实现 结构化数据

persontype = np.dtype(
    {
        'names':['name', 'age', 'chinese', 'math', 'english'], 
        'formats':['S32', 'i', 'i', 'i', 'f']
    }
)
peoples = np.array([
    ('Leo', 20, 90, 91, 90.5), 
    ('Tom', 21, 91, 92, 92.5), 
    ('Lucy', 22, 80, 85, 84.5), 
    ('Lily', 25, 79, 49, 76.5)], 
    dtype=persontype)
names = peoples[:]['name']
ages = peoples[:]['age']
chineses = peoples[:]['chinese']
maths = peoples[:]['math']
englishs = peoples[:]['english']

1.3 numpy的ufunc

1.3.1 连续数组的创建

numpy.arange([start,] stop[, step,], dtype=None) 类似range，不包含stop
numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0)

Return evenly spaced numbers over a specified interval.
Parameters
—————
num : int, optional

   Number of samples to generate. Default is 50. Must be non-negative.<br />   endpoint : bool, optional<br />       If True, `stop`  is the last sample. Otherwise, it is not included.<br />       Default is True.<br />   retstep : bool, optional<br />       If **True** ,  **return ( **`**samples**` **, **`**step**` **)  ** , where `step` is the spacing between samples.<br />   dtype : dtype, optional<br />       The type of the output array.  If `dtype` is not given, infer the data type from the other input arguments.<br />   axis : int, optional<br />       The axis in the result to store the samples.  Relevant only if start or stop are array-like.  By default (0),         the samples will be along a new axis inserted at the beginning. Use -1 to get an axis at the end.

1.3.2 基本运算

numpy.add
numpy.subtract
numpy.multiply
numpy.divide
numpy.power
numpy.remainder / numpy.mod 求余

1.3.3 高级运算

numpy.ptp(a, axis=None) — 统计最大值与最小值之差
numpy.percentilee(a, q, axis=None) — 统计数组的百分位数，仅由最大最小值及

q : array_like of float<br />       Percentile or sequence of percentiles to compute, which must be **between 0 and 100 inclusive**.

a = np.array([[1,2,3], 
              [4,5,6], 
              [7,8,9]])
print np.percentile(a, 50)
print np.percentile(a, 30, axis=0)
print np.percentile(a, 50, axis=1)
# ---------OUTPUT----------
5.0
array([2.8, 3.8, 4.8])  # along x=0
array([1.6, 4.6, 7.6])  # along x=1

numpy.average(a, axis=None, weights=None, returned=False) — 求加权平均

————-
weights : array_like, optional
An array of weights associated with the values in a . Each value in
a contributes to the average according to its associated weight.
The weights array can either be 1-D (in which case its length must be
the size of a along the given axis) or of the same shape as a.
If weights=None , then all data in a are assumed to have a
weight equal to one.
returned : bool, optional
Default is False . If True , the tuple ( average , sum_of_weights )
is returned, otherwise only the average is returned.
If weights=None , sum_of_weights is equivalent to the number of
elements over which the average is taken.

numpy.var() — 方差：每个数值与平均值之差的平方求和的平均值，即 mean((x- x.mean())** 2)
numpy.std() — 标准差：方差的算术平方根。在数学意义上，代表的是一组数据离平均值的分散程度

1.3.4 Numpy排序

numpy.sort(a, axis=-1, kind=’quicksort’, order=None)
Return a sorted copy of an array.
Parameters
—————
axis : int or None, optional
Axis along which to sort. If None, the array is flattened before
sorting. The default is -1, which sorts along the last axis.
kind : {‘ quicksort‘, ‘ mergesort‘, ‘ heapsort‘, ‘ stable‘}, optional
Sorting algorithm. Default is ‘quicksort’.
order : str or list of str, optional
When a is an array with fields defined, this argument specifies
which fields to compare first, second, etc. A single field can
be specified as a string, and not all fields need be specified,
but unspecified fields will still be used, in the order in which
they come up in the dtype, to break ties.
Returns
———-
sorted_array : ndarray
Array of the same type and shape as a.

a = np.array([[1,9,5], [6,4,7], [8,2,3]])
array([[1, 9, 5],
       [6, 4, 7],
       [8, 2, 3]])
np.sort(a, axis=None)
array([1, 2, 3, 4, 5, 6, 7, 8, 9])
np.sort(a, axis=0)  # 纵向排列
array([[1, 2, 3],
       [6, 4, 5],
       [8, 9, 7]])
np.sort(a, axis=1)
array([[1, 5, 9],
       [4, 6, 7],
       [2, 3, 8]])

# 对1.2节的分数，根据成绩总分属进行讲序排序
rank_peoples = sorted(peoples, key=lambda x : sum([x[2],x[3], x[4]]), reverse=True)
print(rank_peoples)
# --------OUTPUT----------
[(b'Tom', 21, 91, 92, 92.5),
 (b'Leo', 20, 90, 91, 90.5),
 (b'Lucy', 22, 80, 85, 84.5),
 (b'Lily', 25, 79, 49, 76.5)]