1. 基本数据结构

1.1 numpy.ndarray

python数据分析 -- numpy 库 - 图1

1D : list
2D : list of lists

1.1.1 基本属性

以 元组 显示该 ndarray 的各维数

  • ndarray[row_index,column_index]

二维下 选择某个元素, , 是分隔 row 和 col 的标志

  • ndarray[row_index], ndarray[ [row1, row2, row3], : ], ndarray[row_index_start : row_index_end]

二维下 选择某一行, 或某几行

  • ndarray[:, col_index] ndarray[:, [col1, col2, col3] ], ndarray[:, col_index_start : col_index_end]

二维下 选择某一列, 或某几列

  • ndarray[row, col_start : col_end]

二维下,选择 某一行的某几列

  • ndarray[row_start : row_end, col_index]

二维下,选择某一列的某几行

  • ndarray[ row_start : row_end, col_start : col_end ]

二维下,选择某几行的某几列

  • ndarray[ :, 0] + ndarray[ :, 1]

Vectors Addition. col0与col1相加

1.1.2 基本方法

  • numpy.ndarray.min()
  • numpy.ndarray.max()
  • numpy.ndarray.mean() — 平均值
  • numpy.ndarray.median() — 中位数
  • numpy.ndarray.sum()
  • numpy.ndarray.reshape() — 按照新定义的shape来组织array

——————————————————-
以上方法中,其中一个参数为 axis ,该参数的作用为:方法作用在某个维度上,以max为例:
python数据分析 -- numpy 库 - 图2
python数据分析 -- numpy 库 - 图3
python数据分析 -- numpy 库 - 图4

1.2 numpy.dtype


除基本数据类型,可自己定义数据结构类型,存储在 ndarray 中。

通过 自定义结构 实现 结构化数据

  1. persontype = np.dtype(
  2. {
  3. 'names':['name', 'age', 'chinese', 'math', 'english'],
  4. 'formats':['S32', 'i', 'i', 'i', 'f']
  5. }
  6. )
  7. peoples = np.array([
  8. ('Leo', 20, 90, 91, 90.5),
  9. ('Tom', 21, 91, 92, 92.5),
  10. ('Lucy', 22, 80, 85, 84.5),
  11. ('Lily', 25, 79, 49, 76.5)],
  12. dtype=persontype)
  13. names = peoples[:]['name']
  14. ages = peoples[:]['age']
  15. chineses = peoples[:]['chinese']
  16. maths = peoples[:]['math']
  17. englishs = peoples[:]['english']

1.3 numpy的ufunc

1.3.1 连续数组的创建

  • numpy.arange([start,] stop[, step,], dtype=None) 类似range,不包含stop

  • numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0)

Return evenly spaced numbers over a specified interval.
Parameters
—————
num : int, optional

  1. Number of samples to generate. Default is 50. Must be non-negative.<br /> endpoint : bool, optional<br /> If True, `stop` is the last sample. Otherwise, it is not included.<br /> Default is True.<br /> retstep : bool, optional<br /> If **True** , **return ( **`**samples**` **, **`**step**` **) ** , where `step` is the spacing between samples.<br /> dtype : dtype, optional<br /> The type of the output array. If `dtype` is not given, infer the data type from the other input arguments.<br /> axis : int, optional<br /> The axis in the result to store the samples. Relevant only if start or stop are array-like. By default (0), the samples will be along a new axis inserted at the beginning. Use -1 to get an axis at the end.

1.3.2 基本运算

  • numpy.add
  • numpy.subtract
  • numpy.multiply
  • numpy.divide
  • numpy.power
  • numpy.remainder / numpy.mod 求余

1.3.3 高级运算

  • numpy.ptp(a, axis=None) — 统计最大值与最小值之差
  • numpy.percentilee(a, q, axis=None) — 统计数组的百分位数,仅由最大最小值及

  1. q : array_like of float<br /> Percentile or sequence of percentiles to compute, which must be **between 0 and 100 inclusive**.
  1. a = np.array([[1,2,3],
  2. [4,5,6],
  3. [7,8,9]])
  4. print np.percentile(a, 50)
  5. print np.percentile(a, 30, axis=0)
  6. print np.percentile(a, 50, axis=1)
  7. # ---------OUTPUT----------
  8. 5.0
  9. array([2.8, 3.8, 4.8]) # along x=0
  10. array([1.6, 4.6, 7.6]) # along x=1
  • numpy.average(a, axis=None, weights=None, returned=False) — 求加权平均

————-
weights : array_like, optional
An array of weights associated with the values in a . Each value in
a contributes to the average according to its associated weight.
The weights array can either be 1-D (in which case its length must be
the size of a along the given axis) or of the same shape as a.
If weights=None , then all data in a are assumed to have a
weight equal to one.
returned : bool, optional
Default is False . If True , the tuple ( average , sum_of_weights )
is returned, otherwise only the average is returned.
If weights=None , sum_of_weights is equivalent to the number of
elements over which the average is taken.

  • numpy.var() — 方差:每个数值与平均值之差的平方求和的平均值,即 mean((x- x.mean())** 2)
  • numpy.std() — 标准差:方差的算术平方根。在数学意义上,代表的是一组数据离平均值的分散程度

1.3.4 Numpy排序

numpy.sort(a, axis=-1, kind=’quicksort’, order=None)
Return a sorted copy of an array.
Parameters
—————
axis : int or None, optional
Axis along which to sort. If None, the array is flattened before
sorting. The default is -1, which sorts along the last axis.
kind : {‘ quicksort‘, ‘ mergesort‘, ‘ heapsort‘, ‘ stable‘}, optional
Sorting algorithm. Default is ‘quicksort’.
order : str or list of str, optional
When a is an array with fields defined, this argument specifies
which fields to compare first, second, etc. A single field can
be specified as a string, and not all fields need be specified,
but unspecified fields will still be used, in the order in which
they come up in the dtype, to break ties.
Returns
———-
sorted_array : ndarray
Array of the same type and shape as a.

  1. a = np.array([[1,9,5], [6,4,7], [8,2,3]])
  2. array([[1, 9, 5],
  3. [6, 4, 7],
  4. [8, 2, 3]])
  5. np.sort(a, axis=None)
  6. array([1, 2, 3, 4, 5, 6, 7, 8, 9])
  7. np.sort(a, axis=0) # 纵向排列
  8. array([[1, 2, 3],
  9. [6, 4, 5],
  10. [8, 9, 7]])
  11. np.sort(a, axis=1)
  12. array([[1, 5, 9],
  13. [4, 6, 7],
  14. [2, 3, 8]])
  1. # 对1.2节的分数,根据成绩总分属进行讲序排序
  2. rank_peoples = sorted(peoples, key=lambda x : sum([x[2],x[3], x[4]]), reverse=True)
  3. print(rank_peoples)
  4. # --------OUTPUT----------
  5. [(b'Tom', 21, 91, 92, 92.5),
  6. (b'Leo', 20, 90, 91, 90.5),
  7. (b'Lucy', 22, 80, 85, 84.5),
  8. (b'Lily', 25, 79, 49, 76.5)]