slice()
glob
defaultdict()
Numpy
- numpy.random系列函数
当replace为False时，生成的随机数不能有重复的数值
- seed()
- permutation(x)
Pandas
- pd.get_dummies
A B C
0 a b 1
1 b a 2
2 a c 3
C col1_a col1_b col2_a col2_b col2_c
0 1 1 0 0 1 0
1 2 0 1 1 0 0
2 3 1 0 0 0 1

slice()

slice() 函数实现切片对象，主要用在切片操作函数里的参数传递。

slice(stop) 
slice(start, stop[, step])

返回一个切片对象。示例：

invoice = """ 
0.....6................................40........52...55........
1909  Pimoroni PiBrella                 $17.50      3  $52.50
1489  6mm Tactile Switch x20            $4.95       2  $9.90
1510  Panavise Jr. - PV-201             $28.00      1  $28.00
1601  PiTFT Mini Kit 320x240            $34.95      1  $34.95
"""
SKU = slice(0, 6)
DESCRIPTION = slice(6, 40)
UNIT_PRICE = slice(40, 52)
QUANTITY = slice(52, 55)
ITEM_TOTAL = slice(55, None)
line_items = invoice.split('\n')[2:]
for item in line_items:
    print(item[UNIT_PRICE], item[DESCRIPTION])

$17.50       Pimoroni PiBrella                  
$4.95        6mm Tactile Switch x20             
$28.00       Panavise Jr. - PV-201              
$34.95       PiTFT Mini Kit 320x240

对于这种纯文本文件，使用有名字的切片比用硬编码的数字区间要方便得多，注意示例里的 for 循环的可读性有多强。

glob

使用glob.glob(path)获得文件路径。
比如获取路径下所有png文件名：

import glob
glob.glob('input/'+'*.png')

返回一个包含所有jpg文件位置的list：
[‘input/vd006.png’, ‘input/vd025.png’, ‘input/vd026.png’, ‘input/vd034.png’, ‘input/vd051.png’, ‘input/vd070.png’, ‘input/vd092.png’, ‘input/vd102.png’]
路径名可以是绝对的，也可以是相对的。

defaultdict()

defaultdict的作用是在于，当字典里的key不存在但被查找时，返回的不是keyError而是一个默认值。
defaultdict接受一个工厂函数作为参数，dict = defaultdict(factory_function)。
这个factory_function可以是 list、set、str 等等，作用是当key不存在时，返回的是工厂函数的默认值，比如list对应[]，str对应的是空字符串，set对应set( )，int对应0。

from collections import defaultdict
dict1 = defaultdict(int)
dict2 = defaultdict(set)
dict3 = defaultdict(str)
dict4 = defaultdict(list)
dict1[2] ='two'
print(dict1[1])
print(dict2[1])
print(dict3[1])
print(dict4[1])

0
set()
[]

当使用 list 做键和值时，需要将 list 转换成 tuple 才能进行哈希。
dicts[tuple(list1)].append(value1)

Numpy

numpy.random系列函数

rand (d0, d1, …, dn)	Random values in a given shape.
randn (d0, d1, …, dn)	Return a sample (or samples) from the “standard normal” distribution.
randint (low[, high, size, dtype])	Return random integers from low (inclusive) to high (exclusive).
random_integers (low[, high, size])	Random integers of type np.int between low and high, inclusive.
random_sample ([size])	Return random floats in the half-open interval [0.0, 1.0).
random ([size])	Return random floats in the half-open interval [0.0, 1.0).
ranf ([size])	Return random floats in the half-open interval [0.0, 1.0).
sample ([size])	Return random floats in the half-open interval [0.0, 1.0).
choice (a[, size, replace, p])	Generates a random sample from a given 1-D array
bytes (length)	Return random bytes.

rand(d0,d1,…,dn)

rand 函数根据给定维度生成[0,1) 之间的数据，包含 0，不包含 1
dn 表示每个维度
返回值为指定维度的 array

>>> np.random.rand(3,4)
array([[0.69187711, 0.31551563, 0.68650093, 0.83462567],
       [0.01828828, 0.75014431, 0.98886109, 0.74816565],
       [0.28044399, 0.78927933, 0.10322601, 0.44789353]])

randn(d0,d1,…,dn)

randn函数返回一个或一组样本，具有标准正态分布。
dn表格每个维度
返回值为指定维度的array

标准正态分布—-standard normal distribution

标准正态分布又称为u分布，是以0为均值、以1为标准差的正态分布，记为N（0，1）。

>>> np.random.randn(3,4)
array([[ 0.98767654,  1.35288014, -1.1725845 , -0.19741466],
     [-0.81537812,  0.06791892,  1.63029447,  0.20268532],
     [ 1.82456323,  0.46829698, -0.66187397, -0.39171059]])

randint(low[, high, size, dtype])

返回随机整数，范围区间为[low,high），包含low，不包含high
参数：low为最小值，high为最大值，size为数组维度大小，dtype为数据类型，默认的数据类型是np.int
high没有填写时，默认生成随机数的范围是[0，low) ```python

np.random.randint(1,size=5) array([0, 0, 0, 0, 0])

np.random.randint(10,size=(3,4)) array([[4, 5, 0, 9], [9, 3, 9, 1], [5, 1, 0, 3]]) ```

random_integers(low[, high, size])

返回随机整数，范围区间为[low,high]，包含 low 和 high
参数：low 为最小值，high 为最大值，size 为数组维度大小
high 没有填写时，默认生成随机数的范围是[1，low]

该函数在最新的 numpy 版本中已被替代，建议使用 randint 函数

生成[0,1)之间的浮点数

numpy.random.random_sample(size=None)
numpy.random.random(size=None)
numpy.random.ranf(size=None)
numpy.random.sample(size=None) ```python

np.random.random_sample() 0.5119501561674447

np.random.random() 0.25626718074665134

np.random.ranf() 0.9107140053852156

np.random.sample() 0.10076977657160135 ```

choice(a, size=None, replace=True, p=None)

从给定的一维数组中生成随机数
参数： a为一维数组类似数据或整数；size为数组维度；p为数组中的数据出现的概率
a为整数时，对应的一维数组为np.arange(a)
参数p为概率，长度需要与参数a的长度一致，且p里的数据之和应为1 ```python

np.random.choice(5,3) array([4, 1, 4])

当replace为False时，生成的随机数不能有重复的数值

np.random.choice(5, 3, replace=False) array([0, 3, 1])

np.random.choice(5,size=(3,2)) array([[1, 0], [4, 2], [3, 3]])

demo_list = [‘lenovo’, ‘sansumg’,’moto’,’xiaomi’, ‘iphone’] np.random.choice(demo_list,size=(3,3)) array([[‘moto’, ‘iphone’, ‘xiaomi’], [‘lenovo’, ‘xiaomi’, ‘xiaomi’], [‘xiaomi’, ‘lenovo’, ‘iphone’]], dtype=’<U7’)

np.random.choice(demo_list,size=(3,3), p=[0.1,0.6,0.1,0.1,0.1]) array([[‘sansumg’, ‘sansumg’, ‘iphone’], [‘lenovo’, ‘lenovo’, ‘sansumg’], [‘sansumg’, ‘lenovo’, ‘sansumg’]], dtype=’<U7’) ```

seed()

np.random.seed()的作用：使得随机数据可预测。
当我们设置相同的seed，每次生成的随机数相同。如果不设置seed，则每次会生成不同的随机数 ```python

np.random.seed(0) np.random.rand(5) array([0.5488135 , 0.71518937, 0.60276338, 0.54488318, 0.4236548 ]) np.random.rand(5) array([0.64589411, 0.43758721, 0.891773 , 0.96366276, 0.38344152])

np.random.seed(0) np.random.rand(5) array([0.5488135 , 0.71518937, 0.60276338, 0.54488318, 0.4236548 ]) np.random.rand(5) array([0.64589411, 0.43758721, 0.891773 , 0.96366276, 0.38344152]) np.random.rand(5) array([0.79172504, 0.52889492, 0.56804456, 0.92559664, 0.07103606]) ```

permutation(x)

随机排列序列。
If x is a multi-dimensional array, it is only shuffled along its first index. ```python

np.random.permutation(10) array([1, 7, 4, 3, 0, 9, 2, 5, 8, 6])

np.random.permutation([1, 4, 9, 12, 15]) array([15, 1, 9, 4, 12])

arr = np.arange(9).reshape((3, 3)) np.random.permutation(arr) array([[6, 7, 8], [0, 1, 2], [3, 4, 5]]) ```

Pandas

pd.get_dummies
pandas.getdummies(_data, prefix=None, prefixsep_=‘‘, dummyna=__False, columns=None, sparse=False, dropfirst_=__False, dtype=None)[source]
Convert categorical variable into dummy/indicator variables. 实现one hot encode。

prefix可以是一个将列名映射到前缀的字典。 ```python df = pd.DataFrame({‘A’: [‘a’, ‘b’, ‘a’], ‘B’: [‘b’, ‘a’, ‘c’], … ‘C’: [1, 2, 3]})
A B C
0 a b 1
1 b a 2
2 a c 3
pd.get_dummies(df, prefix=[‘col1’, ‘col2’])

C col1_a col1_b col2_a col2_b col2_c

0 1 1 0 0 1 0

1 2 0 1 1 0 0

2 3 1 0 0 0 1

`` 这里对列A和列B做处理，col3C`这一列没变。
对于一列里每个属性，都作为一个新的列，比如列A（指定更名前缀为col1）里有a和b两个属性，那么列col1扩展为col1_a 和 col1_b两列，原来是a的行，对应col1_a为1，否则col1_a为0。col1_b同理。

Deep Learning

python 边用边学

slice()

glob

defaultdict()

Numpy

numpy.random系列函数

rand(d0,d1,…,dn)

randn(d0,d1,…,dn)

randint(low[, high, size, dtype])

random_integers(low[, high, size])

生成[0,1)之间的浮点数

choice(a, size=None, replace=True, p=None)

当replace为False时，生成的随机数不能有重复的数值

seed()

permutation(x)

Pandas

pd.get_dummies

A B C

0 a b 1

1 b a 2

2 a c 3

C col1_a col1_b col2_a col2_b col2_c

0 1 1 0 0 1 0

1 2 0 1 1 0 0

2 3 1 0 0 0 1