pandas教程 - 索引 - 《Python程序设计数字教程》

Object selection has had a number of user-requested additions in order to support more explicit location based indexing. Pandas now supports three types of multi-axis indexing.

.loc 显示行索引，首选基于标签is primarily label based, but may also be used with a boolean array. .loc will raise KeyError when the items are not found. Allowed inputs are:
- 单独的标签，例如 5 或 'a' (注意 5 会被解释为一个标签，不是一个表示位置的整数).
- 一个标签的列表或数组 ['a', 'b', 'c'].
- 一个标签的切片对象 'a':'f' (一般是指python里的标签, 包括起始和结束位置 )
- 一个布尔型的数组任何 NA 值被当成 False).
- 一个带一个参数的可调用函数可返回一个有效的输出做行索引
.iloc 隐式的行索引，主要是基于从0到长度-1的整数，也可用一个布尔数组，可以用以下方式：
- 一个整数，例如 5.
- 一个整数的列表或数组 [4, 3, 0].
- 一个整数的切片对象 1:7.
- 一个布尔数组
- 一个带一个参数的可调用函数可返回一个有效的输出做行索引
.loc, .iloc, and also [] indexing can accept a callable as indexer.

Object Type	Selection	Return Value Type
Series	`series[label]`	scalar value
DataFrame	`frame[colname]`	`Series` corresponding to colname

import pandas as pd
import numpy as np
dates = pd.date_range('1/1/2000', periods=8) # 20000.1.1 起按日期产生8个数据
print(dates)

DatetimeIndex([‘2000-01-01’, ‘2000-01-02’, ‘2000-01-03’, ‘2000-01-04’,
‘2000-01-05’, ‘2000-01-06’, ‘2000-01-07’, ‘2000-01-08’],
dtype=’datetime64[ns]’, freq=’D’)

df = pd.DataFrame(np.random.randn(8, 4), columns=['A', 'B', 'C', 'D'])
print(df)

默认行索引为0，1，2，…

          A         B         C         D
0  0.040279  1.530194 -0.839738  1.769272
1  1.364984 -1.607406  0.045436  1.390306
2  1.064731  0.852654  0.741311  0.488171
3 -1.193471 -0.205841  0.224680  1.799955
4  0.156241 -0.151240 -1.336287 -0.102478
5 -1.152899  0.497563  0.789621 -0.780824
6  2.301429 -0.711661  0.394633 -0.009994
7  0.509731  0.187269  0.134205  1.489733

可以用index关键字指定行索引

df = pd.DataFrame(np.random.randn(8, 4),index=dates, columns=['A', 'B', 'C', 'D'])
print(df)

                   A         B         C         D
2000-01-01  0.542793  1.045460 -0.942148  0.187426
2000-01-02  0.516108  0.821478  0.227624  1.503220
2000-01-03  1.558611  1.042741  0.116858 -0.848084
2000-01-04 -0.228758 -0.935041  1.318462  0.002611
2000-01-05  0.420747  3.439259  0.912372  1.345009
2000-01-06 -0.597713  1.039117 -0.235674  0.010001
2000-01-07  0.380847  1.370491  0.715843 -1.356307
2000-01-08  0.114837  0.770705 -0.865508 -0.073762

可以根据dateframe的列索引列数据

s = df['A']
print(s)

2000-01-01    0.542793
2000-01-02    0.516108
2000-01-03    1.558611
2000-01-04   -0.228758
2000-01-05    0.420747
2000-01-06   -0.597713
2000-01-07    0.380847
2000-01-08    0.114837
Freq: D, Name: A, dtype: float64

根据dateframe的多列索引数据时，多列置于一个列表中，各列不要求连续。

s = df[['A','B']]
print(s)

                   A         B 
2000-01-01  0.542793  1.045460 
2000-01-02  0.516108  0.821478 
2000-01-03  1.558611  1.042741 
2000-01-04 -0.228758 -0.935041 
2000-01-05  0.420747  3.439259  
2000-01-06 -0.597713  1.039117 
2000-01-07  0.380847  1.370491  
2000-01-08  0.114837  0.770705

print(s[dates[5]])        # s = df['A']，s[dates[5]]等价于df['A'][dates[5]]
print(df['A'][dates[5]])  # 同上

-0.5977126285793471

df[[‘B’, ‘A’]] = df[[‘A’, ‘B’]] _# 交换AB 列_print(df)

                   A         B         C         D
2000-01-01  1.045460  0.542793 -0.942148  0.187426
2000-01-02  0.821478  0.516108  0.227624  1.503220
2000-01-03  1.042741  1.558611  0.116858 -0.848084
2000-01-04 -0.935041 -0.228758  1.318462  0.002611
2000-01-05  3.439259  0.420747  0.912372  1.345009
2000-01-06  1.039117 -0.597713 -0.235674  0.010001
2000-01-07  1.370491  0.380847  0.715843 -1.356307
2000-01-08  0.770705  0.114837 -0.865508 -0.073762