Object selection has had a number of user-requested additions in order to support more explicit location based indexing. Pandas now supports three types of multi-axis indexing.
.loc
显示行索引,首选基于标签is primarily label based, but may also be used with a boolean array..loc
will raiseKeyError
when the items are not found. Allowed inputs are:- 单独的标签,例如
5
或'a'
(注意5
会被解释为一个标签,不是一个表示位置的整数). - 一个标签的列表或数组
['a', 'b', 'c']
. - 一个标签的切片对象
'a':'f'
(一般是指python里的标签, 包括起始和结束位置 ) - 一个布尔型的数组 任何
NA
值被当成False
). - 一个带一个参数的可调用函数可返回一个有效的输出做行索引
- 单独的标签,例如
.iloc
隐式的行索引,主要是基于从0到长度-1的整数,也可用一个布尔数组,可以用以下方式:- 一个整数,例如
5
. - 一个整数的列表或数组
[4, 3, 0]
. - 一个整数的切片对象
1:7
. - 一个布尔数组
- 一个带一个参数的可调用函数可返回一个有效的输出做行索引
- 一个整数,例如
.loc
,.iloc
, and also[]
indexing can accept acallable
as indexer.
Object Type | Selection | Return Value Type |
---|---|---|
Series | series[label] |
scalar value |
DataFrame | frame[colname] |
Series corresponding to colname |
import pandas as pd
import numpy as np
dates = pd.date_range('1/1/2000', periods=8) # 20000.1.1 起按日期产生8个数据
print(dates)
DatetimeIndex([‘2000-01-01’, ‘2000-01-02’, ‘2000-01-03’, ‘2000-01-04’,
‘2000-01-05’, ‘2000-01-06’, ‘2000-01-07’, ‘2000-01-08’],
dtype=’datetime64[ns]’, freq=’D’)
df = pd.DataFrame(np.random.randn(8, 4), columns=['A', 'B', 'C', 'D'])
print(df)
默认行索引为0,1,2,…
A B C D
0 0.040279 1.530194 -0.839738 1.769272
1 1.364984 -1.607406 0.045436 1.390306
2 1.064731 0.852654 0.741311 0.488171
3 -1.193471 -0.205841 0.224680 1.799955
4 0.156241 -0.151240 -1.336287 -0.102478
5 -1.152899 0.497563 0.789621 -0.780824
6 2.301429 -0.711661 0.394633 -0.009994
7 0.509731 0.187269 0.134205 1.489733
可以用index关键字指定行索引
df = pd.DataFrame(np.random.randn(8, 4),index=dates, columns=['A', 'B', 'C', 'D'])
print(df)
A B C D
2000-01-01 0.542793 1.045460 -0.942148 0.187426
2000-01-02 0.516108 0.821478 0.227624 1.503220
2000-01-03 1.558611 1.042741 0.116858 -0.848084
2000-01-04 -0.228758 -0.935041 1.318462 0.002611
2000-01-05 0.420747 3.439259 0.912372 1.345009
2000-01-06 -0.597713 1.039117 -0.235674 0.010001
2000-01-07 0.380847 1.370491 0.715843 -1.356307
2000-01-08 0.114837 0.770705 -0.865508 -0.073762
可以根据dateframe的列索引列数据
s = df['A']
print(s)
2000-01-01 0.542793
2000-01-02 0.516108
2000-01-03 1.558611
2000-01-04 -0.228758
2000-01-05 0.420747
2000-01-06 -0.597713
2000-01-07 0.380847
2000-01-08 0.114837
Freq: D, Name: A, dtype: float64
根据dateframe的多列索引数据时,多列置于一个列表中,各列不要求连续。
s = df[['A','B']]
print(s)
A B
2000-01-01 0.542793 1.045460
2000-01-02 0.516108 0.821478
2000-01-03 1.558611 1.042741
2000-01-04 -0.228758 -0.935041
2000-01-05 0.420747 3.439259
2000-01-06 -0.597713 1.039117
2000-01-07 0.380847 1.370491
2000-01-08 0.114837 0.770705
print(s[dates[5]]) # s = df['A'],s[dates[5]]等价于df['A'][dates[5]]
print(df['A'][dates[5]]) # 同上
-0.5977126285793471
df[[‘B’, ‘A’]] = df[[‘A’, ‘B’]] _# 交换AB 列_print(df)
A B C D
2000-01-01 1.045460 0.542793 -0.942148 0.187426
2000-01-02 0.821478 0.516108 0.227624 1.503220
2000-01-03 1.042741 1.558611 0.116858 -0.848084
2000-01-04 -0.935041 -0.228758 1.318462 0.002611
2000-01-05 3.439259 0.420747 0.912372 1.345009
2000-01-06 1.039117 -0.597713 -0.235674 0.010001
2000-01-07 1.370491 0.380847 0.715843 -1.356307
2000-01-08 0.770705 0.114837 -0.865508 -0.073762