Pandas中的索引对象Index用于存储轴标签和其它元数据。索引对象是不可变的,用户无法修改它。
In [73]: obj = pd.Series(range(3),index = ['a','b','c']) In [74]: index = obj.index In [75]: index Out[75]: Index(['a', 'b', 'c'], dtype='object') In [76]: index[1:] Out[76]: Index(['b', 'c'], dtype='object') In [77]: index[1] = 'f' # TypeError In [8]: index.size Out[8]: 3 In [9]: index.shape Out[9]: (3,) In [10]: index.ndim Out[10]: 1 In [11]: index.dtype Out[11]: dtype('O')
索引对象的不可变特性,使得在多种数据结构中分享索引对象更安全:
In [78]: labels = pd.Index(np.arange(3)) In [79]: labels Out[79]: Int64Index([0, 1, 2], dtype='int64') In [80]: obj2 = pd.Series([2,3.5,0], index=labels) In [81]: obj2 Out[81]: 0 2.0 1 3.5 2 0.0 dtype: float64 In [82]: obj2.index is labels Out[82]: True
索引对象,本质上也是一个容器对象,所以可以使用Python的in操作:
In [84]: f2 Out[84]: key year state pop debt order a 2000 beijing 1.5 NaN b 2001 beijing 1.7 NaN c 2002 beijing 3.6 1.0 d 2001 shanghai 2.4 2.0 e 2002 shanghai 2.9 NaN f 2003 shanghai 3.2 3.0 In [86]: 'c' in f2.index Out[86]: True In [88]: 'pop' in f2.columns Out[88]: True
而且最关键的是,pandas的索引对象可以包含重复的标签:
In [89]: dup_lables = pd.Index(['foo','foo','bar','bar']) In [90]: dup_lables Out[90]: Index(['foo', 'foo', 'bar', 'bar'], dtype='object')
那么思考一下,DataFrame对象可不可以有重复的columns或者index呢?
可以的!但是请尽量不要这么做!:
In [91]: f2.index = ['a']*6 In [92]: f2 Out[92]: key year state pop debt a 2000 beijing 1.5 NaN a 2001 beijing 1.7 NaN a 2002 beijing 3.6 1.0 a 2001 shanghai 2.4 2.0 a 2002 shanghai 2.9 NaN a 2003 shanghai 3.2 3.0 In [93]: f2.loc['a'] Out[93]: key year state pop debt a 2000 beijing 1.5 NaN a 2001 beijing 1.7 NaN a 2002 beijing 3.6 1.0 a 2001 shanghai 2.4 2.0 a 2002 shanghai 2.9 NaN a 2003 shanghai 3.2 3.0 In [94]: f2.columns = ['year']*4 In [95]: f2 Out[95]: year year year year a 2000 beijing 1.5 NaN a 2001 beijing 1.7 NaN a 2002 beijing 3.6 1.0 a 2001 shanghai 2.4 2.0 a 2002 shanghai 2.9 NaN a 2003 shanghai 3.2 3.0 In [96]: f2.index.is_unique # 可以使用这个属性来判断是否存在重复的索引 Out[96]: False
index对象也可以进行集合的交、并、差和异或运算,类似Python的标准set数据结构。