当对两个Series或者DataFrame对象进行算术运算的时候,返回的结果是两个对象的并集。如果存在某个索引不匹配时,将以缺失值NaN的方式体现,并对以后的操作产生影响。这类似数据库的外连接操作。
In [58]: s1 = pd.Series([4.2,2.6, 5.4, -1.9], index=list('acde')) In [60]: s2 = pd.Series([-2.3, 1.2, 5.6, 7.2, 3.4], index= list('acefg')) In [61]: s1 Out[61]: a 4.2 c 2.6 d 5.4 e -1.9 dtype: float64 In [62]: s2 Out[62]: a -2.3 c 1.2 e 5.6 f 7.2 g 3.4 dtype: float64 In [63]: s1+s2 Out[63]: a 1.9 c 3.8 d NaN e 3.7 f NaN g NaN dtype: float64 In [64]: s1-s2 Out[64]: a 6.5 c 1.4 d NaN e -7.5 f NaN g NaN dtype: float64 In [65]: s1* s2 Out[65]: a -9.66 c 3.12 d NaN e -10.64 f NaN g NaN dtype: float64 In [66]: df1 = pd.DataFrame(np.arange(9).reshape(3,3),columns=list('bcd'),index=['one','two','three']) In [67]: df2 = pd.DataFrame(np.arange(12).reshape(4,3),columns=list('bde'),index=['two','three','five','six']) In [68]: df1 Out[68]: b c d one 0 1 2 two 3 4 5 three 6 7 8 In [69]: df2 Out[69]: b d e two 0 1 2 three 3 4 5 five 6 7 8 six 9 10 11 In [70]: df1 + df2 Out[70]: b c d e five NaN NaN NaN NaN one NaN NaN NaN NaN six NaN NaN NaN NaN three 9.0 NaN 12.0 NaN two 3.0 NaN 6.0 NaN
其实,在上述过程中,为了防止NaN对后续的影响,很多时候我们要使用一些填充值:
In [71]: df1.add(df2, fill_value=0) Out[71]: b c d e five 6.0 NaN 7.0 8.0 one 0.0 1.0 2.0 NaN six 9.0 NaN 10.0 11.0 three 9.0 7.0 12.0 5.0 two 3.0 4.0 6.0 2.0 In [74]: df1.reindex(columns=df2.columns, fill_value=0) # 也可以这么干 Out[74]: b d e one 0 2 0 two 3 5 0 three 6 8 0
注意,这里填充的意思是,如果某一方有值,另一方没有的话,将没有的那方的值填充为指定的参数值。而不是在最终结果中,将所有的NaN替换为填充值。
类似add的方法还有:
DataFrame也可以和Series进行操作,这类似于numpy中不同维度数组间的操作,其中将使用广播机制。我们先看看numpy中的机制:
In [75]: a = np.arange(12).reshape(3,4) In [76]: a Out[76]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) In [78]: a[0] # 取a的第一行,这是一个一维数组 Out[78]: array([0, 1, 2, 3]) In [79]: a - a[0] # 二维数组减一维数组,在行方向上进行了广播 Out[79]: array([[0, 0, 0, 0], [4, 4, 4, 4], [8, 8, 8, 8]])
DataFrame和Series之间的操作是类似的:
In [80]: df = pd.DataFrame(np.arange(12).reshape(4,3),columns=list('bde'),index=['one','two','three','four']) In [81]: s = df.iloc[0] # 取df的第一行生成一个Series In [82]: df Out[82]: b d e one 0 1 2 two 3 4 5 three 6 7 8 four 9 10 11 In [83]: s Out[83]: b 0 d 1 e 2 Name: one, dtype: int32 In [84]: df - s # 减法会广播 Out[84]: b d e one 0 0 0 two 3 3 3 three 6 6 6 four 9 9 9 In [85]: s2 = pd.Series(range(3), index=list('bef')) In [86]: df + s2 # 如果存在不匹配的列索引,则引入缺失值 Out[86]: b d e f one 0.0 NaN 3.0 NaN two 3.0 NaN 6.0 NaN three 6.0 NaN 9.0 NaN four 9.0 NaN 12.0 NaN In [87]: s3 = df['d'] # 取df的一列 In [88]: s3 Out[88]: one 1 two 4 three 7 four 10 Name: d, dtype: int32 In [89]: df.sub(s3, axis='index') # 指定按列进行广播 Out[89]: b d e one -1 0 1 two -1 0 1 three -1 0 1 four -1 0 1
在上面最后的例子中,我们通过axis='index'或者axis=0,在另外一个方向广播。