既然有读,必然有写。
可以使用DataFrame的to_csv方法,将数据导出为逗号分隔的文件:
In [57]: result Out[57]: one two three four key 0 0.467976 -0.038649 -0.295344 -1.824726 L 1 -0.358893 1.404453 0.704965 -0.200638 B 2 -0.501840 0.659254 -0.421691 -0.057688 G 3 0.204886 1.074134 1.388361 -0.982404 R 4 0.354628 -0.133116 0.283763 -0.837063 Q In [58]: result.to_csv('d:/out.csv')
当然 ,也可以指定为其它分隔符,甚至将数据输出到sys.stdout中:
In [60]: result.to_csv(sys.stdout, sep='|') |one|two|three|four|key 0|0.467976300189|-0.0386485396255|-0.295344251987|-1.82472622729|L 1|-0.358893469543|1.40445260007|0.704964644926|-0.20063830401500002|B 2|-0.50184039929|0.659253707223|-0.42169061931199997|-0.0576883018364|G 3|0.20488621220199998|1.07413396504|1.38836131252|-0.982404023494|R 4|0.354627914484|-0.13311585229599998|0.283762637978|-0.837062961653|Q
缺失值默认以空字符串出现,当然也可以指定其它标识值对缺失值进行标注,比如使用‘NULL’:
In [70]: data = pd.DataFrame(np.random.randint(9,size=9).reshape(3,3)) In [71]: data Out[71]: 0 1 2 0 7 7 3 1 8 1 5 2 2 4 2 In [72]: data.iloc[2,2] = np.nan In [73]: data.to_csv(sys.stdout, na_rep='NULL') ,0,1,2 0,7,7,3.0 1,8,1,5.0 2,2,4,NULL
在写入的时候,我们还可以禁止将行索引和列索引写入:
In [74]: result.to_csv(sys.stdout, index=False, header=False) 0.467976300189,-0.0386485396255,-0.295344251987,-1.82472622729,L -0.358893469543,1.40445260007,0.704964644926,-0.20063830401500002,B -0.50184039929,0.659253707223,,-0.0576883018364,G 0.20488621220199998,1.07413396504,1.38836131252,-0.982404023494,R 0.354627914484,-0.13311585229599998,0.283762637978,-0.837062961653,Q
也可以挑选需要的列写入:
In [75]: result.to_csv(sys.stdout, index=False, columns=['one','three','key']) one,three,key 0.467976300189,-0.295344251987,L -0.358893469543,0.704964644926,B -0.50184039929,,G 0.20488621220199998,1.38836131252,R 0.354627914484,0.283762637978,Q
Series的写入方式也是一样的:
In [76]: dates = pd.date_range('1/1/2019', periods=7) # 生成一个日期Series In [77]: dates Out[77]: DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04', '2019-01-05', '2019-01-06', '2019-01-07'], dtype='datetime64[ns]', freq='D') In [78]: ts = pd.Series(np.arange(7), index=dates) # 将日期作为索引 In [79]: ts Out[79]: 2019-01-01 0 2019-01-02 1 2019-01-03 2 2019-01-04 3 2019-01-05 4 2019-01-06 5 2019-01-07 6 Freq: D, dtype: int32 In [80]: ts.to_csv('d:/tseries.csv') # 写入文件中