假设有如下的JSON文件:
[{"a": 1, "b": 2, "c": 3}, {"a": 4, "b": 5, "c": 6}, {"a": 7, "b": 8, "c": 9}]
使用read_json函数可以自动将JSON数据集按照指定的顺序转换为Series或者DataFrame对象,其默认做法是假设JSON数据中的每个对象是表里的一行:
In [81]: data = pd.read_json('d:/example.json') In [82]: data Out[82]: a b c 0 1 2 3 1 4 5 6 2 7 8 9
反之,使用to_json函数,将pandas对象转换为json格式:
In [83]: print(data.to_json()) {"a":{"0":1,"1":4,"2":7},"b":{"0":2,"1":5,"2":8},"c":{"0":3,"1":6,"2":9}} In [84]: print(data.to_json(orient='records')) # 与上面的格式不同 [{"a":1,"b":2,"c":3},{"a":4,"b":5,"c":6},{"a":7,"b":8,"c":9}]
我们都知道,Python标准库pickle,可以支持二进制格式的文件读写,且高效方便。
pandas同样设计了用于pickle格式的读写函数read_pickle
和to_pickle
。
In [85]: df = pd.read_csv('d:/ex1.csv') In [86]: df Out[86]: a b c d message 0 1 2 3 4 hello 1 5 6 7 8 world 2 9 10 11 12 foo In [87]: df.to_pickle('d:/df_pickle') In [88]: new_df = pd.read_pickle('d:/df_pickle') In [89]: new_df Out[89]: a b c d message 0 1 2 3 4 hello 1 5 6 7 8 world 2 9 10 11 12 foo