Looking for simple ways to convert CSV to JSON in Python? We’ve got you covered. In this blog post, we will talk all about CSV, and JSON files, the advantages of each one of these file formats, and finally python code you can use to convert a CSV file to JSON format.
CSV vs JSON
Both CSV and JSON are common formats for storing data. While CSV is more popular in a non-technical way, JSON is widely used in data exchange between servers and applications.
CSV is simple and stores the data in tabular format and JSON is flexible to work with in most programming languages and web applications.
Let’s find out more about CSV and JSON formats before we jump into the code part.
What is CSV format?
- CSV (comma-separated values) is one of the widely used file formats to store data in an organized way.
- The data is organized in rows and columns where fields in a row are separated by a delimiter, typically a comma (Hence the name comma separated value) or a Tab.
- CSV format data can be easily opened and read by popular applications like Google Sheets, Microsoft Excel as well as all the text editors.
Sample CSV Data:
Title,Released,Director,Language,imdbRating
Avatar,18 Dec 2009,James Cameron,"English, Spanish",7.9
I Am Legend,14 Dec 2007,Francis Lawrence,English,7.2
300,09 Mar 2007,Zack Snyder,English,7.7
The Avengers,04 May 2012,Joss Whedon,"English, Russian",8.1
This is pretty hard to read, isn’t it? Now check out the JSON format data.
What is JSON format?
- On the other hand, JSON (JavaScript Object Notation) is also a structured data storage format widely used in applications to transfer data over the network.
- Even though the name suggests JavaScript Object Notation, it is actually language-independent.
- JSON supports storing complex data structures such as nested arrays and objects.
Sample JSON Data:
[{
"Title":"Avatar",
"Released":"18 Dec 2009",
"Director":"James Cameron",
"Language":"English, Spanish",
"imdbRating":"7.9"
},
{
"Title":"I Am Legend",
"Released":"14 Dec 2007",
"Director":"Francis Lawrence",
"Language":"English",
"imdbRating":"7.2"
},
{
"Title":"300",
"Released":"09 Mar 2007",
"Director":"Zack Snyder",
"Language":"English",
"imdbRating":"7.7"
},
{
"Title":"The Avengers",
"Released":"04 May 2012",
"Director":"Joss Whedon",
"Language":"English, Russian",
"imdbRating":"8.1"
}]
Complications with CSV data
JSON format data is easily readable and flexible to work with compared to CSV. Also, most programming languages have better support for working with JSON data.
- Most web-based applications are built with JSON format data. REST APIs are mostly written by responding with JSON data. If you want to integrate any other third-party applications, then most probably you will end up communicating in the JSON format data.
- Working with large data sets in CSV format is difficult since it is a plain text format. Whereas JSON format can be easily parsed and serialized.
- CSV data is hard to store, especially when nested data is involved. It is not straightforward to work with nested objects in CSV format. However, it is pretty simple with JSON data.
How do we convert CSV to JSON in Python?
There are different ways to get JSON data from CSV using Python. We are going to discuss two of the widely used approaches here.
- Using
csv
andjson
modules - Using
pandas
framework
Note
The second method requires you to install the pandas
library, which has a huge collection of in-built functions for data processing. This is one of the must and should libraries used in data science related projects.
1. With csv and json modules
We are going to use csv
and json
libraries in this approach. Both these modules are incorporated with Python’s standard library. We don’t need to install them separately.
The csv module allows you to read a CSV file or write CSV data to a file. Whereas json
module allows you to convert the python data types formats to JSON format.
Here is the sample code for CSV to JSON in Python:
#importing csv and json libraries
import csv
import json
#read the csv file
with open('movies.csv') as movies_csv_file:
data = csv.DictReader(movies_csv_file)
movies_csv_data = list(data)
# writing the csv data to json file
with open('movies_data.json', 'w') as movie_json:
json.dump(movies_csv_data, movie_json)
Explanation:
After importing csv
and json
modules,
- The first step is to read the data from the CSV file. We have used
DictReader()
method in this example. This will convert each row into a dictionary and in the next step, we are converting dict to a Python list. - The next step is to write this data into a JSON file. json library has
dump()
method to convert data to JSON format and store to a file.
Note
You can alternatively use csv.reader()
method instead of DictReader()
method. This function will return the reader object which you can iterate over and convert to Python lists
2. Using pandas
Pandas is a powerful open-source library for data analysis and manipulation. If you have worked on data science projects, then you might have worked with pandas
library for sure.
In our use case as well, pandas gives a flexible approach to convert CSV to JSON data. However, this library is not a part of Python’s standard library unlike csv
and json
modules. We have to install it separately to the Python environment.
Pre-requisite:
Install pandas
library with this command from the terminal.
pip3 install pandas
Now, we have the pandas library installed, you can access all the data manipulation functionalities from this module.
Here is the code thread on how we do this.
#importing pandas library
import pandas as pd
#read csv data to pandas dataframe
movies_data = pd.read_csv('movies.csv')
#writing the data to json format
movies_json_data = movies_data.to_json(orient = 'records')
#write the JSON data to a file
with open('movies_data.json', 'w') as jsonfile:
jsonfile.write(movies_json_data)
pandas library has built-in functions to work with both CSV and JSON data. In the above code, we have used read_csv()
method to read the data from CSV file to pandas data frame format.
Similarly, to convert this DataFrame format to JSON data, we have used to_json()
method. This will create the JSON data and we can write this into a file.
The to_json()
method accepts different optional parameters which are used for converting the dataframe to JSON data and also for the exclusion of a few columns from the data. This method returns the JSON string.
- orient: This parameter is used to specify the orientation of the JSON string. By default, it takes ‘columns’ for this param. The other accepted values include ‘index’, ‘split’, ‘records’, and ‘values’. These are used for generating JSON strings with index labels or value labels at the top level of JSON.
- date_format: As the name suggests, this is used to specify the date format. If there are any date fields in the data frame, you can specify the date format while writing it to JSON string. The default value is ‘epoch’
- double_precision: For float values, we can specify the precision limit with this parameter.
- force_ascii: This param takes True as the default value. If specified to True, only the ASCII values are converted to JSON, and the rest of the characters are escaped.
- sep: Used to mention the separator in CSV file. The default value is ‘,’.
There are other useful parameters it accepts. You can use them depending on your requirement.
CSV to JSON – only a few specified columns:
pandas
gives you a flexibility of taking only a specified columns from the data set and drop the rest while converting JSON format. Here is the code for this.
#importing pandas library
import pandas as pd
#read the CSV file to DataFrame
df = pd.read_csv('moves_data.csv')
#extract only the certain fields from the DataFrame and convert them to JSON string
json_data = df[['Title', 'Released', 'Director']].to_json(orient='records')
With this code, only Title, Released and Director fields will be extracted from the data set and rest of the columns will be dropped.
So, Which one is the most efficient solution?
Both the listed solutions are efficient in converting CSV data to JSON format in Python.
However, pandas
is considered as effective when working with the large data sets. It is optimised for working with tabular format and provides many in-built functions. So if you wanted to apply any data manipulation operations such as sorting or filtering, you can easily do it with pandas
library.
The only overhead with this method is you need to explicitly install the pandas
module. It doesn’t come by default.
Performance Report:
We have tried testing both the methods of converting CSV to JSON in Python using both the listed methods. We have taken 850MB size CSV file and tried converting this to JSON format. Below are the test results.
Method | Time Taken (milli sec) | Memory Usage (MB) |
---|---|---|
Using Pandas Library | 3698 | 593 |
Using csv and json libraries | 16688 | 1277 |
Problems you’ll encounter
One of the main problems we encounter while converting CSV to JSON is with the format of the data. CSV data format is not standardized. This makes the parsing and conversion hard and error-prone.
When we tried running CSV to JSON code on an 800 MB file downloaded from the internet, it didn’t run well due to the inconsistencies in data types for a specific field in the CSV file. We had to mention the data type explicitly to fix this.
To make sure the conversion is successful, carefully review the structure and format of the data.
Frequently Asked Questions
-
How to convert CSV to JSON in Python?
There are two ways to convert CSV format data to JSON. The first one is using
csv
andjson
modules from the Python’s standard library and another way is to use thepandas
library. Listed down both the methods with code examples in this article. -
What is the fastest way to create JSON from CSV data?
We tested the CSV to JSON conversion code on both the standard and pandas methods.
pandas
method was by far the fastest one. Added the detailed performance comparison report.