The JSON Output
Converting the example CSV produces:
[
{
"name": "Alice Chen",
"email": "alice@example.com",
"role": "admin",
"active": true
},
{
"name": "Bob Martinez",
"email": "bob@example.com",
"role": "editor",
"active": true
},
{
"name": "Carol Williams",
"email": "carol@example.com",
"role": "viewer",
"active": false
}
]
The first row becomes the keys. Each subsequent row becomes an object. The active column values true and false are converted to JSON booleans because they match the boolean literal strings exactly.
CSV Parsing Rules (RFC 4180)
RFC 4180 is the closest thing to a formal CSV standard. The key rules:
- Fields are separated by commas
- Each record ends with CRLF (
\r\n), though most parsers also accept LF (\n) - Fields may be enclosed in double quotes; if enclosed, the double quote character inside is escaped by doubling it (
"") - The first record may be a header row with column names
- All records must have the same number of fields (though parsers vary in how strictly they enforce this)
A compliant CSV with quoted fields and embedded commas:
id,description,tags
1,"Converts JSON, YAML, and CSV","data,tools"
2,"Supports ""quoted"" values","parsing"
The second column of row 1 contains a comma. The third column contains a comma. Both are valid because the fields are enclosed in double quotes. The doubled quote in row 2’s description represents a single literal quote.
Type Inference
CSV fields are always strings at the format level. A converter that produces typed JSON applies inference in order:
"42" -> 42 (valid integer)
"3.14" -> 3.14 (valid float)
"true" -> true (boolean literal)
"false" -> false (boolean literal)
"" -> null (empty field)
"hello" -> "hello" (stays as string)
This works for most data, but produces incorrect results for:
- ZIP codes and phone numbers with leading zeros (
07302becomes7302) - Large integers beyond JavaScript’s safe integer range (precision loss above 2^53)
- Scientific notation values you want as strings (
1e5becoming100000) - Fields intentionally storing the string
"true"or"null"
For production data pipelines, do not rely on the converter’s type inference. Apply type coercion explicitly in your application based on the known schema.
CLI Alternatives
When you need to convert CSV to JSON in a script or pipeline:
csvjson (part of csvkit)
pip install csvkit
csvjson data.csv > data.json
csvjson --indent 2 data.csv > data.json
Python standard library
import csv
import json
with open("data.csv") as f:
reader = csv.DictReader(f)
data = list(reader)
print(json.dumps(data, indent=2))
csv.DictReader uses the first row as keys automatically. All values remain strings (no type inference).
jq with @csv
jq can go the other direction (JSON to CSV) using @csv, but for CSV to JSON the cleanest path is Python or csvkit.
Edge Cases
BOM Characters
Excel on Windows exports CSV files with a UTF-8 BOM (the byte sequence EF BB BF at the start of the file). This invisible character attaches to the first column header. If your first key is \ufefffirst_name instead of first_name, strip the BOM before parsing:
with open("data.csv", encoding="utf-8-sig") as f:
reader = csv.DictReader(f)
The utf-8-sig codec strips the BOM automatically.
Inconsistent Column Counts
Some CSV exporters produce rows with fewer columns than the header if trailing fields are empty. RFC 4180 requires consistent column counts, but real-world CSV is messy. When a row has fewer columns than the header, the missing keys either get null values or are omitted entirely depending on the parser.
Different Line Endings
CSV files from Windows use CRLF (\r\n). Files from Unix/Mac use LF (\n). Some parsers treat the \r as part of the last field’s value in each row. If your last column has values with a trailing carriage return, the source file has Windows line endings that the parser is not handling. Open in a hex editor or run file data.csv to check.