How do i load multiple json files in python?

Question

One option is listing all files in a directory with os.listdir and then finding only those that end in '.json':

Nội dung chính Show

How do I read all JSON files in a directory in Python?
Can you import a JSON file in Python?
How do I read multiple JSON files in Pyspark?
How do you parse a JSON list in Python?

import os, json
import pandas as pd

path_to_json = 'somedir/'
json_files = [pos_json for pos_json in os.listdir(path_to_json) if pos_json.endswith('.json')]
print(json_files)  # for me this prints ['foo.json']

Now you can use pandas DataFrame.from_dict to read in the json (a python dictionary at this point) to a pandas dataframe:

montreal_json = pd.DataFrame.from_dict(many_jsons[0])
print montreal_json['features'][0]['geometry']

Prints:

{u'type': u'Point', u'coordinates': [-73.6051013, 45.5115944]}

In this case I had appended some jsons to a list many_jsons. The first json in my list is actually a geojson with some geo data on Montreal. I'm familiar with the content already so I print out the 'geometry' which gives me the lon/lat of Montreal.

The following code sums up everything above:

import os, json
import pandas as pd

# this finds our json files
path_to_json = 'json/'
json_files = [pos_json for pos_json in os.listdir(path_to_json) if pos_json.endswith('.json')]

# here I define my pandas Dataframe with the columns I want to get from the json
jsons_data = pd.DataFrame(columns=['country', 'city', 'long/lat'])

# we need both the json and an index number so use enumerate()
for index, js in enumerate(json_files):
    with open(os.path.join(path_to_json, js)) as json_file:
        json_text = json.load(json_file)

        # here you need to know the layout of your json and each json has to have
        # the same structure (obviously not the structure I have here)
        country = json_text['features'][0]['properties']['country']
        city = json_text['features'][0]['properties']['name']
        lonlat = json_text['features'][0]['geometry']['coordinates']
        # here I push a list of data into a pandas DataFrame at row given by 'index'
        jsons_data.loc[index] = [country, city, lonlat]

# now that we have the pertinent json data in our DataFrame let's look at it
print(jsons_data)

for me this prints:

  country           city                   long/lat
0  Canada  Montreal city  [-73.6051013, 45.5115944]
1  Canada        Toronto  [-79.3849008, 43.6529206]

It may be helpful to know that for this code I had two geojsons in a directory name 'json'. Each json had the following structure:

{"features":
[{"properties":
{"osm_key":"boundary","extent":
[-73.9729016,45.7047897,-73.4734865,45.4100756],
"name":"Montreal city","state":"Quebec","osm_id":1634158,
"osm_type":"R","osm_value":"administrative","country":"Canada"},
"type":"Feature","geometry":
{"type":"Point","coordinates":
[-73.6051013,45.5115944]}}],
"type":"FeatureCollection"}

Comparing data from multiple JSON files can get unweildy – unless you leverage Python to give you the data you need.

I often monitor key page speed metrics by testing web pages using WebPagetest or Google Lighthouse using their CLI or Node tools. I save test results as JSON, which is fine for looking at individual snapshots at a later time. But I often end up with folders full of data that cannot really be analyzed manually:

working_directory
└───data
    ├───export1.json
    ├───export2.json
    ├───export3.json
    ├───...

For example, how to compare changes in those metrics over time? Or how to look for a peak in the data?

The following handy little Python 3 script is useful for sifting through a directory full of JSON files and exporting specific values to a CSV for an ad-hoc analysis. It only uses built-in Python modules. I just drop it in my working directory and run it via command line with python3 json-to-csv-exporter.py:

json-to-csv-exporter.py

#!/usr/bin/env python3# Place this Python script in your working directory when you have JSON files in a subdirectory.
# To run the script via command line: "python3 json-to-csv-exporter.py"
import json
import glob
from datetime import datetime
import csv
# Place your JSON data in a directory named 'data/'
src = "data/"
date = datetime.now()
data = []
# Change the glob if you want to only look through files with specific names
files = glob.glob('data/*', recursive=True)



# Loop through files
for single_file in files:
  with open(single_file, 'r') as f:
    # Use 'try-except' to skip files that may be missing data
    try:
      json_file = json.load(f)
      data.append([
        json_file['requestedUrl'],
        json_file['fetchTime'],
        json_file['categories']['performance']['score'],
        json_file['audits']['largest-contentful-paint']['numericValue'],
        json_file['audits']['speed-index']['numericValue'],
        json_file['audits']['max-potential-fid']['numericValue'],
        json_file['audits']['cumulative-layout-shift']['numericValue'],
        json_file['audits']['first-cpu-idle']['numericValue'],
        json_file['audits']['total-byte-weight']['numericValue']
      ])
    except KeyError:
      print(f'Skipping {single_file}')
# Sort the data
data.sort()
# Add headers
data.insert(0, ['Requested URL', 'Date', 'Performance Score', 'LCP', 'Speed Index', 'FID', 'CLS', 'CPU Idle', 'Total Byte Weight'])
# Export to CSV.
# Add the date to the file name to avoid overwriting it each time.
csv_filename = f'{str(date)}.csv'
with open(csv_filename, "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerows(data)
print("Updated CSV")

That gives you a CSV that you can use to create charts or analyze to your heart’s content.

| Requested URL           | Date                     | Performance Score | LCP                | Speed Index        | FID | CLS                 | CPU Idle           | Total Byte Weight |
| ----------------------- | ------------------------ | ----------------- | ------------------ | ------------------ | --- | ------------------- | ------------------ | ----------------- |
| https://www.example.com | 2020-08-26T11:19:42.608Z | 0.96              | 1523.257           | 1311.5760337571400 | 153 | 0.5311671549479170  | 1419.257 301319    | 301319            |
| https://www.example.com | 2020-08-26T11:32:16.197Z | 0.99              | 1825.5990000000000 | 2656.8016986395200 | 496 | 0.06589290364583330 | 1993.5990000000000 | 301282            |

How do I read all JSON files in a directory in Python?

Read JSON file in Python.

Import json module..

Open the file using the name of the json file witn open() function..

Read the json file using load() and put the json data into a variable..

Can you import a JSON file in Python?

Python supports JSON through a built-in package called json. To use this feature, we import the json package in Python script. The text in JSON is done through quoted-string which contains the value in key-value mapping within { }.

How do I read multiple JSON files in Pyspark?

When you use format("json") method, you can also specify the Data sources by their fully qualified name as below..

# Read JSON file into dataframe df = spark. read. ... .

# Read multiline json file multiline_df = spark. read. ... .

# Read multiple files df2 = spark. read. ... .

# Read all JSON files from a folder df3 = spark. read. ... .

How do you parse a JSON list in Python?

Parse JSON - Convert from JSON to Python If you have a JSON string, you can parse it by using the json.loads() method. The result will be a Python dictionary.

programming python JSON access Python

How do i load multiple json files in python?

How do I read all JSON files in a directory in Python?

Can you import a JSON file in Python?

How do I read multiple JSON files in Pyspark?

How do you parse a JSON list in Python?

Bài Viết Liên Quan

Quảng Cáo

Có thể bạn quan tâm

Toplist được quan tâm

Quảng cáo

Xem Nhiều

Quảng cáo

Chúng tôi

Điều khoản

Trợ giúp

Mạng xã hội