How do i load multiple json files in python?

One option is listing all files in a directory with os.listdir and then finding only those that end in '.json':

import os, json
import pandas as pd

path_to_json = 'somedir/'
json_files = [pos_json for pos_json in os.listdir[path_to_json] if pos_json.endswith['.json']]
print[json_files]  # for me this prints ['foo.json']

Now you can use pandas DataFrame.from_dict to read in the json [a python dictionary at this point] to a pandas dataframe:

montreal_json = pd.DataFrame.from_dict[many_jsons[0]]
print montreal_json['features'][0]['geometry']

Prints:

{u'type': u'Point', u'coordinates': [-73.6051013, 45.5115944]}

In this case I had appended some jsons to a list many_jsons. The first json in my list is actually a geojson with some geo data on Montreal. I'm familiar with the content already so I print out the 'geometry' which gives me the lon/lat of Montreal.

The following code sums up everything above:

import os, json
import pandas as pd

# this finds our json files
path_to_json = 'json/'
json_files = [pos_json for pos_json in os.listdir[path_to_json] if pos_json.endswith['.json']]

# here I define my pandas Dataframe with the columns I want to get from the json
jsons_data = pd.DataFrame[columns=['country', 'city', 'long/lat']]

# we need both the json and an index number so use enumerate[]
for index, js in enumerate[json_files]:
    with open[os.path.join[path_to_json, js]] as json_file:
        json_text = json.load[json_file]

        # here you need to know the layout of your json and each json has to have
        # the same structure [obviously not the structure I have here]
        country = json_text['features'][0]['properties']['country']
        city = json_text['features'][0]['properties']['name']
        lonlat = json_text['features'][0]['geometry']['coordinates']
        # here I push a list of data into a pandas DataFrame at row given by 'index'
        jsons_data.loc[index] = [country, city, lonlat]

# now that we have the pertinent json data in our DataFrame let's look at it
print[jsons_data]

for me this prints:

  country           city                   long/lat
0  Canada  Montreal city  [-73.6051013, 45.5115944]
1  Canada        Toronto  [-79.3849008, 43.6529206]

It may be helpful to know that for this code I had two geojsons in a directory name 'json'. Each json had the following structure:

{"features":
[{"properties":
{"osm_key":"boundary","extent":
[-73.9729016,45.7047897,-73.4734865,45.4100756],
"name":"Montreal city","state":"Quebec","osm_id":1634158,
"osm_type":"R","osm_value":"administrative","country":"Canada"},
"type":"Feature","geometry":
{"type":"Point","coordinates":
[-73.6051013,45.5115944]}}],
"type":"FeatureCollection"}

Comparing data from multiple JSON files can get unweildy – unless you leverage Python to give you the data you need.

I often monitor key page speed metrics by testing web pages using WebPagetest or Google Lighthouse using their CLI or Node tools. I save test results as JSON, which is fine for looking at individual snapshots at a later time. But I often end up with folders full of data that cannot really be analyzed manually:

working_directory
└───data
    ├───export1.json
    ├───export2.json
    ├───export3.json
    ├───...

For example, how to compare changes in those metrics over time? Or how to look for a peak in the data?

The following handy little Python 3 script is useful for sifting through a directory full of JSON files and exporting specific values to a CSV for an ad-hoc analysis. It only uses built-in Python modules. I just drop it in my working directory and run it via command line with python3 json-to-csv-exporter.py:

json-to-csv-exporter.py

#!/usr/bin/env python3# Place this Python script in your working directory when you have JSON files in a subdirectory.
# To run the script via command line: "python3 json-to-csv-exporter.py"
import json
import glob
from datetime import datetime
import csv
# Place your JSON data in a directory named 'data/'
src = "data/"
date = datetime.now[]
data = []
# Change the glob if you want to only look through files with specific names
files = glob.glob['data/*', recursive=True]

  
# Loop through files
for single_file in files:
  with open[single_file, 'r'] as f:
    # Use 'try-except' to skip files that may be missing data
    try:
      json_file = json.load[f]
      data.append[[
        json_file['requestedUrl'],
        json_file['fetchTime'],
        json_file['categories']['performance']['score'],
        json_file['audits']['largest-contentful-paint']['numericValue'],
        json_file['audits']['speed-index']['numericValue'],
        json_file['audits']['max-potential-fid']['numericValue'],
        json_file['audits']['cumulative-layout-shift']['numericValue'],
        json_file['audits']['first-cpu-idle']['numericValue'],
        json_file['audits']['total-byte-weight']['numericValue']
      ]]
    except KeyError:
      print[f'Skipping {single_file}']
# Sort the data
data.sort[]
# Add headers
data.insert[0, ['Requested URL', 'Date', 'Performance Score', 'LCP', 'Speed Index', 'FID', 'CLS', 'CPU Idle', 'Total Byte Weight']]
# Export to CSV.
# Add the date to the file name to avoid overwriting it each time.
csv_filename = f'{str[date]}.csv'
with open[csv_filename, "w", newline=""] as f:
    writer = csv.writer[f]
    writer.writerows[data]
print["Updated CSV"]

That gives you a CSV that you can use to create charts or analyze to your heart’s content.

| Requested URL           | Date                     | Performance Score | LCP                | Speed Index        | FID | CLS                 | CPU Idle           | Total Byte Weight |
| ----------------------- | ------------------------ | ----------------- | ------------------ | ------------------ | --- | ------------------- | ------------------ | ----------------- |
| //www.example.com | 2020-08-26T11:19:42.608Z | 0.96              | 1523.257           | 1311.5760337571400 | 153 | 0.5311671549479170  | 1419.257 301319    | 301319            |
| //www.example.com | 2020-08-26T11:32:16.197Z | 0.99              | 1825.5990000000000 | 2656.8016986395200 | 496 | 0.06589290364583330 | 1993.5990000000000 | 301282            |

How do I read all JSON files in a directory in Python?

Read JSON file in Python.

Import json module..

Open the file using the name of the json file witn open[] function..

Read the json file using load[] and put the json data into a variable..

Can you import a JSON file in Python?

Python supports JSON through a built-in package called json. To use this feature, we import the json package in Python script. The text in JSON is done through quoted-string which contains the value in key-value mapping within { }.

How do I read multiple JSON files in Pyspark?

When you use format["json"] method, you can also specify the Data sources by their fully qualified name as below..

# Read JSON file into dataframe df = spark. read. ... .

# Read multiline json file multiline_df = spark. read. ... .

# Read multiple files df2 = spark. read. ... .

# Read all JSON files from a folder df3 = spark. read. ... .

How do you parse a JSON list in Python?

Parse JSON - Convert from JSON to Python If you have a JSON string, you can parse it by using the json.loads[] method. The result will be a Python dictionary.

How do I read all JSON files in a directory in Python?

Can you import a JSON file in Python?

How do I read multiple JSON files in Pyspark?

How do you parse a JSON list in Python?

Bài Viết Liên Quan

Toplist mới

Bài mới nhất

Chủ Đề