How to read large json file in php

I'm working on a cron script that hits an API, receives JSON file (a large array of objects) and stores it locally. Once that is complete another script needs to parse the downloaded JSON file and insert each object into a MySQL database.

Show

    I'm currently using a file_get_contents() along with json_decode(). This will attempt to read the whole file into memory before trying to process it. This would be fine except for the fact that my JSON files will usually range from 250MB-1GB+. I know I can increase my PHP memory limit but that doesn't seem to be the greatest answer in my mind. I'm aware that I can run fopen() and fgets() to read the file in line by line, but I need to read the file in by each json object.

    Is there a way to read in the file per object, or is there another similar approach?

    How to read large json file in php

    Ry-

    212k54 gold badges442 silver badges456 bronze badges

    asked Mar 12, 2013 at 22:30

    8

    try this lib https://github.com/shevron/ext-jsonreader

    The existing ext/json which is shipped with PHP is very convenient and simple to use - but it is inefficient when working with large ammounts of JSON data, as it requires reading the entire JSON data into memory (e.g. using file_get_contents()) and then converting it into a PHP variable at once - for large data sets, this takes up a lot of memory.

    JSONReader is designed for memory efficiency - it works on streams and can read JSON data from any PHP stream without loading the entire data into memory. It also allows the developer to extract specific values from a JSON stream without decoding and loading all data into memory.

    answered Mar 12, 2013 at 22:54

    Pawel DubielPawel Dubiel

    16.9k3 gold badges40 silver badges56 bronze badges

    This really depends on what the json files contain.

    If opening the file one shot into memory is not an option, your only other option, as you eluded to, is fopen/fgets.

    Reading line by line is possible, and if these json objects have a consistent structure, you can easily detect when a json object in a file starts, and ends.

    Once you collect a whole object, you insert it into a db, then go on to the next one.

    There isn't much more to it. the algorithm to detect the beginning and end of a json object may get complicating depending on your data source, but I hvae done something like this before with a far more complex structure (xml) and it worked fine.

    answered Mar 12, 2013 at 22:36

    KovoKovo

    1,67714 silver badges19 bronze badges

    2

    Best possible solution:

    Use some sort of delimiter (pagination, timestamp, object ID etc) that allows you to read the data in smaller chunks over multiple requests. This solution assumes that you have some sort of control of how these JSON files are generated. I'm basing my assumption on:

    This would be fine except for the fact that my JSON files will usually range from 250MB-1GB+.

    Reading in and processing 1GB of JSON data is simply ridiculous. A better approach is most definitely needed.

    answered Mar 12, 2013 at 22:43

    Wayne WhittyWayne Whitty

    19.2k5 gold badges43 silver badges65 bronze badges

    I'm working on a project now where I need to ingest and output large amounts of data between systems, without direct access to databases. This has come up on past projects with CSV files, but in this case I am using JSON for various reasons which changes things quite a bit.

    CSV files are somewhat easier to work with when dealing with large amounts of data because each record is on its own line. Thus it's easy to create a basic file parser that will do the job by just reading one line at a time. However, with JSON, the file could be formatted in multiple different ways, with a single object possibly spanning multiple lines, or there may be just one single massive line of data containing all the objects.

    I could have tried to write my own tool to handle this issue, but luckily somebody else has already solved this for us. In this case I am going to demonstrate the usage of the JSON machine PHP package to process an extremely large JSON file.

    Setup

    First we need to create an artificial large JSON file to simulate our issue. One could use something like an online JSON generator, but my browser would crash when I set a really high number of objects to create. Hence I used the following basic script to simulate my use-case of a massive array of items that have a depth of 1 (e.g. just name/value pairs).

     md5(rand()),
            "isActive" => md5(rand()),
            "balance" => md5(rand()),
            "picture" => md5(rand()),
            "age" => md5(rand()),
            "eyeColor" => md5(rand()),
            "name" => md5(rand()),
            "gender" => md5(rand()),
            "company" => md5(rand()),
            "email" => md5(rand()),
            "phone" => md5(rand()),
            "address" => md5(rand()),
            "about" => md5(rand()),
            "registered" => md5(rand()),
            "latitude" => md5(rand()),
            "longitude" => md5(rand()),
        ];
    }
    
    print json_encode($items, JSON_PRETTY_PRINT);
    

    I made sure to test this with and without the use of JSON_PRETTY_PRINT. This results in the generated file having different formatting, but the end result of this tutorial is exactly the same.

    This generated me an 843 MB file of one million items, which I feel is suitably large enough for stress testing.

    How to read large json file in php

    Running

    Now that we have a suitably large file, we need to process it.

    First we need to install the JSON Machine package:

    composer require halaxa/json-machine
    

    Then we can use it in a script like so:

    This doesn't actually do anything that useful. It just prints out each object one-by-one. However it does demonstrate that we can safely loop over all the items in the JSON file one at a time without running out of memory etc. We can take this further and write some code to possibly batch insert them 1,000 at a time into a database or do some sort of operation before outputting to another file.

    Last updated: 18th March 2021
    First published: 18th March 2021