How do i count the number of rows in a csv file in python?

View Discussion

Improve Article

Save Article

  • Read
  • Discuss
  • View Discussion

    Improve Article

    Save Article

    CSV [Comma Separated Values] is a simple fileformat used to store tabular data, such as a spreadsheet or database. A CSV file stores tabular data [numbers and text] in plain text. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The use of the comma as a field separator is the source of the name for this file format.

    In this article, we are going to discuss various approaches to count the number of lines in a CSV file using Python.

    We are going to use the below dataset to perform all operations:

    Python3

    import pandas as pd

    results = pd.read_csv['Data.csv']

    print[results]

    Output:

    To count the number of lines/rows present in a CSV file, we have two different types of methods:

    • Using len[] function.
    • Using a counter.

    Using len[] function

    Under this method, we need to read the CSV file using pandas library and then use the len[] function with the imported CSV file, which will return an int value of a number of lines/rows present in the CSV file.

    Python3

    import pandas as pd

    results = pd.read_csv['Data.csv']

    print["Number of lines present:-"

          len[results]]

    Output:

    Using a counter

    Under this approach, we will be initializing an integer rowcount to -1 [not 0 as iteration will start from the heading and not the first row]at the beginning and iterate through the whole file and incrementing the rowcount by one. And in the end, we will be printing the rowcount value.

    Python3

    rowcount  = 0

    for row in open["Data.csv"]:

      rowcount+= 1

    print["Number of lines present:-", rowcount]

    Output:


    2018-10-29 EDIT

    Thank you for the comments.

    I tested several kinds of code to get the number of lines in a csv file in terms of speed. The best method is below.

    with open[filename] as f:
        sum[1 for line in f]
    

    Here is the code tested.

    import timeit
    import csv
    import pandas as pd
    
    filename = './sample_submission.csv'
    
    def talktime[filename, funcname, func]:
        print[f"# {funcname}"]
        t = timeit.timeit[f'{funcname}["{filename}"]', setup=f'from __main__ import {funcname}', number = 100] / 100
        print['Elapsed time : ', t]
        print['n = ', func[filename]]
        print['\n']
    
    def sum1forline[filename]:
        with open[filename] as f:
            return sum[1 for line in f]
    talktime[filename, 'sum1forline', sum1forline]
    
    def lenopenreadlines[filename]:
        with open[filename] as f:
            return len[f.readlines[]]
    talktime[filename, 'lenopenreadlines', lenopenreadlines]
    
    def lenpd[filename]:
        return len[pd.read_csv[filename]] + 1
    talktime[filename, 'lenpd', lenpd]
    
    def csvreaderfor[filename]:
        cnt = 0
        with open[filename] as f:
            cr = csv.reader[f]
            for row in cr:
                cnt += 1
        return cnt
    talktime[filename, 'csvreaderfor', csvreaderfor]
    
    def openenum[filename]:
        cnt = 0
        with open[filename] as f:
            for i, line in enumerate[f,1]:
                cnt += 1
        return cnt
    talktime[filename, 'openenum', openenum]
    

    The result was below.

    # sum1forline
    Elapsed time :  0.6327946722068599
    n =  2528244
    
    
    # lenopenreadlines
    Elapsed time :  0.655304473598555
    n =  2528244
    
    
    # lenpd
    Elapsed time :  0.7561274056295324
    n =  2528244
    
    
    # csvreaderfor
    Elapsed time :  1.5571560935772661
    n =  2528244
    
    
    # openenum
    Elapsed time :  0.773000013928679
    n =  2528244
    

    In conclusion, sum[1 for line in f] is fastest. But there might not be significant difference from len[f.readlines[]].

    sample_submission.csv is 30.2MB and has 31 million characters.

    How do I count the number of rows in a csv file?

    Using len[] function Under this method, we need to read the CSV file using pandas library and then use the len[] function with the imported CSV file, which will return an int value of a number of lines/rows present in the CSV file.

    How do I find the number of rows and columns in a csv file in Python?

    To get the number of rows, and columns we can use len[df. axes[]] function in Python.

    How do you count rows in CSV using pandas?

    Get the number of rows in a Pandas DataFrame.
    Using .shape [0] The .shape property gives you the shape of the dataframe in form of a [rows, column] tuple. ... .
    Using the len[] function. You can also use the built-in python len[] function to determine the number of rows..

    How do you count lines in Python?

    Use readlines[] to get Line Count This is the most straightforward way to count the number of lines in a text file in Python. The readlines[] method reads all lines from a file and stores it in a list. Next, use the len[] function to find the length of the list which is nothing but total lines present in a file.

    Chủ Đề