How do i count the number of rows in a csv file in python?

View Discussion

Improve Article

Save Article

  • Read
  • Discuss
  • View Discussion

    Improve Article

    Save Article

    CSV (Comma Separated Values) is a simple fileformat used to store tabular data, such as a spreadsheet or database. A CSV file stores tabular data (numbers and text) in plain text. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The use of the comma as a field separator is the source of the name for this file format.

    In this article, we are going to discuss various approaches to count the number of lines in a CSV file using Python.

    We are going to use the below dataset to perform all operations:

    Python3

    import pandas as pd

    results = pd.read_csv('Data.csv')

    print(results)

    Output:

    How do i count the number of rows in a csv file in python?

    To count the number of lines/rows present in a CSV file, we have two different types of methods:

    • Using len() function.
    • Using a counter.

    Using len() function

    Under this method, we need to read the CSV file using pandas library and then use the len() function with the imported CSV file, which will return an int value of a number of lines/rows present in the CSV file.

    Python3

    import pandas as pd

    results = pd.read_csv('Data.csv')

    print("Number of lines present:-"

          len(results))

    Output:

    Using a counter

    Under this approach, we will be initializing an integer rowcount to -1 (not 0 as iteration will start from the heading and not the first row)at the beginning and iterate through the whole file and incrementing the rowcount by one. And in the end, we will be printing the rowcount value.

    Python3

    rowcount  = 0

    for row in open("Data.csv"):

      rowcount+= 1

    print("Number of lines present:-", rowcount)

    Output:


    2018-10-29 EDIT

    Thank you for the comments.

    I tested several kinds of code to get the number of lines in a csv file in terms of speed. The best method is below.

    with open(filename) as f:
        sum(1 for line in f)
    

    Here is the code tested.

    import timeit
    import csv
    import pandas as pd
    
    filename = './sample_submission.csv'
    
    def talktime(filename, funcname, func):
        print(f"# {funcname}")
        t = timeit.timeit(f'{funcname}("{filename}")', setup=f'from __main__ import {funcname}', number = 100) / 100
        print('Elapsed time : ', t)
        print('n = ', func(filename))
        print('\n')
    
    def sum1forline(filename):
        with open(filename) as f:
            return sum(1 for line in f)
    talktime(filename, 'sum1forline', sum1forline)
    
    def lenopenreadlines(filename):
        with open(filename) as f:
            return len(f.readlines())
    talktime(filename, 'lenopenreadlines', lenopenreadlines)
    
    def lenpd(filename):
        return len(pd.read_csv(filename)) + 1
    talktime(filename, 'lenpd', lenpd)
    
    def csvreaderfor(filename):
        cnt = 0
        with open(filename) as f:
            cr = csv.reader(f)
            for row in cr:
                cnt += 1
        return cnt
    talktime(filename, 'csvreaderfor', csvreaderfor)
    
    def openenum(filename):
        cnt = 0
        with open(filename) as f:
            for i, line in enumerate(f,1):
                cnt += 1
        return cnt
    talktime(filename, 'openenum', openenum)
    

    The result was below.

    # sum1forline
    Elapsed time :  0.6327946722068599
    n =  2528244
    
    
    # lenopenreadlines
    Elapsed time :  0.655304473598555
    n =  2528244
    
    
    # lenpd
    Elapsed time :  0.7561274056295324
    n =  2528244
    
    
    # csvreaderfor
    Elapsed time :  1.5571560935772661
    n =  2528244
    
    
    # openenum
    Elapsed time :  0.773000013928679
    n =  2528244
    

    In conclusion, sum(1 for line in f) is fastest. But there might not be significant difference from len(f.readlines()).

    sample_submission.csv is 30.2MB and has 31 million characters.

    How do I count the number of rows in a csv file?

    Using len() function Under this method, we need to read the CSV file using pandas library and then use the len() function with the imported CSV file, which will return an int value of a number of lines/rows present in the CSV file.

    How do I find the number of rows and columns in a csv file in Python?

    To get the number of rows, and columns we can use len(df. axes[]) function in Python.

    How do you count rows in CSV using pandas?

    Get the number of rows in a Pandas DataFrame.
    Using .shape [0] The .shape property gives you the shape of the dataframe in form of a (rows, column) tuple. ... .
    Using the len() function. You can also use the built-in python len() function to determine the number of rows..

    How do you count lines in Python?

    Use readlines() to get Line Count This is the most straightforward way to count the number of lines in a text file in Python. The readlines() method reads all lines from a file and stores it in a list. Next, use the len() function to find the length of the list which is nothing but total lines present in a file.