programming python

How do i count the number of columns in a csv file in python?

My program needs to read csv files which may have 1,2 or 3 columns, and it needs to modify its behaviour accordingly. Is there a simple way to check the number of columns without "consuming" a row before the iterator runs? The following code is the most elegant I could manage, but I would prefer to run the check before the for loop starts:

import csv
f = 'testfile.csv'
d = '\t'

reader = csv.reader[f,delimiter=d]
for row in reader:
    if reader.line_num == 1: fields = len[row]
    if len[row] != fields:
        raise CSVError["Number of fields should be %s: %s" % [fields,str[row]]]
    if fields == 1:
        pass
    elif fields == 2:
        pass
    elif fields == 3:
        pass
    else:
        raise CSVError["Too many columns in input file."]

Edit: I should have included more information about my data. If there is only one field, it must contain a name in scientific notation. If there are two fields, the first must contain a name, and the second a linking code. If there are three fields, the additional field contains a flag which specifies whether the name is currently valid. Therefore if any row has 1, 2 or 3 columns, all must have the same.

So you are working on a number of different data analytics projects, and as part of some of them, you are bringing data in from a CSV file.

One area you may want to look at is How to Compare Column Headers in CSV to a List in Python, but that could be coupled with this outputs of this post.

As part of the process if you are manipulating this data, you need to ensure that all of it was loaded without failure.

With this in mind, we will look to help you with a possible automation task to ensure that:

[A] All rows and columns are totalled on loading of a CSV file.

[B] As part of the process, if the same dataset is exported, the total on the export can be counted.

[C] This ensures that all the required table rows and columns are always available.

Python Code that will help you with this

So in the below code, there are a number of things to look at.

Lets look at the CSV file we will read in:

In total there are ten rows with data. The top row is not included in the count as it is deemed a header row. There are also seven columns.

This first bit just reads in the data, and it automatically skips the header row.

import pandas as pd

df = pd.read_csv["csv_import.csv"] #===> reads in all the rows, but skips the first one as it is a header.


Output with first line used:
Number of Rows: 10
Number of Columns: 7

Next it creates two variables that count the no of rows and columns and prints them out.

Note it used the df.axes to tell python to not look at the individual cells.

total_rows=len[df.axes[0]] #===> Axes of 0 is for a row
total_cols=len[df.axes[1]] #===> Axes of 1 is for a column
print["Number of Rows: "+str[total_rows]]
print["Number of Columns: "+str[total_cols]]

And bringing it all together

import pandas as pd

df = pd.read_csv["csv_import.csv"] #===> reads in all the rows, but skips the first one as it is a header.

total_rows=len[df.axes[0]] #===> Axes of 0 is for a row
total_cols=len[df.axes[1]] #===> Axes of 0 is for a column
print["Number of Rows: "+str[total_rows]]
print["Number of Columns: "+str[total_cols]]

Output:
Number of Rows: 10
Number of Columns: 7

In summary, this would be very useful if you are trying to reduce the amount of manual effort in checking the population of a file.

As a result it would help with:

[A] Scripts that process data doesn’t remove rows or columns unnecessarily.

[B] Batch runs who know the size of a dataset in advance of processing can make sure they have the data they need.

[C] Control logs – databases can store this data to show that what was processed is correct.

[D] Where an automated run has to be paused, this can help with identifying the problem and then quickly fixing.

[E] Finally if you are receiving agreed data from a third party it can be used to alert them of too much or too little information was received.

Here is another post you should read!

How to change the headers on a CSV file

Pandas allow us to get the shape of the Dataframe by counting the numbers of rows and columns in the Dataframe. You can try various approaches to know How to count the number of rows and columns in a Pandas.

Example:

Input: {'name':          ['Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura'],
        'score':         [98, 80, 60, 85, 49, 92],
        'age':           [20, 25, 22, 24, 21, 20],
        'qualify_label': ['yes', 'yes', 'no','yes', 'no', 'yes']}

Output: Number of Rows: 6
        Number of Columns: 4

Count the number of rows and columns of Dataframe using len[df.axes[]] function

Let’s take an example of a Dataframe that consists of data on exam results of students. To get the number of rows, and columns we can use len[df.axes[]] function in Python.

Python3

import pandas as pd

result_data = {'name': ['Katherine', 'James', 'Emily',

'Michael', 'Matthew', 'Laura'],

'score': [98, 80, 60, 85, 49, 92],

'age': [20, 25, 22, 24, 21, 20],

'qualify_label': ['yes', 'yes', 'no',

'yes', 'no', 'yes']}

df = pd.DataFrame[result_data, index=None]

rows = len[df.axes[0]]

cols = len[df.axes[1]]

print[df]

print["Number of Rows: ", rows]

print["Number of Columns: ", cols]

Output :

Count the number of rows and columns of Dataframe using info[] function

Pandas dataframe.info[] function is used to get a concise summary of the Dataframe. Here we can see that we get a summary detail of the Dataframe that contains the number of rows and columns.

Python3

import pandas as pd

df = pd.DataFrame[{'name': ['Katherine', 'James', 'Emily',

'Michael', 'Matthew', 'Laura'],

'score': [98, 80, 60, 85, 49, 92],

'age': [20, 25, 22, 24, 21, 20],

'qualify_label': ['yes', 'yes', 'no',

'yes', 'no', 'yes']}]

print[df.info[]]

Output:

RangeIndex: 6 entries, 0 to 5
Data columns [total 4 columns]:
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   name           6 non-null      object
 1   score          6 non-null      int64 
 2   age            6 non-null      int64 
 3   qualify_label  6 non-null      object
dtypes: int64[2], object[2]
memory usage: 320.0+ bytes
None

Count the number of rows and columns of Dataframe using len[] function.

The len[] function returns the length rows of the Dataframe, we can filter a number of columns using the df.columns to get the count of columns.

Python3

import pandas as pd

df = pd.DataFrame[{'name': ['Katherine', 'James', 'Emily',

'Michael', 'Matthew', 'Laura'],

'score': [98, 80, 60, 85, 49, 92],

'age': [20, 25, 22, 24, 21, 20],

'qualify_label': ['yes', 'yes', 'no',

'yes', 'no', 'yes']}]

print[len[df]]

print[len[df.columns]]

Output:

6
4

Count the number of rows and columns of Dataframe using shape.

Here, we will try a different approach for calculating rows and columns of a Dataframe of the imported CSV file, and counting the rows and columns using df.shape.

Python3

Output :

Count the number of rows and columns of Dataframe using the size

The size returns multiple rows and columns. i.e Here, the number of rows is 6, and the number of columns is 4 so the multiple rows and columns will be 6*4=24.

Python3

import pandas as pd

df = pd.DataFrame[{'name': ['Katherine', 'James', 'Emily',

'Michael', 'Matthew', 'Laura'],

'score': [98, 80, 60, 85, 49, 92],

'age': [20, 25, 22, 24, 21, 20],

'qualify_label': ['yes', 'yes', 'no',

'yes', 'no', 'yes']}]

print[df.size]

Output:

Count the number of rows of a Pandas Dataframe using count[] and index.

Using count[] and index we can get the number of rows present in the Dataframe.

Python3

import pandas as pd

df = pd.DataFrame[{'name': ['Katherine', 'James', 'Emily',

'Michael', 'Matthew', 'Laura'],

'score': [98, 80, 60, 85, 49, 92],

'age': [20, 25, 22, 24, 21, 20],

'qualify_label': ['yes', 'yes', 'no',

'yes', 'no', 'yes']}]

print[df[df.columns[0]].count[]]

print[len[df.index]]

Output:

6
6

How do I find the number of columns in a CSV file in Python?

To get the number of rows, and columns we can use len[df. axes[]] function in Python.

How do I count columns in a CSV file?

All what has left is to simply use wc command to count number of characters. The file has 5 columns. In case you wonder why there are only 4 commas and wc -l returned 5 characters it is because wc also counted \n the carriage return as an extra character.

How do I count data in a CSV file in Python?

Use len[] and list[] on a CSV reader to count lines in a CSV file.

Open the CSV file within Python using the open[file] function with file as a CSV file..

Create a CSV reader by calling the function csv. ... .

Get a list representation of the CSV file by calling list[[*args]] with *args as the reader from the previous step..

How do I count data in a CSV file?

Using len[] function Under this method, we need to read the CSV file using pandas library and then use the len[] function with the imported CSV file, which will return an int value of a number of lines/rows present in the CSV file.

Python Code that will help you with this

And bringing it all together

In summary, this would be very useful if you are trying to reduce the amount of manual effort in checking the population of a file.

Here is another post you should read!

Count the number of rows and columns of Dataframe using len[df.axes[]] function

Python3

Count the number of rows and columns of Dataframe using info[] function

Python3

Count the number of rows and columns of Dataframe using len[] function.

Python3

Count the number of rows and columns of Dataframe using shape.

Python3

Count the number of rows and columns of Dataframe using the size

Python3

Count the number of rows of a Pandas Dataframe using count[] and index.

Python3

How do I find the number of columns in a CSV file in Python?

How do I count columns in a CSV file?

How do I count data in a CSV file in Python?

How do I count data in a CSV file?

Bài Viết Liên Quan

Toplist mới

Bài mới nhất

Chủ Đề