Hướng dẫn python read multiline csv

Hướng dẫn python read multiline csv
Raymond

articleArticles 543

imageDiagrams 46

codeCode 2

chat_bubble_outlineThreads 8

commentComments 236

loyaltyKontext Points 5946

account_circle Profile

visibility 8,328 event 2020-05-02 access_time 2 years ago language English

more_vert

CSV is a common data format used in many applications. It's also a common task for data workers to read and parse CSV and then save it into another storage such as RDBMS (Teradata, SQL Server, MySQL). In my previous article PySpark Read Multiple Lines Records from CSV I demonstrated how to use PySpark to read CSV as a data frame. This article will show you several approaches to read CSV files directly using Python (without Spark APIs).

CSV data file

The CSV file I'm going to load is the same as the one in the previous example. The file is named as data.csv with the following content:

ID,Text1,Text2
1,Record 1,Hello World!
2,Record 2,Hello Hadoop!
3,Record 3,"Hello 
Kontext!"
4,Record 4,Hello!

There are 4 records and three columns. One record's content is across multiple line. 

Environment 

All the following code snippets runs on a Windows 10 machine with Python 3.8.2 64bit. It should work on other platforms but I have not tested it. Please bear this in mind. 

Use built-in csv module

csv module can be used to read CSV files directly. It can be used to both read and write CSV files. 

Refer to official docs about this module. 

Sample code

import csv

file_path = 'data.csv'

with open(file_path, newline='', encoding='utf-8') as f:
    reader = csv.reader(f, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    for row in reader:
        print(row)

The above code snippet reads CSV with all default options and it can handle multi-line CSV automatically.

The output looks like this:

['ID', 'Text1', 'Text2']
['1', 'Record 1', 'Hello World!']
['2', 'Record 2', 'Hello Hadoop!']
['3', 'Record 3', 'Hello \r\nKontext!']
['4', 'Record 4', 'Hello!']

Use Pandas

Pandas has API to read CSV file as a data frame directly.

Read this document for all the parameters: pandas.read_csv.

Sample code

import pandas as pd
file_path = 'data.csv'
pdf = pd.read_csv(file_path)
print(pdf)

For the sample CSV files, by default it can handle it properly. If your CSV structure/content is different, you can customize the API call.

The output looks like the following:

   ID     Text1               Text2
0   1  Record 1        Hello World!
1   2  Record 2       Hello Hadoop!
2   3  Record 3  Hello \r\nKontext!
3   4  Record 4              Hello!

For Pandas dataframe, you can also write the results into a database directly via to_sql function.

Within this short tutorial we will show you the steps required to create a csv file that includes multi-line cells within Python.

Nội dung chính

  • Want to become a programming expert?
  • Want to become a programming expert?
  • Not the answer you're looking for? Browse other questions tagged python or ask your own question.
  • How do I create a multiLine in CSV?
  • Can CSV have multiple lines?
  • Can we insert multiple rows in CSV file using Python CSV module?
  • How do you write values in a CSV file in Python?

In order to create a mutliline cell within Excel (2010/2013) the following is required:

  1. newlines are represented with a carriage return/newline,
  2. the string should be wrapped in quotes.

One of the great things with the Python csv module is it takes care of the quotes for you. Below provides an example,

#!/usr/bin/env python2.7

import csv

multiline_string = "this\nis\nsome\ntext"                # assign string
multiline_string = multiline_string.replace('\n','\r\n') # convert newlines to newlines+carriage return

with open('xyz.csv', 'wb') as outfile:
      w = csv.writer(outfile)                            # assign csv writer method
      w.writerow(['sometext',multiline_string])          # append/write row to file

Want to become a programming expert?

Here is our hand-picked selection of the best courses you can find online:
Python Zero to Hero course
Python Pro Bootcamp
Bash Scripting and Shell Programming course
Automate with Shell Scripting course
The Complete Web Development Bootcamp course
and our recommended certification practice exams:
AlphaPrep Practice Tests - Free Trial

Within this short tutorial we will show you the steps required to create a csv file that includes multi-line cells within Python.

Nội dung chính

  • Want to become a programming expert?
  • Not the answer you're looking for? Browse other questions tagged python or ask your own question.
  • How do I create a multiLine in CSV?
  • Can CSV have multiple lines?
  • Can we insert multiple rows in CSV file using Python CSV module?
  • How do you write values in a CSV file in Python?

In order to create a mutliline cell within Excel (2010/2013) the following is required:

  1. newlines are represented with a carriage return/newline,
  2. the string should be wrapped in quotes.

One of the great things with the Python csv module is it takes care of the quotes for you. Below provides an example,

#!/usr/bin/env python2.7

import csv

multiline_string = "this\nis\nsome\ntext"                # assign string
multiline_string = multiline_string.replace('\n','\r\n') # convert newlines to newlines+carriage return

with open('xyz.csv', 'wb') as outfile:
      w = csv.writer(outfile)                            # assign csv writer method
      w.writerow(['sometext',multiline_string])          # append/write row to file

Want to become a programming expert?

Here is our hand-picked selection of the best courses you can find online:
Python Zero to Hero course
Python Pro Bootcamp
Bash Scripting and Shell Programming course
Automate with Shell Scripting course
The Complete Web Development Bootcamp course
and our recommended certification practice exams:
AlphaPrep Practice Tests - Free Trial

Hi I am using this method to write a csv file from a csv file which is a hashed code but i only receive the last row in output, how can i add each row to the previous one?

import hashlib
import csv
d = dict()
result = ()
for i in range(0 , 9999) :
    n = hashlib.sha256(str(i).encode())
    d[n.hexdigest()] = str(i)
with open('/Users/MJ-Mac/Desktop/karname.txt') as f:
    file = csv.reader(f)
    for row in file :
        a = row[0]
        b = d[row[1]]
        result = (a , b)
        with open('/Users/MJ-Mac/Desktop/result3.txt', 'w') as f2:
            file2 = csv.writer(f2)
            file2.writerow(result)

asked Nov 26, 2021 at 17:32

3

Other answers have suggested replacing 'w' with 'a', this is not necessary when working with csv.writer. It also could make your file grow everytime you run the program.

Instead of reopening and closing relut3.txt, keep it open and use just one writer

import hashlib
import csv
d = dict()
result = ()
for i in range(0 , 9999) :
    n = hashlib.sha256(str(i).encode())
    d[n.hexdigest()] = str(i)
with open('/Users/MJ-Mac/Desktop/result3.txt', 'w') as result_file:
    result_writer = csv.writer(result_file)  # only create this once
    
    with open('/Users/MJ-Mac/Desktop/karname.txt') as f:
        file = csv.reader(f)
        for row in file :
            a = row[0]
            b = d[row[1]]
            result = (a , b)

            result_writer.writerow(result)  # use the already created writer

answered Nov 26, 2021 at 17:42

Scrapper142Scrapper142

5011 gold badge3 silver badges10 bronze badges

Your code is writing one line to the file, then it opens the file again and writes the next line as the full content of the program. So on the second run through the loop only the second line will be in the file.

import hashlib
import csv
d = dict()
result = ()
for i in range(0 , 9999) :
    n = hashlib.sha256(str(i).encode())
    d[n.hexdigest()] = str(i)
with open('/Users/MJ-Mac/Desktop/karname.txt') as f:
    file = csv.reader(f)
    for row in file :
        a = row[0]
        b = d[row[1]]
        result = (a , b)
        with open('/Users/MJ-Mac/Desktop/result3.txt', 'a') as f2:
            file2 = csv.writer(f2)
            file2.writerow(result)

It might be better to open the file and then write everything to it:

import hashlib
import csv
d = dict()
result = ()
for i in range(0 , 9999) :
    n = hashlib.sha256(str(i).encode())
    d[n.hexdigest()] = str(i)
with open('/Users/MJ-Mac/Desktop/karname.txt') as f:
    file = csv.reader(f)
    with open('/Users/MJ-Mac/Desktop/result3.txt', 'w') as f2:
        file2 = csv.writer(f2)
        for row in file :
            a = row[0]
            b = d[row[1]]
            result = (a , b)
            
            file2.writerow(result)

answered Nov 26, 2021 at 17:35

jhylandsjhylands

8548 silver badges15 bronze badges

I can't run this myself, since your data is not included.

However, I think your problem is that with open('/Users/MJ-Mac/Desktop/result3.txt', 'w') has the "w" flag -- which stands for "write" -- so your data is being overwritten, You might instead need the "a" flag for "append," so that each line will be appended to the data you are exporting.

import hashlib
import csv
d = dict()
result = ()
for i in range(0 , 9999) :
    n = hashlib.sha256(str(i).encode())
    d[n.hexdigest()] = str(i)
with open('/Users/MJ-Mac/Desktop/karname.txt') as f:
    file = csv.reader(f)
    for row in file :
        a = row[0]
        b = d[row[1]]
        result = (a , b)
        with open('/Users/MJ-Mac/Desktop/result3.txt', 'a') as f2:
            file2 = csv.writer(f2)
            file2.writerow(result)

answered Nov 26, 2021 at 17:35

genericgeneric

1851 silver badge13 bronze badges

It is easier and more readable to open both the input and output files at once, and initialise the CSV reader and writer at the start:

with open('/Users/MJ-Mac/Desktop/karname.txt') as in_file, open('/Users/MJ-Mac/Desktop/result3.txt', 'w') as out_file:
    output = csv.writer(out_file)
    for row in csv.reader(in_file):
        output.writerow([row[0], d[row[1]])
 

answered Nov 26, 2021 at 17:46

StuartStuart

8,6981 gold badge18 silver badges30 bronze badges

Not the answer you're looking for? Browse other questions tagged python or ask your own question.

How do I create a multiLine in CSV?

If you want to write multiple lines in csv file, you can try the sample below. string filePath = @"d:\test. csv"; String delim = ","; StringBuilder outString=new StringBuilder(); outString. Append("\""); //double quote outString.

Can CSV have multiple lines?

In order to process the CSV file with values in rows scattered across multiple lines, use option("multiLine",true) .

Can we insert multiple rows in CSV file using Python CSV module?

How to Write Multiple Rows into a CSV File in Python. In Python, you can use the CSV writer's writerows() function to write multiple rows into a CSV file on the same go.

How do you write values in a CSV file in Python?

Python Write CSV File.

First, open the CSV file for writing ( w mode) by using the open() function..

Second, create a CSV writer object by calling the writer() function of the csv module..

Third, write data to CSV file by calling the writerow() or writerows() method of the CSV writer object..