From a file, i have taken a line, split the line into 5 columns using split[]
. But i have to write those columns as tab separated values in an output file.
Lets say that i have l[1], l[2], l[3], l[4], l[5]
...a total of 5 entries. How can i achieve this using python? And also, i am not able to write l[1], l[2], l[3], l[4], l[5]
values to an output file.
I tried both these codes, both not working[i am using python 2.6]:
code 1:
with open['output', 'w']:
print l[1], l[2], l[3], l[4], l[5] > output
code 2:
with open['output', 'w'] as outf:
outf.write[l[1], l[2], l[3], l[4], l[5]]
$\begingroup$
I am working on a project using a fasta file. I am writing my command in nano within command-line and executing using python, also within my command-line.
I would like my command to provide me with a tab delimited file with three columns: first column should contain my sequence name, second column should provide me with my sequence length, and the third column should show the sequence itself.
I have written the following command so far within nano:
from Bio import SeqIO
import sys
for hello_fasta in SeqIO.parse[sys.argv[1], "fasta"]:
list = hello_fasta.split["\t"]
print hello_fasta.description
print [len[hello_fasta.seq]]
For example, I would like my command to provide me with the desired output and with the following order: Gene name ; Gene length ; Gene seq
H0192X 26 FORUWOHRPPTRWFAWWEAKJNFWEJ
asked Nov 14, 2020 at 21:35
$\endgroup$
$\begingroup$
You can use a list and insert[]
to add an element in a specific order, then expand the list with *
. Or you can use join[]
.
from Bio import SeqIO
import sys
for hello_fasta in SeqIO.parse[sys.argv[1], "fasta"]:
sequences = []
sequences.insert[0, hello_fasta.description]
sequences.insert[1, len[hello_fasta.seq]]
sequences.insert[2, hello_fasta.seq]
# option 1
print[*sequences, sep='\t']
# option 2
print['\t'.join[map[str, sequences]]]
answered Nov 16, 2020 at 14:31
zorbaxzorbax
7594 silver badges11 bronze badges
$\endgroup$
$\begingroup$
Here's a solution using pandas
if you want to save the tsv:
from Bio import SeqIO
import pandas as pd
from io import StringIO
example = """
>seq0
FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTDDLVCLVYRTDQAQDVKKIEKF
>seq1
KYRTWEEFTRAAEKLYQADPMKVRVVLKYRHCDGNLCIKVTDDVVCLLYRTDQAQDVKKIEKFHSQLMRLMELKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM
>seq2
EEYQTWEEFARAAEKLYLTDPMKVRVVLKYRHCDGNLCMKVTDDAVCLQYKTDQAQDVKKVEKLHGK
>seq3
MYQVWEEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVCLQYKTDQAQDVK
>seq4
EEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVVSYEMRLFGVQKDNFALEHSLL
>seq5
SWEEFAKAAEVLYLEDPMKCRMCTKYRHVDHKLVVKLTDNHTVLKYVTDMAQDVKKIEKLTTLLMR
>seq6
FTNWEEFAKAAERLHSANPEKCRFVTKYNHTKGELVLKLTDDVVCLQYSTNQLQDVKKLEKLSSTLLRSI
>seq7
SWEEFVERSVQLFRGDPNATRYVMKYRHCEGKLVLKVTDDRECLKFKTDQAQDAKKMEKLNNIFF
>seq8
SWDEFVDRSVQLFRADPESTRYVMKYRHCDGKLVLKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM
>seq9
KNWEDFEIAAENMYMANPQNCRYTMKYVHSKGHILLKMSDNVKCVQYRAENMPDLKK
>seq10
FDSWDEFVSKSVELFRNHPDTTRYVVKYRHCEGKLVLKVTDNHECLKFKTDQAQDAKKMEK
"""
# This example just happens to be a string, just load your
# fasta file using the method you're already using
example_records = SeqIO.parse[ StringIO[example], 'fasta']
# Dictionary to hold the data you eventually want in the tsv
data = {"Gene name" : list[],
"Gene length" : list[],
"Gene seq" : list[]}
# Append the necessary into the data dictionary
for record in example_records:
data['Gene name'].append[record.description]
data['Gene length'].append[len[record.seq]]
data['Gene seq'].append[str[record.seq]]
# Convert your data into a pandas DataFrame and save as a tsv
gene_df = pd.DataFrame[data]
gene_df.to_csv["gene_info.tsv", sep = '\t', index = False]
This results in a tsv that looks like this:
$ head gene_info.tsv
Gene name Gene length Gene seq
seq0 62 FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTDDLVCLVYRTDQAQDVKKIEKF
seq1 106 KYRTWEEFTRAAEKLYQADPMKVRVVLKYRHCDGNLCIKVTDDVVCLLYRTDQAQDVKKIEKFHSQLMRLMELKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM
seq2 67 EEYQTWEEFARAAEKLYLTDPMKVRVVLKYRHCDGNLCMKVTDDAVCLQYKTDQAQDVKKVEKLHGK
seq3 58 MYQVWEEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVCLQYKTDQAQDVK
seq4 62 EEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVVSYEMRLFGVQKDNFALEHSLL
seq5 66 SWEEFAKAAEVLYLEDPMKCRMCTKYRHVDHKLVVKLTDNHTVLKYVTDMAQDVKKIEKLTTLLMR
seq6 70 FTNWEEFAKAAERLHSANPEKCRFVTKYNHTKGELVLKLTDDVVCLQYSTNQLQDVKKLEKLSSTLLRSI
seq7 65 SWEEFVERSVQLFRGDPNATRYVMKYRHCEGKLVLKVTDDRECLKFKTDQAQDAKKMEKLNNIFF
seq8 68 SWDEFVDRSVQLFRADPESTRYVMKYRHCDGKLVLKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM
Hopefully this helps!
answered Feb 7, 2021 at 21:21
$\endgroup$