由於語法渲染問題而影響閱讀體驗, 請移步博客閱讀~
本文GitPage地址

Biopython

1. Quick Start

  1. from Bio.Seq import reverse_complement, transcribe, back_transcribe, translate
  2. ## echo a seq
  3. my_string = "GCTGTTATGGGTCGTTGGAAGGGTGGTCGTGCTGCTGGTTAG"
  4. ##Revers complement
  5. reverse_complement(my_string)
  6. ## { 'CTAACCAGCAGCACGACCACCCTTCCAACGACCCATAACAGC' }
  7. ##DNA to RNA
  8. transcribe(my_string)
  9. ##--> { 'GCUGUUAUGGGUCGUUGGAAGGGUGGUCGUGCUGCUGGUUAG' }
  10. ##RNA to DNA
  11. back_transcribe(my_string) --> { 'GCTGTTATGGGTCGTTGGAAGGGTGGTCGTGCTGCTGGTTAG' }
  12. ##DNA to protine
  13. translate(my_string) --> { 'AVMGRWKGGRAAG*' }

2. Sequences

2.1 reading FASTA file

  1. from Bio import SeqIO
  2. Seq1='GSE44995_Reference_assembled_isotig_seq.fna'
  3. for seq_record in SeqIO.parse(Seq1, "fasta"):
  4. print(seq_record.id)
  5. print(repr(seq_record.seq))
  6. print(len(seq_record.seq))
  7. ## print by fasta formate
  8. Seq10=''
  9. for seq_record in SeqIO.parse(Seq1, "fasta"):
  10. if len(seq_record.seq) < 100:
  11. Seq10 += ">"+str(seq_record.id)+"\n"
  12. Seq10 += str(seq_record.seq)+"\n"
  13. print(seq_record.id)
  14. print(repr(seq_record.seq))
  15. print(len(seq_record.seq))

2.2 Seq run as string

  1. from Bio.Seq import Seq
  2. from Bio.Alphabet import IUPAC
  3. my_seq = Seq("GATCG", IUPAC.unambiguous_dna)
  4. for index, letter in enumerate(my_seq):
  5. print("%i %s" % (index, letter))
  6. '''
  7. print result
  8. 0 G
  9. 1 A
  10. 2 T
  11. 3 C
  12. 4 G
  13. '''
  14. print(my_seq[0]) #/ print(my_seq[-1])
  15. Seq("AAAA").count("AA")
  16. len(my_seq)
  17. ## convert to str
  18. my_seq = str(my_seq)

Result

46.875

  1. <a name="lvo99"></a>
  2. ### 2.4 Slicing a sequence
  3. <br />
  4. ```python
  5. my_seq[4:12]
  6. #####
  7. the first, second and third codon positions of this DNA sequence:
  8. #####
  9. my_seq[0::3] # Seq('GCTGTAGTAAG', IUPACUnambiguousDNA())
  10. my_seq[1::3] # Seq('AGGCATGCATC', IUPACUnambiguousDNA())
  11. my_seq[2::3] # Seq('TAGCTAAGAC', IUPACUnambiguousDNA())

2.5 revers string

2.6 Changing Font case

3 Bio-information

3.1 Revers Complement

Seq(“ACGTCGTAGCTAC”).complement() # standard example => output: Seq(‘TGCAGCATCGATG’)
Seq(“ACGTCGTAGCTAC”).reverse_complement() # output => Seq(‘GTAGCTACGACGT’)

  1. <a name="o4Bl4"></a>
  2. ### 3.2 Translation
  3. ```python
  4. from Bio.Seq import Seq
  5. from Bio.Alphabet import IUPAC
  6. Seq("UUU", IUPAC.unambiguous_rna).translate() # for RNA
  7. Seq("TTT", IUPAC.unambiguous_dna).translate() # for DNA

3.3 Translation Tables

  1. from Bio.Data import CodonTable
  2. standard_table = CodonTable.unambiguous_dna_by_name['Standard']
  3. mito_table = CodonTable.unambiguous_dna_by_name['Vertebrate Mitochondrial']
  4. ##same as:
  5. standard_table = CodonTable.unambiguous_dna_by_id[1]
  6. mito_table = CodonTable.unambiguous_dna_by_id[2]
  7. mito_table.stop_codons #--> { ['TAA', 'TAG', 'AGA', 'AGG'] }
  8. mito_table.start_codons # --> { ['ATT', 'ATC', 'ATA', 'ATG', 'GTG'] }
  9. mito_table.forward_table["ACG"] #--> { 'T' }
  10. print(standard_table)
  1. | T | C | A | G |
  2. --+---------+---------+---------+---------+--
  3. T | TTT F | TCT S | TAT Y | TGT C | T
  4. T | TTC F | TCC S | TAC Y | TGC C | C
  5. T | TTA L | TCA S | TAA Stop| TGA Stop| A
  6. T | TTG L(s)| TCG S | TAG Stop| TGG W | G
  7. --+---------+---------+---------+---------+--
  8. C | CTT L | CCT P | CAT H | CGT R | T
  9. C | CTC L | CCC P | CAC H | CGC R | C
  10. C | CTA L | CCA P | CAA Q | CGA R | A
  11. C | CTG L(s)| CCG P | CAG Q | CGG R | G
  12. --+---------+---------+---------+---------+--
  13. A | ATT I | ACT T | AAT N | AGT S | T
  14. A | ATC I | ACC T | AAC N | AGC S | C
  15. A | ATA I | ACA T | AAA K | AGA R | A
  16. A | ATG M(s)| ACG T | AAG K | AGG R | G
  17. --+---------+---------+---------+---------+--
  18. G | GTT V | GCT A | GAT D | GGT G | T
  19. G | GTC V | GCC A | GAC D | GGC G | C
  20. G | GTA V | GCA A | GAA E | GGA G | A
  21. G | GTG V | GCG A | GAG E | GGG G | G
  22. --+---------+---------+---------+---------+--

4. Alignment

1. Echo a test file

  1. ## run in bash
  2. echo ">TRINITY_DN106095_c2_g1_i2
  3. MSRIMKVFLFLAVMVCISEAQLHAQCLCPRVRSRISSMTDIREVQIYEATIFCDRMEIVVTNDSGLRYCLNPKLKAVQKLLTAMKPKTSTTARPTVHSSSTGSTNTARM
  4. >TRINITY_DN92154_c0_g1_i1
  5. DIHVRRRTLTRSKTLGRSTNVNKMKLCILLMLGTLLVLVYGMPPISRDYNTHCRCLQVESRIIPPNSLKSIKLVPEGPHCPDMEVIAGLSNGEKVCLNPRSSWVKKLVNFVLEKQQGGALPKNQGQ" > test.fa

2. Align

  1. ## run in python
  2. from Bio import pairwise2
  3. from Bio.Seq import Seq
  4. from Bio.pairwise2 import format_alignment
  5. from Bio.SubsMat import MatrixInfo
  6. seq1 = Seq("MSRIMKVFLFLAVMVCISEAQLHAQCLCPRVRSRISSMTDIREVQIYEATIFCDRMEIVVTNDSGLRYCLNPKLKAVQKLLTAMKPKTSTTARPTVHSSSTGSTNTARM")
  7. seq2 = Seq("DIHVRRRTLTRSKTLGRSTNVNKMKLCILLMLGTLLVLVYGMPPISRDYNTHCRCLQVESRIIPPNSLKSIKLVPEGPHCPDMEVIAGLSNGEKVCLNPRSSWVKKLVNFVLEKQQGGALPKNQGQ")
  8. alignments = pairwise2.align.globalxx(seq1, seq2)
  9. test_alignments = pairwise2.align.localds(seq1, seq2, MatrixInfo.blosum62, -10, -1)
  10. for alignment in alignments:
  11. print(format_alignment(*alignment))
  12. for alignment in test_alignments:
  13. print(format_alignment(*alignment))
  1. SingleLetterAlphabet() alignment with 6 rows and 65 columns
  2. MQNTPAERLPAIIEKAKSKHDINVWLLDRQGRDLLEQRVPAKVA...EGP B7RZ31_9GAMM/59-123
  3. AKQRGIAGLEEWLHRLDHSEAIPIFLIDEAGKDLLEREVPADIT...KKP A0A0C3NPG9_9PROT/58-119
  4. ARRHGQEYFQQWLERQPKKVKEQVFAVDQFGRELLGRPLPEDMA...KKP A0A143HL37_9GAMM/57-121
  5. TRRHGPESFRFWLERQPVEARDRIYAIDRSGAEILDRPIPRGMA...NKP A0A0X3UC67_9GAMM/57-121
  6. AINRNTQQLTQDLRAMPNWSLRFVYIVDRNNQDLLKRPLPPGIM...NRK B3PFT7_CELJU/62-126
  7. AVNATEREFTERIRTLPHWARRNVFVLDSQGFEIFDRELPSPVA...NRT K4KEM7_SIMAS/61-125

PDB file

Split the chain from PDF file

Cite: David Cain

  1. import os
  2. from Bio import PDB
  3. class ChainSplitter:
  4. def __init__(self, out_dir=None):
  5. """ Create parsing and writing objects, specify output directory. """
  6. self.parser = PDB.PDBParser()
  7. self.writer = PDB.PDBIO()
  8. if out_dir is None:
  9. out_dir = os.path.join(os.getcwd(), "chain_PDBs")
  10. self.out_dir = out_dir
  11. def make_pdb(self, pdb_path, chain_letters, overwrite=False, struct=None):
  12. """ Create a new PDB file containing only the specified chains.
  13. Returns the path to the created file.
  14. :param pdb_path: full path to the crystal structure
  15. :param chain_letters: iterable of chain characters (case insensitive)
  16. :param overwrite: write over the output file if it exists
  17. """
  18. chain_letters = [chain.upper() for chain in chain_letters]
  19. # Input/output files
  20. (pdb_dir, pdb_fn) = os.path.split(pdb_path)
  21. pdb_id = pdb_fn[3:7]
  22. out_name = "pdb%s_%s.ent" % (pdb_id, "".join(chain_letters))
  23. out_path = os.path.join(self.out_dir, out_name)
  24. print ("OUT PATH:",out_path)
  25. plural = "s" if (len(chain_letters) > 1) else "" # for printing
  26. # Skip PDB generation if the file already exists
  27. if (not overwrite) and (os.path.isfile(out_path)):
  28. print("Chain%s %s of '%s' already extracted to '%s'." %
  29. (plural, ", ".join(chain_letters), pdb_id, out_name))
  30. return out_path
  31. print("Extracting chain%s %s from %s..." % (plural,
  32. ", ".join(chain_letters), pdb_fn))
  33. # Get structure, write new file with only given chains
  34. if struct is None:
  35. struct = self.parser.get_structure(pdb_id, pdb_path)
  36. self.writer.set_structure(struct)
  37. self.writer.save(out_path, select=SelectChains(chain_letters))
  38. return out_path
  39. class SelectChains(PDB.Select):
  40. """ Only accept the specified chains when saving. """
  41. def __init__(self, chain_letters):
  42. self.chain_letters = chain_letters
  43. def accept_chain(self, chain):
  44. return (chain.get_id() in self.chain_letters)
  45. if __name__ == "__main__":
  46. """ Parses PDB id's desired chains, and creates new PDB structures. """
  47. import sys
  48. if not len(sys.argv) == 2:
  49. print( "Usage: $ python %s 'pdb.txt'" % __file__)
  50. sys.exit()
  51. pdb_textfn = sys.argv[1]
  52. pdbList = PDB.PDBList()
  53. splitter = ChainSplitter("") # Change me.
  54. with open(pdb_textfn) as pdb_textfile:
  55. for line in pdb_textfile:
  56. pdb_id = line[:4].lower()
  57. chain = line[4]
  58. pdb_fn = pdbList.retrieve_pdb_file(pdb_id)
  59. splitter.make_pdb(pdb_fn, chain)

Another example:

This one works fine for me

Biopython 的入門使用 - 图1

  1. from Bio.PDB import Select, PDBIO
  2. from Bio.PDB.PDBParser import PDBParser
  3. import sys
  4. pdb_file = sys.argv[1]
  5. class ChainSelect(Select):
  6. def __init__(self, chain):
  7. self.chain = chain
  8. def accept_chain(self, chain):
  9. if chain.get_id() == self.chain:
  10. return 1
  11. else:
  12. return 0
  13. chains = ['A','B','C']
  14. p = PDBParser(PERMISSIVE=1)
  15. structure = p.get_structure(pdb_file, pdb_file)
  16. for chain in chains:
  17. pdb_chain_file = 'pdb_file_chain_{}.pdb'.format(chain)
  18. io_w_no_h = PDBIO()
  19. io_w_no_h.set_structure(structure)
  20. io_w_no_h.save('{}'.format(pdb_chain_file), ChainSelect(chain))

Extract fasta from PDB file

  1. import sys
  2. from Bio import SeqIO
  3. PDBFile = sys.argv[1]
  4. with open(PDBFile, 'r') as pdb_file:
  5. for record in SeqIO.parse(pdb_file, 'pdb-atom'):
  6. print('>' + record.id)
  7. print(record.seq)

Enjoy~

本文由Python腳本GitHub/語雀自動更新

由於語法渲染問題而影響閱讀體驗, 請移步博客閱讀~
本文GitPage地址

GitHub: Karobben
Blog:Karobben
BiliBili:史上最不正經的生物狗