Biomessage Analysis Python Practical Exercise 2 | Video 19

Open source student letter Python tutorial

Concise Python text and video tutorials for students

The source code is at: https://github.com/Tong-Chen/Bioinfo_course_python

Table of contents

  1. background introduction

    1. Beginning of programming

    2. Why learn Python

    3. How to install Python

    4. How to run Python commands and scripts

    5. What editor to use to write Python scripts

  2. Python program example

  3. Python basic syntax

    1. Numeric Variable Operations

    2. String variable manipulation

    3. list manipulation

    4. set operation

    5. Range use

    6. dictionary operations

    7. Hierarchical indentation

    8. Variables, data structures, flow control

  4. input Output

    1. Interactive input and output

    2. file read and write

  5. Practical exercises (1)

    1. background knowledge

    2. Homework related work (1)

  6. function operation

    1. function operation

    2. Biological letter-related assignments (2)

  7. module

  8. command line parameters

    1. command line parameters

    2. Biological letter-related homework (3)

  9. More Python content

    1. single block

    2. List synthesis, a simplified for loop that produces a new list

    3. lambda, map, filer, reduce (repertoire)

    4. exec, eval (executes string python statements, repertoire)

    5. regular expression

    6. Python drawing

  10. Reference

some practice questions

  1. Given a file in FASTA format (test1.fa and test2.fa), write a program cat.pyto read the file and output it to the screen (2 points)

  • open(file)

  • for .. in loop

  • print()

  • strip() function

  • Knowledge points used

Given a file in FASTQ format (test1.fq), write a program cat.pyto read the file and output it to the screen (2 points)

  • ditto

  • Knowledge points used

Write a program splitName.py, read in test2.fa, and take the name before the first space of the original sequence name as the processed sequence name, and output it to the screen (2 points)

  • split

  • the index of the string

  • Knowledge points used

  • The output format is:

    >NM_001011874
    gcggcggcgggcgagcgggcgctggagtaggagctg.......

Write a program formatFasta.py, read in test2.fa, connect each FASTA sequence into a line and output it (2 points)

  • join

  • strip

  • Knowledge points used

  • The output format is:

    >NM_001011874
    gcggcggcgggc......TCCGCTG......GCGTTCACC......CGGGGTCCGGAG

Write a program formatFasta-2.py, read test2.fa, and divide each FASTA sequence into a sequence of 80 letters per row (2 points)

  • string slice operation

  • range

  • Knowledge points used

  • The output format is

    >NM_001011874
    gcggcggcgc.(60个字母).TCCGCTGACG #(每行80个字母)
    acgtgctacg.(60个字母).GCGTTCACCC
    ACGTACGATG(最后一行可不足80个字母)

Write a program sortFasta.py, read in test2.fa, and take the name before the first space of the original sequence name as the processed sequence name, sort and output (2 points)

  • sort

  • dict

  • aDict[key] = []

  • aDict[key].append(value)

  • Knowledge points used

Extract the sequence given the name (2 points)

  • Knowledge points used

  • print >>fh, or fh.write()

  • Modulo operation, 4 % 2 == 0

  • Write a program grepFasta.pyto extract the sequence of test2.fa corresponding to the name in fasta.name and output it to the screen.

  • Write a program grepFastq.pyto extract the sequence of test1.fq corresponding to the name in fastq.name and output it to a file.

Write a program screenResult.pyto filter the genes whose foldChange is greater than 2 and padj is less than 0.05 in test.expr, and can output the entire line or only the gene name. (4 points)

  • logical AND operator and

  • The contents read in the file are all strings, which need to be converted to integers with int and converted to floating point numbers with float

  • Knowledge points used

Write a program transferMultipleColumToMatrix.pyto convert the expression data of genes in multiple tissues in the file (multipleColExpr.txt) into a matrix form, and draw a heat map. (6 points)

  • aDict['key'] = {}

  • aDict[‘key’][‘key2’] = value

  • if key not in aDict

  • aDict = {'ENSG00000000003': {“A-431”: 21.3, “A-549”, 32.5,…},”ENSG00000000003”:{},}

  • Knowledge points used

  • Input format (only the first 3 columns are required)

    Gene    Sample  Value   Unit    Abundance
    ENSG00000000003 A-431   21.3    FPKM    Medium
    ENSG00000000003 A-549   32.5    FPKM    Medium
    ENSG00000000003 AN3-CA  38.2    FPKM    Medium
    ENSG00000000003 BEWO    31.4    FPKM    Medium
    ENSG00000000003 CACO-2  63.9    FPKM    High
    ENSG00000000005 A-431   0.0     FPKM    Not detected
    ENSG00000000005 A-549   0.0     FPKM    Not detected
    ENSG00000000005 AN3-CA  0.0     FPKM    Not detected
    ENSG00000000005 BEWO    0.0     FPKM    Not detected
    ENSG00000000005 CACO-2  0.0     FPKM    Not detected
  • output format

    Name    A-431    A-549    AN3-CA    BEWO    CACO-2
    ENSG00000000460    25.2    14.2    10.6    24.4    14.2
    ENSG00000000938    0.0    0.0    0.0    0.0    0.0
    ENSG00000001084    19.1    155.1    24.4    12.6    23.5
    ENSG00000000457    2.8    3.4    3.8    5.8    2.9

Write a program reverseComplementary.pyto calculate ACGTACGTACGTCACGTCAGCTAGACthe reverse complement of a sequence. (2 minutes)

  • reverse

  • list(seq)

  • Knowledge points used

Write a program collapsemiRNAreads.pyto convert smRNA-Seq sequencing data. (5 points)

  • Input file format (mir.collapse, tab-separated two-column file, the first column is the sequence, and the second column is the number of times the sequence was measured)

    ID_REF        VALUE
      ACTGCCCTAAGTGCTCCTTCTGGC        2
      ATAAGGTGCATCTAGTGCAGATA        25
      TGAGGTAGTAGTTTGTGCTGTTT        100
      TCCTACGAGTTGCATGGATTC        4
  • Output file format (mir.collapse.fa, ​​the first 3 letters of the name are the specific identification of the sample, the number in the middle indicates the sequence number, which is the only identification of the sequence name, and the third part is x plus each reads detected The number of times. The three parts are connected with an underscore as the name of the fasta sequence.)

    >ESB_1_x2
      ACTGCCCTAAGTGCTCCTTCTGGC
      >ESB_2_x25
      ATAAGGTGCATCTAGTGCAGATA
      >ESB_3_x100
      TGAGGTAGTAGTTTGTGCTGTTT
      >ESB_4_x4
      TCCTACGAGTTGCATGGATTC

The simplified short sequence matching program (map.py) compares the sequences in short.fa to ref.fa, and outputs which sequences the short sequences match with which positions in the ref.fa file. (10 points)

  • find

  • Knowledge points used

  • Output format (the output format is bed format, the first column is the matched chromosome, the second and third columns are the start and end positions of the matched chromosome sequence (the position mark starts with 0, representing the first position; The termination position is not included. The position of the sequence shown in the first example is (199,208] (front closed and rear opened, actually the sequence of Chr1 chromosome 199-206, starting from 0). The fourth column is the short sequence itself the sequence of.).

  • Additional requirements: It can only match to a given template strand, or consider matching to the complementary strand of the template strand. At this time, the fifth column can be the name of the short sequence, and the sixth column is the information of the strand, which matches the template strand as '+' and matches the complementary strand as '-'. Note that when the complementary strand is matched, the starting position is also counted from the 5' end of the template strand.

    chr1    199    208    TGGCGTTCA
    chr1    207    216    ACCCCGCTG
    chr2    63    70    AAATTGC
    chr3    0    7    AATAAAT

Daily Book Recommendations - Fluent Python

Luciano Ramalho, the author of "Smooth Python", is the chief consultant of Thoughtworks, a member of the Python Software Foundation, and the co-founder of Python Brasil, a well-known Python language learning community in Brazil. With 25 years of experience in Python programming, his "Smooth Python" is a classic in the field of programming, affecting nearly 80,000 readers, based on Python 3.10, with detailed content and nearly 500 well-designed code examples! There are also a large number of diagrams and tables, which are really friendly to learning! .

See the evaluation of ChatGPT for details:

6dbf6460e5c11a8e2efc88a89a8130ef.png

299c566d35c71a5ce7cacd8fed22aba2.png

bccb2f7a4c6aa485e1283b7785d46743.png

Past products (click on the picture to go directly to the text corresponding tutorial)

207b7aa9e41a913ba51b7cd9849253a0.jpeg

fb995a41721cc6a10c564dd5cc12ba73.jpeg

0fae9654e84f893ea7c42239422bf08c.jpeg

4a87ad953ebb55fc918fd96927e5afa6.jpeg

a9d7b8fe8fa1e5d64a446c1502c4ec47.jpeg

a52ee99bb078770f81a0b44875ec4b57.jpeg

82fe46bea2a6b9b6201d45e0c2cb3906.jpeg

f6501e43172240622ae4b1784720e14c.jpeg

4a915971364737763e6a2f044cd639f0.jpeg

11aa0c1008dc5bf4bb0cc045e4467e02.jpeg

743a910cf99297b01331bd34453fbe36.jpeg

60405a9da6cd0dc804020dbf9bde1020.jpeg

dd8b7f1536a27720c44d4a55d483b9f8.png

b0b3004702814be310c229b1667453d0.png

a673f29d61b4bb4234ea24022021ff43.png

de1532b294f27abfcae272b274a685d2.png

0a2f99c72bcf92b5e1bb48c58f42a497.jpeg

8548c798a1167fa315df3a4fc0eba944.jpeg

e094a5b7b56de07aad87ee70f7ad43ea.jpeg

63d46eb7e8b120a6c37b49899a218798.jpeg

d895233e7f5ee5963c9d527e5b5390c1.png

e45a426dd535321159926d226abeed49.png

92cb233aa644ad3e8e71028473158164.jpeg

eb686488c554d82036aa115fe77b8881.png

4a96e3e1ea2b241585b4ff25276c1caa.png

80e077aeee4b24b1a2c034b8f761988d.jpeg

06e65b5e8704bd0d882e02979203d8a0.png

27cdae260624cb4971d9d75f83a7e826.png

machine learning

Reply in the background with "The first wave of benefits in the Life Letter Collection" or click to read the original text to get a collection of tutorials

5518001490fbd978b26989ee4fd249bb.jpeg

91cd7d1f83c312d91d84084a65c5ef41.jpeg

fd2909dfb3fd5939c876c39f842d7ecb.png

Guess you like

Origin blog.csdn.net/qazplm12_3/article/details/132463644