Biomessage Analysis Python Practical Exercise 2 | Video 20

Open source student letter Python tutorial

Concise Python text and video tutorials for students

The source code is at: https://github.com/Tong-Chen/Bioinfo_course_python

Table of contents

  1. background introduction

    1. Beginning of programming

    2. Why learn Python

    3. How to install Python

    4. How to run Python commands and scripts

    5. What editor to use to write Python scripts

  2. Python program example

  3. Python basic syntax

    1. Numeric Variable Operations

    2. String variable manipulation

    3. list manipulation

    4. set operation

    5. Range use

    6. dictionary operations

    7. Hierarchical indentation

    8. Variables, data structures, flow control

  4. input Output

    1. Interactive input and output

    2. file read and write

  5. Practical exercises (1)

    1. background knowledge

    2. Homework related work (1)

  6. function operation

    1. function operation

    2. Biological letter-related assignments (2)

  7. module

  8. command line parameters

    1. command line parameters

    2. Biological letter-related homework (3)

  9. More Python content

    1. single block

    2. List synthesis, a simplified for loop that produces a new list

    3. lambda, map, filer, reduce (repertoire)

    4. exec, eval (executes string python statements, repertoire)

    5. regular expression

    6. Python drawing

  10. Reference

some practice questions

  1. Given a file in FASTA format (test1.fa and test2.fa), write a program cat.pyto read the file and output it to the screen (2 points)

  • open(file)

  • for .. in loop

  • print()

  • strip() function

  • Knowledge points used

Given a file in FASTQ format (test1.fq), write a program cat.pyto read the file and output it to the screen (2 points)

  • ditto

  • Knowledge points used

Write a program splitName.py, read in test2.fa, and take the name before the first space of the original sequence name as the processed sequence name, and output it to the screen (2 points)

  • split

  • the index of the string

  • Knowledge points used

  • The output format is:

    >NM_001011874
    gcggcggcgggcgagcgggcgctggagtaggagctg.......

Write a program formatFasta.py, read in test2.fa, connect each FASTA sequence into a line and output it (2 points)

  • join

  • strip

  • Knowledge points used

  • The output format is:

    >NM_001011874
    gcggcggcgggc......TCCGCTG......GCGTTCACC......CGGGGTCCGGAG

Write a program formatFasta-2.py, read test2.fa, and divide each FASTA sequence into a sequence of 80 letters per row (2 points)

  • string slice operation

  • range

  • Knowledge points used

  • The output format is

    >NM_001011874
    gcggcggcgc.(60个字母).TCCGCTGACG #(每行80个字母)
    acgtgctacg.(60个字母).GCGTTCACCC
    ACGTACGATG(最后一行可不足80个字母)

Write a program sortFasta.py, read in test2.fa, and take the name before the first space of the original sequence name as the processed sequence name, sort and output (2 points)

  • sort

  • dict

  • aDict[key] = []

  • aDict[key].append(value)

  • Knowledge points used

Extract the sequence given the name (2 points)

  • Knowledge points used

  • print >>fh, or fh.write()

  • Modulo operation, 4 % 2 == 0

  • Write a program grepFasta.pyto extract the sequence of test2.fa corresponding to the name in fasta.name and output it to the screen.

  • Write a program grepFastq.pyto extract the sequence of test1.fq corresponding to the name in fastq.name and output it to a file.

Write a program screenResult.pyto filter the genes whose foldChange is greater than 2 and padj is less than 0.05 in test.expr, and can output the entire line or only the gene name. (4 points)

  • logical AND operator and

  • The contents read in the file are all strings, which need to be converted to integers with int and converted to floating point numbers with float

  • Knowledge points used

Write a program transferMultipleColumToMatrix.pyto convert the expression data of genes in multiple tissues in the file (multipleColExpr.txt) into a matrix form, and draw a heat map. (6 points)

  • aDict['key'] = {}

  • aDict[‘key’][‘key2’] = value

  • if key not in aDict

  • aDict = {'ENSG00000000003': {“A-431”: 21.3, “A-549”, 32.5,…},”ENSG00000000003”:{},}

  • Knowledge points used

  • Input format (only the first 3 columns are required)

    Gene    Sample  Value   Unit    Abundance
    ENSG00000000003 A-431   21.3    FPKM    Medium
    ENSG00000000003 A-549   32.5    FPKM    Medium
    ENSG00000000003 AN3-CA  38.2    FPKM    Medium
    ENSG00000000003 BEWO    31.4    FPKM    Medium
    ENSG00000000003 CACO-2  63.9    FPKM    High
    ENSG00000000005 A-431   0.0     FPKM    Not detected
    ENSG00000000005 A-549   0.0     FPKM    Not detected
    ENSG00000000005 AN3-CA  0.0     FPKM    Not detected
    ENSG00000000005 BEWO    0.0     FPKM    Not detected
    ENSG00000000005 CACO-2  0.0     FPKM    Not detected
  • output format

    Name    A-431    A-549    AN3-CA    BEWO    CACO-2
    ENSG00000000460    25.2    14.2    10.6    24.4    14.2
    ENSG00000000938    0.0    0.0    0.0    0.0    0.0
    ENSG00000001084    19.1    155.1    24.4    12.6    23.5
    ENSG00000000457    2.8    3.4    3.8    5.8    2.9

Write a program reverseComplementary.pyto calculate ACGTACGTACGTCACGTCAGCTAGACthe reverse complement of a sequence. (2 minutes)

  • reverse

  • list(seq)

  • Knowledge points used

Write a program collapsemiRNAreads.pyto convert smRNA-Seq sequencing data. (5 points)

  • Input file format (mir.collapse, tab-separated two-column file, the first column is the sequence, and the second column is the number of times the sequence was measured)

    ID_REF        VALUE
      ACTGCCCTAAGTGCTCCTTCTGGC        2
      ATAAGGTGCATCTAGTGCAGATA        25
      TGAGGTAGTAGTTTGTGCTGTTT        100
      TCCTACGAGTTGCATGGATTC        4
  • Output file format (mir.collapse.fa, ​​the first 3 letters of the name are the specific identification of the sample, the number in the middle indicates the sequence number, which is the only identification of the sequence name, and the third part is x plus each reads detected The number of times. The three parts are connected with an underscore as the name of the fasta sequence.)

    >ESB_1_x2
      ACTGCCCTAAGTGCTCCTTCTGGC
      >ESB_2_x25
      ATAAGGTGCATCTAGTGCAGATA
      >ESB_3_x100
      TGAGGTAGTAGTTTGTGCTGTTT
      >ESB_4_x4
      TCCTACGAGTTGCATGGATTC

The simplified short sequence matching program (map.py) compares the sequences in short.fa to ref.fa, and outputs which sequences the short sequences match with which positions in the ref.fa file. (10 points)

  • find

  • Knowledge points used

  • Output format (the output format is bed format, the first column is the matched chromosome, the second and third columns are the start and end positions of the matched chromosome sequence (the position mark starts with 0, representing the first position; The termination position is not included. The position of the sequence shown in the first example is (199,208] (front closed and rear opened, actually the sequence of Chr1 chromosome 199-206, starting from 0). The fourth column is the short sequence itself the sequence of.).

  • Additional requirements: It can only match to a given template strand, or consider matching to the complementary strand of the template strand. At this time, the fifth column can be the name of the short sequence, and the sixth column is the information of the strand, which matches the template strand as '+' and matches the complementary strand as '-'. Note that when the complementary strand is matched, the starting position is also counted from the 5' end of the template strand.

    chr1    199    208    TGGCGTTCA
    chr1    207    216    ACCCCGCTG
    chr2    63    70    AAATTGC
    chr3    0    7    AATAAAT

Daily Book Recommendations - Fluent Python

Luciano Ramalho, the author of "Smooth Python", is the chief consultant of Thoughtworks, a member of the Python Software Foundation, and the co-founder of Python Brasil, a well-known Python language learning community in Brazil. With 25 years of experience in Python programming, his "Smooth Python" is a classic in the field of programming, affecting nearly 80,000 readers, based on Python 3.10, with detailed content and nearly 500 well-designed code examples! There are also a large number of diagrams and tables, which are really friendly to learning! .

See the evaluation of ChatGPT for details:

fde53d072caeacccd806f7ecdfc4996a.png

3593d64ae5650eea420164c92e3d001f.png

4383aea8f431451cfff45e8afe24aa4a.png

Past products (click on the picture to go directly to the text corresponding tutorial)

ba1df7fd68e9c98c1e36ba75a2d44259.jpeg

2b6897fb7d862c28608ff00326fc94bb.jpeg

a2fcaf0a1a1723e50c44d43b95bdfe6e.jpeg

87f1da578e1307285b230232821903f9.jpeg

55125c23fd80b0071dfd9e3babe03cfa.jpeg

051245a15dff62c60c3104584d9743b1.jpeg

c47f791b797e8f848609c3a2c17703c3.jpeg

5d3de95ef497b01a9d3b23e439db02da.jpeg

eb3fad4c2b6d4c3e33e97fe2b260e161.jpeg

d6b2ee360c03ca1ba1cafdf2c01815d1.jpeg

ef8815a0cb7e52b294c78cd5f0b6797d.jpeg

fe431a978520a6f07883d30e11a06b96.jpeg

c2a8641e19e427414fe38711f3c61ba8.png

5f9675bd6ee391ddb79d401c1ff7fc02.png

0738c75092e670169e19f894d899e956.png

28402575ff26d38599c73f736054c5fb.png

253a8ca4880f2025e4277db03e4ec76d.jpeg

edd2425da51a5632cdf0828d7daef5d4.jpeg

0e2a20dcc5f5da6d4c3ecf3fa08ee40b.jpeg

2d88593cb98d9e03f02c905932cf052a.jpeg

466924a8295efb2fde526d7a3da3a4b1.png

e9640eb197cf662ce843cc551d7e3820.png

2e504931147171065923317a6eac46e5.jpeg

168f1d613d921f2d509bb75204cd69e5.png

1c147605beb0fd1f03e3b55b2349e33f.png

f9b150e01a8e9651b2d1ed954300bde2.jpeg

a54c86ac25c359a59f17484a6fab3f26.png

6f956cdbaa88f1396666d4c4b0f3db41.png

machine learning

Reply in the background with "The first wave of benefits in the Life Letter Collection" or click to read the original text to get a collection of tutorials

a65d429a116c3a37218cd11625a7c93a.jpeg

a84ab1838d1fef00d83d198f47890129.jpeg

06fae25d22424cc6f4648248886df678.png

Guess you like

Origin blog.csdn.net/qazplm12_3/article/details/132486428