Open source student letter Python tutorial

Concise Python text and video tutorials for students

The source code is at: https://github.com/Tong-Chen/Bioinfo_course_python

background introduction
1. Beginning of programming
2. Why learn Python
3. How to install Python
4. How to run Python commands and scripts
5. What editor to use to write Python scripts
Python program example
Python basic syntax
1. Numeric Variable Operations
2. String variable manipulation
3. list manipulation
4. set operation
5. Range use
6. dictionary operations
7. Hierarchical indentation
8. Variables, data structures, flow control
input Output
1. Interactive input and output
2. file read and write
Practical exercises (1)
1. background knowledge
2. Homework related work (1)
function operation
1. function operation
2. Biological letter-related assignments (2)
module
command line parameters
1. command line parameters
2. Biological letter-related homework (3)
More Python content
1. single block
2. List synthesis, a simplified for loop that produces a new list
3. lambda, map, filer, reduce (repertoire)
4. exec, eval (executes string python statements, repertoire)
5. regular expression
6. Python drawing
Reference

some practice questions

Given a file in FASTA format (test1.fa and test2.fa), write a program cat.pyto read the file and output it to the screen (2 points)

open(file)
for .. in loop
print()
strip() function
Knowledge points used

Given a file in FASTQ format (test1.fq), write a program cat.pyto read the file and output it to the screen (2 points)

ditto
Knowledge points used

Write a program splitName.py, read in test2.fa, and take the name before the first space of the original sequence name as the processed sequence name, and output it to the screen (2 points)

split
the index of the string
Knowledge points used

The output format is:

>NM_001011874
gcggcggcgggcgagcgggcgctggagtaggagctg.......

Write a program formatFasta.py, read in test2.fa, connect each FASTA sequence into a line and output it (2 points)

join
strip
Knowledge points used

The output format is:

>NM_001011874
gcggcggcgggc......TCCGCTG......GCGTTCACC......CGGGGTCCGGAG

Write a program formatFasta-2.py, read test2.fa, and divide each FASTA sequence into a sequence of 80 letters per row (2 points)

string slice operation
range
Knowledge points used

The output format is

>NM_001011874
gcggcggcgc.(60个字母).TCCGCTGACG #(每行80个字母)
acgtgctacg.(60个字母).GCGTTCACCC
ACGTACGATG(最后一行可不足80个字母)

Write a program sortFasta.py, read in test2.fa, and take the name before the first space of the original sequence name as the processed sequence name, sort and output (2 points)

sort
dict
aDict[key] = []
aDict[key].append(value)
Knowledge points used

Extract the sequence given the name (2 points)

Knowledge points used
print >>fh, or fh.write()
Modulo operation, 4 % 2 == 0
Write a program grepFasta.pyto extract the sequence of test2.fa corresponding to the name in fasta.name and output it to the screen.
Write a program grepFastq.pyto extract the sequence of test1.fq corresponding to the name in fastq.name and output it to a file.

Write a program screenResult.pyto filter the genes whose foldChange is greater than 2 and padj is less than 0.05 in test.expr, and can output the entire line or only the gene name. (4 points)

logical AND operator and
The contents read in the file are all strings, which need to be converted to integers with int and converted to floating point numbers with float
Knowledge points used

Write a program transferMultipleColumToMatrix.pyto convert the expression data of genes in multiple tissues in the file (multipleColExpr.txt) into a matrix form, and draw a heat map. (6 points)

aDict['key'] = {}
aDict[‘key’][‘key2’] = value
if key not in aDict
aDict = {'ENSG00000000003': {“A-431”: 21.3, “A-549”, 32.5,…},”ENSG00000000003”:{},}
Knowledge points used

Input format (only the first 3 columns are required)

Gene    Sample  Value   Unit    Abundance
ENSG00000000003 A-431   21.3    FPKM    Medium
ENSG00000000003 A-549   32.5    FPKM    Medium
ENSG00000000003 AN3-CA  38.2    FPKM    Medium
ENSG00000000003 BEWO    31.4    FPKM    Medium
ENSG00000000003 CACO-2  63.9    FPKM    High
ENSG00000000005 A-431   0.0     FPKM    Not detected
ENSG00000000005 A-549   0.0     FPKM    Not detected
ENSG00000000005 AN3-CA  0.0     FPKM    Not detected
ENSG00000000005 BEWO    0.0     FPKM    Not detected
ENSG00000000005 CACO-2  0.0     FPKM    Not detected

output format

Name    A-431    A-549    AN3-CA    BEWO    CACO-2
ENSG00000000460    25.2    14.2    10.6    24.4    14.2
ENSG00000000938    0.0    0.0    0.0    0.0    0.0
ENSG00000001084    19.1    155.1    24.4    12.6    23.5
ENSG00000000457    2.8    3.4    3.8    5.8    2.9

Write a program reverseComplementary.pyto calculate ACGTACGTACGTCACGTCAGCTAGACthe reverse complement of a sequence. (2 minutes)

reverse
list(seq)
Knowledge points used

Write a program collapsemiRNAreads.pyto convert smRNA-Seq sequencing data. (5 points)

Input file format (mir.collapse, tab-separated two-column file, the first column is the sequence, and the second column is the number of times the sequence was measured)
```
ID_REF        VALUE
  ACTGCCCTAAGTGCTCCTTCTGGC        2
  ATAAGGTGCATCTAGTGCAGATA        25
  TGAGGTAGTAGTTTGTGCTGTTT        100
  TCCTACGAGTTGCATGGATTC        4
```
Output file format (mir.collapse.fa, the first 3 letters of the name are the specific identification of the sample, the number in the middle indicates the sequence number, which is the only identification of the sequence name, and the third part is x plus each reads detected The number of times. The three parts are connected with an underscore as the name of the fasta sequence.)
```
>ESB_1_x2
  ACTGCCCTAAGTGCTCCTTCTGGC
  >ESB_2_x25
  ATAAGGTGCATCTAGTGCAGATA
  >ESB_3_x100
  TGAGGTAGTAGTTTGTGCTGTTT
  >ESB_4_x4
  TCCTACGAGTTGCATGGATTC
```

The simplified short sequence matching program (map.py) compares the sequences in short.fa to ref.fa, and outputs which sequences the short sequences match with which positions in the ref.fa file. (10 points)

find
Knowledge points used
Output format (the output format is bed format, the first column is the matched chromosome, the second and third columns are the start and end positions of the matched chromosome sequence (the position mark starts with 0, representing the first position; The termination position is not included. The position of the sequence shown in the first example is (199,208] (front closed and rear opened, actually the sequence of Chr1 chromosome 199-206, starting from 0). The fourth column is the short sequence itself the sequence of.).
Additional requirements: It can only match to a given template strand, or consider matching to the complementary strand of the template strand. At this time, the fifth column can be the name of the short sequence, and the sixth column is the information of the strand, which matches the template strand as '+' and matches the complementary strand as '-'. Note that when the complementary strand is matched, the starting position is also counted from the 5' end of the template strand.
```
chr1    199    208    TGGCGTTCA
chr1    207    216    ACCCCGCTG
chr2    63    70    AAATTGC
chr3    0    7    AATAAAT
```

Daily Book Recommendations - Fluent Python

Luciano Ramalho, the author of "Smooth Python", is the chief consultant of Thoughtworks, a member of the Python Software Foundation, and the co-founder of Python Brasil, a well-known Python language learning community in Brazil. With 25 years of experience in Python programming, his "Smooth Python" is a classic in the field of programming, affecting nearly 80,000 readers, based on Python 3.10, with detailed content and nearly 500 well-designed code examples! There are also a large number of diagrams and tables, which are really friendly to learning! .

See the evaluation of ChatGPT for details:

Past products (click on the picture to go directly to the text corresponding tutorial)

machine learning

Reply in the background with "The first wave of benefits in the Life Letter Collection" or click to read the original text to get a collection of tutorials

Biomessage Analysis Python Practical Exercise 2 | Video 20

Table of contents

Daily Book Recommendations - Fluent Python

Past products (click on the picture to go directly to the text corresponding tutorial)

Guess you like