liftOver different versions of genome files to convert each other

Hello everyone, I am Deng Fei. Some time ago, a friend asked a question on the planet: I want to merge the SNP data of different versions, I don’t want to call the snp again, I want to merge the data of the V2 and V4 versions of the sheep, specifically, convert V2 to V4 and then merge with V4.


I suggest liftOverdoing it with software and promise to write a blog about it.

There are also friends who want to change the 1.2 reference genome to 3.1. Ask me how to deal with it. I still recommend using liftOver. Online websites can also solve it, but local programming is faster.

1. The principle of correspondence between different genome transformations

Every time the reference genome is updated, the position information will change, some are inserted, some are translated, and some are not changed.

However, each version of the reference genome has a corresponding relationship. If we use the corresponding relationship, we can update the old version to the position of the new version.

Field of application: The vcf data of different reference genome call SNPs can be converted into the same genome version in this way, and then merged. Some chips are designed with different genome versions, which can also be converted and then merged in this form.

2. LiftOver software download

URL: http://hgdownload.cse.ucsc.edu/admin/exe/

There are Apple system and Linux system, here we take Linux system as an example to introduce.

3. Find the genome version of a species

URL: https://hgdownload.soe.ucsc.edu/downloads.html

Common species are:

For example the pig version has:

  • V11
  • V10
  • V9


Chicken has:

  • V6
  • V5
  • V4

The cattle are:

  • V9,V8,V7

People have:

  • hg38
  • hg19
  • mm39
  • mm10

4. Download different versions of liftOver data files

For example, here is a chicken as an example, enter the website: https://hgdownload.soe.ucsc.edu/goldenPath/galGal6/liftOver/

Here is V6 to V5, V6 to V4:, we want to change V6 to V5, you can download:


Of course, you can also change V5 to V6, and V4 to V6, just download the corresponding chain file:


Note, do not decompress the downloaded gz file. stay compressed

5. Organize location information

Let's take the plink data as an example. We want to change the map of the v5 version into the map of the v6 version. First, change the map data into the bed format:

Organize the location information into a bed file, which can be organized according to the map, chromosome, start position, end position, and no line header.

Only BED format files are accepted, BED format files only define the first three columns: chr start end, no header Note
: end is not equal to start (if it is a single point, it is recommended that all end = start+1)

Conversion code:

sed 's/\s\+/ /g' new_v3.map >t1.map
awk '{print "chr"$1,$4,$4+1}' t1.map >tt.bed

6. Run the liftOver command-line conversion

The syntax for liftOver is:

liftOver <输入文件> <chain文件> <输出文件> <unmapped文件>

Sample code:

Change the V6 version of bed to V5 version:

liftOver tt.bed galGal6ToGalGal5.over.chain.gz re_map.bed re_un_map.bed
  • The first parameter, tt.bed, is the bed file, the bed file generated according to the map
  • The second parameter is the compressed file downloaded according to the liftOver website, which is the corresponding relationship, URL: https://hgdownload.soe.ucsc.edu/goldenPath/galGal5/liftOver/
  • The third parameter is the output result file
  • The fourth parameter is the result file that does not match

The results will output the sites that were successfully converted, and the sites that were not converted.

In order to facilitate our subsequent use, we can run the code first, delete the sites that have not been converted successfully, and then convert them again, so that there is a one-to-one correspondence.

If you have any questions about use, you can go to the official account to answer questions.

Guess you like

Origin blog.csdn.net/yijiaobani/article/details/130957932