Last active
January 19, 2021 04:45
-
-
Save crazyhottommy/8db4c3fdbd512a4c7e8f7e2ec50ad25c to your computer and use it in GitHub Desktop.
tile bed file to same width bins
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Note, bed file is 0-based, and R is 1 based. when import bed file to | |
R, the GRanges will add 1 in the start. | |
cat test.bed | |
chr1 1 14 | |
chr1 14 20 | |
chr1 30 38 | |
```{r} | |
library(GenomicRanges) | |
library(rtracklayer) | |
library(dplyr) | |
library(readr) | |
gr<- import("~/playground/test.bed") | |
bin_size<- 5 | |
gr_width<- width(gr) | |
bin_num<- ceiling(gr_width/bin_size) | |
## after extending, the peaks are overlapping | |
gr_center<- resize(gr, fix = "center", width = bin_num * bin_size) | |
## this works fine | |
unlist(tile(gr_center, width = bin_size)) | |
## merge before tile | |
gr_center_merge<- reduce(gr_center) | |
out_bed<- unlist(tile(gr_center_merge, width = bin_size)) | |
## trim the out of bound gr | |
library(TxDb.Hsapiens.UCSC.hg38.knownGene) | |
txdb<- TxDb.Hsapiens.UCSC.hg38.knownGene | |
seqinfo(out_bed)<- seqinfo(txdb) | |
out_bed<- trim(out_bed) | |
## will handle the bed format: -1 at the start, but the output will be 6-column bed | |
export(out_bed, "~/playground/out.bed", format = "BED") | |
# or | |
out_bed %>% as.data.frame() %>% | |
mutate(start = start -1) %>% | |
dplyr::select(seqnames, start, end) %>% | |
write_tsv("~/playground/out2.bed", col_names = FALSE) | |
``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment