Skip to content

Instantly share code, notes, and snippets.

@crazyhottommy
Last active January 19, 2021 04:45
Show Gist options
  • Save crazyhottommy/8db4c3fdbd512a4c7e8f7e2ec50ad25c to your computer and use it in GitHub Desktop.
Save crazyhottommy/8db4c3fdbd512a4c7e8f7e2ec50ad25c to your computer and use it in GitHub Desktop.
tile bed file to same width bins
Note, bed file is 0-based, and R is 1 based. when import bed file to
R, the GRanges will add 1 in the start.
cat test.bed
chr1 1 14
chr1 14 20
chr1 30 38
```{r}
library(GenomicRanges)
library(rtracklayer)
library(dplyr)
library(readr)
gr<- import("~/playground/test.bed")
bin_size<- 5
gr_width<- width(gr)
bin_num<- ceiling(gr_width/bin_size)
## after extending, the peaks are overlapping
gr_center<- resize(gr, fix = "center", width = bin_num * bin_size)
## this works fine
unlist(tile(gr_center, width = bin_size))
## merge before tile
gr_center_merge<- reduce(gr_center)
out_bed<- unlist(tile(gr_center_merge, width = bin_size))
## trim the out of bound gr
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
txdb<- TxDb.Hsapiens.UCSC.hg38.knownGene
seqinfo(out_bed)<- seqinfo(txdb)
out_bed<- trim(out_bed)
## will handle the bed format: -1 at the start, but the output will be 6-column bed
export(out_bed, "~/playground/out.bed", format = "BED")
# or
out_bed %>% as.data.frame() %>%
mutate(start = start -1) %>%
dplyr::select(seqnames, start, end) %>%
write_tsv("~/playground/out2.bed", col_names = FALSE)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment