Showing posts with label transcriptdb. Show all posts
Showing posts with label transcriptdb. Show all posts

November 5, 2014

Create a REFSEQ transcript database

Transcript databases (Txdb) in bioconductor are very good annotation packages. These packages help the researchers annotate the genomic regions of interest to multiple genic elements such as exons, introns, UTRs, CDS, genes etc.,. For the human genome bioconductor offers Txdb files only for the UCSC knowngenes. Here, I share the code needed for generating human Txdb using bioconductor package "GenomicFeatures"

The following code/function could be used for generating any Txdb of choice for any organism of interest. This is a very simple function. However, due to the naming of the function and the default parameters hide the full potential of this function in utilizing it for creating a variety of databases. In other words, this function could be used to generate a Txdb from every table existing at UCSC.

February 25, 2014

Create a GENCODE transcript database in R

The following gist will help the researchers in creating the gencode transcript database using the bioconductor packages. I am assuming that the user's computer has preinstalled packages "GenomicRanges" and "GenomicFeatures". Following script has the following information:
  • loads the needs bioconductor packages
  • gives information about creating the intermediate files needed for generating the database
  • brief explanation about each step in the procedure
  • create the transcript database, saving and loading when needed
  • extract information for each feature (gene, cds,transcript,exon,intron,intergenic regions) as 'GRanges' object, 'sort' when needed.
  • saves all the extracted features into combined object to be loaded in future