tag:blogger.com,1999:blog-36981460724130073442024-02-19T12:09:13.701+05:30CHITKAAn IdeaUnknownnoreply@blogger.comBlogger17125tag:blogger.com,1999:blog-3698146072413007344.post-57038846466657125992014-12-16T15:56:00.000+05:302014-12-17T03:14:19.553+05:30Extending methylKit: Christmas Offer for Methylome Researchers<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="color: #222222; font-family: arial, sans-serif; font-size: small;">
Next week is going to be a vacation and most of us are going on vacation. Me too. However, I thought that this Christmas I would OFFER fellow researchers a GOOD SERVICE. </div>
<div style="color: #222222; font-family: arial, sans-serif; font-size: small;">
<br /></div>
<div style="color: #222222; font-family: arial, sans-serif; font-size: small;">
Many of the biologists are learning to deal with their high-throughput sequencing data. I also started just like you and spent hours and days learning R/Bioconductor related packages. Now, I thought I would SAVE your time and offer the data-analysis skills I have developed at most affordable price for the researchers (actually, my service is FREE, I collect charges only to PREVENT SPAM requests and some of it will go into developing further resources to help users like you)</div>
<div style="color: #222222; font-family: arial, sans-serif; font-size: small;">
<br /></div>
<div style="color: #222222; font-family: arial, sans-serif; font-size: small;">
Here are the details of my christmas offer: I am now offering methylation analysis for two samples (1 control and 1 test). As part of my offering you would get the following done:</div>
<div style="color: #222222; font-family: arial, sans-serif; font-size: small;">
<ul>
<li>QC report of the FASTQ reads</li>
<li>Mapping (hg19/mm10) using BISMARK</li>
<li>Bisulfite efficiency analysis (if you have used lambda spike in)</li>
<li>methylKit workflow to suit your needs</li>
<li>Additional workflow in bioconductor to extract the list of genic elements/or non-genic elements with differentially methylated CpGs</li>
</ul>
<div>
Not only this, I also offer 30min - 1 hr consultation on planning methylome experiments (RRBS/WGBS). Yes, you read it right, I do data analysis and I build NGS libraries my self. So, I can help you planning experiments. I have built and sequenced over 100 RRBS libraries and 10 WGBS libraries from 10ng - 30ng DNA (I have NOT heard of any commercial service provider offering methylomes from such smaller amounts of DNA). So, with my tips you can build NGS libraries from very precious samples. I know the ins and outs of NGS library prep and good understanding of methylation analysis. I can see the QC and tell you what has gone wrong in the library prep (if there is any).</div>
</div>
<div style="color: #222222; font-family: arial, sans-serif; font-size: small;">
<br /></div>
<div style="color: #222222; font-family: arial, sans-serif; font-size: small;">
<b>Price details:</b></div>
<div style="color: #222222; font-family: arial, sans-serif; font-size: small;">
<ol>
<li><b>RRBS/WGBS consultation (Sample processing, library prep, data analysis) for 30min to 1 hr :</b> $15 for 30 min (total 6 slots for consultation)</li>
<li><b>RRBS data analysis as mentioned above :</b> $ 50 (only 10 slots)</li>
<ul>
<li>I will also explain you the pipeline and the results will be delivered in neatly explained document/ppt. If necessary over skype/hangout</li>
</ul>
<li><b>WGBS data analysis :</b> $50 (only 1 slot)</li>
<ul>
<li>WGBS analysis will NOT cover the above data analysis due to huge cost of time and computational resources. I will do QC report, mapping and bisulfite efficiency analysis only.</li>
</ul>
</ol>
<div>
<b>Risk free:</b></div>
<ol>
<li>After experiencing this service if you feel that my expertise is not useful for you, I WILL RETURN 80% of the price you PAID with NO QUESTIONS ASKED. All you have to do is just ask for your money. I am sure people reading this are true to their conscience. I do not want anybody to comment on this blog that they have lost money. So, my reputation is at stake if you are not satisfied.</li>
<li>If you happen to use the results, you need not give me any contribution. Its YOUR data. </li>
</ol>
<div>
You can contact me using my personal email id at kalyankpy[at]gmail[dot]com for further details.</div>
<div>
<br /></div>
Disclaimer: Pls note that this is not marketing or business. I want to share my expertise and skill to the needed (I have been active in the methylkit_discussion forum on google groups for the topics I am familiar with). However, I don't want to invite SPAMMERS or JUNK. Price for this service is meant to invite genuine researchers only. Some of the money raised will be used in developing additional online resources for methylation anlaysis.</div>
</div>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-3698146072413007344.post-58066720822615917402014-11-05T18:18:00.000+05:302014-11-05T18:18:59.087+05:30Create a REFSEQ transcript database<div dir="ltr" style="text-align: left;" trbidi="on">
Transcript databases (Txdb) in bioconductor are very good annotation packages. These packages help the researchers annotate the genomic regions of interest to multiple genic elements such as exons, introns, UTRs, CDS, genes etc.,. For the human genome bioconductor offers Txdb files only for the UCSC knowngenes. Here, I share the code needed for generating human Txdb using bioconductor package "GenomicFeatures"<br />
<br />
The following code/function could be used for generating any Txdb of choice for any organism of interest. This is a very simple function. However, due to the naming of the function and the default parameters hide the full potential of this function in utilizing it for creating a variety of databases. In other words, this function could be used to generate a Txdb from every table existing at UCSC.<br />
<br />
<script src="https://gist.github.com/kalyankpy/9befbf3319ed46d6ca3d.js"></script>
</div>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-3698146072413007344.post-2714202742280569902014-10-23T19:24:00.001+05:302014-10-24T15:21:12.640+05:30Extending methylKit : Extract promoters with differentially methylated CpGs<div dir="ltr" style="text-align: left;" trbidi="on">
In my previous <a href="http://chitka-kalyan.blogspot.fi/2014/10/methylkit-for-bisulfite-sequencing-data.html">post</a>, I wrote about the features of methylKit. Here, I will discuss how to extend bisulfite sequencing data analysis beyond methylKit.<br />
<br />
Annotation is an important feature of genomic analyses. Coming to bisulfite sequencing analyses such as RRBS or WGBS, methylKit does a pretty good job by identifying the differentially methylated individual CpGs or any specific regions/tiles. It also performs basic annotation and plots pie charts indicating where all the differentially methylated CpGs overlap the genic annotations.<br />
<br />
<div style="text-align: left;">
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlt2kqipxFe9E_jAYSrIYo5xtjiaCTJlPkHFmdo7a7xri_4AWQFOPvaFwa4eBfvGPGYs1B-eWnbnfGH3a0m4nzwJ59a_3tQCYMAOIhA36o9SbD7tE_1MyXtRZyWvCnanFs1aUWsNREzqHr/s1600/Screenshot+at+2014-10-21+18.46.54.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlt2kqipxFe9E_jAYSrIYo5xtjiaCTJlPkHFmdo7a7xri_4AWQFOPvaFwa4eBfvGPGYs1B-eWnbnfGH3a0m4nzwJ59a_3tQCYMAOIhA36o9SbD7tE_1MyXtRZyWvCnanFs1aUWsNREzqHr/s1600/Screenshot+at+2014-10-21+18.46.54.png" height="285" width="320" /></a></div>
The adjacent picture is an example of the kind of annotation performed by methylKit.Using the native functions, methylKit users will be able to annotate the differentially methylated CpGs to the genic regions. Adjacent picture indicates that among all the diffmeth <br />
CpGs 46% overlap with promoters, 27% overlap with intergenic region. Another way to look at the annotations is to identify the list of promoters, exons, introns, intergenic regions that overlap differentially methylated CpGs. There are no methylKit functions to perform the annotations from the point of genic regions.However, methylKit facilitates this by enabling the coercion of the methylKit objects into "GRanges" (GenomicRanges). The following script will help methylKit users in extracting the list of promoter/exons/introns that overlap with differentially methylated CpGs.</div>
<script src="https://gist.github.com/kalyankpy/b703014b0bad2bb293c6.js"></script>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
</div>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-3698146072413007344.post-90546801743482518932014-10-21T15:18:00.000+05:302014-10-21T15:30:06.972+05:30methylKit for bisulfite sequencing data analysis<div dir="ltr" style="text-align: left;" trbidi="on">
I have been relying upon <a href="http://code.google.com/p/methylkit/" target="_blank">methylKit</a>, an R package for my RRBS data analysis. It is one the most highly <a href="http://genomebiology.com/2012/13/10/R87/" target="_blank">cited</a> R package for analysing bisulifite sequencing data. It is straight forward to install and it's vignette details all the major steps in the bisulfite analysis with clarity. <a href="http://linkedin.com/in/altunaakalin" target="_blank">Altuna Akalin</a>, the author of the methylKit has been actively supporting (via <a href="http://groups.google.com/forum/#!forum/methylkit_discussion" target="_blank">google groups</a>) the issues faced by the users in implementing methylKit. Overall, methylKit could also be used with little knowledge of R. Interestingly, working with methylKit also helps laboratory researchers learn R.<br />
<br />
As with other bisulfite sequencing data analysis packages, methylkit takes charge once the bisulfite reads are aligned to the genome. Here are the tasks one can implement using methylKit:<br />
<br />
<ul style="text-align: left;">
<li>Extract methylation information from aligned data from mappers like Bismark</li>
<li>Alternatively, one can read the methylation information of mapped cytosines easily from other mappers like BSMAP or any other bisulfite mapper in a specified format </li>
<li>Normalize the CpGs covered by removing the CpGs that have excess coverage due to over amplification/PCR duplication</li>
<li>Calculate methylation status of each CpG covered (or specified regions or tiles across genome) and export them into bed or bedcoverage formats for visualization in a genome browser. methylKit also enables merging of strand coverage.</li>
<li>Enables the consideration of replicates among control and test samples</li>
<li>Calculate differential methylation either at single CpG levels or regions/tiles covered across the control and test samples</li>
<li>Facilitates PCA and cluster analysis to identify the overall relation among the samples from methylation point of view</li>
<li>Enables annotation of differential methylation across CpG islands/shores and multiple genic regions of interest such as promoters, exons, introns......</li>
</ul>
<div>
Any genomic analysis is highly customized after a certain number of basic steps. One has to build the customization by utilizing the options among several packages and bridging the gaps by fine tuning the input and output file formats. methylKit does a fairly good job by facilitating the coercion of methylKit objects into GenomicRanges objects such as GRanges. This feature enables seamless integration into multiple packages from bioconductor. In the future posts, I will detail some examples and R scripts that facilitate extension to methylKit analysis.</div>
</div>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-3698146072413007344.post-7766038825870221842014-10-08T19:25:00.001+05:302014-10-21T13:55:49.523+05:30Are Post-bisulfite DNA library preparation kits suitable for RRBS?<div dir="ltr" style="text-align: left;" trbidi="on">
Recently, a colleague asked me if post-bisulfite DNA library preparation kits are suitable for reduced representation bisulfite sequencing (RRBS). In this blog post I share my views on this concept and its applicability.<br />
<br />
The concept of DNA library preparation after bisulfite conversion of DNA was originally introduced by <a href="http://www.zymoresearch.com/epigenetics/dna-methylation/genome-wide-5-mc-analysis/pico-methyl-seq-library-prep-kit" target="_blank">Zymo research</a> for whole genome Methyl-seq. This is really exciting because bisulfite reaction is so harsh that 90% of library gets fragmented in the traditional protocols and is not amplifiable. The amplification we see is actually from the remaning 10% library. Other advantages are listed below:<br />
<ul style="text-align: left;">
<li>Generally sonication of DNA is performed in a buffer of atleast 130 ul (Volume of a Covaris micro-tube). After sonication DNA needs to be purified/concentrated. So, sonication always accompanies additional purification steps that lead to loss of DNA (purification by columns will lead to a minimum loss of 10% of the DNA). Additionally, one has to check the concentration of DNA and the fragments size before proceeding.</li>
<li>Bisulfite conversion of the whole genome is a harsh reaction that leads to random nicks in the DNA. Thus DNA is broken down into fragments. Subjecting the whole genome to bisulfite conversion is thus doing two steps: fragmentation and bisulfite conversion. Thus, it is advantageous as it avoids sonication and loss of DNA during the purifications steps.</li>
<li>Another advantage is that there is no fragmentation induced loss of DNA after ligation (as in normal Methyl-seq where bisulfite reaction is performed post ligation. This generally leads to fragmentation of ligated library that could not be amplified).</li>
</ul>
Before commenting on the applicability of post-bisulfite library preparation to RRBS, it is wise to understand the associated library preparation steps post-bisulfite conversion of whole genome. The following picture explains the methodology (adapted from zymo research <a href="http://www.zymoresearch.com/epigenetics/dna-methylation/genome-wide-5-mc-analysis/pico-methyl-seq-library-prep-kit" target="_blank">webpage</a>)<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe height="480" src="https://drive.google.com/file/d/0BxUByyNRRGP9QWhDcWwxQkhjQjA/preview" width="640"></iframe></div>
<br />
An important consideration in the above work-flow is to convert bisulfite converted ssDNA into dsDNA. This is achieved in a process akin to cDNA preparation using random oligos. Since, the fragmentation induced by bisulfite reaction is random, random oligo seems a right choice. This protocol should work fine for preparing the library for whole genome bisulfite sequencing (Methyl-seq).<br />
<br />
To find out the suitability of this workflow for RRBS, let us revisit the basic concepts. RRBS is inteded to enrich the CpG rich regions by digesting the DNA with MspI restriction enzyme. This will result in DNA fragments ranging from as low as 40 bp to multiple kilobases in length with identical termini. However, we choose fragments in the size range of 40-480 bp that seem to be a better representation of the CpG rich regions and promoters. In the traditional RRBS protocol, we perform bisulfite conversion post adapter ligation. So, we amplify the fragments we 'choose' (with some loss during the bisulfite conversion due to random fragmentation of DNA).<br />
<br />
Let us see what happens if we subject the MspI digested DNA to bisulfite reaction prior to library preparation. <br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe height="480" src="https://drive.google.com/file/d/0BxUByyNRRGP9eVJvZTRHYU9kc2c/preview" width="640"></iframe></div>
<ul style="text-align: left;">
<li>DNA is further fragmented into smaller fragments.</li>
<li>This fragmentation will skew the composition of the MspI digested fragments and the desired size range of 40-480bp does not just represent CpG rich regions. This size range now contains any region of the genome.</li>
<li>Termini will not be MspI recognition motifs but random nucleotides due to chemical induced fragmentation.</li>
<li>Because of the random fragmentation, even sequencing data from replicates is likely to represent CpGs from different genomic loci reducing the overlap among replicates. </li>
<li>The QC of the RRBS reads is assessed by the 5' termini (CGG/TGG). Now, because of random fragmentation, terminal nucleotides are altered!</li>
<li>Even when random fragmentation doesn't happen, another issue exists during ssDNA to dsDNA conversion. Usage of random oligos is good for randomly fragmented termini. For MspI digested termini, the major chunk are identical termini which means, random oligos may not convert the DNA at the same efficiency!</li>
</ul>
Since there are multiple issues involved, I conclude (well I have not done any experiment yet) that post-bisulfite library preparation kits are not suitable for RRBS. In my view any company that claims the suitability of this kit should do the RRBS experiments with replicates before documenting and selling these kits. I am eager to know if this has worked for RRBS. I would be willing to retract this post then!<br />
</div>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-3698146072413007344.post-17776884795962120842014-08-28T18:05:00.000+05:302014-10-08T00:48:22.460+05:30 Survey on Indian Research Scholars<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: left;">
Recently, I have initiated a survey to gather the issues faced by research scholars across India. I have posted the link to this survey in the <a href="https://www.facebook.com/groups/researchsholarsofindia/" target="_blank">facebook page</a> titled "Research Scholars of India". This survey lasted for 10 days (16-26 August 2014).</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
This survey received a good response as much as from 85 research scholars across India spanning a number of universities and institutions. Participants also ranged from all the possible major funding sources in India for research. In summary this report is indicative of the grim situation faced by the research scholars in India. </div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
Following are the highlights of the survey:</div>
<br />
<ul style="text-align: left;">
<li>This survey has 73% male participation against 23% female. The rest chose not to mention their sex.</li>
<li> 90% of the respondents have clarity about their registration for PhD</li>
<li>Nearly 50% of the scholars say that they do not get monthly stipends and are facing administrative issues either at the host institutions or at funding agencies.</li>
<li>More than 60% scholars indicate that the stipend is barely sufficient to meet their individual needs. </li>
<li>About 80% of the researchers are in distress and not satisfied.</li>
<li>80% researchers say that they can not meet the financial needs in case they are married and most of them decided not to get married. Few researchers, in their personal communication also mentioned that their marriage prospects are next to nothing! </li>
<li>A whopping 98% researchers are facing tough life either due to work environment or insufficient stipends.</li>
</ul>
For those who want to have a look at the survey, they can do so at this <a href="http://kwiksurveys.com/app/rendersurvey.asp?sid=npcxwdfrfu1w9vn406168" target="_blank">webpage</a>. Further suggestions are welcome to improve this report!<br />
<br />
Here is the survey report.</div>
<p style=" margin: 12px auto 6px auto; font-family: Helvetica,Arial,Sans-serif; font-style: normal; font-variant: normal; font-weight: normal; font-size: 14px; line-height: normal; font-size-adjust: none; font-stretch: normal; -x-system-font: none; display: block;"> <a title="View Survey Results on Scribd" href="https://www.scribd.com/doc/242208413/Survey-Results" style="text-decoration: underline;" >Survey Results</a></p>
<iframe class="scribd_iframe_embed" src="https://www.scribd.com/embeds/242208413/content?start_page=1&view_mode=scroll&show_recommendations=true" data-auto-height="false" data-aspect-ratio="undefined" scrolling="no" id="doc_32974" width="100%" height="600" frameborder="0"></iframe>
</div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-3698146072413007344.post-39709717554255117742014-04-28T15:51:00.000+05:302014-10-08T01:55:40.068+05:30Restriction digestion of eukaryotic genomes in R<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
There are multiple desktop tools (Bioedit, Emboss, various basic bioinformatic tools) and browser based tools (<a href="http://tools.neb.com/NEBcutter2/index.php" target="_blank">NEBcutter</a>, <a href="http://biotools.umassmed.edu/tacg4/" target="_blank">biotools</a>, <a href="http://insilico.ehu.es/digest/" target="_blank">In silico restriction digestion</a> and <a href="http://molbiol-tools.ca/Restriction_endonuclease.htm" target="_blank">various other tools</a>) available for performing restriction digestion of smaller prokaryotic genomes or smaller eukaryotic chromosomes individually. However, no desktop tool or browser based tool is available for free use to perform restriction digestion on the whole genome of eukaryotes with larger genomes. Tools like Emboss handles this sort of task but in a primitive way that is not helpful for the downstream analysis of the results right away.<br />
<br />
I have been working on methylation analysis using RRBS method. This method is based on the restriction digestion pattern of the enzyme MspI on the whole genome. I wanted to perform <i>in silico</i> digestion of MspI on mouse genome(mm10) to virtually see the pattern of digestion. After scanning the web finally I narrowed down on a bioconductor package "Biostrings" that helped me achieve this task. Here, I give the code to perform this task. Since the package I used is an R package, it also helped me perform a variety of downstream analysis pretty fast.<br />
<br />
This method is based on the ability of the "Biostrings" package to recognize the MspI restriction site (CCGG) on the mouse genome (BSgenome.Mmusculus.UCSC.mm10 bioconductor package loaded into R). Following tasks are peformed by the script below:<br />
<ul style="text-align: left;">
<li>Load the needed bioconductor and R packages</li>
<li>Identify the MspI restriction sites (genomic co-ordinates) per chromosome in the genome.</li>
<li>Extract the start and end co-ordinates of the dna fragments resulting from the genomic digestion (using <i>gaps</i>)</li>
<li>Create a dataframe of the genomic co-ordinates of the digested fragments fro each chromosome for easier downstream analysis</li>
<li>Plot the frequency of the length of digested fragments using ggplot2</li>
<li>A reproducible document is also available at <a href="http://rpubs.com/kalyankpy/genomic_digestion" target="_blank">R pubs</a> </li>
</ul>
<script src="https://gist.github.com/kalyankpy/a65d5f8824f00abe04da.js"></script></div>
<div class="separator" style="clear: both; text-align: center;">
<iframe src="https://docs.google.com/file/d/0BxUByyNRRGP9bmVDTXloeFRmbXM/preview" width="640" height="480"></iframe>
</div>
<br />
<b>Update on 19th August 2014:</b><br />
I came to know that some people tried to replicate this code and noticed some errors on their side. So, I would say that some issues may crop up when the version of the libraries change. Users may note that the above code is working with the following session information in R.<br />
<span style="font-family: "Courier New",Courier,monospace; font-size: x-small;">> sessionInfo()<br />R version 3.1.1 (2014-07-10)<br />Platform: x86_64-pc-linux-gnu (64-bit)<br /><br />locale:<br /> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 <br /> [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 <br /> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C <br />[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C <br /><br />attached base packages:<br />[1] parallel stats graphics grDevices utils datasets methods base <br /><br />other attached packages:<br /> [1] scales_0.2.4 plyr_1.8.1 <br /> [3] ggplot2_1.0.0 BSgenome.Mmusculus.UCSC.mm10_1.3.1000<br /> [5] BSgenome_1.32.0 GenomicRanges_1.16.3 <br /> [7] GenomeInfoDb_1.0.2 Biostrings_2.32.1 <br /> [9] XVector_0.4.0 IRanges_1.22.9 <br />[11] BiocGenerics_0.10.0 <br /><br />loaded via a namespace (and not attached):<br /> [1] bitops_1.0-6 colorspace_1.2-4 digest_0.6.4 grid_3.1.1 gtable_0.1.2 <br /> [6] labeling_0.2 MASS_7.3-33 munsell_0.4.2 proto_0.3-10 Rcpp_0.11.2 <br />[11] reshape2_1.4 Rsamtools_1.16.1 stats4_3.1.1 stringr_0.6.2 tools_3.1.1 <br />[16] zlibbioc_1.10.0</span></div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-3698146072413007344.post-86961500931754465332014-04-08T17:32:00.001+05:302014-06-26T14:02:13.133+05:30ENCODE Transcription factor tracks<div dir="ltr" style="text-align: left;" trbidi="on">
I was trying to overlap the differentially methylated CpGs at the transcription factor binding sites for my data. But I am puzzled to see that the ENCODE consortium which has spent huge amount of money on performing experiments and publishing papers, did not pay much attention to giving details of the tracks and their naming structure as a quick reference that can be easily found. After browsing a lot on the net, I got the clue from this <a href="http://www.biostars.org/p/7347/">webpage</a> about the details of the transcription factor chip experiments.<br />
<br />
In this post I summarize the details about how to download all the tracks and understanding the naming of the tracks.<br />
<br />
Transcription factor related ChIP-seq tracks are available for individual download from this <a href="http://genome-test.cse.ucsc.edu/cgi-bin/hgFileUi?db=hg19&g=wgEncodeAwgTfbsUniform">link</a>. However, for those wanting to perform analysis from all the tracks, it is painful to download each track and merge the tracks ensuring the identity of each track. Fortunately, I found that <a href="http://sartorlab.ccmb.med.umich.edu/node/17">Sartor lab</a> has created a bed file merging all the ENCODE TF tracks. This file also ensured the identity of each row of the track by labelling its source. This could be easily converted into 'GRanges' object either by custom functions in base R or using <i>import</i> function from <i>rtracklayer</i> package from Bioconductor.<br />
<br />
Another issue with these tracks is their naming structure. While the consortium ensured that every track name includes all the necessary information, its structure was not documented anywhere (that could be easily obtained). After few hours of browsing and collecting the information, I understood the structure of the TF track name as follows:<br />
<br />
<ul>
<li>List of tracks and the related metadata for the tracks is available for download from this <a href="http://genome-test.cse.ucsc.edu/cgi-bin/hgFileUi?db=hg19&g=wgEncodeAwgTfbsUniform">webpage</a> (click the files.txt link)</li>
<li>The downloaded file is not uniformly tabulated. So, I had to fiddle with it to make it look uniform</li>
<li>Further I extracted only the details that matter to understand the file/track name. You may download this file from <a href="https://drive.google.com/file/d/0BxUByyNRRGP9QzVaS1YzMFNwTGs/edit?usp=sharing">here</a>.</li>
</ul>
<br />
Here I will explain the name of one track as an example:<br />
Track name = <b>wgEncodeAwgTfbsSydhK562CjunIfna6hUniPk</b><br />
Every track name includes the following elements:<br />
<ol style="text-align: left;">
<li>Text common to all: <b>wgEncodeAwgTfbs</b></li>
<li>Laboratories involved in generating the track: <b>Sydh</b> (Labs from Stanford,yale, USC, Harvard)<br />
</li>
<ul>
<li>Other possible values appearing in the track names are (<b>Haib, UChicago, Uta, UW</b>)</li>
</ul>
<li>Cell line used=<b>K562</b></li>
<ul>
<li>There are over 92 types of cell lines used. So, this part is highly variable. Details of the cell types used is available from this <a href="http://genome-test.cse.ucsc.edu/ENCODE/cellTypes.html">page</a>.</li>
</ul>
<li>Antibody used = <b>c-Jun</b></li>
<ul>
<li>There are over 190 antibodies used (=TFs probed). This is also a vairable part of the track name. In some cases, they have provided the catalog number of the antibody purchased.</li>
</ul>
<li>Treatment=<b>ifna6h </b>(Means cells are treated with IFNA for 6 hrs)<br />
</li>
<ul>
<li>This part gives details of the cell treatment. When cells are
not subjected to any treatement, there is no mention of this part.
Overall, there are 29 variables at this place. When the treatment is for 36h, it is taken as standard and not mentioned in the name of the track.</li>
</ul>
<li>Algorithm used=UniPk (Uniform peak calling). Common for every track!</li>
</ol>
I hope, this information is useful for others doing a similar analysis.<br />
<br />
Update (26th June 2014)<br />
<br />
Here are few more links that give additional information about the ENCODE transcription factor ChIP experiments.<br />
<ol style="text-align: left;">
<li> List of antibodies registered at Data coordination center (DCC), ENCODE is available at <span itemscope="" itemtype="http://schema.org/Answer"><span itemprop="text"> <a href="http://genome.ucsc.edu/ENCODE/antibodies.html">http://genome.ucsc.edu/ENCODE/antibodies.html</a>. This link provides the list of antibodies and their targets probed in ENCODE.</span></span></li>
<li><span itemscope="" itemtype="http://schema.org/Answer"><span itemprop="text">For those interested in the comprehensive list of transcription factors across various genomes this link may be useful. However these are predicted transcription factors and not human curated. <a href="http://www.bioguo.org/AnimalTFDB/index.php">http://www.bioguo.org/AnimalTFDB/index.php</a></span></span></li>
<li><span itemscope="" itemtype="http://schema.org/Answer"><span itemprop="text">For those who want know about the type of experiments performed in ENCODE, this page may be helpful (<a href="http://www.encodeproject.org/ENCODE/dataMatrix/encodeDataSummaryHuman.html">http://www.encodeproject.org/ENCODE/dataMatrix/encodeDataSummaryHuman.html</a>). This page also lists the number of experiments performed. Also lists the various ChIP-seq experiments.</span></span></li>
<li><span itemscope="" itemtype="http://schema.org/Answer"><span itemprop="text">Finally FAQs are available at <a href="http://www.encodeproject.org/ENCODE/FAQ/">http://www.encodeproject.org/ENCODE/FAQ/</a> </span></span></li>
<li><span itemscope="" itemtype="http://schema.org/Answer"><span itemprop="text">Human curated list of transcription factors is available at <a href="http://www.tfcat.ca/">http://www.tfcat.ca/</a> . Reference article for the same is at <a href="http://genomebiology.com/2009/10/3/R29">http://genomebiology.com/2009/10/3/R29</a></span></span> </li>
</ol>
</div>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-3698146072413007344.post-81292166580276909532014-03-26T17:12:00.002+05:302014-03-26T17:12:55.968+05:30CpG Island shelves<div dir="ltr" style="text-align: left;" trbidi="on">
Nowadays, regions with relatively lower CpG density are gaining importance in DNA methylation studies. This is based on the fact that a majority of CpG rich regions (CpG islands) are non-dynamic and less variant in terms of methylation status probed across a variety of tissues and cell populations (Irizarry 2009, Ziller 2013). It is now proven that methylation is more dynamic along the CpG shores (< 2kb flanking CpG Islands) and CpG shelves (<2kb flanking outwards from a CpG shore). While it is easy to retreive the genomic co-ordinates of CpG Islands from UCSC browser, public resources for retrieving the genomic co-ordinates for shores and shelfs were missing or scarce. Here, I am displaying the code (in R using Bioconductor package <i>GenomicRanges</i>) I use for generating objects for CpG islands, CpG island shores and CpG island shelves. The resulting objects are GRanges objects which could be used in multiple downstream applications in Bioconductor related packages.
<br />
<br />
<script src="https://gist.github.com/kalyankpy/d6b2896d9fb311f2faae.js"></script>
<br /></div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-3698146072413007344.post-17103911071516289262014-02-25T22:35:00.001+05:302014-04-28T14:50:47.392+05:30Create a GENCODE transcript database in R<div dir="ltr" style="text-align: left;" trbidi="on">
The following gist will help the researchers in creating the gencode transcript database using the bioconductor packages. I am assuming that the user's computer has preinstalled packages "GenomicRanges" and "GenomicFeatures". Following script has the following information:<br />
<ul style="text-align: left;">
<li>loads the needs bioconductor packages</li>
<li>gives information about creating the intermediate files needed for generating the database</li>
<li>brief explanation about each step in the procedure</li>
<li>create the transcript database, saving and loading when needed</li>
<li>extract information for each feature (gene, cds,transcript,exon,intron,intergenic regions) as 'GRanges' object, 'sort' when needed.</li>
<li>saves all the extracted features into combined object to be loaded in future</li>
</ul>
<script src="https://gist.github.com/kalyankpy/9fd4d5b13ff4ec3053c4.js"></script>
</div>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-3698146072413007344.post-61958403096560396802013-10-15T18:53:00.002+05:302014-10-21T13:56:58.392+05:30Reduced Representation Bisulfie Sequencing Data Analysis<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
I am writing this post to help researchers trying to analyse their own RRBS data. This is non-technical explanation. Just following the steps may help on any ubuntu/linux system.
Requirements on the computer:<br />
<ol style="text-align: left;">
<li>Bowtie (Manual and Source available from <a href="http://bowtie-bio.sourceforge.net/index.shtml">here</a>)</li>
<li>Human genome in fasta format (Download hg19.2bit from <a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/">here</a>)</li>
<li>Cutadapt (Manual and installation instructions are detailed <a href="http://code.google.com/p/cutadapt/">here</a>)</li>
<li>Trim galore (Perl script is available <a href="http://www.bioinformatics.babraham.ac.uk/projects/download.html#trim_galore">here</a>)</li>
<li>Bismark (Excellent resource is available <a href="http://www.bioinformatics.babraham.ac.uk/projects/bismark/Bismark_User_Guide.pdf">here</a>)</li>
<li>Methylkit (useful R package for downstream analysis on this <a href="http://code.google.com/p/methylkit/">google code page</a>)</li>
<li>Install all the above scripts/programs in the directories in your PATH or export it.</li>
</ol>
</div>
<script src="https://gist.github.com/kalyankpy/768cfd9b0196252e5834.js"></script>
</div>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-3698146072413007344.post-46623902441797435112013-05-04T23:28:00.002+05:302014-03-22T18:37:13.550+05:30Survival tips for biologists to navigate on linux <div dir="ltr" style="text-align: left;" trbidi="on">
As a biologist I noticed that there are few resources that teach programming or linux commands from the biologists view. Here, I try to summarize the basic linux commands needed to navigate the linux folder structure, getting help and other basic tasks. This information given below will be just enough for survival. For detailed learning you may refer to other sources available on internet.<br />
<br />
Basic linux tasks and related command for survival (commands are writtien in brackets):<br />
1. Getting help on any command (man) <br />
<script src="https://gist.github.com/kalyankpy/ceaec1ae975e96b93b93.js"></script>
2. Creating directories (mkdir)<br />
<script src="https://gist.github.com/kalyankpy/d388921c7bae5547865f.js"></script>
3. Changing directories (cd)<br />
<script src="https://gist.github.com/kalyankpy/7a18b7ec1dfa85ca2f40.js"></script>
4. Removing file/s or directory (rm)<br />
<script src="https://gist.github.com/kalyankpy/ec7f57b257f7282bd8ea.js"></script>
5. Copying file/s or directory (cp)<br />
<script src="https://gist.github.com/kalyankpy/1cff22d37bb7d41a4be8.js"></script>
6. Move the location of file/s or directory (mv)<br />
<script src="https://gist.github.com/kalyankpy/ee0165ab12edde687ba0.js"></script>
7. Rename a file or list of file/s (rename)
<script src="https://gist.github.com/kalyankpy/6a3a1800846b64da29f2.js"></script>
</div>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-3698146072413007344.post-79104773292621808482013-03-20T16:46:00.000+05:302013-10-15T18:38:23.453+05:30Scope of Reduced Representation Bisulfite Sequencing Data<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-size: large;">RRBS method explores the methylation status across the genome but at specific locations defined by the MspI recognition sites. These sites are mostly located within CpG Islands. So, how to visulaize the scope of the RRBS data - regions from where we can expect the methylation status in the human genome.</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">I show the localization of the CpG Island on the human genome in the following graphic. These are the most possible locations for the RRBS data sampling</span><br />
<span style="font-size: large;"><br /></span>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-size: large;"><a href="http://3.bp.blogspot.com/-Plsahj0U39g/UUmYMNr5cZI/AAAAAAAAGpM/NINa5KHO8IA/s1600/cpg_location_chromwise_hg19.jpeg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-Plsahj0U39g/UUmYMNr5cZI/AAAAAAAAGpM/NINa5KHO8IA/s1600/cpg_location_chromwise_hg19.jpeg" height="453" width="640" /></a></span></div>
<span style="font-size: large;">This is a Karyogram view of the CpG Island on human chromosomes. Chromosomes were plotted relative to their size. Each CpG island is denoted by a single dot at the relative position on the chromosome. CpG Island close to each other are perceived as a connecting line to the human eye (hundred of dot placed closely). This picture also gives us an Idea that certain parts of the chromosomes either lack CpG islands (chr1, chr13, chr14, chr15). This could be due to incomplete mapping of human genome on these chromosomes.</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">I made this graphic using ggbio and ggplot2.</span></div>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-12482504-2', 'chitka-kalyan.blogspot.fi');
ga('send', 'pageview');
</script>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-3698146072413007344.post-56009263022451726872013-03-20T15:57:00.000+05:302013-03-21T03:12:35.479+05:30 Methylation Status of CpG Islands Across Human Genome<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: left;">
<span style="font-size: large;">Researchers are aware that a majority of the CpG islands are unmethylated. How to represent this fact in a graphic? </span></div>
<div style="text-align: left;">
<span style="font-size: large;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-size: large;"><a href="http://1.bp.blogspot.com/-BfcZHIk_PbE/UUmJ5Jphs-I/AAAAAAAAGo8/5uNL1_cEoUw/s1600/Rplot.tiff" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img alt="" border="0" src="http://1.bp.blogspot.com/-BfcZHIk_PbE/UUmJ5Jphs-I/AAAAAAAAGo8/5uNL1_cEoUw/s1600/Rplot.tiff" height="453" title="Methylation Status of CpG Islands" width="640" /></a></span></div>
<div style="text-align: left;">
<span style="font-size: large;">The above picture created with ggplot2 explains us how the CpG methylation across the CpG Islands is distributed for each chromosome. Each CpG island is shown as a single dot(.). Methylation on the CpG island is identified by the color gradient. This explains that most of the CpG Islands are unmethylated (overlapped dots are seen as a line on the left side of the picture). Sparsely methylated CpG Islands can be identified as blue dots on the right side. Click the picture for a larger view.</span></div>
<div style="text-align: left;">
</div>
</div>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-3698146072413007344.post-20642472199049525762010-04-09T00:49:00.001+05:302013-10-15T18:38:32.771+05:30Starting again!Hey! I have changed my laptop. Now I have the brand new Acer 4740. It is having the i3 processor 330M, with 1033 MHz FSB (no other model in the market has this much speed as on 16th Mar 2010). I got it with out an OS installed so that I can experiment with the system.<br />
<br />
I tried installing Ubuntu 9.10 today and faced a list of peculiar problems which I could solve successfully. These pages, I write to help others in troubleshooting Ubuntu on Acer 4740 <br />
<br />
Problems: <br />
<ol><li>Graphics is not compatible with the i3 processor (monitor becomes blank)</li>
<ul><li>Troubleshoot by updating the kernel version to the most recent one</li>
<li>Change the X11 configuration file from 'vesa' to 'intel'</li>
</ul></ol> 2. Empathy doesnt support video chat<br />
<ul><ul><li>install latest version of telepathy, the problem is solved if not, try lucview installation</li>
</ul></ul> 3. U don't find a standalone application for taking photos from your in-built webcam<br />
<ul><ul><li>Say <i>cheese</i>. I mean install this application.</li>
</ul></ul><ul></ul> 3. Mic doesnt work<br />
<ul><ul><li>Install new version of alsa driver (www.alsa-project.org)</li>
<li>while installing alsa driver the installation might stop suddenly showing some error 'patch command not found'</li>
<li>just install patch. its done. </li>
</ul></ul>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-12482504-2', 'chitka-kalyan.blogspot.fi');
ga('send', 'pageview');
</script>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-3698146072413007344.post-78222035752947265992010-01-26T11:49:00.001+05:302010-01-26T12:17:20.976+05:30Trying to get away from piracy!I have finished my PhD now and not doing much work in lab. I thank my boss for not putting any pressure on me regarding lab work. Then what am I doing nowadays? This is the best time for to try and satisfy my long standing wishes- to create website for me, learn linux and computing skills. I have go the tata photon+ connection which is faster than the internet we are provided in the lab. So, I am now completely dependant on it for all the stuff I am doing nowadays!<br />
<br />
I shall detail my experience with Ubuntu 9.10 (Karmic Koala). I have decided that I should not be using any pirated software once I leave India. To make it happen I need to learn the alternatives - Linux OS and the Openoffice! I decided to try Ubuntu which I got from my colleague. I started installing it on my Travelmate (1.5 GHz, 1.25 GB Ram and Peltium M 715 processor). I was impressed with the speed at which it got installed - just 45 min. In this short span (compare it with the windows installation which is almost 1-1.30 hr on a computer with similar configuration) it installed all the softwares necessary for the day to day life. The list of softwares include Openoffice - the potential rival for MSoffice, pdf reader, audio and video players, lot of games, firefox.<br />
<br />
I was charmed by the speed and ease of installation that has almost everything. There is a lot to tell about Linux and Ubuntu particularly. I will continue in my next post.Unknownnoreply@blogger.comtag:blogger.com,1999:blog-3698146072413007344.post-56246345220461173342010-01-13T18:46:00.002+05:302010-01-15T14:44:22.401+05:30<div style="text-align: justify;">Hi! I am Kalyan. I have finished my PhD and am searching for a post-doctoral positon. I wish to be part of an interdisciplinary group working at the interface of molecular biology, cell biology and bioinformatics/computational biology. I prefer a university/institutional environment that will give me an oppurtunity to interact with the young and expose me to the innovative and creative minds. I wish to be part of a research team that demands independant design and execution of experiments, problem solving ability, written and oral presentation skills. I do have a flair for teaching!If you are looking for my website please <a href="http://www.kalyankpy.in/">click here</a> or type <b><a href="http://www.kalyankpy.in/">www.kalyankpy.in</a> </b>in the address bar<br />
</div>Unknownnoreply@blogger.com0