`r knitr::opts_chunk$set(cache=TRUE)` Comparison of RNA- and LNA-Hybrid Oligonucleotides in Template-Switching Reactions ================================================================================== This supplementary file documents the commands run to compare the expression levels of CAGE clusters in libraries made with different template-switching oligonucleotides. It is also intended as a self-executable tutorial exemplifying how to make differential gene expression analysis with nanoCAGE in general. Table of contents ----------------- * [Data download and preparation](#data-prep) * [Artifact cleaning and alignment of the reads](#artifact-cleaning-and-alignment) * [Tag clustering](#clustering) * [Annotation](#annotation) * [Preparation for statistical analysis](#r-prep) * [Differential representation analysis](#differential-expression) * [LNA vs. RNA](#LNA-RNA) * [RNA vs. DNA](#RNA-DNA) * [DNA vs. LNA](#DNA-LNA) * [Output of the results as tables](#make-tables) * [Notes on the software](#notes-on-software) Data download and preparation --------------------------------------------------- The following commands are run in a command-line terminal on Unix systems. The nanoCAGE library is named `NCms10010`. To better re-use commands, this name is put in an environment variable. ```{r export_library, engine="bash"} export LIBRARY=NCms10010 ``` ### Information and download The `NCms10010` library was made with total RNA from rat muscle. It is comparing template-switching oligonucleotides that end in RNA (r), DNA (d) or LNA (l) bases. The comparisons were multiplexed in triplicates. The data is a single-end MiSeq run (ID: `121012_M00528_0022_AMS2003997-00050`) of 4,682,200 reads. The output files (`s_G1_L001_R1_001.fastq.gz` and `s_G1_L001_R1_002.fastq.gz` were decompressed, concatenated, and re-compressed with `xz`. The resulting file is available for download from the supplementary material, and from RIKEN. Deposition to DDBJ is in progress. ```{r download_material, engine="bash", dependson='export_library'} wget http://genome.gsc.riken.jp/plessy-20130430/$LIBRARY.fastq.xz echo '47b797f36cd20d6548e80e09dc05daa1 NCms10010.fastq.xz' | md5sum -c ``` ### De-multiplexing The barcodes and sample IDs are associated in a whitespace-delimited file called `NCms10010.id` containing the following. The sample IDs encode the type of template-switching oligonucleotide used, which each letter representing the chemical nature of the third, second and first nucleotide, from the 3′ end. ```{r create_ID_file, engine="bash"} cat <<__ID__ > $LIBRARY.id rrr_1 CACTGA rrr_2 GCTCTC rrr_3 TCGCGT ddd_1 ATCGTG ddd_2 CACGAT ddd_3 GTATAC ddl_1 ACAGAT ddl_2 CTGACG ddl_3 GAGTGA dll_1 AGTAGC dll_2 GCTGCA dll_3 TCGAGC lll_1 ATCATA lll_2 CGATGA lll_3 TATAGC __ID__ ``` Samples were demultiplexed with [FASTX-toolkit](http://hannonlab.cshl.edu/fastx_toolkit/). The number of extracted reads was collected in a file named `NCms10010.extracted.log`. ```{r demultiplex, engine="bash", dependson=c('download_material','create_ID_file')} xzcat $LIBRARY.fastq.xz | fastx_barcode_splitter.pl --bcfile $LIBRARY.id --prefix $LIBRARY. --suffix .fq --bol --exact | sed 1d | cut -f1,2 | perl -ne 'print "extracted\t$_"' | grep -v -e unmatched -e total | tee $LIBRARY.extracted.log ``` ### Trimming reads to 31 nt The reads start with 6 bases of barcode, 8 bases of random fingerprint, 4 bases of spacer, and 3 base of linker, that are all removed in the following command. The reads were also trimmed in 3′ to ensure that the results were comparable with a reference HiSeq run (not covered in this document). ```{r trim_reads, engine="bash", dependson='demultiplex'} for FASTQ in *.fq do fastx_trimmer -f 22 -l 52 -Q33 < $FASTQ | sponge $FASTQ done ``` Artifact cleaning and alignment of the reads ------------------------------------------------------------------------------------------ ### Removal of artifacts with TagDust Download [TagDust](http://bioinformatics.oxfordjournals.org/content/25/21/2839) and install it in the user's path. ```{r tagdust, engine="bash", dependson='trim_reads'} cat > tagdust.fa <<__TagDust__ >TS (before barcode) TAGTCGAACTGAAGGTCTCCAGCA >RT (without random bases) TAGTCGAACTGAAGGTCTCCGAACCGCTCTTCCGATCT >empty (TS linker + RT reverse-complemented) TATAGGGAGATCGGAAGAGCGGTTCGGAGACCTTCAGTTCGACTA __TagDust__ for ID in $( awk '{print $1}' $LIBRARY.id ) do echo -ne "tagdust\t$ID\t" tagdust tagdust.fa N*.$ID.fq -o $LIBRARY.$ID.dusted.fastq 2>&1 | grep -e rejected | cut -f1 done | tee $LIBRARY.tagdust.log ``` ### Alignment on the rat genome version 4 The following assumes the genome downloaded and indexed for BWA in the current directory, using `rn4_male` as a base name. ```{r align, engine="bash", dependson='tagdust'} GENOME=rn4_male for FQ in *dusted.fastq do bwa aln -t8 $GENOME -f $(basename $FQ .dusted.fastq).sai $FQ bwa samse $GENOME $(basename $FQ .dusted.fastq).sai $FQ | samtools view -uS - | samtools sort - $(basename $FQ .dusted.fastq) done ``` ### Filter reads aligning to rDNA with rRNAdust `rRNAdust` is available in the supplementary material at . Download the reference rRNA sequences at and , and save them in a file called `rat_rDNA.fa`. ```{r rrnadust, engine="bash", dependson='align'} for ID in $( awk '{print $1}' $LIBRARY.id ) do echo -ne "rdna\t$ID\t" (rRNAdust -t8 rat_rDNA.fa $LIBRARY.$ID.bam | samtools view -bS - 2> /dev/null | sponge $LIBRARY.$ID.bam) 2>&1 | sed 's/Excluded: //' done | tee $LIBRARY.rdna.log ``` ### Alignment statistics ```{r align_stats, engine="bash", dependson='rrnadust'} for ID in $( awk '{print $1}' $LIBRARY.id ) do echo -ne "mapped\t$ID\t" (samtools flagstat $LIBRARY.$ID.bam | grep mapped | grep %) | cut -f1 -d' ' done | tee $LIBRARY.mapped.log ``` ### Filter out possible strand-invasion tags Strand invasion was described in [Tang et al, Nucl. Acids Res. (2013) 41 (3):e44](http://nar.oxfordjournals.org/content/41/3/e44), and is more frequent in LNA- or DNA-based template-switching oligonucleotides. The following commands need an updated version of the `find_strand_invasion.pl` that was in Tang et al's supplementary material, where a new `-f` option is added, with the same semantics as in `samtools`. It is available at . ```{r strand_invasion, engine="bash", dependson='rrnadust'} ERRORS=2 for BAM in $LIBRARY.???_?.bam do find_strand_invasion.pl -f $BAM -g rn4_male.fa -e $ERRORS -s TATA done for ID in $( awk '{print $1}' $LIBRARY.id ) do echo -ne "strand-invasion-$ERRORS\t$ID\t" (samtools flagstat ${LIBRARY}.${ID}_nw_${ERRORS}_??????_removed_sorted.bam | grep mapped | grep %) | cut -f1 -d' ' done | tee $LIBRARY.strand-invasion-$ERRORS.log ``` Tag clustering --------------------------------------- Level 1 clusters are single-nucleotide resolution data representing the 5′ ends of the CAGE tags. Level 2 clusters are groups of level 1 clusters that are separated by 20 or less nucleotides. The `level1.py` and `level2.py` scripts implement tag clustering like in the [FANTOM3](http://fantom.gsc.riken.jp/3/) and [FANTOM4](http://fantom.gsc.riken.jp/4/) projects, and are available at . They output their results in Order Switchable Column ([OSC](http://sourceforge.net/projects/osctf/)) format, where each line is a cluster, and each library gives one column counting the tags in the clusters, and another column where the counts are normalised in parts per million. See also [Carninci et al., Nature Genetics 38 626-635 (2006)](http://www.nature.com/ng/journal/v38/n6/full/ng1789.html) and [Suzuki et al., Nature Genetics 41 553-562 (2009)](http://www.nature.com/ng/journal/v41/n5/full/ng.375.html) for original examples of CAGE tag clustering. ```{r tag_clustering, engine="bash", dependson='strand_invasion'} level1.py -o $LIBRARY.l1.osc.gz -F 516 \ NCms10010.???_?.bam \ NCms10010.???_?_nw_?_??????_filtered_sorted.bam level2.py -o $LIBRARY.l2.osc -t 0 $LIBRARY.l1.osc.gz gzip $LIBRARY.l2.osc ``` The resulting file NCms10010.l1.osc.gz can be loaded in the [Zenbu](http://fantom.gsc.riken.jp/zenbu/) system to browse the alignments on the rat genome. Annotation ----------------------------------- ### Preparation of the reference files. Data from ENSEMBL 69 were retrieved via [BioMart](http://oct2012.archive.ensembl.org/biomart/martview/), with the following XML query. Note that `external_gene_id` is called _Associated Gene Name_ in the web interface. ``` ``` Convert coordinates to gene names ```{r convert_biomart, engine="bash"} cat mart_export.txt | sed -e 1d -e 's|\t\t|\tno_symbol\t|' | awk '{OFS="\t"} {print $3, $6, $7, $5, 0, $4}' | grep -v ^[JA] | uniq | sed -e 's/-1$/-/' \ -e 's/1$/+/' \ -e 's/^/chr/' | sort -k1,1 -k2,2n \ > rn4_male.symbols.bed ### Gene symbols ```{r gene_symbols, engine="bash", dependson='convert_biomart'} zcat $LIBRARY.l1.osc.gz | grep -v \# | sed 1d | awk '{OFS="\t"}{print $2, $3, $4, "l2", "1000", $5}' > $LIBRARY.l1.bed zcat $LIBRARY.l2.osc.gz | grep -v \# | sed 1d | awk '{OFS="\t"}{print $2, $3, $4, "l2", "1000", $5}' > $LIBRARY.l2.bed zcat $LIBRARY.l2.osc.gz | grep -v \# | sed 1d | awk '{OFS="\t"}{print $2, $3, $4, "l2", "1000", $5}' > $LIBRARY.l2.bed bedtools intersect -a $LIBRARY.l2.bed -b rn4_male.symbols.bed -s -loj | awk '{OFS="\t"}{print $1":"$2"-"$3$6,$10}' | bedtools groupby -g 1 -c 2 -o distinct > $LIBRARY.l2.genes ``` ### Repeated elements Download the repeatmasker track from the [UCSC genome browser](http://genome.ucsc.edu) and save it in a file called `rn4_male.repeatmasker.bed`. ```{r intersect_rmsk, engine="bash"} bedtools intersect -a $LIBRARY.l2.bed -b rn4_male.repeatmasker.bed -s -loj | awk '{OFS="\t"}{print $1":"$2"-"$3$6,$10}' | bedtools groupby -g 1 -c 2 -o distinct > $LIBRARY.l2.rmsk ``` Preparation for statistical analysis --------------------------------------------------------- The following commands are run in the [`R` package for statistical computing](http://www.r-project.org/). ### Load the data. The oscR library is available at . The following commands load the level 1 and 2 clusters into data frames where the column names correspond to the sample IDs defined above. ```{r load_data, dependson='tag_clustering'} LIBRARY <- system("echo $LIBRARY", intern=TRUE) library(oscR) l1 <- read.osc(paste(LIBRARY,'l1','osc', 'gz',sep='.'), drop.coord=T, drop.norm=T) l2 <- read.osc(paste(LIBRARY,'l2','osc', 'gz',sep='.'), drop.coord=T, drop.norm=T) colnames(l1) <- sub('raw.NCms1.....','',colnames(l1)) colnames(l2) <- sub('raw.NCms1.....','',colnames(l2)) colnames(l1) <- sub('_......_filtered_sorted', '', colnames(l1)) colnames(l2) <- sub('_......_filtered_sorted', '', colnames(l2)) ``` ### Organise the data. The following commands defined convenient shortcuts to manipulates groups of libraries. The presence of `nw_2` in the names indicate that strand-invasion artifacts have been removed. ```{r define_aliases} ddd <- c('ddd_1', 'ddd_2', 'ddd_3') ddl <- c('ddl_1', 'ddl_2', 'ddl_3') dll <- c('dll_1', 'dll_2', 'dll_3') lll <- c('lll_1', 'lll_2', 'lll_3') rrr <- c('rrr_1', 'rrr_2', 'rrr_3') all <- c(rrr, lll, dll, ddl, ddd) ddd_nw_2 <- c('ddd_1_nw_2', 'ddd_2_nw_2', 'ddd_3_nw_2') ddl_nw_2 <- c('ddl_1_nw_2', 'ddl_2_nw_2', 'ddl_3_nw_2') dll_nw_2 <- c('dll_1_nw_2', 'dll_2_nw_2', 'dll_3_nw_2') lll_nw_2 <- c('lll_1_nw_2', 'lll_2_nw_2', 'lll_3_nw_2') rrr_nw_2 <- c('rrr_1_nw_2', 'rrr_2_nw_2', 'rrr_3_nw_2') all_nw_2 <- c(rrr_nw_2, lll_nw_2, dll_nw_2, ddl_nw_2, ddd_nw_2) ``` ```{r calculate_means_and_SDs, dependson='load_data'} TPM <- function(clusters){ clusters.tpm <- data.frame(prop.table(as.matrix(clusters),2) * 1000000) colnames(clusters.tpm) <- colnames(clusters) return(clusters.tpm) } L2 <- TPM(l2) L2.means <- data.frame( ddd = apply(L2[,ddd],1,mean), ddl = apply(L2[,ddl],1,mean), dll = apply(L2[,dll],1,mean), lll = apply(L2[,lll],1,mean), rrr = apply(L2[,rrr],1,mean) ) L2.means_nw_2 <- data.frame( ddd = apply(L2[,ddd_nw_2],1,mean), ddl = apply(L2[,ddl_nw_2],1,mean), dll = apply(L2[,dll_nw_2],1,mean), lll = apply(L2[,lll_nw_2],1,mean), rrr = apply(L2[,rrr_nw_2],1,mean) ) L2.sd <- data.frame( ddd = apply(L2[,ddd],1,sd), ddl = apply(L2[,ddl],1,sd), dll = apply(L2[,dll],1,sd), lll = apply(L2[,lll],1,sd), rrr = apply(L2[,rrr],1,sd) ) ``` ### Annotation of the results ```{r load_gene_symbols, dependson=c('gene_symbols','intersect_rmsk')} genesymbols <- read.table(paste(LIBRARY,'l2','genes',sep='.'), col.names=c("cluster","symbol")) rownames(genesymbols) <- genesymbols$cluster genesymbols$rmsk <- read.table(paste(LIBRARY,'l2','rmsk',sep='.'), col.names=c("cluster","rmsk"))[,'rmsk'] ``` Differential representation analysis ------------------------------------------------------------------------- Statistical comparisons using [edgeR](http://www.bioconductor.org/packages/release/bioc/html/edgeR.html). ```{r load_edgeR, message=FALSE, dependson='load_data'} library(edgeR) ``` The following plots represent: * the multidimensional scaling of the samples, where spatial separation between the two sets of triplicates indicates that the factor that is compared (type of template-switching oligonucleotide) introduces more differences that the simple technical fluctuations, * the expression levels of the CAGE clusters as a M-A plot, where dots in red are clusters significantly enriched in one type of libraries. Vertical distance from the horizontal midline represent the amplitude of the differences, and distance on the horizontal axis represents the average strength of expression. The following comparisons show the difference (or lack of it) between non-filtered and filtered data, and then explore the filtered data in more details. ### LNA vs. RNA ```{r LNA_vs_RNA, dev='png'} x <- DGEList(counts=l2[,c(lll,rrr)], group=c(rep('lll',3),rep('rrr',3)), remove.zeros=TRUE) x <- calcNormFactors(x) x <- estimateCommonDisp(x) x <- estimateTagwiseDisp(x) x.com <- exactTest(x) lr <- x lr.com <- x.com plotMDS(lr) plotSmear(lr.com, de.tags=rownames(lr.com)[decideTestsDGE(lr.com) != 0], cex=0.8, main='LNA / RNA', ylab="LNA (bottom) / RNA (top)") x <- DGEList(counts=l2[,c(lll_nw_2,rrr_nw_2)], group=c(rep('lll_nw_2',3),rep('rrr_nw_2',3)), remove.zeros=TRUE) x <- calcNormFactors(x) x <- estimateCommonDisp(x) x <- estimateTagwiseDisp(x) x.com <- exactTest(x) lr_nw_2 <- x lr_nw_2.com <- x.com plotMDS(lr_nw_2) plotSmear(lr_nw_2.com, de.tags=rownames(lr_nw_2.com)[decideTestsDGE(lr_nw_2.com) != 0], cex=0.8, main='LNA / RNA (filtered)', ylab="LNA (bottom) / RNA (top)") ``` ```{r compare_LNA_RNA_numbers, results="hide"} lr_nw_2.up <- sum(decideTestsDGE(lr_nw_2.com) > 0) lr_nw_2.down <- sum(decideTestsDGE(lr_nw_2.com) < 0) ``` `r lr_nw_2.up` clusters were enriched and `r lr_nw_2.down` were depleted in RNA libraries compared to LNA. The top 100 stronger fold changes in each direction are shown below. ```{r compare_LNA_RNA, dependson=c('load_gene_symbols' ,'LNA_vs_RNA,'), echo=-1} options(width=150) # Summary of the top 100 clusters enriched in RNA libraries. summary(merge(subset(topTags(lr_nw_2.com, Inf)$table, logFC > 0)[1:100,], genesymbols[,-1], by=0, sort=FALSE)) # Top 15 clusters enriched in RNA libraries. merge(subset(topTags(lr_nw_2.com, Inf)$table, logFC > 0)[1:15,], genesymbols[,-1], by=0, sort=FALSE) # Summary of the top 100 clusters enriched in LNA libraries. summary(merge(subset(topTags(lr_nw_2.com, Inf)$table, logFC < 0)[1:100,], genesymbols[,-1], by=0, sort=FALSE)) ``` Positive fold change indicate enrichment in RNA libraries. The 7SLRNA hits are concentrated at the top of the list. LNA libraries are enriched for hits on CAGAGA repeats, even beyond the significance level (FDR) of the statistical comparison. ```{r srprna, dependson='load_gene_symbols', dev='svg'} library(reshape) library(ggplot2) srprna <- rownames(subset(genesymbols, rmsk=="7SLRNA")) srprna.expression <- melt(L2[srprna, c(rrr_nw_2, ddd_nw_2, lll_nw_2)], measure.vars=c(rrr_nw_2, ddd_nw_2, lll_nw_2)) for (group in c('ddd','lll','rrr')) srprna.expression[grep(group,srprna.expression$variable),'group'] <- group srprna.expression$group <- reorder(factor(srprna.expression$group), srprna.expression$value, function(x) sum(x) * -1) qplot(data=srprna.expression, value, reorder(variable, value, sum), xlab="Parts per million", ylab="Library", main="Expression levels of 7SL RNA genes", col=group) ``` The measured expression of 7SL RNA is strongest in RNA libraries, strong in DNA libraries, and weak in LNA libraries. ### RNA vs. DNA ```{r RNA_vs_DNA, dev='png'} x <- DGEList(counts=l2[,c(rrr,ddd)], group=factor(c(rep("rrr", 3), rep("ddd", 3)), levels=c("rrr", "ddd")), remove.zeros=TRUE) x <- calcNormFactors(x) x <- estimateCommonDisp(x) x <- estimateTagwiseDisp(x) x.com <- exactTest(x) rd <- x rd.com <- x.com plotMDS(x) plotSmear(rd.com, de.tags=rownames(rd.com)[decideTestsDGE(rd.com) != 0], cex=0.8, main='RNA / DNA', ylab="RNA (bottom) / LNA (top)") x <- DGEList(counts=l2[,c(rrr_nw_2,ddd_nw_2)], group=factor(c(rep("rrr_nw_2", 3), rep("ddd_nw_2", 3)), levels=c("rrr_nw_2", "ddd_nw_2")), remove.zeros=TRUE) x <- calcNormFactors(x) x <- estimateCommonDisp(x) x <- estimateTagwiseDisp(x) x.com <- exactTest(x) plotMDS(x) rd_nw_2 <- x rd_nw_2.com <- x.com plotSmear(rd_nw_2.com, de.tags=rownames(rd_nw_2.com)[decideTestsDGE(rd_nw_2.com) != 0], cex=0.8, main='RNA / DNA (filtered)', ylab="RNA (bottom) / DNA (top)") ``` ```{r compare_RNA_DNA_numbers, echo=FALSE} rd_nw_2.up <- sum(decideTestsDGE(rd_nw_2.com) > 0) rd_nw_2.down <- sum(decideTestsDGE(rd_nw_2.com) < 0) ``` `r rd_nw_2.down` clusters were enriched and `r rd_nw_2.up` were depleted in RNA libraries compared to DNA. The majority of the clusters in the top 100 enriched in the RNA libraries did not overlap with repeat elements, and were overlapping with loci having gene symbols. In contrast, the majority of the top 100 clusters enriched in the DNA libraries did not overlap with known genes. A mild enrichment for GGGTG simple repeats is noted. ```{r compare_RNA_DNA, dependson=c('load_gene_symbols' ,'RNA_vs_DNA,')} # Summary of the top 100 clusters enriched in DNA libraries. summary(merge(subset(topTags(rd_nw_2.com, Inf)$table, logFC > 0)[1:100,], genesymbols[,-1], by=0, sort=FALSE)) # Summary of the top 100 clusters enriched in RNA libraries. summary(merge(subset(topTags(rd_nw_2.com, Inf)$table, logFC < 0)[1:100,], genesymbols[,-1], by=0, sort=FALSE)) ``` ### DNA vs. LNA ```{r DNA_vs_LNA, dev='png'} x <- DGEList(counts=l2[,c(ddd,lll)], group=c(rep('ddd',3),rep('lll',3)), remove.zeros=TRUE) x <- calcNormFactors(x) x <- estimateCommonDisp(x) x <- estimateTagwiseDisp(x) x.com <- exactTest(x) dl <- x dl.com <- x.com plotMDS(x) plotSmear(dl.com, de.tags=rownames(dl.com)[decideTestsDGE(dl.com) != 0], cex=0.8, main='DNA / LNA', ylab="DNA (bottom) / LNA (top)") x <- DGEList(counts=l2[,c(ddd_nw_2,lll_nw_2)], group=c(rep('ddd_nw_2',3),rep('lll_nw_2',3)), remove.zeros=TRUE) x <- calcNormFactors(x) x <- estimateCommonDisp(x) x <- estimateTagwiseDisp(x) x.com <- exactTest(x) plotMDS(x) dl_nw_2 <- x dl_nw_2.com <- x.com plotSmear(dl_nw_2.com, de.tags=rownames(dl_nw_2.com)[decideTestsDGE(dl_nw_2.com) != 0], cex=0.8, main='DNA / LNA (filtered)', ylab="DNA (bottom) / LNA (top)") ``` ```{r compare_DNA_LNA_numbers, echo=FALSE} dl_nw_2.up <- sum(decideTestsDGE(dl_nw_2.com) > 0) dl_nw_2.down <- sum(decideTestsDGE(dl_nw_2.com) < 0) ``` `r dl_nw_2.up` clusters were enriched and `r dl_nw_2.down` were depleted in LNA libraries compared to DNA. After filtering out strand-invasion artifacts, only few significant differences remain between the DNA and LNA libraries. 7SLRNA was also depleted in LNA libraries, and CAGAGA repeats were enriched. ```{r compare_DNA_LNA, dependson=c('load_gene_symbols' ,'DNA_vs_LNA,')} summary(decideTestsDGE(dl_nw_2.com)) summary(merge(subset(topTags(dl_nw_2.com, Inf)$table, logFC > 0)[1:100,], genesymbols[,-1], by=0, sort=FALSE)) summary(merge(subset(topTags(dl_nw_2.com, Inf)$table, logFC < 0)[1:100,], genesymbols[,-1], by=0, sort=FALSE)) ``` ```{r save_workspace, cache=FALSE} save.image('analysis.RData') ``` Output of the results as tables -------------------------------------------------------- ```{r make-tables, dependson=c('compare_LNA_RNA', 'compare_RNA_DNA', 'compare_DNA_LNA')} # One table per list of significantly over-represented clusters. write.csv(file='R-L.csv', merge(subset(topTags(lr_nw_2.com, Inf)$table, logFC > 0 & FDR < 0.1), genesymbols[, -1], by = "row.names", all.x='T', sort = FALSE), row.names=FALSE) write.csv(file='L-R.csv', merge(subset(topTags(lr_nw_2.com, Inf)$table, logFC < 0 & FDR < 0.1), genesymbols[, -1], by = "row.names", all.x='T', sort = FALSE), row.names=FALSE) write.csv(file='D-R.csv', merge(subset(topTags(rd_nw_2.com, Inf)$table, logFC > 0 & FDR < 0.1), genesymbols[, -1], by = "row.names", all.x='T', sort = FALSE), row.names=FALSE) write.csv(file='R-D.csv', merge(subset(topTags(rd_nw_2.com, Inf)$table, logFC < 0 & FDR < 0.1), genesymbols[, -1], by = "row.names", all.x='T', sort = FALSE), row.names=FALSE) write.csv(file='L-D.csv', merge(subset(topTags(dl_nw_2.com, Inf)$table, logFC > 0 & FDR < 0.1), genesymbols[, -1], by = "row.names", all.x='T', sort = FALSE), row.names=FALSE) write.csv(file='D-L.csv', merge(subset(topTags(dl_nw_2.com, Inf)$table, logFC < 0 & FDR < 0.1), genesymbols[, -1], by = "row.names", all.x='T', sort = FALSE), row.names=FALSE) # One summary table combining all the results. l2[rownames(topTags(lr_nw_2.com, Inf)), "LR.logFC"] <- topTags(lr_nw_2.com, Inf)$table$logFC l2[rownames(topTags(lr_nw_2.com, Inf)), "LR.FDR"] <- topTags(lr_nw_2.com, Inf)$table$FDR l2[rownames(topTags(rd_nw_2.com, Inf)), "RD.logFC"] <- topTags(rd_nw_2.com, Inf)$table$logFC l2[rownames(topTags(rd_nw_2.com, Inf)), "RD.FDR"] <- topTags(rd_nw_2.com, Inf)$table$FDR l2[rownames(topTags(dl_nw_2.com, Inf)), "DL.logFC"] <- topTags(dl_nw_2.com, Inf)$table$logFC l2[rownames(topTags(dl_nw_2.com, Inf)), "DL.FDR"] <- topTags(dl_nw_2.com, Inf)$table$FDR l2$symbol <- genesymbols$symbol l2$rmsk <- genesymbols$rmsk write.csv(file=paste(LIBRARY, "DGE", "csv", sep = "."), l2) ``` Notes on the software ----------------------------------------------------- This analysis was done on a iMac with a i7 hyperthreaded quad-core CPU (2.93 GHz) and 12 GiB of memory, running [Debian](http://www.debian.org) system, with the following packages installed. - bedtools 2.17.0-1 - bwa 0.6.2-2 - fastx-toolkit 0.0.13.2-1 - moreutils 0.47 (for the `sponge` command) - r-base (see below) - r-bioc-edgeR (see below) - samtools 0.1.19-1 This tutorial was made with the [knitr](http://yihui.name/knitr/) library for `R`, that produces HTML pages from templates containing executable code. ```{r session_info, cache=FALSE} sessionInfo() ```