About DMS MaP experiments


DMS has emerged as one of the pre-eminent choices for RNA structure determination. DMS can be added to cells, tissues, or in vitro solution, and it rapidly and specifically modifies solvent accessible adenines and cytosines at their Watson–Crick base-pairing positions. In standard experimental conditions, an accessible nucleotide has ~2% chance of reacting with DMS, which results in multiple DMS modifications per single RNA molecule. The DMS modifications therefore report on the folding of each individual RNA molecule.

DMS mutational profiling with sequencing (DMS-MaPseq), encodes DMS modifications as mismatches that get incorporated during reverse transcription by a thermostable group II intron reverse transcriptase (TGIRT). Due to the high fidelity of TGIRT, the background incorporation of mismatches is typically lower than sequencing error. Thus, the observed rate of a mismatch at a given nucleotide is directly proportional to its DMS reactivity. A big advantage of DMS-MaPseq is that any RNA of interest can be targeted for library generation and analyses using sequence specific primers.

About the MaP analysis


The MaP analysis web tool provides a simple platform for analyzing DMS-reactivity of an RNA. The user input is a raw sequencing file (.fastq) generated from a DMS-MaPseq experiment, and a sequence of the RNA of interest (.fasta). The DREEM algorithm performs sequence alignment using bowtie-2 and outputs the mismatch rate per nucleotide.

Here we give a brief explanation for what is being done under the hood. The code is freely available and can be downloaded hereEach run generates three directories: input, output, and log

FastQC
First FastQC is run. FastQC generates a substantial breakdown of the quality of the supplied fastq files.

fastqc --extract fq1 fq2 --outdir=output/Mapping_Files/

Here we run fastqc at command line where fq1 and fq2 are the two supplied fastq files. If only one is supplied the we leave out the fq2 argument. It is recommended to supply both the forward and reverse reads if available as they give additional confidence to nucleotide identity.

fastqc generates an html file for each supplied fastq file see the below image

Trim Galore!
Trim Galore is a wrapper script to trim the sequence adapter which is first 13 bp of Illumina standard adapters ('AGATCGGAAGAGC'). We use it to remove the adapter sequence to avoid bad aligns to the standard adapter.

trim_galore --fastqc --paired fq1 fq2 -o output/Mapping_Files/

Here we run Trim Galore at command line where fq1 and fq2 are the two supplied fastq files.

Bowtie2
Bowtie2 is a sequencer aligner. It is used to align each illumina sequencing read to the target reference sequences supplied in the fasta file.

There are two discrete steps required to run bowtie2. First using bowtie2-build, bowtie2 will create an index of each reference sequence in the supplied fasta file. This is required for aligning. The below command puts these index file in the input/ directory

bowtie2-build test.fasta input/test

Next we run the bowtie2 alignment command. Bowtie2 has a ton of options and these are the ones that give us the best results.

bowtie2 --local --no-unal --no-discordant --no-mixed -X 1000 -L 12 -x input/test -1 fq1 -2 fq2 -S output/Mapping_Files/aligned.sam

--local runs bowtie2 in local mode which allows for a read to match to part of a reference sequence.
--no-unal disallows reads that do not align to any of the reference sequences to be included in the final sequence and alignment file (SAM).
--no-discordant removes reads that do not cordantly align. Here is a full definition of concordant pairs.
--no-mixed removes combination of reads if no cordant alignments are possible.
-X 1000 allows for gaps between reads
-L 12 use a length of 12 nt sequence for seeding.


How to predict a secondary structure with MaP results?


Each job generates a structural constraint file for each sequence in the supplied fasta. These files end with "_struc_constraint.txt". You can use the RNAStructure software package (https://rna.urmc.rochester.edu/RNAstructure.html) to predict a secondary structure with a sequence and this file. if you have the software installed then run

Fold -m 3 test.fasta -dms test_struc_constraint.txt out.ct
ct2dot out.ct out.db

Where test.fasta contains a single sequence of interest.
test_struc_constraint.txt is the outputed structural constraint file

You can also use their webserver
Upload your fasta file containing a single sequence to the field "Select Sequence File:"
Upload the constraint file generated in "Select SHAPE Constraints File:". This will work even if you have DMS

Who to contact


Questions about DREEM, MaP analysis, and DMS-MaPseq please contact Silvi Rouskin (silvi@hms.harvard.edu)
Question about the server please contact Joseph Yesselman (jyesselm@unl.edu)