Sam’s Notebook: Bioinformatics – Trimmomatic/FASTQC on C.gigas Larvae OA NGS Data

Below is a from Sam’s Lab Notebook and provides some insight into some of our genomic focused ocean acidification studies.

Previously trimmed the first 39 bases of sequence from reads from the BS-Seq data in an attempt to improve our ability to map the reads back to the C.gigas genome. However, Mac (and Steven) noticed that the last ~10 bases of all the reads showed a steady increase in the %G, suggesting some sort of bias (maybe adaptor??):

Although I didn’t mention this previously, the figure above also shows an odd “waves” pattern that repeats in all bases except for G. Not sure what to think of that…

Quick summary of actions taken (specifics are available in Jupyter notebook below):

  • Trim first 39 bases from all reads in all raw sequencing files.
  • Trim last 10 bases from all reads in raw sequencing files
  • Concatenate the two sets of reads (400ppm and 1000ppm treatments) into single FASTQ files for Steven to work with.

Raw sequencing files:

Notebook Viewer: 20150521_Cgigas_larvae_OA_Trimmomatic_FASTQC

Jupyter (IPython) notebook: 20150521_Cgigas_larvae_OA_Trimmomatic_FASTQC.ipynb

 

jQuery(function($){
var target_height = $(window).height();
$(“iframe.iframe-class”).height(target_height);
});


http://ift.tt/1IPAeqw

 

Output files

Trimmed, concatenated FASTQ files
20150521_trimmed_2212_lane2_400ppm_GCCAAT.fastq.gz
20150521_trimmed_2212_lane2_1000ppm_CTTGTA.fastq.gz

 

FASTQC files
20150521_trimmed_2212_lane2_400ppm_GCCAAT_fastqc.html
20150521_trimmed_2212_lane2_400ppm_GCCAAT_fastqc.zip

 

Example of FASTQC analysis pre-trim:

 

 

FASTQC post-trim:

 

Trimming has removed the intended bad stuff (inconsistent sequence in the first 39 bases and rise in %G in the last 10 bases). Sequences are ready for further analysis for Steven.

However, we still see the “waves” pattern with the T, A and C. Additionally, we still don’t know what caused the weird inconsistencies, nor what sequence is contained therein that might be leading to that. Will contact the sequencing facility to see if they have any insight.

from Sam’s Notebook http://ift.tt/1IPAeGQ

Growth and survival in a changing environment: Combined effects of moderate hypoxia and low pH on juvenile bivalve Macoma balthica

Visit this link to see original document. See Emma’s “Acidification” bookmarks for other recent papers of interest.

Sam’s Notebook: Bioinformatics – Trimmomatic/FASTQC on C.gigas Larvae OA NGS Data

Below is a from Sam’s Lab Notebook and provides some insight into some of our genomic focused ocean acidification studies.

In another troubleshooting attempt for this problematic BS-seq Illumina data, I’m going to use Trimmomatic to remove the first 39 bases of each read. This is due to the fact that even after the previous quality trimming with Trimmomatic, the first 39 bases still showed inconsistent quality:

 

Ran Trimmomatic on just a single data set to try things out: 2212_lane2_CTTGTA_L002_R1_001.fastq.gz

Notebook Viewer: 20150506_Cgigas_larvae_OA_trimmomatic_FASTQC

Jupyter (IPython) notebook: 20150506_Cgigas_larvae_OA_trimmomatic_FASTQC.ipynb

jQuery(function($){
var target_height = $(window).height();
$(“iframe.iframe-class”).height(target_height);
});


http://ift.tt/1H0KeKd

Results:

Trimmed FASTQ: 20150506_trimmed_2212_lane2_CTTGTA_L002_R1_001.fastq.gz

FASTQC Report: 20150506_trimmed_2212_lane2_CTTGTA_L002_R1_001_fastqc.html

You can see how flat the newly trimmed data is (which is what one would expect).

Steven will take this trimmed dataset and try additional mapping with it to see if removal of the first 39 bases will improve the mapping.

 

from Sam’s Notebook http://ift.tt/1F7759s