Improve documentation!

Introduction

Purpose of this guide is to give a consolidated and authoritative overview of the data from OSD 2015. After reading this guide, you should:

Overview of data sets

Primers

In 2015 we started using a new primer pair, designated alma-alma (to be distinguished from the original, designated osd-osd). The 16S osd-osd primer pair is described in Caporaso et al., 2012 (see Supplementary Material). The 16S alma-alma primer pair is described in Parada et al., 2016. The 18S osd-osd primer pair is based on Stoeck et al., 2010 -

PLEASE NOTE the sequence of the reverse primer was modified by adding an extra TGA triplet at the 3’ end. An overview table is available below.

ribosomal subunit designation direction label sequence reference
16S alma forward 515F-Y 5’-GTGYCAGCMGCCGCGGTAA-3’ Parada et al., 2016
16S alma reverse 926R 5’-CCGYCAATTYMTTTRAGTTT-3’ Parada et al., 2016
16S osd forward 515F 5’-GTGCCAGCMGCCGCGGTAA-3’ Caporaso et al., 2012
16S osd reverse 806R 5’-GGACTACHVGGGTWTCTAAT-3’ Caporaso et al., 2012
18S osd forward TAReuk454FWD1 5′-CCAGCASCYGCGGTAATTCC-3′ Stoeck et al., 2010
18S osd reverse TAReukREV3_modified 5’-ACTTTCGTTCTTGATYRATGA-3’ modified after Stoeck et al., 2010

16S datasets

18S datasets

Technical samples

Several Blanks, a staggered community and an even community samples were sequenced with the 16S alma-alma primer pair as controls.

sample label community
TEC0_2015-06_1_16S_alma-alma evencom
TEC0_2015-06_2_16S_alma-alma stagcom
TEC0_2014-06_1_16S_alma-alma blank
TEC0_2014-06_2_16S_alma-alma DNA Kit 1
TEC0_2014-06_3_16S_alma-alma DNA Kit 2
TEC0_2014-06_4_16S_alma-alma DNA Kit 3

NOTE: The technical samples labelled ‘TEC0_…’ were included in most (but not all) SILVAngs analyses. Please consult the section SILVAngs Analysis for more detail.

Sample labeling

All samples described here are labelled using a new labeling scheme, which is independent of the sample metadata, and therefore different from the labeling in the 2014. The reason is that in 2014 we experienced a lot of metadata changes which happened after the sequence data was already pre-processed and even analysed, which made correcting the labels extremely difficult. Once the metadata for 2015 is finally curated we will provide a mapping between the new and the old label scheme. The new labeling scheme is as follows ${campaign_name}${site_id|kit_number}_${campaign_date}_${artificial_number}_${dataset_name}_${primer_pair_name}

Where:

NOTE: Minor deviations from the scheme are possible in some file names and SILVAngs analysis results. This means that additional information might be included in the sample labels, for example ‘16S/18S’ notation or ‘qc.filt’ suffix (denoting quality filtering and additional length filtering, see Sequence Data Pre Processing ).

Sequence data

Access

All sequence data sets (both raw and workable) are available here:

All sequence data will be submitted for long-term archival to the European Nucleotide Archive (ENA) (see ENA umbrella project PRJEB5129 and/or OSD 2014 Data Guide), once the manual curation of the contextual data is finalized (see section Contextual Data for more detail).

Pre-processing

The pre-processsing was done using the same workflow as in 2014 with minor modifications - see Sequence Data Pre Processing

Contextual data

The contextual data for OSD 2015 and MyOSD 2015 is currently being manually curated.

SILVAngs analysis

The SILVAngs analysis was carried out with SILVA version 123.1 as a reference. The results can be found here: