Improve documentation!

Introduction

Purpose of this guide is to give a consolidated and authoritative overview of the data from MyOSD 2016. After reading this guide, you should:

have an overview of which data from MyOSD 2016 is available and how to access it
understand how the data was generated
be able to correctly use and interpret the data

Overview of Datasets

Primers

The 16S alma-alma primer pair is described in Parada et al., 2016.

ribosomal subunit	designation	direction	label	sequence	reference
16S	alma	forward	515F-Y	5’-GTGYCAGCMGCCGCGGTAA-3’	Parada et al., 2016
16S	alma	reverse	926R	5’-CCGYCAATTYMTTTRAGTTT-3’	Parada et al., 2016

16S datasets

MYOSD 2016 (alma-alma)
OSD 2016 (alma-alma)
One sample from OSD 2015 ()

Technical samples

Blanks

Several blank samples were sequenced with the 16S alma-alma primer pair as controls. The DNA extractions were done using DNA extraction kits. For each kit one extraction was done on sterile water to ensure the kit was not contaminated. These blank samples are labelled as follows:

For MYOSD 2016, any sample whose label starts with MYOSD0 (i.e. site_id=0, see section Sample labeling) is a blank except for the samples listed below as other technical samples.
For OSD 2016, any sample whose label starts with OSD0 (i.e. site_id=0, see section Sample labeling is a blank except for the samples listed below as other technical samples.

Others

sample label	comment / description
MYOSD0_2016-06_1	control in compliance with the MyOSD sampling protocol.
MYOSD0_2016-06_2	first contamination step, by placing syringe and filter on the ground
MYOSD0_2016-06_3	intense contamination step, by placing filter and syringe on the ground and by touching Sterivex and syringe openings without gloves
MYOSD0_2016-06_10	Blu Tak experiment
MYOSD0_2016-06_11	Blu Tak experiment

Sample labeling

All samples described here are labelled using the same labeling scheme as the 2015 datasets, which is independent of the sample metadata, and therefore different from the labeling in the 2014.

The current labeling scheme is as follows ${campaign_name}${site_id|kit_number}_${campaign_date}_${artificial_number}_${dataset_name}_${primer_pair_name}

Where:

campaign_name is one of:
- OSD
- MYOSD
site_id is the stable OSD site id or the MYOSD kit number
campaign_date is the date of campaign in the format ‘YYYY-MM’ (e.g. 2015-06)
artificial_number is counting samples from the same campaign site on the same campaign_date and starts with 1
dataset_name is one of
- 16S for 16S sequence data
primer_pair_name is the designation of the primer pair (see section Primers); one of:
- alma-alma

NOTE: Minor deviations from the scheme are possible in some file names and SILVAngs analysis results. This means that additional information might be included in the sample labels, for example ‘16S/18S’ notation or ‘qc.filt’ suffix (denoting quality filtering and additional length filtering, see Sequence Data Pre Processing ).

Sequence data access

All sequence datasets (both raw and workable) will be made available as soon as we are done with quiality control. All sequence data will be submitted for long-term archival to the European Nucleotide Archive (ENA) (see ENA umbrella project PRJEB5129 and/or OSD 2014 Data Guide), once the manual curation of the contextual data is finalized (see section Contextual Data for more detail).

Pre-processing

The pre-processsing was done using the same workflow as in 2014 with minor modifications - see Sequence Data Pre Processing

Contextual Data

The contextual data for OSD 2016 and MyOSD 2016 is currently being manually curated.

SILVAngs Analysis

The SILVAngs analysis is currently running, stay tuned for results.

Documentation of

Ocean Sampling Day

Guide to OSD 2016 data