CQ-Weiterbildung-Bioinformatik

Unlocking your NGS data skills: NGS data analysis and workflow management systems

In order to answer biological-medical questions, the importance of an efficient and high-quality data analysis of genomic information (DNA) is becoming more and more crucial. This is made possible by means of the latest next-generation sequencing (NGS) technologies. This is why more knowledge in the sector of bioinformatics is required when dealing with very large amounts of data.

In this (part-time) course, you will strengthen your skills by tackling the most important aspects of NGS data analysis. In addition to dealing with large data sets and data formats, you will learn to create your own workflows using workflow management systems. By connecting all steps from quality control to variant calls, you will build complete workflows, first in Galaxy, and then transfer and apply this knowledge to Snakemake, a script-based, open-source workflow management system. With Snakemake, independent and reproducible bioinformatic pipelines can be created that allow automatic execution of the complete workflow.

1. Analysis of NGS data

  • Introduction to NGS data (Illumina, single- and paired-end data, phred quality scores)
  • Becoming familiar with NGS data formats (FASTQ, SAM/BAM, VCF)
  • Processing of sequencing data (quality control, trimming, mapping)
  • Building of Variant Call workflows in Galaxy and Snakemake
  • Visualization of NGS data with Integrated Genomic Viewer (IGV)

2. Introduction to command line tools and Software management

  • Application of Conda to create and manage software environments for NGS data analysis
  • Installation and application of command line tools (e.g. Bowtie2, BWA-MEM, samtools, Trimmomatic, freebayes)

3. Introduction to workflow management systems

  • Basic principles of NGS workflows with Galaxy (browser-based)
  • Building of NGS workflows in Snakemake (script-based, command line, basic bash)

Professional requirements:

  • Basic knowledge in molecular biology, especially genetics / genomics
  • Basic understanding of molecular/sequencing technologies (PCR, library preparation, Illumina sequencing)
  • Interest in the use of command line tools
  • No prior programming skills are required

Technical prerequisites:

  • Computer with available hard disk space of at least. 30 GB
  • Linux OS (Ubuntu), or MacOS
  • Up-to date browser (firefox or chrome)
  • Good Internet connection, at least one large monitor (preferably two), as well as a microphone and speakers or a headset for optimal sound quality

Participation takes place online via the virtual classroom and enables a live transmission of the lecturer's screen as well as communication with the group and the lecturers via microphone / loudspeaker or the chat function.

In addition to online participation via the virtual classroom, there are self-study phases in which selected materials and exercises are made available on an e-learning platform (Moodle). The lecturer is available at fixed office hours to answer individual questions about the virtual class room.

After finishing the course you will have learned to:

  • understand important data formats (FASTQ, SAM/BAM, VCF)
  • manipulate large NGS data files
  • navigate the file system via the commandline with basic bash commands
  • use the software manager Conda for installing software and organizing software environments
  • use the command line to apply analytical software
  • work with open source software for all analytical steps
  • create a complete DNA analysis workflow in Galaxy (from quality control to variant call)
  • transfer this knowledge to build reproducible and scalable workflows in Snakemake

