Chapter 11 - our final Snakefile - review and discussion

Here's the final Snakefile for comparing four genomes.

This snakemake workflow has the following features:

  • it has a single list of accessions at the top of the Snakefile, so that more genomes can be added by changing only one place in the file. See Using expand with a single pattern and one list of values for more discussion of this.

  • the workflow uses a default rule all, a "pseudo-rule" that contains only input files. This is the default rule that snakemake will run if executed without any targets on the command line. See Running rules and choosing targets from the command line for some discussion of targets and Snakefile organization.

  • the workflow uses one wildcard rule, sketch_genome, to convert multiple genome files ending in .fna.gz into sourmash signature files. See Using wildcards to generalize your rules for discussion of wildcards.

  • there is also a rule compare_genomes that uses expand to construct the complete list of genomes signature needed to run sourmash compare. Again, see using expand with a single pattern and one list of values for more discussion of this.

  • the last rule, plot_comparison, takes the output of compare_genomes and turns it into a PNG image via sourmash plot via the provided shell command.

ACCESSIONS = ["GCF_000017325.1",
              "GCF_000020225.1",
              "GCF_000021665.1",
              "GCF_008423265.1"]

rule all:
    input:
        "compare.mat.matrix.png"

rule sketch_genome:
    input:
        "genomes/{accession}.fna.gz",
    output:
        "{accession}.fna.gz.sig",
    shell: """
        sourmash sketch dna -p k=31 {input} --name-from-first
    """

rule compare_genomes:
    input:
        expand("{acc}.fna.gz.sig", acc=ACCESSIONS),
    output:
        "compare.mat"
    shell: """
        sourmash compare {input} -o {output}
    """

rule plot_comparison:
    message: "compare all input genomes using sourmash"
    input:
        "compare.mat"
    output:
        "compare.mat.matrix.png"
    shell: """
        sourmash plot {input}
    """

In the following sections we will cover the core features of snakemake used in this Snakefile more thoroughly, and then introduce some more complex bioinformatics workflows as well as a number of useful patterns and reusable recipes.