Chapter 9 - using expand
to make filenames
You might note that the list of files in the compare_genomes
rule
all share the same suffix, and they're all built using the same rule.
Can we use that in some way?
Yes! We can use a function called expand(...)
and give it a template
filename to build, and a list of values to insert into that filename.
Below, we build a list of accessions named ACCESSIONS
, and then use
expand
to build the list of input files of the format {acc}.fna.gz.sig
from that list, creating one filename for each value in ACCESSSIONS
.
ACCESSIONS = ["GCF_000017325.1",
"GCF_000020225.1",
"GCF_000021665.1",
"GCF_008423265.1"]
rule sketch_genome:
input:
"genomes/{accession}.fna.gz",
output:
"{accession}.fna.gz.sig",
shell: """
sourmash sketch dna -p k=31 {input} --name-from-first
"""
rule compare_genomes:
input:
expand("{acc}.fna.gz.sig", acc=ACCESSIONS),
output:
"compare.mat"
shell: """
sourmash compare {input} -o {output}
"""
rule plot_comparison:
message: "compare all input genomes using sourmash"
input:
"compare.mat"
output:
"compare.mat.matrix.png"
shell: """
sourmash plot {input}
"""
While wildcards and expand
use the same syntax, they do quite different
things.
expand
generates a list of filenames, based on a template and a list
of values to insert into the template. It is typically used to make a
list of files that you want snakemake to create for you.
Wildcards in rules provide the rules by which one or more files will be actually created. They are recipes that say, "when you want to make a file with name that looks like THIS, you can do so from files that look like THAT, and here's what to run to make that happen.
expand
tells snakemake WHAT you want to make, wildcard rules tell
snakemake HOW to make those things.
CTB: add discussion about how this is the same as list of strings. CTB: refer to additional expand docs.