Genome Alignment pipeline help

Genomic sequences alignment is to be completed in two stages.

Input data

target_set.fa and query_set.fa files are to be in multifasta format.

1. Produce raw alignment

At the current stage for every sequence of target_set.fa there will be created a binary file in *.da format with the set of all possible variants of its alignments.

File name is being determined by the names of input files and a number of sequence in "target" set, and has the following appearance:
"target_set.fa:query_set.fa:XXXXX:00.da" , where
target_set.fa - name of file with target set
query_set.fa - name of file with query set
XXXXX - number of sequence in target set, and the first sequence has the number 00000, second - 00001 etc.

2. Get optimal coverage from raw alignment

The possible variants of coverage search: with search for optimal way for each query sequence separately, and with search for optimal way for the whole query set. It is also possible to output all found local alignments.

To the program input the *.da file obtained at the previous stage is being set, thus the current step is to be executed individually for every of the target sequences.