General help for CLUSTAL W
Clustal W is a general purpose multiple alignment program for DNA or proteins.
: all sequences must be in 1 file, one after another.
6 formats are automatically recognised: NBRF/PIR, EMBL/SWISSPROT,
Pearson (Fasta), Clustal (*.aln), GCG/MSF (Pileup) and GDE flat file.
All non-alphabetic characters (spaces, digits, punctuation marks) are ignored
except "-" which is used to indicate a GAP ("." in GCG/MSF).
To do a MULTIPLE ALIGNMENT on
a set of sequences,
use item 1
from the main menu to
INPUT them; go to menu item 2 to do the
(menu item 3) are used to align 2 alignments. Use this to
add a new sequence to an old alignment, or to use secondary structure to guide
the alignment process. GAPS in the old alignments are indicated using the "-"
character. PROFILES can be input in ANY of the allowed formats; just
use "-" (or "." for MSF) for each gap position.
(menu item 4) can be calculated from old alignments (read in
with "-" characters to indicate gaps) OR after a multiple alignment while the
alignment is still in memory.
The program tries to automatically recognise the different file formats used
and to guess whether the sequences are amino acid or nucleotide. This is not
FASTA and NBRF/PIR formats are recognised by having a ">" as the first
character in the file.
EMBL/Swiss Prot formats are recognised by the letters
ID at the start of the file (the token for the entry name field).
CLUSTAL format is recognised by the word CLUSTAL at the beginning of the file.
GCG/MSF format is recognised by the word PileUp at the start of the file. If
your msf files do not contain this word first, edit it in at the start
of the first line.
Note from the htmlizer (sorry): This is not the best way to input
sequences from GCG. For more details see this
If 85% or more of the characters in the sequence are from A,C,G,T,U or N, the
sequence will be assumed to be nucleotide. This works in 97.3% of cases
but watch out!