DCSE

What is DCSE

DCSE (Dedicated Comparative Sequence Editor) is a multiple alignment editor. It can be used to edit protein, DNA or RNA alignments. The structure of the molecules can be incorporated in the alignment. It is written in C, and it uses dynamic memory for most things. This means you can almost edit any size of alignment with it. It offers lots of features such as color display of characters and structure, automatic alignment relative to sequences already aligned with others, sequence grouping, sequence or pattern searching, marker system, checking of incorporated RNA structure, on-line hypertext help, macros, and a lot more.

More information on DCSE can be found on the DCSE Home page at http://rrna.uia.ac.be/~peter/dcse/index.html

The DCSE alignment file

DCSE v.3 stores alignment and structure information in plain ASCII files in a certain format. These files are usually given the extension ".ali". The alignment in an alignment files can be edited Using DCSE. To create an alignment file or append sequences to an existing alignment, the complementary program Convers can be used.

The DCSE alignment format

An alignment file starts with four info lines followed by several sequence lines. The first info line shows two numbers: the first position shown in the alignment, and the last position. For DCSE, the difference between the two must be equal to the number of positions in the alignment. RnaViz does not use these numbers, so for RnaViz any two numbers would do. The positions can be preceded by a 'P' for a protein alignment, a 'D' for a DNA alignment or an 'R' for an RNA alignment. When none of these are present, the alignment is supposed to be an RNA alignment. The other info lines can contain any text. The second info line is usually empty. The third and fourth line usually contain an indication of the position. A sequence line consists of the entire sequence (including gaps), followed by a space, a number of five characters long, another space, and the species name of maximum 40 characters. The number is not essential, the number of characters between the sequence and its name however has to be 7. All sequences should have equal length. Each sequence line consists of symbols for nucleotides or gaps, alternated with positions that are either blank or contain special symbols, e.g. symbols delimiting secondary structure elements.

Symbols describing secondary structure

The following symbols are used to indicate secondary structure elements.:
 [ and ] : beginning and end of one strand of a helix.
 ^     	 : symbolizes ][, a new helix starting immediately after the previous one.
 { and } : beginning and end of an internal loop or bulge loop interrupting a helix strand.
 ( and ) : enclose a base forming part of a non-standard pair (any pair other than G.C, A.U, or G.U).

Helix numbering

To allow the identification of secondary structure elements, "helix numbering lines" are intercalated between the sequences. The name of such lines must begin with "Helix numbering". These lines contains the helix names, but have otherwise an empty sequence (only gap characters). The 5'- and 3'- strand of a helix name are indicated as <name> and <name>'.

RnaViz is more picky about the position of the helix names that DCSE: the helix name used is the first one on the helix numbering line after the postion where the strand of the helix is opened. (DCSE did check the strands before and after the helix name; this will also change in future versions of DCSE.)