SP10074 GENETIC2 - Genetics
Description
A colony of alien bacteria has recently been discovered close to a crater in New Mexico. Dr. Poucher is in charge of the scientific team at the ICPC BioLab committed to the study of the alien DNA structure. We briefly sketch their discoveries here.
Alien DNA molecules have the structure of a circular sequence. Each sequence is composed of nucleotides. There are 26 different types of nucleotides, and each of them can occur in two faces. It is very important to remark that in any given alien DNA molecule, every nucleotide either does not appear at all or appears exactly twice (hence, the length of a DNA molecule is an even integer between 2 and 52). In case a nucleotide occurs twice, each occurrence can be of either type independently. Alien bacteria have two types of extremities, which in the technical biological jargon are referred to as arms and legs. A major discovery of Dr. Poucher’s team is a method to determine the exact number of arms and legs of a bacterium by examining its DNA structure.
Here we represent each nucleotide as a letter of the alphabet. We refer to the different nucleotides as a, A, …z, Z, where the lowercase and uppercase forms of a letter represent the two possible faces a nucleotide may appear with; we shall also use _a_/_A_, _b_/_B_, …_z_/_Z_ to refer to a nucleotide in either face.
To determine the number of extremities, Dr. Poucher starts by initializing two counters of arms and legs to zero, and then proceeds to perform a number of surgeries, transforming a DNA sequence into another one. After each transformation, you may need to increase some of the counters, depending on the type of surgery applied. When the empty sequence of nucleotides (which will be denoted by ∅) has been reached, the number of extremities of the original molecule has been found. The possible surgeries are:
1. Eliminate consecutive instances of a given nucleotide appearing with opposite faces. The number of arms and legs is preserved. For example: aBbCaC → aCaC by eliminating Bb. Another example: DeHhEd→ eHhE by eliminating dD. Remember that DNA structure is circular, so in our representation as a string the last and first letters are connected.
2. Eliminate consecutive nucleotides appearing with the same face. Add one to the number of arms. For example: BBcgCg → cgCg by eliminating BB. Another example: xabyyaBX → xabaBX by eliminating yy.
3. Eliminate a sequence of four nucleotides formed by two different nucleotides that appear alternately where different occurrences of the same nucleotide have opposite faces. Add one to the number of legs. For example: dcDCefFe → efFe, by eliminating dcDC. Another example: cmNMnC → cC by eliminating mNMn.
4. Cut and paste, the most sophisticated procedure. First, a nucleotide is selected, for instance _a_/_A_, and the DNA sequence is chopped into two linear chains such that the nucleotide appears once in each of them. Second, if both occurrences of _a_/_A_ are of the same face, one of the chains is “inverted” by reversing the sequence and changing the face of every nucleotide in the chain. Given a chain _S_, S denotes its “inverted” chain.
Then, the chains are combined by concatenating the subsequence occurring before a with the subsequence occurring after A, and the subsequence occurring after a with the sub-sequence occurring before A.
Finally, two new _a_/_A_ nucleotides are added to close the chain into a circular shape. The face of the new nucleotides are the same if the original pair of nucleotides selected had the same face, and is different otherwise.
Formally, suppose you select the nucleotide _a_/_A_, and further assume for the moment that it appears both times with the face a (A). The cut and paste surgery turns sequences of the form _S_ $ _{1} $ a_S_ $ _{2} $ _S_ $ _{3} $ a_S_ $ _{4} $ (respectively _S_ $ _{1} $ A_S_ $ _{2} $ _S_ $ _{3} $ A_S_ $ _{4} $ ) into _S_ $ _{2} $ a_S_ $ _{1} $ S\_3aS\_4 (respectively _S_ $ _{2} $ A_S_ $ _{1} $ S\_3AS\_4). On the other hand, if nucleotide _a_/_A_ appears with its two different faces, the surgery turns sequences of the form _S_ $ _{1} $ a_S_ $ _{2} $ _S_ $ _{3} $ A_S_ $ _{4} $ into _S_ $ _{2} $ a_S_ $ _{1} $ _S_ $ _{4} $ A_S_ $ _{3} $ . _S_ $ _{1} $ , _S_ $ _{2} $ , _S_ $ _{3} $ and _S_ $ _{4} $ are arbitrary sub-chains (possibly empty). In both cases the original circular chain was chopped into _S_ $ _{1} $ (_a_/_A_)_S_ $ _{2} $ and _S_ $ _{3} $ (_a_/_A_)_S_ $ _{4} $ .
For example (see the figure below): starting with the sequence BacDcAbD, we can get chains BacDc and AbD. Then, merging at nucleotide _a_/_A_ we get the sequence cDca’BbDA’ where a’ and A’ represent the new _a_/_A_ nucleotides. Here, _S_ $ _{1} $ = B, _S_ $ _{2} $ = cDc, _S_ $ _{3} $ = ∅ and _S_ $ _{4} $ = bD.
Another example: take the same DNA sequence BacDcAbD, and cut to get the chains DBac and DcAb; paste nucleotide _c_/_C_ (in this case you need to reverse one chain, for example BaCd) to get the sequence cDBadcBa. Here, _S_ $ _{1} $ = DBa, _S_ $ _{2} $ = ∅, _S_ $ _{3} $ = D and _S_ $ _{4} $ = Ab.
> - - - - - -
Input Format
N/A
Output Format
N/A