The deoxyribonucleic acid (DNA) is a molecule that contains the genetic instructions required for the development and functioning of all known living organisms. The basic double-helix structure of the DNA was co-discovered by Prof. Francis Crick, a long-time faculty member at UCSD. See image.
TheDNAmoleculeconsistsofalongsequenceoffournucleotidebases: adenine(A),cytosine(C), gua- nine (G) and thymine (T). Since this molecule contains all the genetic information of a living organism, geneticists are interested in understanding the roles of the variuos DNA sequence patterns that are con- tinuously being discovered worldwide. One of the most common methods to identify the role of a DNA sequence is to compare it with other DNA sequences, whose functionality is already known. The more similar such DNA sequences are, the more likely it is that they will function similarly.
Your task is to write a C program, called dna.c, that reads three DNA sequences from a file called dna input.dat and prints the results of a comparison between each pair of sequences to the file dna output.dat. The input file dna input.dat consists of three lines. Each line is a single se- quence of characters from the set {A, C, G, T}, that appear without spaces in some order, terminated by theend of linecharacter n. You can assumethat the three lines contain thesame numberofcharacters, and that this number is at most 241 (including the character n). Here is a sample input file:
ACGTTTTAAGGGCTGAGCTAGTCAGTTCATCGCGCGCGTATATCCTCGATCGATCATTCTCTCTAGACGTTTTAAGGGCTGAGCTAGTCAGTTC
ACGTTTTAAGGGCTTAGAGCTTATGCTAATCGCGCGCGTATATCCTCGATCGATCATTCTCTCTAGACGTTTTAAGGGCTAAGGCGCGTAATTA
TCGTTTGAAGGGCTTAGTTAGTTAGTTCATCGGCGGCGTATATCCTCGATCGATCATTCTCTCTAGACGTTTTAAGGGCTGAGCCGGTCAGTTA
Each of the three lines (shown with wrap-around above) consists of 95 characters: the 94 letters from {A, C, G, T} and the character n (not shown). The output file dna output.dat must be structured as follows. For each pair of sequences #i and #j, with i, j ∈ {1,2,3} and i > j, you should print:
Each line in the output file dna output.dat should contain at most 61 characters, including the end of line character n. If the DNA sequences are longer than that, then each of the three rows mentioned above should be split across several lines, with the first few lines containing exactly 60 letters, and the last containing the rest of the letters. Here is a sample file dna output.dat which results upon processing the file dna input.dat above: See image.
Notes: