You have received two DNA/RNA sequences from the lab in a text le in a format like the following:
43
attacgatagagctcgatttagcggggctgcggctggcgatat
and
43
atcttcttatcgtagtatgatcgggctagatgatcgatgcagt
You are to analyze each sequence in the following manner:
Also, you are to show where the two DNA/RNA sequences bond. A DNA strand would be a series of nucleotides (Adenine, Thymine, Cytosine, Guanine). RNA is the same except Thymine is replaces by Uracil. The following bonds are possible (Adenine-Thymine, Cytosine-Guanine, Guanine-Cytosine, and Adenine-Uracil). Note the bonds are symmetrical. Additionally, space or ' ' means that the lab has detected a gap in the sequence.
Thus the example above will have the following bonding pattern (the | shows bonding between the top sequence and the bottom sequence):
a t t a c g a t a g a g c t c g a t t t a g c g g g g c t g c g g c t g g c g a t a t
| | | | | | | | | | |
a t c t t c t t a t c g t a g t a t g a t c g g g c t a g a t g a t c g a t g c a g t
You analysis should be in a le named output.
Your program should accept two lenames with a series of characters for the DNA/RNA sequences to be paired. You can either ask the user for inputs similar to how your previous assignment or optionally use program parameters. Assume lenames are no longer than 256 characters.
Example of asking for lenames (user input is in red):
Which file contains your first sequence? seq
Which file contains your second sequence? seq2
Finished processing your sequence data.
Output will be found in the file 'output'.
[Length of the sequence]
[String of characters representing sequence (see assumptions)]
This part of the code is ready to go, but it needs to be completed.
// Remember to include the necessary headers
/* Start of given declarations */
#define NUM_COUNTERS 5 // Same number of items in CounterType
#define MAX_SEQ_LEN 100
#define PRINT_LEN 35
#define MAX_FILENAME_LEN 256
#define NUM_BOND_TYPES 3 // Same number of items in BondType
typedef enum {
DNA,
RNA,
INVALID,
UNDETERMINED
} SequenceType;
// Note that you may use these to access arrays too! array[ADENINE] will be the
// same as array[0]. Follow this ordering presented here. CYTOSINE equals 1 and
// so forth.
typedef enum {
ADENINE = 0,
CYTOSINE,
GUANINE,
THYMINE,
URACIL,
ERROR
} NucleotideType;
// Translation function from symbol to counter index NucleotideType GetNucleotideType(char symbol);
// Translates the counter index to its character name
char GetCounterName(int type);
// Standardizes any sequence character.
char GetStandardChar(char input);
// Debug function
void PrintCounters(const int counter[]);
// Prints the name of the sequence type to file output
void PrintSequenceType(const SequenceType type, FILE *output);
/* End of given definitions */
// Identify the type of sequence from the counter information SequenceType
GetSequenceType(const int counter[]);
// Counts nucleotides in sequence and update values in counters
void CountNucleotides(const char seq[], int counters[]);
// Logic to determine whether a nucleotide pairing can form a bond.
// 1 means true, 0 means false, -1 means maybe
int IsMatch(const char sym1, const char sym2);
// Print the ratios of each nucleotides given the counters and length.
void PrintRatios(const int counter[], int total_len, FILE *output);
// Prints sequences in the following format
// A T C A G A T A C
// | | | | // T A C T T C T T T
void PrintMatches(const int length, const char seq1[], const char seq2[], FILE *output);
// Hint: you need a function to process the file data
// Main
void main () {
// Fill this or change as necessary
}
/* Given code definitions */
NucleotideType GetNucleotideType(char symbol) {
// Sift through the data
switch(symbol) {
case 'a':
case 'A': {
return ADENINE;
break;
}
case 'c':
case 'C': {
return CYTOSINE;
break;
}
case 'g':
case 'G': {
return GUANINE;
break;
}
case 't':
case 'T': {
return THYMINE;
break;
}
case 'u':
case 'U': {
return URACIL;
break;
}
default: {
// Unexpected character. Error to invalidate the sequence.
return ERROR;
}
}
}
char GetCounterName(int type) {
switch(type) {
case ADENINE: return 'A';
case CYTOSINE: return 'C';
case GUANINE: return 'G';
case THYMINE: return 'T';
case URACIL: return 'U';
default: return 0;
}
}
char GetStandardChar(char input) {
return GetCounterName(GetNucleotideType(input));
}
void PrintCounters(const int counter[]) {
for (int i = 0; i < NUM_COUNTERS; i++) {
printf("%s%c%s%dn", "Number of '", GetCounterName(i), "': ", counter[i]);
}
}
void PrintSequenceType(const SequenceType type, FILE *output) {
switch(type) {
case UNDETERMINED: {
fprintf(output, "%s", "Undetermined");
break;
}
case DNA: {
fprintf(output, "%s", "DNA");
break;
}
case RNA: {
fprintf(output, "%s", "RNA");
break;
}
default: {
fprintf(output, "%s", "Invalid");
}
}
}
/* End of given definitions */
SequenceType GetSequenceType(const int counter[]) {
// Fill this in
}
void CountNucleotides(const char seq[], int counters[]) {
// Fill this in
}
int IsMatch(const char sym1, const char sym2) {
// Fill this in
}
void PrintRatios(const int counter[], int total_len, FILE *output) {
// Fill this in
}
void PrintMatches(const int length, const char seq1[], const char seq2[], FILE *output) {
// Fill this in
}