Data preparation¶
The input data should be a .csv file containing three columns with no header. The three columns are sequences, phenotype, and variance.
- sequence
- The sequence column should contain strings of sequences of uniform length.
- phenotype
- This column should contain the phenotype(fitness) of individual sequences.
- variance
- This column should contain estimation of the variance of measurement noise for each sequence.
example¶
See the example data files smn1data.csv for further formatting guidelines.
Here are the top ten lines of smn1data.csv:
| UACGUUGGG | 0.407 | 0.076 |
| GGCGCUCAC | 1.62 | 0.407 |
| AUGGUAUCG | 9.752 | 1.276 |
| ACAGCUGGC | 0.151 | 0.028 |
| ACAGUUGAU | 0.193 | 0.036 |
| GACGUCCUC | 20.191 | 3.182 |
| AGCGUCUUC | 0.124 | 0.023 |
| AAGGUUGUG | 13.497 | 1.912 |
| CUAGCGUUA | 0.208 | 0.046 |
| AAGGCAGGA | 0.357 | 0.105 |