Sonifying COVID-19 genetic mutations over time

These uncertain times have brought into sharp focus the importance of scientific research and communication. Our understanding of the world is filtered through the filters of our conceptual and scientific knowledge, and the value of multi-modal representations of data is growing. In response to this we have set up the Covid-19 Listening Project dedicated to the sonic and musical representation of Covid-19 data.

In consultation with geneticist Gemma Bruno (Telethon Institute of Genetics and Medicine, Italy), programming and music technology resources are employed in the communication of relevant genetic patterns in the disease. Such data as the structure of the Covid-19 genome, the points of mutation within it, protein coding, the comparative genetics in the phylogenetic tree of Covid-19 samples and their geographical distribution are translated into pitch, rhythms and harmonies to create rich and compelling communicative works.

A 42-minute choral piece and a genomic-spatial realisation have already been produced using these techniques.


Background

The genome is formed by a long sequence of 4 types of nucleotides: adenine (A), guanine (G), cytosine (c), and uracil (U). This sequence is used to synthesise specific proteins such as the “Spike protein”, which gives COVID-19 its crown-like appearance. A mutation happens when a nucleotide is changed, inserted or deleted in the sequence.

The current sonification methodology associates notes’ timing to the position of the mutation within the DNA and pitch to the type of nucleotide mutation (e.g. adenine to guanine). This means that the position of the mutation results in different rhythmic placement, and the type of nucleotide mutation results in different melodies, giving each genome its own musical signature. Furthemore, repeating musical patterns mean that a mutation has persisted over time.

Note that even though the number of mutations may seem large, COVID-19 is actually considered a relatively stable virus.

Data gathering

The project uses the database of the National Center for Biotechnology Information (NCBI), which is being updated every day with new COVID-19 sequences coming from research centres across the world. The website is available here: NCBI Covid-19 page

The data is parsed and downloaded using the covid-genome-matlab-parser.

How the mutations are obtained from the NCBI dataset

The first step is to obtain the mutations are obtained from the NCBI dataset:

How the mutations are translated to music

The second step is to translate the mutations into music. Below are two examples of how to do this.

Youtube video

Below are the details of the procedure to generate the sound from the mutations:

Old basis New basis Midi note Note
C - 47 B2
U - 48 C3
A - 49 C♯3/D♭3
G A 51 D♯3/E♭3
G U 52 E3
G C 53 F3
A G 54 F♯3/G♭3
A U 55 G3
A C 56 G♯3/A♭3
U G 57 A3
U A 58 A♯3/B♭3
U C 59 B3
C G 60 C4
C A 61 C♯4/D♭4
C U 62 D4
- G 63 D♯4/E♭4
- A 64 E4
- U 65 F4
- C 66 F♯4/G♭4
G - 50 D3


Protein name Group Instrument Angle
NSP1 1 Cello -45°
NSP2 1 Cello -45°
NSP3 2 Cello -30°
NSP4 3 Cello -15°
NSP5 3 Cello -15°
NSP6 3 Cello -15°
NSP7 3 Cello -15°
NSP8 3 Cello -15°
NSP9 4 Cello
NSP10 4 Cello
NSP12 4 Cello
NSP13 5 Cello +15°
NSP14 5 Cello +15°
NSP15 5 Cello +15°
NSP16 5 Cello +15°
S 6 Double base +30°
ORF3a 7 Violin +45°
E 7 Violin +45°
M 7 Violin +45°
ORF6 7 Violin +45°
ORF7a 7 Violin +45°
ORF8 7 Violin +45°
N 7 Violin +45°
ORF10 7 Violin +45°
Non-coding DNA 8 Violin +45°


Chorus of Changes (soundcloud)

Here, over 500 genome sequences are translated into two octaves of a B minor scale. The translations are selected by mapping the most common mutations types (‘note deltas’ above) into the most common diatonic scale degrees on a sample of Western Art Music (see Huron 2008)[6]. This results in familiar melodic motifs for the most commonly retained mutations and pandiatonic blurring for the more novel mutations. At the tempo selected this results in a surprisingly engaging piece of music lasting over 42 minutes, where the language of mutation is translated into the language motivic transformation, a deeper sonificaiton beyond arbitrary chormatic or ‘safe’ scale choices.This is performable by choir and organ nut its here rendered with MIDI instrumentations in Ableton Live with UAD and Native Instrument plugins.

Contributing

Please keep the coding convention.

We are particularly interested in collaborations with Molecular Biologists. Contact enzodesena AT gmail DOT com if interested.

Authors

We are both with the Department of Music and Media, University of Surrey, Guildford, UK University of Surrey DMM Website.

Acknowledgements

We would like to thank Gemma Bruno (Telethon Institute of Genetics and Medicine, Italy), Niki Loverdu (KU Leuven, Belgium) and Nicoletta Bruno (Imperial College London, UK) for the useful discussions on topic. Thanks also to Carl Zimmer for pointing us to the Nature article [8]. Finally, we would like to thank our friends and colleagues for the useful feedback on the presentation.

The project also uses internally:

References

[1] Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. (1998). Biological Sequence Analysis (Cambridge University Press).

[2] H. Hacıhabiboğlu, E. De Sena, Z. Cvetković, J.D. Johnston and J. O. Smith III, “Perceptual Spatial Audio Recording, Simulation, and Rendering,” IEEE Signal Processing Magazine vol. 34, no. 3, pp. 36-54, May 2017.

[3] E. De Sena, Z. Cvetković, H. Hacıhabiboğlu, M. Moonen, and T. van Waterschoot, “Localization Uncertainty in Time-Amplitude Stereophonic Reproduction,” IEEE/ACM Trans. Audio, Speech and Language Process. (in press).

[4] E. De Sena, H. Hacıhabiboğlu, and Z. Cvetković, “Analysis and Design of Multichannel Systems for Perceptual Sound Field Reconstruction,” IEEE Trans. on Audio, Speech and Language Process., vol. 21 , no. 8, pp 1653-1665, Aug. 2013.

[5] E. De Sena, H. Hacıhabiboğlu, Z. Cvetković, and J. O. Smith III “Efficient Synthesis of Room Acoustics via Scattering Delay Networks,” IEEE/ACM Trans. Audio, Speech and Language Process., vol. 23, no. 9, pp 1478 - 1492, Sept. 2015.

[6] Huron, D. (2008) Sweet Anticipation: Music and the Psychology of Expectation. MIT Press

[7] https://www.nytimes.com/interactive/2020/04/03/science/coronavirus-genome-bad-news-wrapped-in-protein.html (Accessed on: 8/3/2020)

[8] Gordon, D.E., Jang, G.M., Bouhaddou, M. et al. A SARS-CoV-2 protein interaction map reveals targets for drug repurposing. Nature (2020)

License

This project is licensed under the GNU License. The code will be made available soon. In the meantime, contact enzodesena AT gmail DOT com if interested.

You use the code, data and findings here at your own risk. See the LICENSE.md file for details.