DOI QR코드

DOI QR Code

Implementation of Melody Generation Model Through Weight Adaptation of Music Information Based on Music Transformer

Music Transformer 기반 음악 정보의 가중치 변형을 통한 멜로디 생성 모델 구현

  • Received : 2023.06.28
  • Accepted : 2023.09.09
  • Published : 2023.10.31

Abstract

In this paper, we propose a new model for the conditional generation of music, considering key and rhythm, fundamental elements of music. MIDI sheet music is converted into a WAV format, which is then transformed into a Mel Spectrogram using the Short-Time Fourier Transform (STFT). Using this information, key and rhythm details are classified by passing through two Convolutional Neural Networks (CNNs), and this information is again fed into the Music Transformer. The key and rhythm details are combined by differentially multiplying the weights and the embedding vectors of the MIDI events. Several experiments are conducted, including a process for determining the optimal weights. This research represents a new effort to integrate essential elements into music generation and explains the detailed structure and operating principles of the model, verifying its effects and potentials through experiments. In this study, the accuracy for rhythm classification reached 94.7%, the accuracy for key classification reached 92.1%, and the Negative Likelihood based on the weights of the embedding vector resulted in 3.01.

Keywords

References

  1. R. Kamien, "Music: An Appreciation," McGraw-Hill, 2007.
  2. J. London, "Hearing in Time: Psychological Aspects of Musical Meter," Oxford University Press, 2004.
  3. H. W. Dong, W. Y. Hsiao, L. C. Yang, Y. H. Yang, "MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment," Proceedings of Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
  4. S. Hochreiter, J. Schmidhuber, "Long Short-term Memory," Neural Computation, Vol. 9, No. 8, pp. 1735-1780, 1997. https://doi.org/10.1162/neco.1997.9.8.1735
  5. I. Goodfellow, "Generative Adversarial Nets," Advances in Neural Information Processing Systems 27, pp. 2672-2680,
  6. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin "Attention Is All You Need," Advances in Neural Information Processing Systems 30, pp. 5998-6008, 2017.
  7. A. R. Selfridge-Field, "Concepts of MIDI: Musical Instrument Digital Interface," Computing in Musicology, Vol. 8, pp. 3-5, 1992.
  8. L. Cohen, "Time-frequency Distributions-a Review," Proceedings of the IEEE, Vol. 77, No. 7, pp. 941-981, 1989. https://doi.org/10.1109/5.30749
  9. Y. LeCun, Y. Bengio, G. Hinton, "Deep Learning," Nature, Vol. 521, No. 7553, pp. 436-444, 2015. https://doi.org/10.1038/nature14539
  10. H. Huang, T. Cooijmans, A. Roberts, A. Courville, D. Eck, "Music Transformer: Generating Music with Long-Term Structure," Proceedings of ICLR, 2019.
  11. S. Davis, P. Mermelstein, "Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences," IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 28, No. 4, pp. 357-366, 1980. https://doi.org/10.1109/TASSP.1980.1163420
  12. T. Qian, J. Kaunismaa, T. Chung, "cMelGAN: An Efficient Conditional Generative Model Based on Mel Spectrograms," arXiv preprint arXiv:2205.07319, 2022.
  13. D. P. Kingma, M. Welling, "Auto-Encoding Variational Bayes," in Proceedings of the 2nd International Conference on Learning Representations (ICLR), 2014. [Online]. Available: https://arxiv.org/abs/1312.6114.
  14. A. Roberts, J. Engel, C. Raffel, I Simon, C Hawthorne, "MusicVAE: Creating a Palette for Musical Scores wit h Machine Learning," Google AI Blog, 2018. [Online]. Available: https://ai.googleblog.com/2018/02/musicae-creating-palette-for-musical.html.
  15. N. Jaques, S. Gu, R. E. Turner, D. Eck"Tuning Recurrent Neural Networks with Reinforcement Learning," arXiv preprint arXiv:1611.02796, 2017.
  16. A. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, K. Kavukcuoglu, "WaveNet: A Generative Model for Raw Audio," arXiv preprint arXiv:1609.03499, 2016.
  17. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, "Attention Is All You Need," Advances in Neural Information Processing Systems 30, pp. 5998-6008, 2017.
  18. H. Huang, T. Cooijmans, A. Roberts, A. Courville, D. Eck, "Music Transformer: Generating Music with Long-Term Structure," in Proceedings of the 7th International Conference on Learning Representations (ICLR), 2019.
  19. http://www.aihub.or.kr/