Implementation of Melody Generation Model Through Weight Adaptation of Music Information Based on Music Transformer

Seunga Cho;Jaeho Lee;

doi:10.14372/IEMEK.2023.18.5.217

IEMEK Journal of Embedded Systems and Applications (대한임베디드공학회논문지)

Volume 18 Issue 5
/
Pages.217-223
/
2023
/
1975-5066(pISSN)

Institute of Embedded Engineering of Korea (대한임베디드공학회)

DOI QR Code

Implementation of Melody Generation Model Through Weight Adaptation of Music Information Based on Music Transformer

Music Transformer 기반 음악 정보의 가중치 변형을 통한 멜로디 생성 모델 구현

Seunga Cho (Duksung Women's University ) ;
Jaeho Lee (Duksung Women's University)

조승아 ;
이재호

Received : 2023.06.28
Accepted : 2023.09.09
Published : 2023.10.31

https://doi.org/10.14372/IEMEK.2023.18.5.217 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, we propose a new model for the conditional generation of music, considering key and rhythm, fundamental elements of music. MIDI sheet music is converted into a WAV format, which is then transformed into a Mel Spectrogram using the Short-Time Fourier Transform (STFT). Using this information, key and rhythm details are classified by passing through two Convolutional Neural Networks (CNNs), and this information is again fed into the Music Transformer. The key and rhythm details are combined by differentially multiplying the weights and the embedding vectors of the MIDI events. Several experiments are conducted, including a process for determining the optimal weights. This research represents a new effort to integrate essential elements into music generation and explains the detailed structure and operating principles of the model, verifying its effects and potentials through experiments. In this study, the accuracy for rhythm classification reached 94.7%, the accuracy for key classification reached 92.1%, and the Negative Likelihood based on the weights of the embedding vector resulted in 3.01.

Keywords

References

R. Kamien, "Music: An Appreciation," McGraw-Hill, 2007.
J. London, "Hearing in Time: Psychological Aspects of Musical Meter," Oxford University Press, 2004.
H. W. Dong, W. Y. Hsiao, L. C. Yang, Y. H. Yang, "MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment," Proceedings of Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
S. Hochreiter, J. Schmidhuber, "Long Short-term Memory," Neural Computation, Vol. 9, No. 8, pp. 1735-1780, 1997. https://doi.org/10.1162/neco.1997.9.8.1735
I. Goodfellow, "Generative Adversarial Nets," Advances in Neural Information Processing Systems 27, pp. 2672-2680,
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin "Attention Is All You Need," Advances in Neural Information Processing Systems 30, pp. 5998-6008, 2017.
A. R. Selfridge-Field, "Concepts of MIDI: Musical Instrument Digital Interface," Computing in Musicology, Vol. 8, pp. 3-5, 1992.
L. Cohen, "Time-frequency Distributions-a Review," Proceedings of the IEEE, Vol. 77, No. 7, pp. 941-981, 1989. https://doi.org/10.1109/5.30749
Y. LeCun, Y. Bengio, G. Hinton, "Deep Learning," Nature, Vol. 521, No. 7553, pp. 436-444, 2015. https://doi.org/10.1038/nature14539
H. Huang, T. Cooijmans, A. Roberts, A. Courville, D. Eck, "Music Transformer: Generating Music with Long-Term Structure," Proceedings of ICLR, 2019.
S. Davis, P. Mermelstein, "Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences," IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 28, No. 4, pp. 357-366, 1980. https://doi.org/10.1109/TASSP.1980.1163420
T. Qian, J. Kaunismaa, T. Chung, "cMelGAN: An Efficient Conditional Generative Model Based on Mel Spectrograms," arXiv preprint arXiv:2205.07319, 2022.
D. P. Kingma, M. Welling, "Auto-Encoding Variational Bayes," in Proceedings of the 2nd International Conference on Learning Representations (ICLR), 2014. [Online]. Available: https://arxiv.org/abs/1312.6114.
A. Roberts, J. Engel, C. Raffel, I Simon, C Hawthorne, "MusicVAE: Creating a Palette for Musical Scores wit h Machine Learning," Google AI Blog, 2018. [Online]. Available: https://ai.googleblog.com/2018/02/musicae-creating-palette-for-musical.html.
N. Jaques, S. Gu, R. E. Turner, D. Eck"Tuning Recurrent Neural Networks with Reinforcement Learning," arXiv preprint arXiv:1611.02796, 2017.
A. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, K. Kavukcuoglu, "WaveNet: A Generative Model for Raw Audio," arXiv preprint arXiv:1609.03499, 2016.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, "Attention Is All You Need," Advances in Neural Information Processing Systems 30, pp. 5998-6008, 2017.
H. Huang, T. Cooijmans, A. Roberts, A. Courville, D. Eck, "Music Transformer: Generating Music with Long-Term Structure," in Proceedings of the 7th International Conference on Learning Representations (ICLR), 2019.
http://www.aihub.or.kr/

IEMEK Journal of Embedded Systems and Applications (대한임베디드공학회논문지)

Implementation of Melody Generation Model Through Weight Adaptation of Music Information Based on Music Transformer

Music Transformer 기반 음악 정보의 가중치 변형을 통한 멜로디 생성 모델 구현

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)