In the past performances, I encoded and trained each track in my MIDI files (melody & harmony) separately. Then I play the generated tracks on the same time.
But I think that was not the proper way of generating two-track music, because the melody and harmony should correlate.
So for this assignment, I tried using Performance_RNN workflow to encode the dataset and train the model.
I used the same dataset I trained my 2nd performance with: 20 MIDI files of Yoko Shimomura’s game soundtracks. All of these tracks have a melody (right hand) and a harmony (left hand) track.
The issue with these tracks are that its MIDI formatting has no ‘note_off', and the end of the note is indicated with 'velocity=0’. I was not sure that the sequencing algorithm would be able to parse these MIDI files. But from Magenta’s GitHub, it seems like Performance_RNN has a config that takes into account a spectrum of velocity value:
"The performance_with_dynamics model includes velocity changes quantized into 32 bins"
I don't know what the warnings mean and I ignored them
I tried training the model locally with 1000 steps first. The final loss value is 4.4152, and perplexity 82.698395
Then I moved on to training 20,000 steps on Paperspace twice, and 10,000 once. I didn’t realize that if you shut the console down the machine will get rid of the tmp folder so I wasted a good 2 x 3 hours....
The final loss value is 2.985... and perplexity 19.8043....
Kind of coherent for 2 seconds, but longer than that the generations just melt into a cacophony of unending notes & chords.
Conclusion: although I can hear that the harmony and melody now sound more harmonious, the generations sound nothing like Yoko Shimomura.
My gut says it’s because the dataset format is not clean. There’s no note off which just results in unending notes in the generation.
To do: try with better MIDI files? try with longer training?