In the first performance, I frankenstein-ed 3 of my favorite songs with Markov chain and generate a midi file as an output. For this performance, I essentially planned to do the same, but with added objectives of: (1) having a more meaningful data corpus, and (2) designing a performance with the generated music that makes sense.
This project has been more of a proof-of-concept of a "duet" pipeline.
(1) Meaningful data corpus
Playing games was a huge part of my childhood, and a significant aspect of the experience was definitely the soundtrack. Until now, every time I listen to a battle theme song of the games I played, I can still feel the adrenaline rush. The songs could still take me back to a time when I experienced magic and lived in the game fantasy world.
So I collected 20 MIDI files of game soundtracks composed by Yoko Shimomura from Musescore. She has composed for few of my favorite JRPGs (but also for Super Mario & Street Fighter) and is arguably the most prominent female composer in the gaming industry.
Even though 20 is not a big enough data set, I decided to go ahead because I knew the MIDI formatting would be consistent and hence easier to encode.
This is the encoding code (taken from the 1st performance)
if message.type == 'note_on':
for item in message_components:
if 'note=' in item:
note = item.split('note=')
if 'velocity=' in item:
vel = int(item.split('velocity='))
if vel == 0:
# ! velocity 0 means end of note
# ! only save time information (duration) when it is end of note
isSaveTime = True
# ! if velocity is more than 0, means start of note
# ! don't save time if it's start of note
isSaveTime = False
print ('note: ' + note)
if 'time=' in item:
dur = int(item.split('time='))
# ! if velocity and time is 0, means current note plays at the same time as previous note
# ! the duration should follow the previous note
if dur > 50:
print ('duration: ' + str(dur))
(2) Designing a performance with generated music
Coming to this I knew that the output of training a model on the MIDIs would mostly be gibberish. Shimomura’s songs are hardly homogenous: some of them would be fast tempo and aggressive sounding, while the others sound like a relaxing lullaby.
Transposing everything into the same tempo and key would also be too much work.
So how do I design a performance that still makes sense and is meaningful without similarity or listenability? I designed a performance pipeline that could emulate how I experience Shimomura’s music:
I would “play” a game, or try to evoke certain sounds that remind me of the games I played as a child.
The program would respond to me playing and generate a “soundtrack”.
I would play more, and the program would respond further. It’s like a "duet/dialog" between the player and the soundtrack composer.
Prep Work: Tensorflow
Encode MIDI files into text files, train model.
Live Processing: JS
Listen for MIDI inputs, play note for each input with Tone.JS
Store inputs in the same MIDI encoding format.
Once inputting is complete, use it as seed for generation with ML5.
Translate generated text to sounds with ToneJS
(source code to be posted soon)
Bigger data set, and account for different keys and tempo.
Better encoding(!!) – instead of splitting the 2 piano melodies (MIDI tracks) into two different encodings, encode them together into one file.
Faster / more responsive generation. Perhaps use a timer function instead of space/enter as generation triggers.
Use game controllers instead of MIDI keyboard.
More real-instrument-sounding output.