From Motion to Musical Gesture: Machine Learning in the CompositionalWorkflow
Jean Bresson (1), Paul Best (1), Diemo Schwarz (1), Alireza Farhang (2)
1- Ircam Lab, CNRS, Sorbonne Université, Ministère de la Culture – Paris, France
2- Royal Conservatoire Antwerp, Belgium
Applications of machine learning and AI for music composition often aim at simulating some kind of creativity out of computing systems. In this paper, we present preliminary works investigating the use of machine learning for computer-aided composition, as a tool assisting composers (creative users) in varied tasks and facets of their activity.We propose a work direction centred on the concept of musical “gesture”.
Contemporary music creation has long taken advantage of technology and computing systems to increase sonic and compositional possibilities, enhancing at the same time the expressivity and language of musicians, and the experience of music listeners. Artificial intelligence (AI) inspired the very beginnings of computer music, with the implicit objective of one day producing machines capable of composing, displaying their own creativity (Hiller and Isaacson 1959). While its core concepts remained central in the work of a few composers, for instance in David Cope’s Experiments in Musical Intetlligence (Cope 1996), or in Shlomo Dubnov’s Memex compositions (Dubnov 2008), the notion of computational creativity only came back recently at the forefront of music technology research (Dubnov and Surges 2014), with publicised projects such as Flow Machines (Ghedini, Pachet, and Roy 2015) as well as a whole new field of research applying deep-learning techniques to varied aspects of music processing (Briot, Hadjeres, and Pachet 2018).
In the meantime, machine learning techniques were intensively applied for data mining and classification in the fields of Music Information Retrieveal (Pearce, Müllensiefen, and Wiggins 2008; Illescas, Rizo, and M. 2008), computational musicology (Camilleri 1993; Meredith 2015), humanmachine co-improvisation—where a computer agent learns from musical sequences in order to produce new sequences imitating a style or a “mixture” of styles (Assayag, Dubnov, and Delerue 1999), research of instrumental combinations for the synthesis of orchestral timbres (Esling, Carpentier, and Agon 2010; Crestel and Esling 2017), or gesture following and anticipation for real-time musical interaction (Franc¸oise, Schnell, and Bevilacqua 2013). Daniele Ghisi’s project La Fabrique des Monstres recently demonstrated as well the generative potential of machine learning using raw sound signals (Ghisi 2018).
Despite this variety of applications, however, to our knowledge the use of machine learning and AI by the composers was never generalized as a “standard” item in the composition-assistance toolbox. In the field of computeraided composition, a somehow opposite focus on preserving users’ own creative input has led researchers and composers to develop more “constructivist” approaches and to get into other emerging aspects of information technology, such as end-user programming (Burnett and Scaffidi 2014) and visual programming languages (Assayag 1995). In environments like OpenMusic (Bresson, Agon, and Assayag 2011) or Max (Puckette 1991) — two visual programming languages dedicated to music creation—composers can freely develop, formalize, and implement ideas under the form of programs associated to varied musical representations and interaction features.
Computer-aided composition therefore raises an interesting challenge for machine learning and AI considered under the previous perspective: how to enhance a creative process while preserving user’s input and subjectivity ?
Our objective here is to question the possible integration of machine learning and AI-inspired techniques and their appropriation by composers in formalized compositional approaches such as the ones developed in computeraided composition environments. Prospective applications might include varied tasks and stages of compositional processes: analysis and transcription, complex problem solving and operational research, composition by recomposition or concatenation of patterns, etc. In this range of compositional tasks, machine learning provides an opportunity to better understand, control, or generate abstract musical structures emanating from composers’ formalized thinking, empirical musicianship, and creativity.
Learning and Musical Gestures
The concept of “gesture” is frequently found in compositional discourse and studies, yet in a variety of different, often relatively abstract meanings (Godøy and Leman 2010; Hervé and Voisin 2006; Farhang 2016).
In the field of movement & computing (MOCO), machine learning technology today permits to deal with gestural data input, mapping, and processing using powerful and operational tools (Wanderley 2002; Bevilacqua et al. 2011; Caramiaux et al. 2014). One of such technology is the model developed by Jules Françoise in the XMM  library. Based on hybrid techniques combining Gaussian Mixture Models and Hidden Markov Models, XMM dynamically analyses time-series and streams of gesture-description signals in order to classify movements: at each-time of a real-time data input can be estimated the highest-probability gesture being performed, as well as the position of this estimation within a global model of the gesture (Françoise, Schnell, and Bevilacqua 2013).
Recent works on the explicit integration of gesture data and concepts in a computer-aided composition framework were carried out for instance by J´er´emie Garcia’s pOM project with composer Philippe Leroux, converting handdrawn pen gestures to symbolic compositional structures (Garcia, Leroux, and Bresson 2014), or by Marlon Schumacher in the OM-Geste  library for OpenMusic (Schumacher and Wanderley 2017). In OM-Geste, multidimensional gesture-description signals are encoded and mapped to musical objects at different scales and time-resolutions, in order to be processed in a compositional workflow for the production of symbolic musical structures (scores) or the control of sound synthesis. These two examples are clear illustrations of motion/gesture descriptions being linked to the more abstract conception of a musical gesture . One of our hypotheses here is that the use of gesture learning and recognition techniques mentioned previously in such situation might allow for a relevant, complementary approach and understanding of the status and identity of musical gestures in compositional processes.
A Compositional Workflow
As a first step of exploration, we decided to put together an operational framework for gesture-learning and recognition, based on the XMM library, within the OpenMusic computer aided composition environment.  We used this framework to implement composer Alireza Farhang’s musical research residency project at Ircam: Traces of expressivity. In this project, the composer’s intention was to produce a “dataflow score” informed by abstract musical “gestures”, dedicated to control the performance in the various media involved in a multidisciplinary work. The challenge, and open question at this point, was to assess the relevance of learning and recognition tools developed for motion data, in processing more abstract musical objects and (so-called) musical gestures.
The composer annotated his score with subjective labels corresponding to different classes of musical gestures. The subjective aspect is important here: this classification does not necessarily rely on quantifiable information, and can be informed by any arbitrary consideration from the composer. Corresponding segments were extracted from an audio recording of the piece, labeled with class-name tags, and used to train a model to estimate the class of any further incoming audio segment or comparable vector of audio-descriptor signals. Of course the approach is somehow naive, and the quality of the results will highly depend on the nature of the training vs. incoming audio signals, and most importantly, on the choice of sound descriptors and parameters used for building and running the model.
All these tasks can not be automatised and must be fine tuned according to the composer’s specific use, material at hand, and subjective goals. A main challenge at the core of this work is therefore the appropriation and integration of the “machine learning workflow” within the composer’s work and practice. Pre-processing, categorisation, labeling of a training data-set, calibration and fine-tuning of the system, all become part of the composer’s work and therefore require adequate tools, taking into account the specificity of his/her profile, expertise, and artistic approach.
The prototype under development considers arbitrary pairs of [data , label] for building, training an running the model. The data is a vector of n time-series corresponding, in the standard case, to a set of temporal descriptors (e.g. x, y, z, acceleration, orientation, etc. for a motion description, or any other set of signals for instance derived from standard audio descriptors). XMM models have the advantage of not requiring large training-sets: a few examples can be enough to recognise with fairly good accuracy simple shapes performed, drawn or input in the system (see Figure 1).
In our real-experiment case, however, numerous audio descriptors have to be tested, weighted and combined in order to find out and build efficient models, for which intensive, often time-consuming experimentation, observation and analysis of intermediate results are necessary. The graphical programming environment shall help composers to implement such experimental procedures (Bresson and Agon 2010), embedding some of them in iterative processes and storing/displaying results easily.
Figure 2 shows a composer’s workspace in OpenMusic, including an XMM model being trained and tested over a series of sound extracts for gesture classification.
We presented preliminary works towards the integration of machine learning tools and concepts in a computer-aided composition framework, and the prototype of a gesture recognition and analysis model implemented with the XMM library in the OpenMusic environment.
A great amount of tools and technology are currently available or in development, which could be used to help, accelerate, or improve a variety of compositional tasks. Techniques such as neural networks, data clustering, or Bayesian networks could well be adapted and applied to symbolic (sequential, hierarchical...) musical structures, for instance to classify and process chords, melodies, patterns, diagrams, or spatial information.
In these different cases, adequate tools and environments will also need to be developed in order to foster efficient and creative workflows and user experience. While applications of machine learning for composition so far have mostly been carried out using ad-hoc systems, a common generic platform would most likely contribute to a better and more productive appropriation of these techniques by musicians.
3- The prototype and figures presented in this paper run in the “O7” implementation of the visual language (Bresson et al. 2017).
Assayag, G.; Dubnov, S.; and Delerue, O. 1999. Guessing the Composer’s Mind: Applying Universal Prediction to Musical Style. In Proceedings of the International Computer Music Conference.
Assayag, G. 1995. Visual Programming in Music. In Proceedings of the International Computer Music Conference.
Bevilacqua, F.; Schnell, N.; Rasamimanana, N.; Zamborlin, B.; and Guédy, F. 2011. Online Gesture Analysis and Control of Audio Processing. In Solis, J., and Ng, K., eds., Musical Robots and Interactive Multimodal Systems. Springer.
Bresson, J.; Agon, C.; and Assayag, G. 2011. OpenMusic. Visual Programming Environment for Music Composition, Analysis and Research. In ACM MultiMedia 2011 (Open- Source Software Competition).
Bresson, J., and Agon, C. 2010. Processing Sound and Music Description Data Using OpenMusic. In Proceedings of the International Computer Music Conference.
Bresson, J.; Bouche, D.; Carpentier, T.; Schwarz, D.; and Garcia, J. 2017. Next-generation Computer-aided Composition Environment: A New Implementation of OpenMusic. In Proceedings of the International Computer Music Conference.
Briot, J.-P.; Hadjeres, G.; and Pachet, F. 2018. Deep Learning Techniques for Music Generation. Computational Synthesis and Creative Systems. Springer.
Burnett, M., and Scaffidi, C. 2014. End-User Development. In Soegaard, M., and Dam, R. F., eds., The Encyclopedia of Human-Computer Interaction. Interaction Design Foundation.
Camilleri, L. 1993. Computational Musicology. A Survey on Methodologies and Applications. Revue Informatique et Statistique dans les Sciences Humaines 29.
Caramiaux, B.; Montecchio, N.; Tanaka, A.; and Bevilacqua, F. 2014. Adaptive Gesture Recognition with Variation Estimation for Interactive Systems. ACM Transactions on Interactive Intelligent Systems 4(4).
Cope, D. 1996. Experiments in Musical Intelligence. A-R Editions.
Crestel, L., and Esling, P. 2017. Live Orchestral Piano, a system for real-time orchestral music generation. In Proceedings of the Sound and Music Computing Conference.
Dubnov, S., and Surges, G. 2014. Delegating Creativity: Use of Musical Algorithms in Machine Listening and Composition. In Lee, N., ed., Digital Da Vinci: Computers in Music. Springer.
Dubnov, S. 2008. Memex and Composer Duets: Computer-Aided Composition Using Style Modeling and Mixing. In Bresson, J.; Agon, C.; and Assayag, G., eds., The OM Composer’s Book. 2. Editions Delatour / Ircam.