Expertise in Jazz Guitar Improvisation a Cognitive Approach Review

Accepted on 24 Nov 2021Submitted on 26 Feb 2021

ane. Introduction

Attempts to generate music with the estimator are nigh as quondam as the computer itself, starting with Lejaren Hiller's ILLIAC Suite in 1959 ( Hiller and Isaacson, 1959 ). The underlying motivations are manifold, ranging from artistic experiments to proofs-of-concept, educational applications such every bit practising aids, ¹ and commercially viable products for royalty-free music. ⁱⁱ The model evaluated in this newspaper is based on another motivation, the development of a psychological model of jazz improvisation.

In that location is vast literature on jazz improvisation research, merely the model by Pressing ( 1984 , 1988 ) adult already in the 1980s is even so country-of-the-art even though it remains largely untested. Proving the adequacy of models for jazz improvisation is problematic every bit the required experimental procedures are difficult. The master reason is that jazz improvisers seem to have limited conscious access to their cognitive and motor processes during improvisation. Furthermore, experimental studies on jazz improvisation must rely on production paradigms which are difficult to evaluate and endure from a sampling problem, considering it is seldom possible to have improvisers play a large number of solos in a controlled lab setting. The external validity of these experiments is further ofttimes limited, as these experiments very frequently have to use calculator-generated (east. g., with Band-in-a-Box) or prerecorded backing tracks (e. 1000., Aebersold play-alongs), which offer no possibilities for interaction. Confronting improvisers with their generated solos and prompting self-reflective comments has proved to be ane of the almost fruitful approaches and then far ( Norgaard, 2011 ), only this yet has limitations, as jazz improvisers frequently just do not know in retrospect why they played certain notes or phrases. Another promising and recent arroyo is corpus-based jazz studies ( Owens, 1974 ; Pfleiderer et al., 2017 ) that aim at finding phenomenological patterns in jazz improvisations of (mostly eminent) jazz players.

Psychological models of jazz improvisation seldom make sufficiently hard predictions to allow them to be tested in experiments and with corpora. Hither, generative models tin help, following a general analysis-by-synthesis arroyo. We decided that our generative model of jazz improvisation should explicitly incorporate high-level knowledge of jazz improvisation while being at the same time probabilistic in order to model inaccessible factors and genuinely probabilistic aspects of an improvisation. One obvious model choice is hierarchical Markov models, which are employed in the current written report.

The newspaper is organized as follows. After some full general consideration about evaluation methods in Department 2 for generative models of jazz improvisation and an overview of related piece of work in Department three, we present in Department 4 our hierarchical Markov model which is based on mid-level analysis, the Weimar Bebop Alphabet, a simple rhythm model, and chord-scale theory. In Department 5, we written report on our exploratory experiment to evaluate factors influencing Turing-similar solo assessment by a group of experts and non-experts. Finally, we discuss in Section 6 our findings and suggest ideas for the evaluation of generative models of jazz improvisation.

two. Evaluation of Generative Models

Assay-by-synthesis approaches are only fruitful if the models tin be evaluated with respect to their adequacy. As the models serve as a "laboratory" to examination diverse hypotheses about improvisational processes, at that place is a certain demand for testing often and reliably, in club to employ a continuous improvement process while ruling for or against certain modeling options.

There are two main approaches to evaluation, which are complementary and non exclusive. Outset, generated solos can be evaluated with a Turing-similar test. Second, they tin can be judged confronting a corpus of "accepted" jazz solos, east. one thousand., in terms of melodic features, dramaturgy, or pattern content. In this study, nosotros decided to pursue only the first approach, due to length constraints, and also considering we think that the objective approach is principally less severe and powerful, as objective features are not likely to capture the total content of a solo with all the intricate correlations between musical dimensions. Simple first and second statistics for pitch and rhythm, such equally the MGEval variant used in Madaghiele et al. ( 2021 ), will not suffice for our needs, as these are partly fulfilled already by construction.

In gild to evaluate a generative model using Turing-like tests with a panel of judges or raters, ane has to solve several problems, which generally relate to performance aspects. First to note is that a Turing-test needs to be designed equally a signal-detection experiment, in which computer and human-generated solos are to exist assessed as either human or computer-generated. This necessitates a standardization of the solos in order to minimize confounds. A sentence of computer-generated solos based only on absolute criteria is possible, but it would still need baseline measurements on human-generated solos for a complete picture.

As our (and near other) algorithms generate score-like representations, one could either allow expert raters approximate the score directly, or the scores could exist transferred into sounding music for assessment. The commencement arroyo is seldom used every bit it has the disadvantage that it demands considerable skills from the raters and that imagining music from scores volition e'er be inferior to actual listening to music. Thus, in general, a listening experiment seems preferable.

All the same, the transfer process introduces additional degrees of freedom in pattern and, near chiefly, confounds judging the actual musical content and performative aspects, which results in assessment bias.

To produce sounding music from generated scores, one tin either use machine or human-generated renditions of solos, either with or without a musical context. As jazz solos without accompaniment rarely make musical sense, using an accompaniment seems mandatory. Letting human players perform the generated solo is an intriguing approach, but a very time and resource consuming method that does not calibration well. The most common and most practical solution is to utilize machine-generated renditions over machine-generated (or prerecorded) backing tracks.

A machine-generated solo rendition tin can be either deadpan MIDI (a score "as is") or post-candy by humans (or further algorithms). Tweaking tin exist applied to performance parameters (due east. g., microtiming, dynamics) and to the musical surface. The easiest solution of using deadpan MIDI data has the disadvantage that non-expressive operation might exist strongly associated with a computerized, non-homo functioning which might likely issue in rating bias as well.

Another issue is the proper choice of generated solos, as creativity is mostly conditioned on choice processes. This applies to both human- and computer-generated solos. A fair evaluation procedure tin can merely be based on some grade of random pick, but there are nevertheless free parameters, e. g., the choice of underlying tunes. On the man-generated side, the question is whether solos from masters or professionals or solos from non-professionals or students should be used. The choices hither are probable to influence the evaluation process and need careful considerations.

Finally, there is besides the question of whether to use proficient or amateurish listeners for a human panel. Proficient listeners are more probable to identify figurer-generated jazz solos simply by having more exposure to jazz and its implicit rules. As such, an adept panel might provide a more than severe test for the algorithmic model. On the other paw, jazz experts might also be (negatively) biased towards estimator-generated jazz solos, equally this goes against central points of ethics and aesthetics of jazz. Not-experts, on the other hand, while probably not being equally sensitive to details every bit experts, might be less biased in this regard. In low-cal of all these considerations, we thus felt the need that, before conducting a large scale evaluation of our algorithm, nosotros had to accost aspects of the evaluation procedure itself outset, which volition be the second focus of the newspaper.

3. Related Piece of work

iii.i Generation of Jazz Solos

There take been quite a few attempts to artificially create jazz solos, by and large of the monophonic type. The employed methods range from Markov models ( Pachet, 2003 , 2012 ) to rule-based models ( Johnson-Laird, 1991 , 2002 ; Quick and Thomas, 2019 ), (probabilistic) grammars ( Keller et al., 2013 ; Keller and Morrison, 2007 ), genetic algorithms ( Biles, 1994 ; Papadopoulos and Wiggins, 1998 ), amanuensis-based approaches ( Ramalho et al., 1999 ) and artificial neural networks ( Toiviainen, 1995 ; Hung et al., 2019 ; Wu and Yang, 2020 ). Some of these models practice not generate solos in the narrow sense, but, for instance, walking bass lines or atomic number 82 sheets. Almost systems piece of work offline, whereas some are interactive and real-time. A standardized and rigorous evaluation of these models is, however, often lacking. The algorithms were nigh often simply evaluated informally or qualitatively, either by the authors themselves or a pocket-size panel of experts. Recently, evaluation using objective features was also employed ( Yang and Lerch, 2020 ). 1 notable exception is the recent work past Wu and Yang ( 2020 ), who used a rather extensive subjective listening test, which notwithstanding was not the chief focus of the study. The evaluation algorithm is similar to the proposed algorithm here, equally it is likewise based on the Weimar Jazz Database and also incorporates mid-level unit annotations. Yet, as the model is a Transformer-variant, the implementation is quite different.

The Impro-Visor programme by Keller and co-workers ³ is open source software and freely available. Information technology seems that they moved recently from their original probabilistic grammars to RNN techniques such every bit LSTM and GAN-based networks for generation of solos ( Trieu and Keller, 2018 ; Keller et al., 2013 ).

The JIG system by Grachten ( 2001 ) has some similarities to the model proposed hither, every bit it also uses some form of abstract pitch motif, derived from Narmour'due south implication-realization model ( Narmour, 1990 ). It has also two modes of note generation, called 'melodying' and 'motif', which are, however, more short-ranged than the mid-level units used in our model.

The most recent additions are BebopNet ( Haviv Hakimi et al., 2020 ) and MINGUS ( Madaghiele et al., 2021 ), which are both deep learning Transformer models. BebopNet is based on a large collection of saxophone solos, whereas MINGUS is based on the Weimar Jazz Database and the Nottingham Database. The results of BebopNet tin can be listened to online and are partly convincing, particularly in longer lines, which might be due to the fact that some real bebop patterns are reproduced past the model. The authors of MINGUS found a similar level of functioning of their system to BebopNet.

iii.ii Evaluation of Artificial Jazz Solos

To the all-time of our cognition, no systematic work on how to set musical Turing tests for bogus jazz solos has been undertaken and then far. In a recent newspaper by Yang and Lerch ( 2020 ), a strong point was made about the quite pitiful country of formal evaluation methods for generative models of music. They acknowledge the power of Turing-like tests, which are not without problems though, but mainly advocate methods of objective evaluation which eddy down to comparing feature distributions betwixt original (training) and generated sets of music, similar to the idea of a "critic" proposed by Wiggins and Pearce ( 2001 ). Objective evaluation of generated music has the advantage that it can be unequivocally defined and thus reproducibly measured, only in our view, it can only be a preliminary or auxiliary step. The problem is that it has to rely on an arbitrary (though often obvious) fix of features, while the space of possible features is basically infinite. At to the lowest degree, expert care and extended domain-knowledge is necessary to devise such a organisation. Because of the not-fiddling merely crucial interaction of musical features (pitch, harmony, rhythm, meter, joint, dynamics and micro-timing), it is hardly to wait that befitting on single feature dimensions alone can guarantee the correct provisional distribution. On the other hand, this way, true Big-C Creativity ( Kaufman and Beghetto, 2009 ) might exist precluded. Yet, as the Pro-c creativity problem is not solved nonetheless, this might non be a relevant problem for the near future.

iv. The Model

4.1 Overview

An overview of the model tin can be found in Figure ane . The general aim is to generate a note sequence over a given chord sequence for a prespecified number of choruses, i. e., cycles of chord sequences. This is achieved by using a hierarchical model with mid-level units (MLU) at the tiptop level (Section 4.iii). After selecting an MLU, a sequence of Weimar Bebop Alphabet (WBA) atoms (Department 4.iv) is generated with a first-order Markov model conditioned on the selected mid-level unit. The pitches of these WBA atoms are then "realized" using the given chord context with the assistance of chord-scale theory (Section four.6), whereas the rhythm is generated based on a first-order Markov model of duration classes likewise conditioned on the containing mid-level unit. The number of tones in an MLU is predetermined past drawing from the length distributions conditioned on the type of mid-level unit. Even though in the original mid-level annotation system a musical phrase can contain more than than 1 MLU, we make hither the simplifying assumption that an MLU always constitutes a single phrase. After a phrase is generated in this manner, a short gap or intermission betwixt phrases is inserted past randomly drawing from the gap elapsing distribution, whereupon the whole procedure is repeated upwards until the specified number of choruses is generated. All involved probability distributions are estimated past the respective empirical distributions from the Weimar Jazz Database.

Overview of the generative model

Effigy 1

Overview of the generative model.

iv.2 The Weimar Jazz Database

The Weimar Jazz Database is a loftier-quality database of annotated transcriptions of monophonic jazz solos performed by eminent jazz performers from the US-American jazz canon. It covers well-nigh the entire history of jazz (1925–2009) and the most important tonal jazz styles, without claiming total representativeness. See Table 1 for a quick overview. ^four

Table ane

Curt overview of the Weimar Jazz Database.

The WJD contains an all-encompassing set of annotations such every bit metrical annotations, articulation, loudness, chords, forms, and metadata, every bit well as manually annotated phrases and mid-level units.

4.3 Mid-level Units

Mid-level analysis is a content-based qualitative annotation organization based on the thought that performers use curt-range action plans, which cover a elapsing of about 2–iii seconds. The organisation was originally developed for jazz piano improvisations ( Lothwesen and Frieler, 2012 ) and and so modified and extended for monophonic solos ( Frieler et al., 2016 ). Interviews with pianists showed that professional jazz players indeed used similar action plans ( Schütz, 2015 ). Based on an iterative qualitative assay of solos, a organization with ix main types (and 38 subtypes) of playing ideas or pattern units, called mid-level units, was devised and codification. Subsequently, the entire WJD was annotated manually, with a practiced inter-rater agreement on section boundaries and adequate understanding on unit of measurement labels. It could further be shown that the dissimilar MLU types indeed differ statistically on diverse aspects and that styles and performers differ in their application of mid-level units (e. g., Frieler, 2018 , 2020 ). The most common MLUs are line and lick MLUs, covering about 75 % of all MLUs as well every bit 75 % of solo durations ( Frieler et al., 2016 ). Lick MLUs are shorter and rhythmically more than diverse, whereas line MLUs are rhythmically more uniform and generally longer. For the nowadays model, only lick and line MLUs are used. Meet Section S1.1 in the supplementary information for a more than detailed description of the two primary types.

4.iv Weimar Bebop Alphabet

In an effort to find a more compact clarification of melodies, the Weimar Bebop Alphabet (WBA) was adult ( Frieler, 2019 ). The guiding principle was based on identifying brusk melodic units that make sense in their own right, either past musical conventions or musical instrument rehearsal practices such equally running scales and arpeggios. The system was devised based on expert cognition and phenomenological intuition. These units are called WBA atoms and are idea to serve as basic building blocks for melodic construction. As such, they can exist applied in principle to whatsoever melody, non only to jazz solos. Information technology represents a classification organization for interval sequences with 6 master and nine subcategories. Encounter Table two for an overview. The most bones categories are repetitions, scales (diatonic and chromatic), arpeggios, and trills (short oscillations of two tones). More specific is the course of 'approaches', where a target tone is 'encircled' by two other tones, i higher and one lower. Finally, there is a miscellaneous category, dubbed 'X' atoms, with the subcategory of links, which are X atoms of length 1, east. g., just one interval. The current form of the WBA should be viewed as preliminary, as, for example, the miscellaneous category is the largest. See Frieler ( 2019 ) for more than details.

Tabular array 2

Overview of WBA atoms.

Using a priority list, an interval sequence can exist segmented uniquely into a sequence of non-overlapping atoms of constant direction (see Frieler ( 2019 ) for de-tails). Hence, a WBA cantlet is unambiguously described by a short symbol for its category, a direction (ascending, descending, or horizontal), and a length (number of intervals). All the same, a single atom can have different realizations in terms of pitches, except for tone repetitions, which are the only unequivocal category. For example, an ascending diatonic scale cantlet of length 3 tin take the realizations [+2, +2, +2], [+ii, +2, +one], [+ii, +ane, +2], [+ane, +2, +ii], [+i, +2, +1] (but not [+1,+ane,+one] every bit this would be a chromatic calibration atom). This under-decision is the master piece of work-horse for the generative model.

In Frieler ( 2019 ), it was shown that the WBA atoms empirically follow at near a first-society Markov model. This is an interesting upshot, which undermined one of the original goals of the WBA—to find a significantly more compact description of melodies—, just it simplifies the generative model.

The average length of a WBA atom is, with two.3 intervals, rather small-scale. This means that every 2 to three intervals (3 to four tones) a change in character and/or direction takes identify, which is indicative of the high variability and complication of jazz improvisations and is an of import role of jazz melodic construction.

four.5 Rhythm Model

The rhythm model is currently a very simple i, basically just a showtime-order Markov model of inter-onset interval (IOI) classes. For this, inter-onset intervals are classified into five distinct categories (very short, short, medium, long and very long), which roughly correspond to metrical durations of sixteenth, 8th, quarter, half and whole notes. For the current model, IOI form transition probabilities for lick and line MLUs are sampled and used to generate inter-onset intervals. The current implementation of model uses but $iv 4$ M1 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \brainstorm{certificate} \[\begin{array}{fifty} 4\\ 4 \end{array}\] \end{document} fourth dimension with a sixteenth notation resolution, so the IOI classes are here, in fact, identical to the same metrical durations. In lodge to get meliorate results, ii farther tweaks were applied. For line MLUs, the durations were fixed to exist either sixteenth or eighth notes, with equal probability. Markov sampling from the true IOI distribution for line MLUs produced too many rhythmically inhomogeneous and syncopated rhythms, which shows that rhythm is not fairly modeled by this unproblematic arroyo. Ane reason for this is the interaction of rhythm and metrical constraints, likewise as the fact that micro-timing somewhat distorts the distribution, since the metrical note in the WJD was done algorithmically. Too, for lick MLUs, the generated onsets needed some smoothing. Here a uncomplicated resampling was used until the number of on-beat events was twice as high as the number of off-beat events.

four.half dozen Chord-Calibration Theory

Chord-calibration theory was used to fit the WBA atoms to the current chord context. Chord-scale theory was initiated by Russell ( 1953 ) and became one of the most popular harmonic theories in jazz didactics ( Cooke and Horn, 2002 ; Aebersold, 1967 ; Coker, 1987 ). Tonal jazz improvisation is based on chords, and way rules demand to select pitches that sufficiently match the underlying chords. In order to achieve this goal, chord-calibration theory provides a simple mapping from chords to scales which can be used by a player. For instance, a Cmaj⁷ chord implies (amidst others) either an Ionian (major) or a Lydian scale, which differ only by the fourth calibration degree which is raised in Lydian (♯xi). Playing just the pitches from either scale over a Cmaj⁷ chord volition more than or less guarantee a sufficient fit. Another reward of chord-calibration theory is that it is a unified framework for tonal and modal jazz alike. There is some give-and-take well-nigh the adequacy, benefits, and disadvantages of chord-calibration theory for jazz research and do ( Ake, 2002 ), merely this is outside the telescopic of this newspaper. It suffices to say that chord-scale theory is an approximation for modeling tonal choices in certain jazz styles, which seems to be useful for our present purposes.

In chord-scale theory, the mapping of chords to scales is not unique; typically several suitable scales are available to the player, depending on style, taste, and tonal context. For example, a min7 chord can exist mapped to a Dorian or an Aeolian scale, depending on whether information technology is interpreted as a ii ^vii or a i ⁷ chord in the current context, and whether the melody is considered modal or tonal in character.

For the generative model, we used a simple and fixed mapping of chords to scales with stock-still probabilities of being chosen. No attempts were made to observe the almost appropriate calibration for a chord, which would require harmonic analysis and ofttimes external fashion data. This is left for future extensions of the model.

Upon realizing a WBA atom over a chord with a sure starting pitch, first, a suitable scale is randomly selected by a simple weighted sampling from the set of allowed scales in Table 3 . So the pitches are generated based on the current WBA value, which has the form of an interval sequence. For diatonic and arpeggio atoms, occasionally, a link atom is inserted if the electric current starting pitch is not part of the chord or the respective scale. This is the but manner link atoms are used in the generative model. The rationale behind this is that a jazz performer might also employ links to get from an unsuitable pitch to a suitable one before executing an arpeggio or a diatonic calibration, equally these should (normally) match the current chord. For a more detailed description how the atoms are realized, see Section S2.2 in the supplementary information.

Table 3

Chord-scales used in the current model. Scale contents are given as pitch form vectors with 0 representing the root of the chord.

4.7 Implementation

The main algorithm is depicted in Algorithm i . It consists of two nested loops: the outer 1 generates phrases, and inner i generates pitch and rhythm sequences. Pitch sequences are generated based on a first-gild Markov model of WBA atoms, conditioned on the MLU, and rhythm sequences are added to the pitch sequences also conditional to the MLU (see Section 4.5).

Input to the algorithm is a lead sail, i. e., a chord sequence with metrical information, taken either from the iRealPro corpus or extracted from the WJD chord annotations. The model is currently constrained to $iv four$ M2 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -ane.0in \brainstorm{document} \[\begin{array}{l} 4\\ 4 \cease{assortment}\] \end{document} time and a sixteenth note tatum resolution, simply these restrictions could easily be lifted. Further input is a pitch range, in social club to avoid running out of instrument ranges. This is ensured by filtering out pitches that are out of range, and by adjusting WBA directions if the current pitch is 30% below the upper or 30% above the lower pitch range limit. This is a crude but effective simulation of bodily playing practise, which results in the common "regression to the mean" in melodic motion ( Von Hippel and Huron, 2000 ).

The main loop runs until the number of specified choruses is reached by using the onset ticks (in sixteenth units) as main control condition. The generated tone events are finally converted to a proprietary CSV representation which is then converted to MIDI or Lilypond scores using the MeloSpySuite/GUI software from the Jazzomat project ( Pfleiderer et al., 2017 ).

I instance of a generated solo, that was also used in the evaluation (Algorithm-1-Original, come across be-depression), tin be institute in Figure 2 .

Example of a generated solo over an F-blues chord sequence, used in the evaluation (Algorithm-1-Original)

Figure 2

Instance of a generated solo over an F-blues chord sequence, used in the evaluation (Algorithm-1-Original).

5. Evaluation

Every bit discussed in Section 2, we decided to address some full general problems of fair evaluation of generated jazz solos using Turing-like tests instead of starting with a large-scale evaluation of the model right away. The results of our exploratory experiment will inform the design of these in the hereafter. Furthermore, we did non explore the possibility of objective evaluations in this paper. We go out this also for the future.

5.1 Grooming of Stimuli

Nosotros produced a gear up of stimuli along the post-obit dimensions:

Skillful vs. bad algorithmic solos. We generated a set up of 50 solos containing 1 chorus of a simple jazz blues in F (over the chord sequence ∥F^seven | B♭⁷ | F⁷ | Cmin⁷ F^vii | B♭⁷ | B♭⁷ | F⁷ | F^vii | Gmin^seven | C⁷ | F⁷ | F^vii∥). After listening to the results, we selected one of the almost disarming solos and one of the least convincing ones.
Tweaked vs. raw algorithmic solos. For the most convincing artificial solo, we prepared ii versions. One was just the solo as generated past the algorithm, for the other one, we tweaked a few notes, which seemed suboptimal to our practiced ears, and manually added microtiming variations and dynamics for a more realistic performance.
Human vs. algorithmic solos. We selected three kinds of human solos to exist compared to the algorithmic ones. The commencement set of man solos were taken at random from the WJD, using the F blues subset. This independent one solo by Charlie Parker ("Billie's Bounce"), Miles Davis ("Vierd Dejection"), and Sonny Rollins ("Vierd Blues"). Next, we took four solos from a quondam (unpublished) study, where jazz students had improvised solos to an F blues play-along. The students had different levels of expertise (beginner, intermediate, advanced, and graduated). Thirdly, the authors recorded one solo each over the bankroll rails used for all stimuli. The start author (AUT1) played a unmarried line solo on a digital piano and the 2nd author (AUT2) played a solo on electric guitar.
Original vs. MIDI-fied solos. We used the original recordings of the authors and besides produced two MIDI-fied versions by either using the recorded MIDI from the piano solo or an automatically converted version of the guitar solo by using the audio-to-MIDI converter of the DAW plug-in Melodyne (editor version).

All MIDI versions were rendered with a tenor sax sample over the aforementioned backing track with pianoforte, bass, and drums (see Section S5 in the supplementary information). Just the two original solos by the authors did not apply the tenor sax sound and played thus the role of a baseline condition. For all candidate solos, only the first chorus was used and rendered with tempo 120 BPM to create our stimuli, which lasted for approximately 25 seconds each (run across Table iv ).

Table four

Stimuli used for the evaluation. In column Operation Type specifics of the estimation are given. Deadpan MIDI: Fully-quantized MIDI without dynamics; MIDI with microtiming: MIDI with semiautomatically added microtiming (swing); Audio: Recorded audio; Converted audio-to-MIDI: recorded audio converted to MIDI with Melodyne, keeping microtiming and dynamics; Recorded MIDI: human-played MIDI with microtiming and dynamics.

5.2 Method

Nosotros prepared an online survey using the SoSci-Survey platform with the 14 stimuli as the core items. Each solo had to exist assessed on a questionnaire containing 10 Likert-like items (cf. Table S2 for a consummate list) with answer options ranging from ane = "completely disagree" to 7 = "completely agree" for all items except items 8 to nine, which had their own range merely with the same polarity. The rationale was to present items that reflect typical qualitative judgments of jazz solos that practice not use deep jazz-specific terminology. The sequence of solos was randomized for each participant.

We collected basic demographic data (historic period, gender) and asked three self-cess questions pertaining to jazz and music expertise ("I am a jazz expert", "I am a jazz fan", "I am a music expert") using the same 7-choice Likert-scale. We likewise asked the participants for textual feedback on the experiment itself.

Ethics approving was non required by our host insti-tution for this study. Participants gave their informed consent earlier starting the experiment.

5.3 Participants

By advertizement on social media and budgeted friends and colleagues, we obtained a convenience sample of 41 participants (seven female person; hateful historic period 27.0, SD 9.9), of which 29 were identified as jazz experts based on the sum of responses to the iii expertise items being greater or equal to 12. The overall median value on the item "I am jazz fan" was 7 ("completely agree"). In determination, nosotros tin say that the sample contains a large share of jazz experts (on different levels), while all self-identified as jazz fans.

5.four Results

five.iv.1 Solo characteristics

As the scores on all items (except item x) showed various stiff correlations, nosotros reduced the variables by using a factor assay with three factors and oblimin rotation, which explained 85% of the variance (run into Section S2.1 in the supplementary data). The factors were named MUSICALITY (convincing, liking, expressive, swing, inventive, expertise), Complexity (virtuosic, complex), and RHYTHM_EXACT (rhythm verbal). The number of factors was determined using standard methods (KMO, Screeplot). The factor RHYTHM_EXACT will be not considered further, equally it is not very informative for our aims hither.

In Effigy iii(A) , MUSICALITY ratings tin be found. Algorithmic solos were ranked very low, except for the enhanced solo. 2 pupil solos were rated even lower than all algorithmic solos. Every bit expected, the most natural-sounding solo, 1 of the writer solos (AUT2-Original), was rated highest. Rankings by experts and non-experts are more or less like. Annotation that the range of values are quite big for well-nigh of the solos. For the Complexity cistron, as seen in Figure 3(B) , two of the pupil solos and the starting time algorithmic solo in two versions were among the top 5. Two other student solos and the Davis and the Rollins solo were rank everyman. The MIDI-fied versions of the Author's solos were consistently rated lower here than the original versions, despite identical musical content. Once again, the range of values is quite big.

Boxplot of MUSICALITY (A) and COMPLEXITY (B) values for all solos, separately for rater expertise. Left panels: Jazz experts; right panels: non-experts; blue: algorithmic solos; brown: author (MIDI) solos; yellow: student solos; green: author (original) solos; violet: WJD solos

Figure three

Boxplot of MUSICALITY (A) and COMPLEXITY (B) values for all solos, separately for rater expertise. Left panels: Jazz experts; correct panels: non-experts; blue: algorithmic solos; chocolate-brown: author (MIDI) solos; xanthous: student solos; green: author (original) solos; violet: WJD solos.

v.4.2 Recognition of origin

Nosotros accounted an algorithmically generated solo as correctly (and rather confidently) recognized, if the answer on Item 10 ("Do you think the notes of this solo were generated past a computer algorithm?", variable bogus) had a value of 5 or larger, otherwise we accounted it as unrecognized. Conversely, a human generated solo was accounted as correctly recognized if information technology received a value of three or less on Particular x, and every bit unrecognized otherwise. The recognition accuracy of a solo is so divers as the proportion of responses counted as correctly recognized. The results can exist found in Table 6 , separately for experts and non-experts and human and algorithmically generated solos. A more detailed brandish for all 14 stimuli can be found in Figure 5 and in Table S4 in the supplement. Only four solos have a recognition accuracy whose 95 % confidence intervals do non cross the random baseline of 50 % (AUT2-Original, WJD-Sonny Rollins, Educatee-Intermediate, Algorithm-ii-Original). For the not-jazz experts, this is simply true for one solo (AUT2-Original). Some solos are consistently misclassified (AUT1-MIDI and Student-Graduated). Algorithm-two-Original, the unconvincing solo, is successfully identified as computer-generated with a mean accuracy of 69 %, though this comes mostly from the jazz experts; the non-experts are basically guessing here. Algorithm-1-Original performs better with an accuracy of 59 %. The improved version Algorithm-ane-Improved is able to fool the raters with an overall mean accuracy of 44 %, just experts are better with an accuracy of 53 % slightly over the random baseline, whereas not-experts are completely at loss with an accuracy of only 18 %. Interestingly, one of the student solos (Student-Graduated) is the one most clearly misclassified. The MIDI-fied versions of the author solos are regarded equally computer-generated by experts and non-experts alike. This might be an effect of the unlike joint and approaches by rendering piano and guitar solos with a tenor saxophone sound, e. thousand., due to differing attack times. They are also rated much worse on the other factors compared to the original version. This issue tells a cautionary tale. For the WJD solos, Sonny Rollins's solo is most clearly identified as human-generated, whereas the other WJD solos are rated much more ambiguously. For the Charlie Parker solo this might come from the fact that the bankroll track had a slightly unlike chord sequence than the original solo and the tempo was perceivably slowed downwardly from the original. The original version of Miles Davis's solo besides has slightly different chords, merely the nigh important factor might exist that the very spacious solo of Davis works because the piano player fills in the spaces, which is, of class, not the case for the MIDI-fied version used hither.

Boxplot of liking values for sources of solos, separately for rater expertise. Left boxes: jazz experts, right: non-experts

Figure 4

Boxplot of liking values for sources of solos, separately for rater expertise. Left boxes: jazz experts, correct: not-experts.

Recognition accuracy of all 14 stimuli by expertise level. Left: all, middle: jazz experts, right: no jazz experts. Error bars are 95% confidence interval of proportion

Effigy 5

Recognition accuracy of all 14 stimuli by expertise level. Left: all, middle: jazz experts, right: no jazz experts. Error bars are 95% confidence interval of proportion.

The pooled accuracies for expert/not-expert raters and human/algorithmic solos can be found in Table half dozen . Experts had an accuracy of 64 % for correctly identifying algorithmic and 54 % for human solos, which is only slightly above chance. For non-experts the aggregated values are even below gamble level, 41 % for algorithmic and 44 % for homo solos, which means that not-experts tend to assume algorithms at piece of work even when this was not the case. This might be a event of the explicit framing of the survey as a "Turing Test" for jazz, which might have raised the baseline expectation for computer-generated solos, while, in fact, only a minority (3 out of fourteen) were really computer-generated.

v.4.iii Human relationship of identification and characteristics

A correlation analysis of the 2 factors MUSICALITY and COMPLEXITY with artificial tin can be seen in Table 5 . MUSICALITY and Complication are strongly positively correlated, whereas MUSICALITY and artificial are strongly negatively correlated. As liking (item 8) is the strongest contributing cistron to MUSICALITY, we checked if recognition accurateness might exist related to liking. Interestingly, liking and recognition accuracy are strongly positively correlated with r = .86 for human generated solos, but strongly negatively correlated for computer-generated solos with r = –.66. This suggests that the participants judge solos every bit human-generated on the ground of their liking of the solo, probably based on a bias against artificially generated music ( Moffat and Kelly, 2006 ).

Table 5

Pearson'south correlation coefficients for MUSICALITY (MUS), Complication (COMP), and bogus (Particular ten). All correlations p ≤ .001.

Table 6

Recognition accuracy for computer and human generated solos past experts and non-experts.

5.iv.4 Human relationship of recognition and liking

We checked further, if and to what extent the solos were aesthetically pleasing for the participants, by looking at the ratings on item 8 ("How did you like the solo extract? (not at all-very much)?"). Commencement, nosotros conducted a mixed linear regression (package lmerTest for R) to see whether jazz experts and non-good differ in their overall liking scores. Indeed, the not-expert had slightly higher liking score (β = 0.385, df = 531, p = .018), with experts having a mean of 3.72 and non-experts one of 4.1, so not-experts seem to be more forgiving. In that location were too stark differences between subjects in the ratings. The participants mean values of liking ranged from 1.29 to 5.43 with a hateful of 3.8, a median of 4.0, and a standard deviation of 0.95. The difference in liking with respect to the source of the solo (author original, author MIDI, WJD, student, algorithm) can be found in Figure 4 . Every bit expected, the original solos were liked the best (hateful liking = 5.14), followed by the master solos from the WJD (hateful liking = 4.37), the MIDI-fied author solos (hateful liking = 3.47), the algorithmic solos (hateful liking = 3.37), and the students' solos (hateful liking = 3.29). A linear mixed model with source and jazz expertise every bit fixed effects and participant as random effect showed that there is no significant difference between experts and non-experts if individual preferences are taken into account, and that only original (β = i.77, SE = 0.23, Z(503) = 7.seven. Pr(> |t|) < .0001) and WJD solos (β = i.00, SE = 0.206, Z(503) = four.85. Pr(> |t|) < .0001) were significantly more than liked than algorithmic, MIDI-fied author, and educatee solos. Actually, ii of the student solos were liked less than all iii algorithmic solos, whereas the other two were liked considerably better. The mean liking values for all sources did not attain the neutral level of iv except for WJD and writer originals. The liking ratings of the author solo AUT2 dropped from 5.69 for the original recording to three.51 for the MIDI-fied tenor sax version, a huge loss of d = 2.19 on a vii-indicate calibration. For AUT1, the drop was from four.59 to 3.44 (d = i.15). The difference, particularly for AUT2, is hit and clearly demonstrates the influence of a natural-sounding performance on aesthetic appreciation (of jazz solos).

For the variable inventive, the ranking of solos is very similar to that of liking, with the exception that Algorithm-1-Improved is ranked fourth. In terms of source or origin, the algorithmic solos were tied with the MIDI-fied writer solos (mean inventive = 3.77 for both), whereas the rest of the ranking was the same every bit for liking (Originals: 4.42, WJD: iii.85, students: iii.iv).

half-dozen. Discussion

6.one Evaluation of our Model

We presented a novel algorithm to generate monophonic jazz solos over a given chord sequence. Nosotros evaluated the algorithm with a Turing-similar listening test with 41 jazz-affine participants. We could show that a paw selected, edited, and expressive rendition of one of the generated solos (Algorithm-1-Improved) could fool the panel, equally it was slightly more than oft considered to be human-generated than calculator-generated. Moreover, the recognition accuracies verged on the chance level, so we tin can land that fifty-fifty the jazz experts were not entirely sure about its origin. On the other hand, the unedited, deadpan version of the same solo was successfully recognized every bit computer-generated, at least by the experts, but however with considerable doubtfulness. The second, "bad" solo was even more than clearly recognized as computer-generated, mainly past the jazz experts in the panel, whereas the not-experts were not completely certain here either. These results were anticipated, but they accept to be viewed in perspective, equally even solos by eminent jazz masters such equally Charlie Parker and Miles Davis were frequently judged equally computer-generated, when presented as deadpan MIDI over a computer-generated backing rail. Only the original audio solo by the 2nd writer was unequivocally considered to exist human-generated, which was expected as this was specifically included as a baseline. However, the second original solo (AUT1-Original) was not as often recognized as human-generated and the MIDI-fied version of the second original solo (AUT2-MIDI) was likewise considered mainly to be computer-generated. This conspicuously demonstrates that an expressive, "natural" performance is crucial for human judgements in Turing-similar music tests. Furthermore, comparing of algorithmically generated solos with those of jazz greats might as well not exist the most important test as the estimator-generated solos were recognized correctly more ofttimes than three of the four pupil solos. This seems to make sense, as devising a successful algorithm that is able to invent masterly solos seems a bit besides much to ask for, thus a comparison with less practiced performers might be more off-white.

The functioning of our new algorithm is promising, as information technology is just in its early stages, just a proof-of-concept, and uses a relatively simple model. This is withal quite powerful, because it is based on empirical and analytical results of jazz improvisation and powered past a rather large database of solo transcriptions.

At that place are many possible avenues for improvement. The about obvious weakness of the model is the rhythm model, particularly for lick MLUs. The simplified rhythm model for line MLUs works rather well. For further improvements on the rhythm model, one would need a indepth assay of rhythm and meter and their interaction in jazz solos, similar to the WBA study, merely which is unfortunately missing, equally of yet. Also, the model does not allow for incorporating pre-learned patterns, which basically all jazz improvisers exercise ( Norgaard, 2014 ). Nosotros plan to improve the algorithm along-side further empirical enquiry on jazz improvisation in an iterative process.

6.2 Evaluation of Evaluation

Nosotros also explored the problem space of Turing-similar evaluation of computer-generated jazz solos. Along our bones distinctions, we found these results.

Skilful vs. bad algorithmic solos. The worse (equally judged by the authors) algorithmic solo was less favorably evaluated by our respondents in all aspects, and besides much more than oft correctly identified as computer-generated.
Tweaked vs. raw algorithmic solos. Fifty-fifty a little tweaking of the musical surface can improve the assessment of a solo. One possible reason is that even minor or spurious signals of "not-authencity" can exist picked upward past humans to make their judgement.
Human vs. algorithmic solos. Solos by jazz masters were more often than not better judged than solos past jazz students and algorithmic solos, but some of the algorithmic solos performed conspicuously better than student solos. On the other hand, deadpan synthesised solos of masters were rated worse than "natural"-sounding solos by the authors.
Original vs. MIDI-fied solos. Performance seems crucial as the MIDI-fied versions of the original author solos rendered with different instrumental sounds were rated much worse and likewise less often recognized every bit human-generated solos.

Besides this, we establish generally low inter-rater agreement and large variances of judgements, and saw articulate differences between jazz experts and not-jazz experts. Finally, we establish evidence of bias confronting figurer-generated music, in the sense that participants seemed to expect computer-generated solos to be worse and were more likely to assume not-human being origin if they did not like the solos. We also noted that by framing the experiment as a "Turing Test" people tended to expect more computer-generated solos than in that location actually were. Hence, their rating behaviour could have been influenced by expectations for these kinds of studies.

7. Decision and Outlook

In calorie-free of these results, we come across three possible approaches for large-calibration evaluation of generative models for jazz solos.

Using Human- and computer-generated solos performed past a unmarried human histrion over a fixed accompaniment to keep all background factors abiding while providing reasonably expressive performances with microtiming, articulation, dynamics, and timbre features (eastward. g., vibrato, slurs, bends). This process requires considerable effort and is probably non suitable for testing many solos at in one case. Interestingly, in the free exam feed-back field of the survey, where we asked for farther comments, many participants suggested exactly this kind of procedure (see Department S5 in the supplementary information).
Devising a system that is capable of generating sufficiently natural jazz performances. This would be very efficient for quick and frequent evaluations in conjunction with using a service for recruiting online participants. Such an algorithm does not be yet and might thus require considerable development try. Information technology might be in achieve with the current country of applied science. However, in contrast to the field of classical music, non much inquiry has gone into developing such a arrangement, all the same, merely see Friberg et al. ( 2021 ); Arcos et al. ( 1998 ). Once such a arrangement is bachelor, large scale evaluation will become easy and cheap.
Using deadpan versions of human- and calculator-generated solos. Because of the low baseline accuracy in this setup, a large number of raters and solos would be required to go reliable estimates. The reward of this approach is that it is relative inexpensive to realize, even though the recruitment of a sufficiently large number of experts might exist an issue, merely using an fifty-fifty larger number of non-experts could remedy this problem.

Finally, there is the complementary option of evaluating solos based on objective features, as proposed by Yang and Lerch ( 2020 ), similar to the "critic" in the evaluation framework proposed past Wiggins and Pearce ( 2001 ). This is rather straightforward to implement if suitable corpora are bachelor and we want to explore it in the futurity. Some other option, often tacitly or explicitly office of whatever algorithm development, are formal or informal analyses by experts. For case, some tweaks and pattern decisions that ended up in the current version of the WBA-MLU algorithm were informed past the expertise of the authors.

But, ultimately, we think that there will be no way effectually Turing-similar tests, as objective characteristic distributions are not likely to provide a sufficient description of music, and clearly not of truly innovative music, whereas expert analytic evaluations do not scale. Such approaches could exist useful to select the most promising candidates from a set of generated solos (if style conformity is the goal, which is often the case in analysis-by-synthesis contexts).

One last remark should exist fabricated in regard that the evaluation of human vs. computer-generated solos is driven by aesthetics and a bias against computer-generated music. This implies that for commercial (and other) applications of algorithmically generated music, it has to be very good, i. e., being unrecognizable as such, otherwise the cognition about its origin tin can result in audience aversion against the product.

Here, more research is needed, every bit in this small airplane pilot report, we could only find some evidence in this direction, which clearly warrants more targeted and systematic examination of this miracle (cf. Moffat and Kelly ( 2006 ); Chamberlain et al. ( 2018 ), who found such bias against computer-generated music and art-works).

This study had a mostly exploratory nature, both in regard to the proposed novel algorithm and an evaluation procedure for monophonic jazz solos, and presents promising and insightful starting time results to both aspects. Our future plans are twofold. First, we want to elaborate the WBA-MLU model farther, every bit many simplifying assumptions were used at the moment in order to create a functioning organisation. Secondly, we programme to improve the evaluation framework. Nosotros think it is worth to check if the strategy that humans play artificially generated solos is feasible. Additionally, we call back that the evolution of a system that is capable to render more than natural-sounding performances, if probably only for a single type of instrument, is in reach with the current land of technology.

Notes

Reproducibility

There is an accompanying OSF site for this paper: https://osf.io/kjsdr. It contains the R project jazz-turing with the survey data, the sound stimuli, and the analysis code for the evaluation. The folder samples provides xl generated solos as MIDI files and scores. In that location is as well a link to parkR, an R package for solo generation based on our model, which can be installed from https://github.com/klausfrieler/parkR.

Acknowledgements

We like to thank all participants in the evaluation experiment, Simon Dixon for proof-reading the manuscript, and three anomymous reviewers for their helpful comments.

Competing Interests

The authors have no competing interests to declare.

Author Contributions

KF developed the generative algorithm, developed, conducted, and analyzed the evaluation survey, and wrote the paper. WGZ developed and conducted the evaluation survey and wrote the paper.

References

Aebersold, J. (1967). A New Approach to Jazz Improvisation.
Ake, D. A. (2002). Jazz Cultures. University of California Press, Berkeley. DOI: https://doi.org/x.1525/9780520926967
Arcos, J. L., Lopéz de Mántaras, R., and Serra, 10. (1998). SaxEx: A case-based reasoning system for generating expressive musical performances. Journal of New Music Inquiry, 27:194–210. DOI: https://doi.org/10.1080/09298219808570746
Biles, J. A. (1994). GenJam: Evolution of a jazz improviser. In Proceedings of the International Computer Music Conference (ICMC), pages 131–137.
Chamberlain, R., Mullin, C., Scheerlinck, B., and Wagemans, J. (2018). Putting the fine art in artificial: Aesthetic responses to computer-generated fine art. Psychology of Aesthetics, Creativity, and the Arts, 12(ii):177–192. DOI: https://doi.org/10.1037/aca0000136
Coker, J. (1987). Improvising Jazz. Simon & Schuster, New York, 1st fireside edition.
Cooke, M., and Horn, D. (2002). The Cambridge Companion to Jazz. Cambridge University Printing, Cambridge, UK. OCLC: 758544526. DOI: https://doi.org/10.1017/CCOL9780521663205
Friberg, A., Gulz, T., and Wettebrandt, C. (2021). Calculator tools for modeling swing timing interactions in a jazz ensemble. In 16th International Conference on Music Perception and Knowledge and 11th Triennial Conference of the European Society for the Cerebral Sciences of Music (ICMPC16-ESCOM11), Sheffield, United kingdom of great britain and northern ireland.
Frieler, Thou. (2018). A feature history of jazz solo improvisation. In Knauer, W., editor, Jazz @ 100: An Alternative to a Story of Heroes, book 15 of Darmstadt Studies in Jazz Research. Wolke Verlag, Hofheim am Taunus.
Frieler, K. (2019). Constructing jazz lines: Taxonomy, vocabulary, grammar. In Pfleiderer, M. and Zaddach, West.-Chiliad., editors, Jazzforschung heute: Themen, Methoden, Perspektiven, Berlin. Edition Emwas.
Frieler, K. (2020). Miles vs. Trane: Computational and statistical comparing of the improvisatory styles of Miles Davis and John Coltrane. Jazz Perspectives, 12(i):123–145. DOI: https://doi.org/x.1080/17494060.2020.1734053
Frieler, K., Pfleiderer, M., Abeßer, J., and Zaddach, Due west.-G. (2016). Midlevel analysis of monophonic jazz solos: A new arroyo to the study of improvisation. Musicae Scientiae, twenty(two):143–162. DOI: https://doi.org/10.1177/1029864916636440
Grachten, One thousand. (2001). JIG: Jazz Improvisation Generator. In Proceedings of the MOSART Workshop on Electric current Enquiry Directions in Computer Music, Barcelona, Espana.
Haviv Hakimi, S., Bhonker, Northward., and El-Yaniv, R. (2020). BebopNet: Deep neural models for personalized jazz improvisations. In Proceedings of the 21st International Guild for Music Information Retrieval Conference, Montréal, Canada.
Hiller, L. A., and Isaacson, Fifty. M. (1959). Experimental Music: Composition With an Electronic Computer. McGraw-Hill, New York.
Hung, H.-T., Wang, C.-Y., Yang, Y.-H., and Wang, H.-Chiliad. (2019). Improving automatic jazz melody generation by transfer learning techniques. In Proceedings of the Asia-Pacific Signal and Information Processing Association Almanac Summit and Briefing, pages 1–8, Lanzhou, Red china. DOI: https://doi.org/10.1109/APSIPAASC47483.2019.9023224
Johnson-Laird, P. North. (1991). Jazz improvisation: A theory at the computational level. In Howell, P., Due west, R., and Cantankerous, I., editors, Representing Musical Construction, London. Bookish Press.
Johnson-Laird, P. N. (2002). How jazz musicians improvise. Music Perception, 10:415–442. DOI: https://doi.org/10.1525/mp.2002.19.three.415
Kaufman, J. C., and Beghetto, R. A. (2009). Beyond big and lilliputian: The Four C Model of Inventiveness. Review of General Psychology, 13(one):1–12. DOI: https://doi.org/10.1037/a0013688
Keller, R., Schofield, A., Toman-Yih, A., Merritt, Z., and Elliott, J. (2013). Automating the explanation of jazz chord progressions using idiomatic analysis. Calculator Music Journal, 37(4):54–69. DOI: https://doi.org/ten.1162/COMJ_a_00201
Keller, R. 1000., and Morrison, D. (2007). A grammatical arroyo to automatic improvisation. In Proceedings of the 4th Sound and Music Computing Conference, pages 330–337, Lefkada, Greece.
Lothwesen, 1000., and Frieler, K. (2012). Gestaltungsmuster und Ideenuss in Jazzpiano-Improvisationen: Eine Pilotstudie zum Einfluss von Tempo, Tonalität und Expertise. In Lehmann, A., Jeßulat, A., and Wünsch, C., editors, Kreativität: Struktur und Emotion. Königshausen & Neumann, Würzburg.
Madaghiele, Five., Lisena, P., and Troncy, R. (2021). MINGUS: Melodic improvisation neural generator using Seq2Seq. In 22nd International Society for Music Data Retrieval Conference.
Moffat, D., and Kelly, M. (2006). An investigation into people's bias confronting computational inventiveness in music composition. In Third Joint Workshop on Computational Creativity, ECAI 2006, Trento, Italy. Universita di Trento.
Narmour, E. (1990). The Analysis and Cognition of Basic Melodic Structures: The Implication-Realization Model. University of Chicago Press, Chicago.
Norgaard, M. (2011). Descriptions of improvisational thinking by artist-level jazz musicians. Journal of Research in Music Education, 59(two):109–127. DOI: https://doi.org/10.1177/0022429411405669
Norgaard, M. (2014). How jazz musicians improvise: The central role of auditory and motor patterns. Music Perception: An Interdisciplinary Periodical, 31(3):271–287. DOI: https://doi.org/10.1525/mp.2014.31.iii.271
Owens, T. (1974). Charlie Parker: Techniques of Improvisation. PhD thesis, University of California, Los Angeles.
Pachet, F. (2003). The Continuator: Musical interaction with way. Periodical of New Music Inquiry, 32(3):333–341. DOI: https://doi.org/10.1076/jnmr.32.3.333.16861
Pachet, F. (2012). Musical virtuosity and creativity. In McCormack, J. and d'Inverno, M., editors, Computers and Inventiveness, pages 115–146. Springer, Berlin, Heidelberg. DOI: https://doi.org/10.1007/978-3-642-31727-9_5
Papadopoulos, Grand., and Wiggins, G. (1998). A genetic algorithm for the generation of jazz melodies. In Human and Artificial Information Processing: Finnish Conference on Bogus Intelligence (Footstep'98), Jyväskylä, Republic of finland.
Pfleiderer, M., Frieler, K., Abeßer, J., Zaddach, Westward.-1000., and Burkhart, B., editors (2017). Inside the Jazzomat: New Perspectives for Jazz Inquiry. Schott Music GmbH & Co. KG, Mainz, 1. Auflage OCLC: 1015349144.
Pressing, J. (1984). Cognitive Processes in Improvisation. Cognitive Processes in the Perception of Art. Elsevier, North-The netherlands. DOI: https://doi.org/10.1016/S0166-4115(08)62358-four
Pressing, J. (1988). Improvisation: Method and models. In Sloboda, J. A., editor, Generative Processes in Music: The Psychology of Operation, Improvisation, and Composition, pages 129–178, Oxford. Clarendon. DOI: https://doi.org/10.1093/acprof:oso/9780198508465.003.0007
Quick, D., and Thomas, K. (2019). A functional model of jazz improvisation. In Proceedings of the 7th ACM SIGPLAN International Workshop on Functional Art, Music, Modeling, and Blueprint (FARM 2019), pages 11–21, Berlin, Germany. ACM Press. DOI: https://doi.org/x.1145/3331543.3342577
Ramalho, G. L., Rolland, P.-Y., and Ganascia, J.-One thousand. (1999). An artificially intelligent jazz performer. Journal of New Music Research, 28(2):105–129. DOI: https://doi.org/10.1076/jnmr.28.2.105.3120
Russell, K. (1953). George Russell'south Lydian chromatic concept of tonal system. Concept Pub. Co, Brookline, Mass. OCLC: ocm50075662.
Schütz, K. (2015). Improvisation im Jazz: eine empirische Untersuchung bei Jazzpianisten auf der Ground der Ideenussanalyse. Number 34 in Schriftenreihe Studien zur Musikwissenschaft. Kovač, Hamburg. OCLC: 915812622.
Toiviainen, P. (1995). Modelling the target-note technique of bebop-style jazz improvisation: An artificial neural network arroyo. Music Perception, 12:398–413. DOI: https://doi.org/10.2307/40285674
Trieu, Due north., and Keller, R. (2018). JazzGAN: Improvising with generative adversarial networks. In Proceedings of the 6th International Workshop on Musical Metacreation (MUME 2018), Salamanca, Espana.
Von Hippel, P., and Huron, D. (2000). Why do skips precede reversals? The effect of tessitura on melodic construction. Music Perception: An Interdisciplinary Journal, 18(1):59–85. DOI: https://doi.org/10.2307/40285901
Wiggins, G., and Pearce, M. (2001). Towards a framework for the evaluation of machine compositions. In Proceedings of the AISB'01 Symposium on Bogus Intelligence and Inventiveness in the Arts and Sciences, pages 22–32, York.
Wu, S.-L., and Yang, Y.-H. (2020). The Jazz Transformer on the front line: Exploring the shortcomings of AI-composed music through quantitative measures. In 21st International Society for Music Information Retrieval Conference, pages 142–149, Montréal, Canada.
Yang, 50.-C., and Lerch, A. (2020). On the evaluation of generative models in music. Neural Computing and Applications, 32(9):4773–4784. DOI: https://doi.org/10.1007/s00521-018-3849-7

pendletonmandeseent.blogspot.com

Source: https://transactions.ismir.net/articles/10.5334/tismir.87/