Vocal communication evolved over 400M years ago

by -189 views


It has been hypothesized that the generative power of language is derived from a cognitive capacity called “Merge”1,two,three. No affair how the meaning is created, Merge allows senders to combine two linguistic items (e.yard., two words or two phrases) into larger sequences and receivers to recognize it as a single unit1,2,3. The about basic grade of Merge is oft specifically referred to as “core-Merge” in which 2 words are combined to form a new unit (e.grand., come + talk = {come up talk}, a + dog = {a dog})two,3. Although one time considered a uniquely human chapters, contempo field studies suggest intriguing parallels with core-Merge in non-homo animals (hereafter animals): several species of birds and mammals combine ii telephone call types, each with their own meaning, into larger sequences which evoke specific behavioural responses in receivers that are different than their responses to each component phone call type4,5,half dozen. Notwithstanding, in that location is an alternative explanation for these animate being examples that does non depend on cadre-Merge: receivers may perceive a two-call sequence as ii private calls that are arbitrarily produced in close fourth dimension proximity (i.e., temporally linked), non as a single unit of measurement5,7,viii,nine. Due to the lack of studies examining whether animals perceive a 2-telephone call sequence as a unmarried unit of measurement or not, it remains unknown whether cadre-Merge is unique to humans or whether it has also evolved in not-human species.

In this study, nosotros develop a novel paradigm to examination whether animals utilize core-Merge to interpret ii-telephone call sequences (i.e., if they recognize ii combined telephone call types coming from a single private as forming a single unit) or whether they respond to any temporally linked calls (i.e., arbitrarily produced or not, coming from one individual or more) in the same manner. To do this we suggest the following single-sender/multiple-sender prototype. If an animal has evolved core-Merge, and then information technology should be able to distinguish a two-call sequence produced by a single individual (i.e., combined calls) from two temporally linked calls produced past multiple individuals (i.eastward., non-combined calls). In other words, beast receivers should recognize whether 2 temporally linked calls are produced from the same spatial location (see Fig. i for a human instance). On the other hand, if cadre-Merge does non operate, then the temporal linkage of two calls should be sufficient to evoke specific responses; receivers should show similar responses to two calls being produced by one or two sources, as long every bit they are temporally linked. Using one- and two-speaker playbacks, nosotros tin can test whether receivers’ responses to ii temporally linked calls depend on whether they are produced by a single sound source, and, if and then, this provides evidence for core-Merge.

Fig. 1: Experimental paradigm for testing core-Merge.

In language, core-Merge allows receivers to recognize two temporally liked words equally a single unit (e.yard., “come” + “talk” = {come talk}).
However, if the same words are separately given by 2 persons, receivers may perceive them every bit ii individual messages (eastward.thou., “come” from i person and “talk” from the other). If an fauna species has evolved core-Merge, then receivers’ responses to 2 temporally linked calls should depend on whether the two calls are produced by a single individual.

Full size image

We use this paradigm to explore core-Merge in a wild bird species, the Japanese tit (Parus small) (Fig. 2a). These birds produce alert calls when warning conspecifics near danger, such as the presence of predators, while they produce acoustically distinct recruitment calls when attracting conspecifics to non-dangerous situations, such as food locations or during nest visitations10,11. They often combine these telephone call types into ordered sequences (alert-recruitment call sequences) when gathering other individuals to approach and harass (i.e., mob) a predator10
(Fig. 2b). Previous experiments showed that tits display different behaviours when hearing warning calls (moving their head horizontally as if scanning for danger) and recruitment calls (approaching the sound source)xi. In response to alarm-recruitment call sequences, tits progressively approach the sound source while continuously scanning the horizon, suggesting that they detect chemical compound data (i.e., “alertly” + “approach”)eleven. Still, if the call order is artificially reversed, tits reduce their response, indicating that they perceive whether the component calls are temporally linked into specific sequences11.

Fig. two: Written report system.
figure 2

Japanese tits (a) combine alarm and recruitment calls into warning-recruitment call sequences (b) when gathering other individuals to mobbing of a predator.

Total size prototype

Based on these findings, we hypothesized that Japanese tits accept evolved core-Merge and recognize an alert-recruitment telephone call sequence as a single unit. If this is the case, then tits are expected to exhibit appropriate responses to alert-recruitment call sequences given by a single individual; notwithstanding, they should not perceive the aforementioned information when alert calls and recruitment calls are separately produced by two individuals. To test this prediction, we exposed gratuitous-living flocks of Japanese tits to (i) alert-recruitment call sequences broadcast from a single speaker and (2) alert calls and recruitment calls broadcast from 2 speakers in turn, post-obit the alarm-recruitment society (Fig. 3a, b). To ensure that the differences between the two treatments only depend on the number of speakers, we created all the audio files using the same procedure; nosotros copied alert calls and recruitment calls separately from Japanese tits’ natural call sequences, then pasted them onto groundwork noise files, making the intervals between these call types abiding (0.1 s) across treatments. We created 90-s-long playback stimuli containing the same number of alert and recruitment calls (thirty calls for each call type) at a natural calling rate (one telephone call per 3 s for each speaker).

Fig. iii: Experimental gear up-upwards and sound files.
figure 3

If an animal uses core-Merge to perceive call sequences, and so information technology should be able to assess whether the component calls are produced from the same spatial location, as well as whether they are temporally linked into naturally ordered sequences. Japanese tits are exposed to a shrike specimen in combination with four types of playback stimuli:
alert calls and recruitment calls are broadcast from i speaker as temporally linked, warning-recruitment sequences,
the same ii calls are broadcast from two speakers, while they are temporally linked,
recruitment calls and alert calls are circulate from one speaker, but they are not naturally ordered,
the two calls are not linked in either space or time. In two-speaker treatments, the speakers and a shrike specimen were placed in a direct line.

Full size prototype

Upon finding a flock, nosotros placed ane or two Bluetooth speakers (SoundLink Micro, Bose) on tree branches. In treatments with two speakers, we separated them past ten g, which is a natural distance between two individuals within a flock. We likewise placed a taxidermic specimen of bull-headed shrike (Lanius bucephalus) on a tree branch v grand from the speaker(s) in a natural perching posture (Fig. 3). Bull-headed shrikes are a major predator of minor passerines, and tits often approach and harass them with wing flicking displays (i.eastward., mobbing)12,xiii. Exposure of a predator specimen in combination with playback stimuli immune u.s. to measure out tits’ mobbing behaviours during one- and two-speaker playbacks through a common standard. During 90-s of playback, we recorded (i) the percent of individuals in Japanese tit flocks that approached within two-m of the shrike specimen and (ii) the percentage of flock members that exhibited wing flicking displays.

Hither, we show that Japanese tits mob a shrike specimen when hearing warning-recruitment call sequences played from a single speaker, but not when hearing the same two calls played from different speakers with the same timing. This demonstrates that tits recognize an alert-recruitment call sequence produced past a unmarried individual equally a single unit, and not merely as two temporally linked calls, providing evidence for cadre-Merge in a non-human species.


Practise tits recognize a call sequence as a single unit?

Japanese tits responded differently to the shrike specimen during one- and two-speaker playbacks (Fig. 4). During the one-speaker playback of warning-recruitment call sequences, tits typically approached within 2 k of the shrike and exhibited fly flicking displays (Supplementary Picture show 1). However, when warning and recruitment calls were separately broadcast from two speakers, tits rarely mobbed the shrike: they infrequently approached it and rarely exhibited fly flicking displays (generalized linear mixed model: approach:
Z = v.50,
P < 0.0001; fly flicking:
Z = 5.68,
P < 0.0001). Therefore, tits’ responses do non merely depend on the alarm and recruitment calls existence temporally linked, but rather on their perception of the sequence being circulate from a single source. This supports the hypothesis that tits perceive an alert-recruitment phone call sequence as a single unit of measurement produced by a single individual.

Fig. 4: Predator mobbing by Japanese tits during call playbacks.
figure 4

Per centum of individuals in Japanese tit flocks that approached inside two-1000 of the shrike specimen (generalized linear mixed model:
2 = 80.16,
df = 3,
P < 0.0001).
Percentage of individuals in Japanese tit flocks that exhibited wing flicking displays (χ
2 = 95.75,
df = three,
P < 0.0001). The box-and-whisker plots display the median, 1st and 3rd quartiles; the whiskers are extended to the most extreme value inside the 1.5-fold interquartile range. Statistical significance was calculated using two-sided log-likelihood ratio tests. Sample size:
n = 16 flocks for each treatment, resulting in
northward = 64 flocks across all 4 treatments. 1A-R 1-speaker playback of alert-recruitment sequences, 2A-R two-speaker playback of alert calls and recruitment calls arranged in this society, 1R-A one-speaker playback of recruitment-warning sequences, 2R-A two-speaker playback of recruitment calls and alarm calls arranged in this gild. Encounter Supplementary Table 1, Supplementary Table two, and Supplementary Fig. 1 for details of statistical analyses. Source data are provided every bit a Source Information file.

Full size image

Does tits’ mobbing depend on temporal linkage of 2 calls?

Although a previous study showed that temporal linkage of two calls (phone call ordering) influences tits’ behavioural responses11, there remains the possibility that, in the presence of a shrike specimen, merely hearing two call types from a unmarried source causes tits to showroom mobbing behaviour. If this is the case, then tits are expected to mob the shrike when hearing any sequences of alert and recruitment calls, equally long as they are produced by a unmarried source. To account for this possibility, nosotros exposed flocks to artificially reversed, recruitment-alert call sequences circulate from 1 speaker (Fig. 3c). The calling charge per unit and the elapsing between two call types were identical to those of one-speaker playback of alert-recruitment sequences, but the call types were not presented in the naturally ordered sequences. Tits exhibited weaker responses to a shrike during one-speaker playback of recruitment-alert sequences than during i-speaker playback of alert-recruitment sequences (approach:
Z = v.ten,
P < 0.0001; fly flicking:
Z = 4.69,
P < 0.0001; Fig. 4), indicating that they are sensitive to temporal linkage of ii calls fifty-fifty in the presence of a predator.

We further exposed flocks to recruitment calls and alert calls separately broadcast from 2 speakers in this social club (Fig. 3d), and then that the component calls are not linked in either time or space. Tits rarely mobbed the shrike specimen during two-speaker playback of recruitment-alert call ordering (Fig. 4), which was significantly different from responses during one-speaker playback of alert-recruitment call sequences (arroyo:
Z = four.xc,
P < 0.0001; wing moving picture:
Z = v.54,
P < 0.0001). Further pairwise comparisons reveal that tits exhibit predator mobbing when and only when they perceive naturally ordered, alert-recruitment phone call sequences produced by a unmarried source (see Supplementary Table 1). These results evidence that tits’ mobbing responses depend on both whether the 2 calls are temporally linked and whether they are produced from the aforementioned spatial location.

Do any other factors influence tits’ behaviour?

We carefully designed the experiments to command for the possibility that factors other than temporal and spatial linkages of the two phone call types may influence tits’ mobbing responses. Commencement, nosotros controlled for the possibility that subtle variation within each call type may provide information virtually callers’ identity, which might influence receivers’ responses. We prepared 16 unique sets of alert and recruitment calls using either calls from the same bird (n = 8 source individuals,
due north = 8 call sets) or from two different birds (due north = 16 source individuals,
n = 8 call sets). So, we created 64 playbacks from the 16 call sets in which each call fix was used to construct four playback treatments for a block (due north = 16 blocks; eastward.g., block no. 2: alert phone call from bird no. 2 and recruitment call from bird no. 17 were played together from the aforementioned speaker, from different speakers, and in reversed order from the aforementioned speaker and from different speakers; Supplementary Tabular array 3). As expected, there was no meaning influence of the number of source individuals (one or ii) on tits’ mobbing responses (approach:
2 = 0.78,
df = ane,
P = 0.3777; wing flicking:
ii = 0.69,
df = 1,
P = 0.4046).

Second, we likewise controlled for the possibility that social context may influence the willingness of tits to join in mobbing. Since flock size has been suggested to increase the intensity of mobbing14,15, we recorded the number of Japanese tits observed around fifteen-m of the shrike specimen during ninety-due south of playback and included this as a covariate in the statistical models. Supporting the prediction, tits more readily approached within 2-thousand of the shrike when flock size was larger (generalized linear mixed model:
2 = xvi.06,
df = 1,
P < 0.0001). Yet, they did not alter wing flicking displays according to the flock size (χ
two = 0.00,
df = ane,
P = 0.9692). Approaching a predator is probable to exist riskier than exhibiting wing flicking displays, simply the associated risks should be reduced when at that place are more surrounding individuals (i.east., rubber in numbers)16. This might explicate why flock size affected approaching behaviour, but non wing-flicking.


Our results evidence that Japanese tits discriminate between two temporally- and spatially linked calls played from one speaker (which mimic calls by ane individual) and two temporally linked calls played from two speakers (which mimic calls from two individuals). They join in mobbing a shrike when perceiving alert-recruitment telephone call sequences broadcast from a single sound source (i.e., combined calls). In dissimilarity, if the component calls are broadcast separately from different sources in the same ordering (i.e., non-combined calls), tits reduced their mobbing response. During playbacks of recruitment-warning orderings from one and two sources, tits rarely mobbed the shrike, indicating that they recognize whether the two calls are temporally linked into ordered sequences fifty-fifty in the presence of a predator. These results are supported past the statistical models that control for the possibilities that other factors, such every bit the way of creating playback stimuli and flock size, may take influence on tits’ behaviour. These findings prove that tits are able to recognize an alert-recruitment call sequence as a single unit when coming from i individual, simply not from two, which supports our conclusion that tits take evolved core-Merge.

Previous experiments showed that Japanese tits exercise not simply acquaintance an alert-recruitment call sequence with an independent pregnant, such every bit “mobbing”, but rather, extract meanings of both component calls (i.e., “alertly” + “approach” = {alertly approach})11,17. Therefore, call combinations of Japanese tits might stand for an analogy to homo phrases where core-Merge operates on 2 words to produce juxtaposed, compositional phrases (e.g., come + talk = {come talk}). Combinations of two phone call types take also been documented for other animals; nonetheless, how they relate to the creation of meaning seems to be diverse across species4,5,vi. For case, putty-nosed monkeys (Cercopithecus nictitans) combine 2 warning telephone call types, each of which seemingly denotes a different predatory threat, such as leopards or eagles, to stimulate long-distance grouping movements18,xix. Since this combination creates a message that is non derived from either alarm phone call type, it might correspond an analogy to idiomatic expressions, i.east., “leopard” + “eagle” = “move on”20
(but come across ref. 21). Campbell’due south monkeys (Cercopithecus campbelli) add together a brusk vocal element at the stop of loftier-threat alert calls when perceiving lower threats, which has thought to exist an illustration to suffixation (i.e., “predator” + “-like” = “predator-like”)22,23. Regardless of how meaning is created, the production and perception of brute call combinations may largely depend on core-Merge. We hope that our experimental prototype provides a robust method to investigate core-Merge beyond a variety of species and encourages future comparative studies, which will help to understand nether which atmospheric condition this linguistic capacity will evolve.

This written report not simply provides prove for core-Merge in fauna communication systems, but also has important implications for the studies of language evolution. There are two conflicting theories for the origins of language’south productivity. One theory holds that a single cognitive capacity called “Merge” enables us to produce and comprehend any kind of discussion combinations, including complex expressions with hierarchical structure (e.yard., a + dog + barks = {a canis familiaris} + barks = {{a domestic dog} barks})24,25. The second theory holds that such complex expressions crave, in addition to Merge, another cognitive capacity chosen “recursion” that allows united states to form hierarchically structured mental representations2,3,4,5,26,27,28. In this theory, it is expected that without recursion, Merge just serves to combine two words, which is often labelled every bit core-Merge2. In other words, the combination of core-Merge and recursion enables fusing more 2 words into hierarchical expressions, which is referred to every bit recursive-Mergetwo. Our findings support this 2nd theory, since tits combine two telephone call types into a single unit, but testify no evidence that they produce sequences with more than than two meaningful calls; further research is necessary to determine if tits tin create hierarchically structured sequences. Nosotros stand at a starting point to explore the similarity and difference of the combinatorial advice systems between animals and humans3,4,5,6,vii,8,ix,29. Determining how widely Merge is involved in fauna signals and what specific mechanisms provide the ground for the emergence of hierarchical structure remains a key challenge in fauna communication research, which volition deepen our understanding on the evolutionary pathway of language.


Study site and animals

Nosotros studied
n = 64 flocks of Japanese tits in mixed deciduous-coniferous forests in Nagano and Gumma (36°17-31’Due north, 138°26-39’E), Japan. Although most of the birds had non been individually color-ringed, all the experimental trials were conducted at to the lowest degree 400 thou apart; previous observations on colour-ringed individuals showed that this altitude was plenty to ensure the collection of data from different individualsxxx. In this site, ane of the major predators of small birds is the bull-headed shrike, which is ofttimes mobbed by small-scale birds including Japanese tits.

Playback stimulus

To test whether Japanese tits recognize an alert-recruitment phone call sequence equally a unmarried unit, we prepared four treatments: (i) one-speaker playback of alarm-recruitment telephone call sequences, (2) ii-speaker playback of alert-recruitment call sequences with alert and recruitment calls played from unlike speakers, (iii) ane-speaker playback of recruitment-alert telephone call sequences, (iv) two-speaker playback of recruitment-alarm call sequences with recruitment and alert calls played from different speakers (Fig. 3). We created audio files for these treatments using the software plan Audacity two.i.3 (http://world wide web.audacityteam.org). For ane-speaker treatments, we equanimous mono sound files where call sequences were repeated onto a single channel, whereas for ii-speaker treatments, we equanimous stereo sound files where either alert or recruitment calls were repeated onto the right or left channels, respectively. All the files contained an equal number of alert calls (30 calls) and recruitment calls (thirty calls) at the same rate (ane call every 3 s), resulting in 90-s of stimuli (Fig. 3), which corresponds to the range of the natural calling charge per unit of warning-recruitment sequences during mobbing by Japanese tits10. For all stimuli, inside-call-sequence intervals betwixt alert and recruitment calls were constant (0.1 s), which is within the range of intervals of these calls in natural call sequences11,17. In dissimilarity, between-call-sequence intervals varied from 1.50 to 1.81 (median = 1.68) due to the divergence in telephone call length, just were abiding across playback stimuli within the same “block” where the 4 treatments were created using the same call exemplars (meet beneath). While alert calls are composed of three distinct note types, recruitment calls are strings of the same note type that vary in repetition number. Since the repetition number tin vary depending on predator type10, we conducted predator exposure experiments to Japanese tit flocks (northward = 12) and recorded call sequences towards a bull-headed shrike life-like specimen. In response to a shrike specimen, tits produced alert-recruitment call sequences with a recruitment note repetition number ranging from 5 to 15. Since the interquartile range of repetition number was 6.75 to 10, nosotros used recruitment calls with 7–10 notes as playback stimuli in this study. In consideration for the possible influence of sound editing procedure, we created all the stimuli in the same manner; nosotros copied alert and recruitment call parts separately from recording files, and pasted them onto background noise files to produce all four types of stimuli. Playback amplitudes were constant across treatments, seventy dB at 1.0 m measured using a sound level meter (SM-325, Equally ONE Corporation). Therefore, the differences between treatments merely depend on whether these calls are produced equally sequences from the aforementioned source and how the calls are ordered.

Nosotros advisedly designed experiments to control for the possibility that individual-based audio-visual features in warning and recruitment calls might influence tits’ responses. First, nosotros prepared 16 unique sets of warning and recruitment calls using either calls from the same bird (due north = 8 source individuals,
due north = 8 unique telephone call sets) or from 2 different birds (n = 16 source individuals,
n = 8 unique telephone call sets). Then, we created the four types of treatments (i.e., alert-recruitment phone call sequences from the aforementioned speaker, from different speakers, and in reversed order from the same speaker and from different speakers) from each of the alert-recruitment call sets, resulting in 16 blocks of playback stimuli (Supplementary Table 3). This allows us to examination the possible influence of individual-based acoustic variation on receivers’ responses.

We were also conscientious to avoid the possible influence of population-level signatures of acoustic features: we but used Japanese tits’ call sequences that had been previously recorded from the same study population. Nosotros saved the sound files in .wav format (16-bit accuracy, 48-kHz sampling rate) onto a playback device (iPhone eight, Apple Inc.). We used the default Music app (Apple tree Inc.) to playback the sound files.


We (TNS and YKM) conducted experimental trials from 26 Oct to 4 December 2020 and during the period of 0800 and 1600 h (Nippon Standard Time). Nosotros did not behave trials under wet and windy atmospheric condition conditions, since these may influence behavioural patterns of wood birds31. First, we searched for and located a flock of Japanese tits. Upon finding a flock, we fixed a taxidermic specimen of bull-headed shrike in a perching posture on the co-operative at 1.8 ± 0.2 m (mean ± s.d.,
north = 64) above the basis. And then, nosotros placed either i or two Bluetooth speakers (SoundLink Micro, BOSE) on tree branches at 1.6 ± 0.2 one thousand (mean ± s.d.,
due north = 96) above the ground, and oriented them upwards to command for the possible influence of directionality. Nosotros prepare the distance between the shrike specimen and the speaker(s) at 5 one thousand. For trials with 2 speakers, we gear up the distance between speakers at 10 m, mimicking the situation in which two birds are calling (Fig. 3). The shrike specimen was first covered with a black material and was exposed by removing the cloth but before each trial.

We began playbacks when at to the lowest degree two Japanese tits were present within 15 m from the shrike specimen. During 90-s of playbacks, nosotros recorded (i) whether birds approached within 2-one thousand of the shrike specimen during the playback and (ii) whether birds exhibited wing flicking displays12,13. We counted the number of individuals within 15 one thousand from the shrike during ninety-s of playbacks and considered it as flock size. During trials, we sat on the ground at ca. ten m from the shrike specimen to decrease the influence of the observers’ presence on bird behaviour. To account for the inter-observer reliability32, nosotros calculated intra-class correlation coefficient (ICC;
role in the R bundle
irr) between us. The lowest ICC was 0.998, indicating high degree of inter-observer reliability for the 2 behavioural measurements. We also video-recorded the responses of tits using a digital video photographic camera (FDR-AX60, SONY). After completion of each trial, we checked the video recording and made an on-the-spot confirmation of the verbal location at which each bird fabricated the closest approach to the shrike specimen during the xc-southward of playbacks. Then, using a tape measure, nosotros recorded the minimum approach altitude of birds to the shrike specimen. Thus, our final data set up consisted of the near reliable observations confirmed by two experimenters and video evidence.

The lodge of trials was randomized within each block (n = sixteen blocks), each of which is composed of a unique alarm-recruitment call gear up just includes iv treatments differing in the number of speakers and call club. Therefore, responses to all four treatments were observed under largely like conditions. In a few trials, the commencement bird to approach the shrike specimen was from a heterospecific species, such as a varied tit (due north = 1) or a long-tailed tit (north = i). To account for the possibility that these birds evoke mobbing behaviour in Japanese tits, we but used the information from instances where the offset private to approach the shrike was a Japanese tit. Otherwise, nosotros repeated the same handling at a different site.

Nosotros used 64 unique playbacks created from 16 unique sets of alert-recruitment calls for 64 trials in society to avoid pseudoreplication33. Nosotros prepared two specimens of male bull-headed shrikes and used each of them for the equal number of trials. Nosotros did non use specimens of female shrikes since females drift from the report site in late summertime and but males were observed during the study menstruum.

Statistical assay

Nosotros analyzed the effect of playback treatments on the mobbing behaviours of Japanese tits using generalized linear mixed models in R34,35. We used the proportions of Japanese tits in flocks that (i) approached inside 2-m of the shrike specimen and (two) exhibited wing flicking displays. For the analysis of predator approach, nosotros prepared two vectors (i.due east., the number of Japanese tits that approached the shrike specimen and the number of Japanese tits that did non approach the shrike specimen). Then, we created a single response variable past binding together these 2 vectors using
function. Similarly, for the analysis of wing flicking displays, we created a single response variable past binding two vectors (i.e., the number of tits that exhibit wing flicking and the number of tits that did not exhibit fly flicking). We fitted playback treatments as a fixed term, and flock size (maximum number of Japanese tits observed during 90-south of playback) and the fashion of creating playback stimuli (whether the two call types were recorded from a single private or ii individuals) as covariates. We also included identity of alert-recruitment call sets that were used for creating playback stimuli (i.e., telephone call sets from either 1 or two source individuals) and identity of shrike specimens equally random terms. We used a binomial error distribution and logit-link office (glmer
in the R package
lme4) for these models. Statistical significance was calculated past log-likelihood ratio tests using
in the R parcel
stats. We further conducted post-hoc pairwise comparisons between treatments by using estimated marginal means (emmeans
in the R package
emmeans). When making pairwise comparisons, nosotros adjusted
p-values past applying a fake discovery rate control for multiple testing36. All tests were 2-sided and the significance level was fix at
α = 0.05. Exact
p-values are reported when
p ≥ 0.0001.

Ethics statement

All protocols were approved by the ethics commission of Kyoto University, the Ministry of the Environment, and the Forestry Agency of Japan, and adhered to Guidelines for the Utilise of Animals of the Association for the Study of Animal Behaviour/Creature Behavior Society37.

Reporting summary

Farther information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

All data used in this study are bachelor in Figshare (https://doi.org/10.6084/m9.figshare.18007046)38. Source data are provided with this newspaper.

Code availability

R codes used for statistical analysis are available in Figshare (https://doi.org/ten.6084/m9.figshare.18007046)38.


  1. Chomsky, N. In
    Ken Hale: A Life in Linguistic communication, (ed. M. Kenstowicz) pp.1–52 (MIT Press, Cambridge, 2001).

  2. Fujita, K. In
    Recursion: Complexity in Cognition, (eds. Roeper, T. & Speas, K.) pp. 243–264 (Springer, New York, 2014).

  3. Rizzi, L. Monkey morpho-syntax and merge-based systems.
    Theor. Linguist.
    42, 139–145 (2016).

    Article  Google Scholar

  4. Zuberbühler, M. Syntax and compositionality in animal advice.
    Philos. Trans. R. Soc. B
    375, 20190062 (2019).

    Article  Google Scholar

  5. Suzuki, T. N., Wheatcroft, D. & Griesser, Yard. The syntax-semantics interface in animal song advice.
    Philos. Trans. R. Soc. B
    375, 20180405 (2019).

    Article  Google Scholar

  6. Engesser, S. & Townsend, Due south. W. Combinatoriality in the vocal systems of nonhuman animals.
    WIREs Cogn. Sci.
    e1493, (2019).

  7. Schlenker, P. et al. Formal monkey linguistics: the fence.
    Theor. Ling.
    42, 173–201 (2016).

    Google Scholar

  8. Schlenker, P., Chemla, E. & Zuberbühler, One thousand. What exercise monkey calls mean?
    Trends Cogn. Sci.
    twenty, 894–904 (2016).

    Article  Google Scholar

  9. Kuhn, J., Keenan, S., Arnold, Fifty. & Lemasson, A. On the –oo suffix of Campbell’s monkeys.
    Ling. Inq.
    49, 169–181 (2018).

    Article  Google Scholar

  10. Suzuki, T. N. Communication nearly predator blazon by a bird using detached, graded and combinatorial variation in alarm calls.
    Anim. Behav.
    87, 59–65 (2014).

    Article  Google Scholar

  11. Suzuki, T. Northward., Wheatcroft, D. & Griesser, M. Experimental testify for compositional syntax in bird calls.
    Nat. Commun.
    7, 10986 (2016).

    Article  ADS  CAS  Google Scholar

  12. Clemmons, J. R. & Lambrechts, Thousand. M. The waving display and other nest site anti-predator behavior of the black-capped chickadee.
    Wilson Bull.
    104, 749–756 (1992).

    Google Scholar

  13. Carlson, Northward. V., Pargeter, H. Yard. & Templeton, C. Due north. Sparrowhawk motion, calling, and presence of dead conspecifics differentially affect blue tit (Cyanistes caeruleus) vocal and behavioral mobbing responses.
    Behav. Ecol. Sociobiol.
    71, 133 (2017).

    Article  Google Scholar

  14. Krams, I., Bērziņš, A. & Krama, T. Group effect in nest defence behaviour of breeding pied flycatchers,
    Ficedula hypoleuca.
    Anim. Behav.
    77, 513–517 (2009).

    Article  Google Scholar

  15. Dutour, M., Kalb, N., Salis, A. & Randler, C. Number of callers may affect the response to conspecific mobbing calls in cracking tits (Parus major).
    Behav. Ecol. Sociobiol.
    75, 29 (2021).

    Article  Google Scholar

  16. Caro T.
    Antipredator Defenses in Birds and Mammals
    (University of Chicago Press, Chicago, 2005).

  17. Suzuki, T. N., Wheatcroft, D. & Griesser, G. Wild birds utilise an ordering dominion to decode novel telephone call sequences.
    Curr. Biol.
    27, 1–six (2017).

    Article  Google Scholar

  18. Arnold, G. & Zuberbühler, K. Linguistic communication evolution: Semantic combinations in primate calls.
    441, 303 (2006).

    Article  ADS  CAS  Google Scholar

  19. Arnold, K. & Zuberbühler, K. Meaningful call combinations in a non-human primate.
    Curr. Biol.
    18, R202–R203 (2008).

    Article  CAS  Google Scholar

  20. Arnold, K. & Zuberbühler, K. Telephone call combinations in monkeys: Compositional or idiomatic expressions?
    Encephalon Lang.
    120, 303–309 (2012).

    Article  Google Scholar

  21. Schlenker, P., Chemla, E., Arnold, K. & Zuberbühler, K. Pyow-hack revisited: Two analyses of putty-nosed monkey alarm calls.
    171, ane–23 (2016).

    Commodity  Google Scholar

  22. Ouattara, 1000., Lemasson, A. & Zuberbühler, G. Campbell’s monkeys concatenate vocalizations into context-specific call sequences.
    Proc. Natl Acad. Sci. United states
    106, 22026–22031 (2009).

    Article  ADS  CAS  Google Scholar

  23. Coye, C., Ouattara, One thousand., Zuberbühler, K. & Lemasson, A. Suffixation influences receivers’ behaviour in non-homo primates.
    Proc. R. Soc. B
    282, 20150265 (2015).

    Article  Google Scholar

  24. Bolhuis, J. J., Tattersall, I., Chomsky, North. & Berwick, R. C. How could language have evolved?
    PLoS Biol.
    12, e1001934 (2014).

    Article  Google Scholar

  25. Berwick, R. & Chomsky, N. Why Simply The states? Language and Evolution
    (MIT Press, London, 2016).

  26. Hauser, Thou. D., Chomsky, N. & Fitch, W. T. The kinesthesia of language: what is it, who has information technology, and how did information technology evolve?
    298, 1569–1579 (2002).

    Article  ADS  CAS  Google Scholar

  27. Suzuki, T. N., Wheatcroft, D. & Griesser, M. Call combinations in birds and the development of compositional syntax.
    PLoS Biol.
    sixteen, e2006532 (2018).

    Article  Google Scholar

  28. Bolender, J., Erdeniz, B. & Kerimoğlu, C. Human uniqueness, cognition by description, and procedural retention.
    2, 129–151 (2008).

    Article  Google Scholar

  29. Miyagawa, S. & Clarke, E. Systems underlying human and former world monkey advice: one, two, or infinite.
    Front. Psychol.
    x, 1911 (2019).

    Article  Google Scholar

  30. Suzuki, T. North. Long-distance calling past the willow tit,
    Poecile montanus, facilitates germination of mixed-species foraging flocks.
    118, 10–16 (2012).

    Commodity  Google Scholar

  31. Grubb, T. C. Weather-dependent foraging behaviour of some birds wintering in a deciduous woodland.
    77, 175–182 (1975).

    Article  Google Scholar

  32. Kaufman, A. B. & Rosenthal, R. Can you believe my eyes? The importance of interobserver reliability statistics in observations of brute behaviour.
    Anim. Behav.
    78, 1487–1491 (2009).

    Commodity  Google Scholar

  33. Kroodsma, D. East., Byers, B. E., Goodale, E., Johnson, S. & Liu, W. C. Pseudoreplication in playback experiments, revisited a decade later.
    Anim. Behav.
    61, 1029–1033 (2001).

    Article  Google Scholar

  34. R Cadre Team, R: A Language and Surroundings for Statistical Computing (R Foundation for Statistical Computing, Vienna, 2022), Version 4.two.one.

  35. Crawley, Thousand. J.
    The R Book,
    2nd edn.
    (Wiley, 2012).

  36. Benjamini, Y. & Hochberg, Y. Controlling the false discovery charge per unit: A practical and powerful approach to multiple testing.
    J. R. Stat. Soc. B
    57, 289–300 (1995).

    MathSciNet  MATH  Google Scholar

  37. Association for the Report of Creature Behaviour/Brute Beliefs Society. Guidelines for the treatment of animals in behavioural research and instruction.
    Anim. Behav. i–ix (2017).

  38. Suzuki, T. N. & Matsumoto, Y. K. Data and codes: experimental testify for core-Merge in the vocal communication organisation of a wild passerine.
    https://doi.org/10.6084/m9.figshare.18007046 (2022).

Download references


We are very grateful to Dr. David Wheatcroft and Dr. Nora 5. Carlson for their invaluable comments on the manuscript. This work was supported by JSPS KAKENHI (Grant Numbers JP20H05001 and JP20H03325 to T.N.S. and JP19J01718 to Y.K.K.), the Hakubi Projection Funding of Kyoto University (T.N.S.), and JST FOREST Plan (Grant Number JPMJFR2149 to T.Northward.S.).

Author information

Authors and Affiliations


T.Due north.Due south. conceived the study and drafted the manuscript. T.N.S. and Y.M.M. performed the experiment, analysed the information, and finalized the manuscript.

Corresponding author

Correspondence to Toshitaka N. Suzuki.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review data

Nature Communications
thanks Katelyn Ray and the other, bearding, reviewer(due south) for their contribution to the peer review of this work. Peer reviewer reports are available.

Boosted information

Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary data

Source data

Rights and permissions

Open Access
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long every bit you give appropriate credit to the original author(due south) and the source, provide a link to the Creative Commons license, and bespeak if changes were made. The images or other 3rd party material in this article are included in the article’s Artistic Commons license, unless indicated otherwise in a credit line to the material. If cloth is not included in the commodity’s Artistic Eatables license and your intended use is not permitted past statutory regulation or exceeds the permitted apply, you volition need to obtain permission straight from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/past/four.0/.

Reprints and Permissions

About this commodity

Verify currency and authenticity via CrossMark

Cite this commodity

Suzuki, T.N., Matsumoto, Y.K. Experimental evidence for cadre-Merge in the song communication organization of a wild passerine.
Nat Commun
thirteen, 5605 (2022). https://doi.org/10.1038/s41467-022-33360-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI


Source: https://www.nature.com/articles/s41467-022-33360-3