This paper integrates dialogic and embodied theories of learning to investigate how multimodal dialogue unfolds as three 4-year-olds translate an AABCC block-tower pattern (blue, blue, white, red, red) into a pattern dance within a colour matrix. Drawing on data from an embodied design study, four interrelated aspects of the children's multimodal dialogue (flexibility, rhythm, gaps, and revoicing) were examined using synchronic and diachronic analyses. The findings show how flexibility functioned as an adaptive resource through which the children attuned to one another's actions, negotiated emerging meanings, and sustained collective understanding by dynamically coordinating speech, gesture, stepping, and sound. Rhythm supported anticipation and collective flow, while dialogic gaps, moments of hesitation or disruption, created openings for negotiation and meaning-making. Revoicing bridged these gaps through alignment of rhythm, speech, and movement into shared understanding. The study demonstrates how mathematical meaning emerges through multimodal voices, thereby foregrounding multimodality as a central dimension of early mathematical learning.