The sounds your kid makes happen

When parents assess a kids' app, they tend to notice the visuals first and the background music second. The small, responsive sounds tend to go unnoticed: the plop when something lands, the splash when water runs, the soft chop of a knife on a board. They feel like decoration. They're doing more than that.

The sounds a child triggers through their own actions are doing something different from the sounds that play in the background. They're teaching cause and effect. And the research on how this works is surprisingly specific.

The feedback loop you don't notice

When a toddler taps a screen and hears a sound, their brain registers something important: I did that. This is called contingent feedback, and it's one of the strongest mechanisms for early learning.

Kirkorian, Choi, and Pempek (2016) tested this directly. Toddlers aged 24 to 36 months learned new words significantly better from touchscreen content that responded to their actions compared to identical content that played passively (Kirkorian, 2016). The contingent group's scores were comparable to live, in-person teaching. The timing matters too. Goldstein, King, and West (2003) found that responses within roughly one to two seconds of the child's action are processed as "I caused that" (Goldstein, 2003). Responses that come later aren't registered the same way.

Begus, Gliga, and Southgate (2014) showed a related effect: information delivered after an infant's own initiated action was retained better than identical information delivered unprompted (Begus, 2014). The child's agency is part of the encoding. They remember what they made happen.

The sound needs to match the action

Not all audio feedback is equal. Russo-Johnson and colleagues (2017) studied 2- to 5-year-olds using a word-learning app and found that relevant interactive feedback, where tapping produced audio semantically connected to the content, improved learning (Russo-Johnson, 2017). Irrelevant feedback, where the sound had no connection to what the child was doing, actually distracted from the task.

This aligns with Hirsh-Pasek and colleagues' (2015) influential framework for educational app design, which identifies four pillars: active, engaged, meaningful, and socially interactive (Hirsh-Pasek, 2015). Audio that connects to the child's action supports the "active" pillar. Generic reward sounds that fire regardless of context undermine the "engaged" pillar by pulling attention away from what the child is actually doing.

The practical implication: a splashing sound when a child washes fruit under a tap is more useful than a generic chime. It connects the action to a real-world concept. A chop sound when they cut something on a board reinforces what a knife does. These aren't just pleasant sounds. They're tiny bridges to the real world.

Three senses are better than one

There's a separate line of evidence about why sound specifically matters alongside touch and vision. Shams and Seitz (2008) reviewed the research on multisensory learning and found that combining auditory, visual, and tactile information produces more robust encoding than any single modality, even when you only test one sense afterwards (Shams, 2008). The brain stores multisensory experiences differently.

Jordan and Baker (2011) tested this with 3- and 4-year-olds on a number matching task. Kids who received audio-visual information together performed significantly better than those who got visual information alone (Jordan, 2011). The effect is consistent with what Bahrick and Lickliter (2000) call "intersensory redundancy": young children preferentially attend to and learn from information that arrives through multiple senses at the same time.

A touchscreen game already provides two senses: vision and touch. Adding a well-matched sound effect on each interaction adds the third. That's the difference between one type of memory trace and a richer one.

It doesn't need to sound real

One question we get is whether synthesised sounds are as effective as recorded ones. The short answer: it doesn't seem to matter. Gaver (1993) distinguished between two modes of listening: "musical listening" (attending to the sound's acoustic properties) and "everyday listening" (attending to what caused the sound) (Gaver, 1993). Young children primarily engage in everyday listening. They associate a sound with the action that produced it, not with how faithfully it reproduces a real-world recording.

Plass and Kaplan (2016) found that what matters is the emotional quality of the sound: warm, pleasant tones create a favourable state that supports cognitive processing (Plass, 2016). A synthesised plop that feels soft and satisfying works as well as a recorded one. Possibly better, because synthesised sounds can be tuned precisely for warmth and volume without the background noise and compression artefacts that come with field recordings.

This connects to what Trainor and Heinmiller (1998) found about young children's preferences: they gravitate towards consonant, harmonically simple sounds and away from dissonant or spectrally complex ones (Trainor, 1998). Clear, warm tones over harsh or noisy ones. Simple over complicated. The same principle that applies to visual pacing applies to audio.

What to listen for

Next time your toddler is playing a game, watch their face when they tap something and hear a response. That moment of recognition is visible. It builds agency. It teaches them the screen is something they control.

The sounds don't need to be loud or complex. They need to be immediate, warm, and connected to what the child just did. A gentle splash for water. A soft thud for placing something down. A quiet sizzle for cooking. Each one says: you did this, and it mattered.

Our companion post on background music covers the ambient side of audio design. But if you had to choose between a beautiful soundtrack and responsive sound effects, the research would nudge you towards the effects. The sounds your kid triggers are the sounds that teach.

Sources

Kirkorian, H.L., Choi, K., & Pempek, T.A. (2016). Toddlers' word learning from contingent and non-contingent video on touch screens. Child Development, 87(2), 405-413. https://doi.org/10.1111/cdev.12508
Goldstein, M.H., King, A.P., & West, M.J. (2003). Social interaction shapes babbling: testing parallels between birdsong and speech. Proceedings of the National Academy of Sciences, 100(13), 8030-8035. https://doi.org/10.1073/pnas.1332441100
Begus, K., Gliga, T., & Southgate, V. (2014). Infants learn what they want to learn: responding to infant pointing leads to superior learning. PLoS ONE, 9(10), e108817. https://doi.org/10.1371/journal.pone.0108817
Russo-Johnson, C., Troseth, G., Duncan, C., & Mesghina, A. (2017). All tapped out: touchscreen interactivity and young children's word learning. Frontiers in Psychology, 8, 578. https://doi.org/10.3389/fpsyg.2017.00578
Hirsh-Pasek, K., Zosh, J.M., Golinkoff, R.M., Gray, J.H., Robb, M.B., & Kaufman, J. (2015). Putting education in 'educational' apps: lessons from the science of learning. Psychological Science in the Public Interest, 16(1), 3-34. https://doi.org/10.1177/1529100615569721
Shams, L., & Seitz, A.R. (2008). Benefits of multisensory learning. Trends in Cognitive Sciences, 12(11), 411-417. https://doi.org/10.1016/j.tics.2008.07.006
Jordan, K.E., & Baker, J. (2011). Multisensory information boosts numerical matching abilities in young children. Developmental Science, 14(2), 205-213. https://doi.org/10.1111/j.1467-7687.2010.00966.x
Gaver, W.W. (1993). What in the world do we hear? An ecological approach to auditory event perception. Ecological Psychology, 5(1), 1-29. https://doi.org/10.1207/s15326969eco0501_1
Plass, J.L., & Kaplan, U. (2016). Emotional design in digital media for learning. In S.Y. Tettegah & M. Gartmeier (Eds.), Emotions, Technology, Design, and Learning, 131-161. https://doi.org/10.1016/B978-0-12-801856-9.00007-4
Trainor, L.J., & Heinmiller, B.M. (1998). The development of evaluative responses to music: infants prefer to listen to consonance over dissonance. Cognition, 66(2), B33-B36. https://doi.org/10.1016/S0010-0277(98)00010-0

Written by a parent, not a medical professional. This is general information, not health advice. If you have concerns about your kid's development, talk to your GP or paediatrician.

The feedback loop you don't notice

The sound needs to match the action

Three senses are better than one

It doesn't need to sound real

What to listen for

Sources

Related posts

What Happens in Your Toddler's Brain During Screen Time

Screens Before Bed: What 30 Minutes Does

Why We Don't Use Bright Colours