Speeding and Braking Talk: Part 2 'Human Buffering'

Human Buffering.

An accelerated and compressed version of this was delivered at The Speeding and Braking: Navigating Accleration conference organised by SARU at Goldsmiths College University of London on 14/05/2016.

Part 1 here


Let's return to Berardi consider a particular dynamic that our accelerated speech is symptomatic of. Increased syllabic intensity is a symptom of the pressures and demands or semiocapitalism. We can consider this connection, between semiocapitalism and voice in terms of two key Berardian ideas. The infosphere and the psychosphere – or two use two other Berardian terms cyberspace and cybertime. The infosphere and cyberspace are ever expanding, getting faster, denser, more complex and detailed. The exponential growth of data capacities, corresponding to Moore’s Law seem limitless. Cyberspace and the Infosphere never cease exploding. But our engagement, that is cybertime (how long we can distractedly look at the internet) and psychosphere (our collective psychological capacities) are not boundless like the infosphere and cyberspace. We have our limit.

Our speed limit is manifested in multiple ways. In terms of cognition and the absorbtion of text we now look at our endless email rather than reading them. “Did you read the email?” is a common question, precisely because no one actually reads email anymore. We no longer engage deeply with music – we download discographies that go unlistened or flit randomly through YouTube videos or streaming services. In terms of voice, whilst we may attempt to speak faster and faster, our brain cannot keep up. We say um, err, use vapid filler words and phrases. Everyone has some vocal manifestation of a speed limit a stutter, a pause, a gestural tick, a familiar embellishment of phrase: “y’know”.

More specifically, when we cannot keep up, we might croak. Our voice is reduced to vocal fry. As our syntax overrides our respiration, as our finite brains struggle with a cognitive-motor-syllabic pile up we croak, drawling the gravelly phonic register of our human buffering. Although vocal fry is defined as using the lowest register of the voice (a croaky, creaky, sort of sound) I feel that in practice it is an affect that comes into to hide our cognitive buffering and braking of our syllabic delivery. This is particularly noticeable in the youtube example.

“Women exhibiting a low-pitched, creaky voice known as "vocal fry" are considered less competent, educated, trustworthy, attractive and hirable, according to research from Duke University's Fuqua School of Business.

“The researchers conducted an experiment using 800 online listeners split evenly between men and women. The listeners were randomly assigned to listen to either seven male voices or seven female voices that alternated between vocal fry and normal tones of voice. The listeners were then asked to judge the examples for competence, education, trustworthiness and attractiveness. The experiment found a strong aversion to voices exhibiting vocal fry, particularly among women. “

There is a heavy gender bias here, a hypocritical bias too. The male voices with exactly the same vocal affect were not received as negatively as the female voices containing the same method of speech braking (the vocal fry). As the video points out, this has a strong implication for career prospects. Many hiring decisions are based on initial impressions. Women who possess an identical vocal affect to their male counterpart are regarded in a grossly negative manner. This is inequality and normative bias revealing itself in the way voices and our braking methods are heard and interpreted. In essence the method of braking syllabic delivery and the ways we affect our voice to slow down our delivery under the pressures of semiocapitalism are arbitrary in terms of prejudice – prejudice can be applied retroactively to any affect or braking method with no logic other than it’s prejudice: there is no right way to speak or delay one’s speech, only supposedly, right forms of speaker who can employ affects to positive effect that when used by another would be regarded negatively. Further to this I’d suggest that the discriminatory gender bias in terms of the American vocal fry is another subtle form of women being persecuted for any form of control over their own bodies: don’t brake, keep up honey.

In Britain there is a very prevalent vocal fry associated with a certain class and a certain geographic area: the charming Home Counties croak of the masculine vocal fry – “Yah”. Again, there is normative prejudice inherent in this. The affect that imbues the speaker baritone privilege in one instance may sully another with common coarseness of voice in another. Prejudice follows no logic other than its own, vocal traits are arbitrary. We can also see similar double standards with other cognitive-syllabic braking methods. We can regard the class distinction between the charming Oxbridge debating society stutter (heard on Radio 4, HIGNFY, Question Time) and other stutters in the same way. One seems granted authority and gravitas, whereas the other is regarded as a speech impediment, an inability to speak. In the same sense we can consider the use of filler words to buy time. Politicans, who relentlessly parrot empty phraseology as testament to Burroughs’ claim, are seemingly allowed filler phrases like ‘robust’ and ‘now look here now’ whereas other phrases that serve exactly the same cognitive motor-syllabic deceleration purpose, may be taken up as less authoritative, informed or capable. Filler terms such as ‘um’, ‘you know’, ‘well I think’, ‘to be honest’ etc are seldom granted the same privilege as patrician stutters or elite-class parroting.

There are many forms of how our speed limit is manifested when our neuro-vocal abilities fall short of the accelerating demands media-saturated semiocapitalism. We all have our speed limit. Recognizing how deeply the contemporary environment affects our bodies and minds is important. But so too is recognising the politically ingrained hypocrisies that surround the different ways we brake, buffer, hesitate and pause.


Berardi, F, 2009. Precarious Rhapsody: Semiocapitalism and the pathologies of the post-alpha generation. Minor Compositions. London.
Jukes, I, 2010. Understanding the Digital Generation: Teaching and Learning in the New Digital Landscape (The 21st Century Fluency Series). Corwin
Karpf, A. 2007. The Human Voice: The Story of a Remarkable Talent. Bloomsbury. London.
Pinker, S. 1994. The Language Instinct. Penguin Books. London Zizek, S. 2008 ‘Language, violence and non-violence.’ International Journal of Zizek Studies 2 (3), 307-316

Speeding and Braking Talk: Part 1 'Life is short: Talk Fast'

Life is short: Talk Fast. 

An accelerated and compressed version of this was delivered at The Speeding and Braking: Navigating Accleration conference organised by SARU at Goldsmiths College University of London on 14/05/2016.


Voice is a partial register not just of who we are but how we are. It reflects, in part, our experience, our trauma, the environment we live in and how we engage – or are required to engage. This is not to suggest that voice is at all analogous to us or where we have been or who we are – we cannot reduce facets of voice to history, class, race or gender – but voice hints at the pressures we face in an oblique manner. If we are unwell our voice may be hoarse. We may stutter when nervous or if we’ve had too much caffeine. We might slur our speech when intoxicated and slow speech is a symptom of depression. Voice alludes partially to both our corporeal state and our neurological state.

When we speak we are asking our minds and bodies to do something virtuosic. Not only do we master dexterous and complex acrobatics and contortions of our tongue, glottis lips and teeth but we also time our breath to the sentences we utter. As Pinker notes: “Syntax overrides carbon dioxide: we suppress the delicately tuned feedback loop that controls our breathing rate to regulate oxygen intake, and instead we time our exhalations to the length of the phrase or sentence we intend to utter.” (Pinker, 1994, p.164). Strange, that even our respiratory rhythm, that automatic inhale and exhale that sounds softly in our slumber, doesn’t take precedence of language. The only other activity that trumps respiratory needs is swimming underwater – but during this activity we are, sometimes painfully, aware of our respiratory need. It is odd how little we notice speech overtaking the most basic needs of our body: breath. Without thinking our urge to communicate, to gesture, usurps our need for air. And isn’t there something romantic about the body dying for its voice? Recall the straining apoplectic rock-star or the self-sacrificing siren of the stage offering generous self-annihilation for voice, for our sadistic ears. Voice is a profound crux of how our bodies are subjected to what Zizek calls “the torture house of language”.

Given that voice is traumatic already, given that we already let it usurp our need for air and requisition parts of our body made for respiration and nourishment for the sake of speech, why do we persist in speaking faster and faster? In Precarious Rhapsody Berardi references a study by Richard Robin:

"Evidence suggests that globalisation has produced faster speech emission rates in areas of the world where the Western mode of transmission of signs has come to replace traditional and authoritarian ones. For instance, in the ex-Soviet Union the speed of transmission measured in syllables per second has almost doubled since the fall of the communist regime: from three to almost six syllables per second. ; similar findings reached the same conclusions in the Middle East and China’ (Robin, 1991: 403)." [1]

The import from this study is clear: there is a relationship of syllabic speed to capitalism. The pressures of competition and production are reflected in the speed of speech. More specifically, the accelerating demands of semiocapitalism that exploit our communicative and cognitative capabilities, is reflected in our voices… in syllabic speed. The business of talking faster…

I remember being particularly thrilled watching American TV on Channel 4 in the late 90’s and early 00’s. I’d come home from school or college and, gawping at the quick fire, gag-a-second, energetic and snappy sitcoms I’d feel a distinct sense of speed. Everyone seemed to talk so fast. The prim, vaguely New York, dialects of the characters seemed like syllabic machine guns. The immediacy of the retorts, the dialogue’s unrelenting pace sounded out in stark contrast to the familiar monosyllabic plodding of Coronation Street, Eastenders, Porridge or Only Fools and Horses.

Of course, much of this has to do with the way American shows are created. They tend to have more words to fit into a show from the outset – more jokes to fit in, so to speak. British shows tend to be written by fewer people (often one or two writers) whereas it is much more typical for American shows to be written by small army of hyper-caffeinated hotshot scriptwriters. A good example of this is The Gilmore Girls – a show known for its fast-paced dialogue. Ken Honeywell comments that this is, in part, the result of how long the script for each episode is – much longer than the average show script: “while scripts for most TV shows of that length are 40 – 50 pages long, Gilmore girls scripts were known to hit 80 pages.”[2]

Frasier too, has, fast speech as a direct result of increasing communicative content being crammed into a finite timeslot. But for Frasier it was not due to the script being long. Frasier has had accelerated syllabic intensities as a direct result of from market forces. Karpf references an article by Aaron Barnhart: “Speeded-up Frasier Gives KSMO extra Ad-time”. That describes how once a Kansas TV station sped up Frasier episodes to fit more adverts into the slot: “American television now routinely speeds up sitcoms and compresses speech in order to fit in more ads. If ‘Frasier’ seems to talk faster on one affiliate station than on another that’s probably because he is talking faster – the station has accelerated the episode.” (Karpf, 2007, p.44)

Gilmore Girls, Frasier, Friends, Will and Grace, The Big Bang Theory, Sex and the City and shows similar are manifestations of the Richard Robin study cited by Berardi in a dramatic microcosm. Syllabic speed is intensified, accelerated, as a result of the requirement to fit more and more content within a finite timeframe. Increasing densities of content, more and more communicative semiotics are forcibly compressed within a finite time-frame as a result of late capitalism. The pressures of providing ever snappier and exciting sitcom content or the simple technological acceleration of the video playback in order to fit additional advertisements result in fast talking.

Of course, in life we do not need to worry about when the credits come up. But we are pressed for time. We need to communicate in ever-faster ways, with more people, about more complex topics – but the day is still only so long. We have, perhaps felt this pressure whilst giving a presentation or during an interview – we talk faster in the hope of getting more in. But there is another dynamic behind accelerated syllabic delivery. It is, to half-quote Burroughs, connected to how “language is a virus”. When we are speaking to each another our speed and rhythms fall into synchronicity. If the other is speaking quickly we quicken our pace too.

We do this for many reasons but mostly it is a subconscious, pre-perceptual, action. We mirror and mimic the speech rate of the other more than we realise. A large part of speech is neuro-motorlogical. Speech is, in a sense, neurologically rooted in gesture and the neural mirroring that occurs as a consequence of hearing or seeing gestures has firm evidence. For example, when we see a smile our brains mirror the neurological protocols for when we smile, hence why you should always smile on dating profiles – not because grinning, alone, into a webcam is a fun or happy task but because the hopeful viewer of our smile will experience the neurological protocols concomitant to their own smile: happiness. We should smile at the camera so that the other, through no conscious contemplation but on a sub-conscious level, feel the feelings connected to their own smile. We should smile in the hope of emotionally affecting the other. Disgust is a similarly contagious gesture and one of neuroscience’s strongest evidencing of neural mirroring. When we see another vomit we often feel nauseous too. Likewise, and especially because language speaking and cognition is intrinsically gesture based in neurological terms, when we speak to someone we tend to fall into a similar tempo without realising it. If someone lists numbers as we are trying to count, either out-loud or without speaking, we lose count. The degree to which the other’s voice and the language it carries gets under our skin and into our heads cannot be overstated. Syllabic acceleration is contagious on the basis of subconscious neural mirroring across temporalities in order to maximize communicative efficiency. We tend to match our speech rates to the other.

Similar tempos of speech are more easily cognized. As we listen we look ahead and map out a future rate of delivery. Falling into tempo with the other is as much a part of syllabic gesture mirroring as it is about out temporal and rhythmic attentions of listening manifesting itself through our own vocal delivery in response. In a sense, if the other speaks at the same rate as us then cognizing efficiency is maximized. Easy listening. “We think people who speak in a similar tempo to our own are more competent and more attractive than those who go slower” (Karpf, 2007, p.43) As we speed up, others will do so too – not just as a pre-perceptual affect of neurological mirroring but because we tend to talk more to people we engage deeply with. So, in turn, we expose ourselves to others who have similarly accelerated speech and, of course, mirror, mimic and respond with similar rapidity.

Given that we mirror and mimic the speech we are exposed to it is important to consider what types of speech we expose ourselves to. A lot of our exposure is technologically afforded. MP3’s with high tempo and frenetic sample based musics blot out the organic babble of the crowd. Social spaces are flooded with the humanly impossibly vocal licks of contemporary electronic popular music. We spend hours watching television programs that contain accelerated syllabic content for laughs or advertisement space. Hyper-spliced, energetically cut and chopped YouTube videos that omit any hesitance, pause or delay from the speaker are a growing online video aesthetic. Today our exposure is as much to post-human, technologically accelerated syllabic intensities as it those within the human remit of syllabic delivery frequencies. The majority of our exposure is not the humdrum organicism of conversation but the accelerated form of media. For the past few generations a synthetic, technological, relationship to voice has been naturalized – the baby monitor, at once transmitting and amplifying the infant’s cry to the mother, is the ubiquitous exemplar of how post-human our cries and hearkening have become. Berardi, referencing Rose Golden, notes this shift: “For the first time in human history, there is a generation that has learnt more words and heard more stories from the televisual machine than from its mother” (Berardi, 2009, p.9 – referencing Rose Golden from 1975).

Of course, the profound influence of the info-blitz is not limited to sound and voice. It does extend to visual information processing. A recently uncovered example of this is the difference in automatic eye movements between digital readers (i.e. post internet generation) and non-digital readers. In short digital reading follows in F shaped eye movement, scanning the first paragraph and taking heed of a subheading but ultimately shifting down quickly and neglecting text on the lower right hand-side of the screen. Google refers to this as “the golden triangle” of attention. “the bottom line is that digital bombardment has changed reading patterns. Increasingly digital readers tend to unconsciously ignore the right side and bottom half of the page and tend to only read content in those areas if they are highly motivated to do so.” (Jukes, 2010, p.28). This is not surprising, consider, following Berardi’s citation of Golden the following in contrast to an individual whose formative years were pre-info blitz:

“by the time they’re 21, the digital generation will have played more than 10,000 hours of video games, sent and received 250,000 emails and text/instant messages, spent 10,000 hours talking on phones, and watched more than 20,000 hours of television and 500,000 commercials (and most assuredly these estimates are on the extreme low side). Almost none of these experiences our parent or we had while we were growing up” (Juke, 2010, p.29)

20,000 hours, at least, of exposure to technologically accelerated and artificially energetic and content dense speech manifests in our increased syllabic deliveries. 500,000 instances of attention being honed on the ultra-quick delivery of terms and conditions closing monologues, again, conditions us for maximized verbal efficiency and ever increased language processing speeds.

But it is not just that we accelerate one another and learn our words from artificially overdriven syllabic media contents. We consume psychotropic stimulants too. The ubiquity of coffee in pre-crash American sitcoms is notable. Friends, Frasier, The Gilmore Girls and many other shows tend to revolve around the semi-sacred importance of consuming coffee. Coffee, has been romanticized in propaganda by the media networks of late capitalism. First coffee of the day is important but so too is meeting for coffee after ‘work’. Similarly many characters, after a date, late at night, offer more sleep denying elixir to one another. Drink up! Be quick and productive in your precarious endeavors! Not only is the organism accelerated by the requirements and interactions of hyper-drive communicative semiocapitalism but also psychotropically accelerated, catalyzed on a chemical level – we caffeinate our minds and bodies to peak-jibbering efficiency.

The contemporary coffee shop (notably pitched as social space but mis-used by many as a solitary and dislocated workstation) is a catalyzing syllabic intensifier – a feedback furnace that accelerates our speech. The contemporary coffee shop is a capitalist hub, it is an engine of biopolitical acceleration in the service of production, a hive of exacerbating speech and neurological mirroring. Increased syllabic intensities, cutting off our breath, overriding respiration are symptomatic of the shows we watch, the coffee we drink and the accelerated others we mirror and mimic.

Part 2 here


Berardi, F, 2009. Precarious Rhapsody: Semiocapitalism and the pathologies of the post-alpha generation. Minor Compositions. London.
Jukes, I, 2010. Understanding the Digital Generation: Teaching and Learning in the New Digital Landscape (The 21st Century Fluency Series). Corwin
--> Karpf, A. 2007. The Human Voice: The Story of a Remarkable Talent. Bloomsbury. London. Pinker, S. 1994. The Language Instinct. Penguin Books. London
Zizek, S. 2008 ‘Language, violence and non-violence.’ International Journal of Zizek Studies 2 (3), 307-316

[1] Robin 1991: 403 cf Berardi 2009: 112
[2] http://www.punchnels.com/2011/08/22/life-is-short-talk-fast-ten-reasons-why-i-love-gilmore-girls/ (last accessed 15/05/2016)
[3] http://www.fuqua.duke.edu/news_events/news-releases/1034232/#.VzMu0WZOGC4
last accessed 15/05/2016.