Sounding Out Film

Sounding Out Film

Steven Connor

This is an expanded version of a paper given at the conference on Film, Literature and Modernity, mounted by the Institute of English Studies in London, 13-15 January 2000. I think of it as an unincluded enclosure within my Cultural History of Ventriloquism (Oxford: Oxford University Press, 2000)


Film theory has done its best to bring to life a phenomenological film-body through its way of ‘being with one’s own eyes’ (Vivian Sobchack), in which everything is commuted into the narrow channel of visual perception. What part is left for sound to play in the raising of this film-body, other than docile servant of the here-say and the see-hear? I say that certain kinds of sound belong, not to the order of ostension which dominates the image – in which everything is à voir, to be had by and made clear to the eye – but to an order of mutative commixture, characterised by the encounter and transformation of imaginary substances. It is sound, not in the civilian garb of music or voice, but as noise, which broaches the possibility which interests me of a wild phenomenology of film. There is no sense looking for a sonorous body to complete or even to compete with film’s body à voir. Having learned to see things with its own eyes, film wants also to hear its own voice, but remains a deaf or dummy medium. The body of sound is a body all right, but not film’s own.


Film’s Body

Film offers some of the richest and most intense forms of bodily experience of any art form. So powerful and so transforming are these experiences, indeed, that there is a tendency to think of film, not just as an apparatus for simulating or stimulating bodily responses, but as itself constituting a kind of body. Just as Condillac famously imagined a senseless statue progressively being draped with sensation and self-consciousness, so film theorists and historians have sometimes imagined the growth of film’s body through its acquisition of different senses and faculties.


The most developed form of this argument about film’s body is to be found in Vivian Sobchack’s The Address of the Eye, undoubtedly the most sophisticated and particularised account so far available of the phenomenology of film experience. What Sobchack calls ‘film’s body’ has grown historically, through the progressive actualisation of possibilities and overcoming of limitations. Sobchack plies the onto-phylo-genetic analogy to establish his claim that film’s body grows in the same way and through the same stages as the infant’s body, moving in the direction of greater and greater complexity, coordination and self-control. Sound emerges as a late addition to the ever-expanding, but ever more integrated economy of affective and perceptual capacities which is film’s body.


From its initial introduction as external musical and/or verbal accompaniment to the film to its current and sometimes problematic all-surrounding presence in Dolby stereo, cinematic technology has progressed towards the incorporation of external and discrete aural mechanisms into physically synoptic and synaesthetic union with the film’s visual organ, the camera, and its expressive organ, the projector. First, music was provided from a source completely external to the film’s body, as dialogue and narrative occasionally were (spoken from alongside of or behind the screen by human beings who were often unaware of or insensitive to the film’s unfolding contact and intentions). Then technological innovation and synthesis solved the problems of synchronization and amplification, and sound-on-disc emerged so that the film could more directly hear the world and emotionally color it with sound. Sound-on-disc, however, proved clumsy and limited and was only tenuously incorporated as part of the film’s body; there were problems with maintaining synchrony with the film’s more advanced visual fluency expressed by both the camera and the projector. As well, individual expressions of the same film varied in the contingencies of projection, while the sound-on-disc did not, and were therefore often “out of synch” with the rest of the film’s body. Optical sound, or sound-on-film, literally achieved synaesthetic cooperation and bodily union with the film’s primary perceptive and expressive organs. That is, technological synaesthesia between sight and hearing was accomplished through another bodily commutation of perception and expression: sound waves commuted to light waves that are recorded onto the film emulsion alongside the image and can be commuted back to sound waves simultaneously with the film’s projection on the screen.

This is one of only two discussions of sound in the whole of Sobchack’s book, and does not even feature in her index as such. Perhaps the reason for this is that it is not really a discussion of sound at all. It is a discussion of the opticising of sound, the taking in of sound to the eye-body of film, which is not so much all-seeing or omniscopic, as omnioptic, seeing everything in terms of vision. To be sure, Sobchack’s suggestion that film is ‘a subject of vision’ is intended to restore the bodiless Cartesian eye to its conditions of embodiment, replacing the Cartesian proposition ‘I think therefore I am’ with the phenomenological ‘I see, therefore I am embodied’, And, what is more, the embodied condition of the eye means that this formulation could equally well read ‘I hear/smell/taste/feel, etc’. At the heart of Sobchack’s analysis is a view of film as sensory and cognitive prosthesis which, far from alienating or dehumanising the body whose capacities it extends, expresses and fulfils that body’s nature.

The experience of making and watching films makes vision dominant in Sobchack’s account, not only to the exclusion of hearing, touch, taste and smell, but also in vision’s powers to coopt and commute these other senses. The only other reference to sound in Sobchack’s book is a mere parenthetical afterthought. ‘As the camera engages the filmmaker’s intentions toward the world in a primarily visual modality, it reduces his or her tactile intentions to their realization primarily through sight. (Sound can also render the world substantial and realize touch in an alternative [pp] modality.)’ The implications of film’s power not only to see touch, or touch by means of seeing, but also to hear touch or touch by means of hearing are not elaborated. The context merely notes suggests the parallelism between the synaesthesic, commuting powers of sight and sound. My idea here is to find a way of wondering whether hearing can indeed simply be folded into Sobchack’s excursive, elastically self-supplying eye-body.

Others have deployed the onto-phylogenetic analogy differently. The sound designer Walter Murch, for example, points out that the body of cinema develops back to front. A human body begins to hear half-way through gestation, before any of the other senses have developed to any significant degree, or had anything very interesting to apprehend, and thereafter the foetus grows and develops in ‘a continuous and luxurious bath of sounds: the song of our mother’s voice, the swash of her breathing, the trumpeting of her intestines, the timpani of her heart’. Things are entirely otherwise with cinema, which spent the years of its gestation and infancy ‘wandering in a mirrored hall of voiceless images’. Only late in its development did it develop the capacity to hear and produce sounds.

The confidence in the gowing sensorium of film sometimes expresses itself in a concern about the consequences of a disorganised body. Theorists of sound, as well as sound-technicians, have been much concerned with the nature of the ‘film body’, and the difficulties of coordinating sound and vision; more precisely, the difficulties of correlating the correlation between sound and vision of the ‘film body’ with the correlation of sound and vision of its listening viewer. John L. Cass, an RCA sound technican, was worrying in 1930 about the ways in which evolving practices of patching together inputs from multiple microphones in order to maintain intelligibility at all times, was creating a grotesquely distorted auricular homunculus, as opposed to the realistic preservation of ‘point of audition': ‘The resultant blend of sound may not be said to represent any given point of audition, but is the sound which would be heard by a man with five or six very long ears, said ears extending in various directions.’ 1 A couple of decades later, Jean-Marie Straub turned Cass’s mistrust into an entire aesthetic, urging and exemplifying the use of direct, unedited, time-of-recording sound, in order to avoid the fundamentally false or labyrinthine body-of-hearing constructed by post-synchronised sound: a director should not, as in post-synchronised films ‘transform a real space into a confused labyrinth and put the viewer into the confusion, from which he can no longer escape. The viewer becomes a dog who can’t find its young.’ Although Mary Ann Doane seems to disapprove of the fixing or centring of the body effected by film, she sees sound as fundamentally cooperating in this defence against auditory dispersal:

The aural illusion of position constructed by the approximation of sound perspective and by techniques which spatialize the voice and endow it with “presence” guarantees the singularity and stability of a point of audition, thus holding at bay the potential trauma of dispersal, dismemberment, difference.

Phenomenological analysis tells us – or at any rate me – too little when it is driven by the vitalist perfectibilism which Sobchack derives from Merleau-Ponty. One of its infortunate side-effects in Sobchack’s writing, for example, is an idealisation of the film-body to the extent that other forms and encoding of coordinated movement and sound, such as those furnished by electronic and virtual technologies (in which, perhaps significantly, the master-code into which and from which everything is translated is no longer optical but digital) become entirely unanalysable in phenomenological terms. What Sobchack calls ‘electronic space’ merely disembodies, as opposed to the ‘world of imaginative and potential bodily inhabitation’ provided by cinema. I am interested in the possibilities of a cultural phenomenology which retains the interest in the embodied and in-the-world experience of film while discarding the phenomenological prejudice in favour of putting man and the world back together, or in sync.
The à voir
For Rick Altman, writing in the volume of essays that he masterminded five years or so back, Sound Theory and Sound Practice, sound has no essence, no definitive shape or volume, no archaic resonances, no obvious destiny in becoming part of film’s body. Sound is intrinsically mobile, taken up anew in different ways, according to different kinds of hearing situation, different points of view – or ‘points of audition’. The reason for taking sound seriously in the analysis of cinema is just the fact that there is so little to be said about film sound as such. Sound, according to Altman, is the reminder, through its own contingent, excrescent condition, of cinema’s impurity.

Nevertheless, although Altman’s idiom is very unlike that of Sobchack, there are striking continuities between them too. The most important of these shared concerns is with the question of sonorous space and its coordination with the visible space of the screen. The question of space is dominant also in those, like Kaja Silverman, who take their models from Didier Anzieu’s conception of the primal ‘acoustic envelope’.

This emphasis on the spatiality of sound fits neatly alongside the larger obsession of film theorists with the ways in which the viewer is to be summoned or ‘sutured’ into the film space. Everything falls between the alternatives of diffused or centred subjectivity. It sometimes sems as though cinema issues the single command:pull yourself together. The demand for the screen, and for screened sound, is of a piece with the demand for a certain kind of reproportioned body; the necessity of preventing scattering, dismemberment, hypertrophy, unnaturalness, monstrosity.

The idea of the stabilising function of sound seems like the reversal of the economy of sound and vision set out in an earlier essay of Altman’s of 1980 entitled ‘Cinema as Ventriloquism’, in which sound is always said to pose a question for sight to answer. In Hollywood talkies of the 1930s onwards, as explicated by Altman, it is vision that perplexes, and sound that reassures. Vision keeps saying ‘Where?, while sound says ‘here!’

Of course, sound could also suggest other spaces than the space of the screen, and indeed, in the form of the voice-over, other times from that represented on the screen. Sound could signify the ‘off’. Silent cinema could refer to the ‘off’ only through gesture – pointing to the wings or the flies – look, over there!, or by literally cutting between scenes. Sound allowed cinema to refer to other, more indeterminate spaces, even with Chion’s acousmêtre , to indeterminate spaces within the visible space of the screen. It took cinema a long time to begin to be able to explore the possibilities of these other spaces – to be able fully to incorporate radio. But, of course, all this is still a matter of space; which is still to say, the operative dream of reducing sound to spatial coordinates, to some representable space relative to the space of the screen.

One can imagine how such a conception of sound might actually be assisted by the application of Bertram Lewin’s idea of the ‘dream screen’. Lewin suggested that the screen – conceived in fantasy as the invisible support underlying, but also sustaining all the violent action played out across it – was what held the dream together. We can see the concern with synchronisation and cospatialisation as an enlargement of this function of the screen; assisting in the task of simultaneously multiplying the stimuli to awakening – giving shock and surprise and disgust and overload – while keeping the viewer-dreamer asleep, or knitted into the dream. The envelope of sound becomes equivalent to the containing skin of the screen.

This emphasis on vision, perspective and position may perhaps be seen as no more than an accurate reflection of the evolution of cinema itself. For the fact is, as Altman has himself shown, audiences do not even need auditory realism. It may be true that, where cinema funnels the audience’s attention inwards, sound lets in the street, and lets cinema out on to the street – for instance in the operations of ballyhoo, the broadcasting of the soundtrack through loudspeakers to passers-by. But, in fact, it doesn’t matter where the sound comes from, where the loudspeakers are placed, where you are sitting in the auditorium, all of those elements which Altman and others insist are the signs of the radical heterogeneity of sound. Altman writes as though the material fact of sonorous diffusion were phenomenologically identical with the the experience of cinema sound, when the experience seems to have been the opposite: a massive attempt to keep cinema silent, to keep its audience asleep and sunk in its numb, tumultuous dream. The dream screen knows that sound is a toxic shock; it protects itself against sound by swallowing it up into legibility, filtering out everything but visible meaning. So imperative was the need to swallow sound up and make it safe for synchronicity, that cinema for a while almost lost the capacity to see anything but pairs of lips. ‘The people’, Alberto Cavalcanti, rather snootily declared in 1929, six years before the making of his own reflection on the ventriloquist powers of cinema in Dead of Night, ‘wanted to see the people speaking in sync.’ Cinema became, for a while, a primitive, two-sided organism, an eye locked on to a mouth. It is not vision, but visible speech which comes to represent the norm. In this sense cinema recapitulates the movement of the gramophone, which quickly narrowed down the spectrum of recorded sound to the individual singing voice.

During the 1930s, everything seemed to come down to the question: how can the screen be made to talk? Everything must be returned to the screen: everything must be given a voice, and everything heard must be given as a voice. Nothing heard can remain without a visible or spatialisable source. Intelligibility is all. For Rudolf Arnheim, this means an anthropomorphisation of cinema’s speech.

In the universal silence of the image, the fragments of a broken vase could “talk” exactly the way a character a character talked to his neighbor, and a person approaching on a road and visible on the horizon as a mere dot “talked” as someone acting in close-up. This homogeneity, which is completely foreign to the theater but familiar to painting, is destroyed by a talking film: it endows the character with speech, and since only he can have it, all other things are pushed into the background.

Now, only humans speak; but the cinema itself has become human, has become an organ of speech. It is not surprising that the culminating metaphor of cinema for Sobchack is as a kind of visual speech – the ‘address of the eye’.

Everything depends upon the screen, upon what is seen, or shown. Everything becomes a matter, not just of the visible, or the available for sight, but, so to speak, of the à voir, of the to-be-seen. We might imagine that these conditions would be much more in evidence in the silent film, in which only what could be seen could register at all. Thus, voice is translated into gesture, the body taking over from the speaking lips. But, in fact, this necessity to refer everything to the screen seems to have become more rather than less remorseless in sound cinema, in which nothing will be audible that cannot be screened – filtered through the requirements of the visible, which, for the greater part of the cinema’s history, has meant visible speech.

The demand for the eye is the infant (=speechless) demand of cinema. Children demand the eye of their parents or carers; there is something deeply unsatisfactory for them in narrative or description. Come here; come and look; stop what you are doing and look at me. The desire to be looked at, to capture the gaze of another is much more imperious than the desire to be a bodiless, invisible, Cartesian eye, taking possession of a world yielded up for it in Heideggerian Gestell. The allegedly objectifying power of the male gaze in cinema, hungrily and cannibalistically consuming everything on the screen, is repeatedly overcome by the objectifying power of the screen which demands ‘look at this’, capturing and fixing the eye, and, in the process capturing and fixing everything in the eye, making everything a matter of the à voir, of the to-be-seen. The fascinum, which deflects and captures the evil eye, is really an interception of cinema in the net of sight. Sound is quickly recruited to this economy.
Deaf cinema
The central insight of Sobchack’s work concerns the ways in which cinema gives us mediated perception – the embodied perceptions of another – directly, in a way that we take into our own being as we view the film. Watching a film is both a direct and mediated experience of direct experience as mediation. We both perceive a world within the immediate experience of an “other” and without it, as immediate experience mediated by an “other.” Watching a film, we can see the seeing as well as the seen, hear the hearing as well as the heard, and feel the movement as well as the moved. We thus take in to ourselves, appropriating it as our own, another’s way of perceiving: the film’s body. The kind of perception in which we engage in watching a film is signification: it is access to that primary perception-as-signification which is one of Merleau-Ponty’s leading themes. ‘This is how I see it.’ Film grows its body through this capacity for double looking; for looking, via our eyes, over its own shoulder. As we look at a film, we give it the eyes to see itself, and to say itself seeing.

Film, argues Sobchack, is therefore much more than technology, or an aggregate of technical processes. It is the creation of a new perceptual relationship to the world, even, the creation of a new form of life: cinema, she declares, is ‘life expressing life…experience expressing experience’. Sobchack takes from Merleau-Ponty the insistence that perception is never merely passive, never a mere registration of, or response to the world, but always rather a creative making of it. Perception is expressive for Merleau-Ponty, in the way in which the body sings the world. Film is actively mediated immediacy, something which we look with as well as at. Film is therefore not just an object in the field of a perceiving body, it is itself a doubled perceiving body. It is not just a body, in the sense of a mere mass or store, of affective and perceptive possibilities, not just a body in itself. It is also a body for itself, that makes possible a kind of perception-consciousness which has self-relation. As we watch a film, film watches its own body coming into shape. When I see something on the screen, I do not see an object: I redouble an act of looking: I take in to myself and reenact the camera’s look. What is more, I am the film’s prosthesis, the means by which it can show what it sees as a seeing: I am the otherness by which it can complete its circuit of self-seeing, can watch itself seeing, by showing itself as an act of seeing, for me to see. When I hear something on a soundtrack, I may listen with the microphone’s ears, redouble its way of listening; but I do not in the same way speak with its speech, in unison.

This is to say that I do not altogether accept Michel Chion’s characterisation of the process of auditory identification with what he calls ‘the I-voice’, when he claims that, in a film, ‘when the voice is heard in sound closeup without reverb, it is likely to be at once the voice the spectator internalizes as his or her own and the voice that takes total possession of the diegetic space. It is both completely internal and invading the entire universe’. Chion is here suggesting that there could be a kind of vocal point of view which is equivalent to the visual point of view offered to us by the camera. Others have assumed this process of identification with voice-over narrative. I think that the reason why the audio-viewer does not identify in this way lies in the question of synchronisation. There is always something belated about auditory identification, partly because the nature of sound is to occupy a passage rather than an instant of time, a duration rather than a moment. In order to hear a sound, one must have already heard it start to decay, or come to an end; one must already have started finishing hearing it. One hears very largely analeptically, in memory, even with the most shockingly immediate of sound effects; which appear to bore a hole in auditory attention which is then only slowly filled up with definition. Sound is always, as Rick Altman has put it, ‘sound hermeneutic’. A question is always broached by a sound, a question that cannot be laid to rest until the sound is embodied, prescribed an origin, returned to the source from which it can then be seen to have come. Sound must be sighted. Sound comes first; in order that it can be shown to have come second, to have emanated from a source. This is because sound is temporal, a matter of events rather than states. In the subordination of sound to sight, do we not see the priority of spatiality to temporality; the repeated demonstration that there are states which produce or result in events. We always seem to hear the sound of an event after the event itself. Sound is the actualisation of a potential, but has no potency, no puissance, or pressure of its own.

By contrast, visual perception is proleptic, since, as Merleau-Ponty and others have shown, it is closely bound up with our active throwing forward or reaching out of ourselves and our projects into the world. I do not merely see the tennis ball coming over the net: as I see it, I am already disposing my limbs in a kind of anticipatory visualisation to make contact with it, or, in my case almost, the seeing being always a kind of preparatory motion – vision always containing an element of visiting, or ‘going to see’ as Michel Serres puts it. We use our eyes not merely to fix ourselves in position, to put ourselves in the place of the camera, but to move ourselves into the place that we wish or expect it and ourselves imminently to be. The imminence of looking is what enables one to internalise the camera’s look as one’s own, appearing to drive and direct its gaze with our curiosity and appetite. The cinema satisfies our longing for identification because it seems obedient to our wishes. This is what makes the physical fact of the projectedness of cinema so potent a metaphor of its workings. If we do indeed internalise voices and even sounds, by contrast, it is as a result of our compliance to them rather than theirs to us. If we identify with the ‘I-voice’ of the unseen or acousmêtric narrator, it is because this voice has exceptionally gathered to it the powers to direct or produce or project what we are about to see in advance.

It may be objected that music and song are also special cases here. Does not this sometimes imply or enable a kind of identification or introjection, if only because music, unlike speech, is predictable? Those who come to know a film’s soundtrack like a melody are indeed able to lipsync it, as in those cinemas which showCasablanca to its adoring audiences: when the sound fails, the audience are able to supply the missing dialogue. I think these are important exceptions, and they point us to the link between obliquity and memory. We can only anticipate sound as a result of the reactivation of memory; memory is much less requisite for the operations of visual projection. This perhaps points to a difference between music and raw sound: music allows the film to hear itself, allows hearing itself to rise into self-consciousness. But music does not come from cinema’s mouth; from the mouth of the screen, from the speaking lips. Music, like sound, is emitted, rather than spoken. It comes from somewhere else in film’s body: comes from the underneath or backside of that body. There is nothing to correspond in the visual image to this emission.

If film’s eye is the means by which it acquires a reflexive, self-seeing body, and is the point in the film apparatus at which the body of the film and the body of the viewer come together – in particular in the proffering and acceptance of a ‘point of view’, there is no equivalent prosthetic organ for film sound. Clearly in one sense, the microphone is the equivalent to the camera, in being film’s ear where the camera is its eye. But we do not naturally or easily orientate ourselves with respect to the microphone as we do with respect to the camera. There seems to be a number of reasons for this. The first is that the microphone is studiously kept invisible – indeed a whole art and technology has grown up in order to keep the devices for registering sound themselves unregistered. The camera cannot be looked through without announcing itself as being looked through, and so is always itself visible in what it makes available to see. It is not just that microphones happen to be invisible; cameras usually are, too. It is also that, as Chion puts it, ‘the mike must remain excluded not only from the visual and auditory field (microphone noises, etc.), but also from the spectator’s very mental representation.’ . This last point implies that we do not find it natural to identify what have been called ‘point of audition’ in the same way as we naturally identify point of view, partly because of the magnetising pull of the visual, which will tend to pull everything into its own visual configurations. Point of audition mixing is hard to maintain strictly for this reason.

The unrepresentability of point of audition, the difficulty of forming the same kind of bond between the microphone and the ear of the viewer as is formed between the camera and the eye of the viewer also seems to derive from another important distinction between seeing and hearing. You can cut visual material but cannot, apart from in specific and exceptional circumstances, such as lap-dissolves, for instance, sustainedly or tolerably mix this material, as you can mix sound. In split-screen films, or in sustained passages of superimposition, the eye is forced to flick uncomfortably backwards and forwards between the two images thus combined, as in certain optical illusions. The reason for this is that we experience and think of seeing as an action necessarily performed from one particular perspective. Although we have two eyes, so accustomed are we to synthesising the information they collect that there is no difficulty in identifying ourselves with the monocular mode of cinema. Experientially, none of us can see with two eyes. Perhaps this very built-in limitation in looking, the very fact that you cannot see something from two different points of view simultaneously – least of all, perhaps, in a Cubist painting – is what makes looking apt for reflexivity, for the cognitive doubling of the eye of flesh by the eye of the mind, which can see the outward eye looking.

We do not experience hearing as material coming to a point, whether in us or anywhere else. We have two ears, as well as many other channels of hearing: the bones of the skull, the fingertips, the soles of the feet. Sound is mapped across the whole body, rather than converging cone-like on one organ of entry. When we hear a sound that we refer to a place high above our heads, or off to one side of us, then the effect is of a sound striking us on the appropriate portion of the body, rather than necessarily being conducted towards the ears. We feel sight come in at the eyes, which is one of the things that helps make visual response feel so actively forensic. The projector is an eye-like device for producing rather than collecting visual material. We feel hearing coming in almost at every portion of our bodies, which then become a variably vibrating membrane rather than a projector. This, along with the fact that it is so easy to mix together sounds gathered from different points of audition at different times, or to transform them in various ways, militates against the identification with and internalisation of the microphone as a single hearing point. Seeing is monoptic in the way it brings things to a point: hearing is panaural. The hearing body, if body it is, is thus in a very literal sense, a body without a single identifying organ. We will see a little later that not having an organ seems to deprive the hearing-body of film of reflexivity, of being able to intend itself as the means of intending the world, of a means of knowing its own way of knowing.

It was often said during the 1920s and 1930s that the addition of sound to the cinema meant that the cinema had learned to talk. This makes for some striking parallels between the new technology and the long history of concern about how to give speech to the deaf, a history that has recently been charted in Jonathan Rée’s I See A Voice. The majority of deaf and dumb people are not, of course, mute for any organic reason, but only because they have never heard themselves. Rée shows the recurrent split between those who have believed in the power of visible speech, in the form of signing and gesture, and those who have attempted to train the deaf to speak without the normal feedback mechanism provided by self-hearing. It has seemed to many of the latter that it could only be by being given access to a spoken language that the deaf could be given a fully living, fully human body. The recurrent concern is that, unless one can hear the words one speaks, the result of teaching the deaf to speak is just the creation of a talking machine, a parrot or ventriloquist’s dummy.

One may perhaps see the so-called silent cinema in these terms: not as a form of representation that cannot make itself heard, but as a deaf cinema, that cannot sufficiently speak up for itself, because it cannot hear itself through our ears as it sees itself through our eyes. Just as so-called ‘deaf and dumb’ people typically engage in many and varied kinds of vocalisations, so the silent cinema is actually extravagantly full of utterance and sonorous occasion – cries, concussions, explosions, general sound and fury. It was, of course, necessary to signify this sound in all kinds of different ways – by movement, posture, gesture, expression and intertitle. As Robert Bresson remarked, it took the coming of sound to make silence possible in the cinema. Even the most deadpan performer and inexpressive performers, like Buster Keaton or Harold Lloyd, derive their innocence or serenity from their apparent deafness to the clamour around them. They are cut off, belonging with us in the audience, because they can’t hear either.

Perhaps we should think of silent cinema in terms of the transmigration of faculties sometimes reported in nineteenth-century mesmerism and hysteria.In silent cinema, sound passed across into the visible signs of bodily posture, movement and appearance. It does indeed seem to have been the case, as Alan Williams has argued, that the coming of sound resulted in a much more docile, less melodramatic body. ‘The new, post-melodramatic males and females would have seemed strange indeed and difficult to comprehend before the talkies, their behaviors grounded in a denial of what made the “silent” film tick: bodily expressiveness. The liberation of speech brings with it the repression of the body.’ Contemporary commentators also noticed the stilling of the camera. Speech had been got back from the body, into which, under the regime of mutism and aphonia before the talkies, it had hysterically passed, into the mouth; had been got back from the body of the world, for the human body. Indeed, the mouth itself seems to have thinned, to have got ever more tight-lipped and immaterial.

Randy Thom has spoken of the need to complete the evolution of film’s body, to give sound, not just as an environmental accessory, but as a dynamic; to show sound as having effects, to show characters listening as intently as they look.

A dramatic film which really works is, in some senses, almost alive, a complex web of elements which are interconnected, almost like living tissues, and which despite their complexity work together to present a more or less coherent set of behaviors. It doesn’t make any sense to set up a process in which the role of one craft, sound, is simply to react, or follow, to be pre-empted from giving feedback to the system it is part of.

The ambition here might be to replicate the circuit of self-recognition, that cinema achieves in its doubling of its own looked, in a self-audition. But this may be a mistaken ambition. For, as sound has evolved, it has been the dimension of noise, rather than dialogue or music that has most developed, a noise that is always an unencompassable part of the voice. Cinema wants its own voice, and wants to hear itself speaking; but it remains a talking head, that cannot hear itself. Cinema sees for itself; sees things with its own eyes. Words – and sounds – by contrast are always being put into its mouth. Perhaps in some fundamental sense, even the talking cinema remains deaf. We see what the cinema cannot hear of itself. Much of the dialogue and speaking sound-effects for films is in fact supplied for the image, rather than derived from it, in ADR, Foleying and other processes, in a process of lip-reading. Sound speaks, and the sound-editor reads its lips.

This is one of the most important reasons why it does not make complete sense to speak of a sonorous body of film to correspond to, or even to compete with its visual body. Film can never coincide with itself as sound. The body of sound is a body all right, but it is not film’s own body.
Cutting and Mixing
What is the right verb for the process of sound production in the cinema? The image is projected. The sound is – what? Emitted? Transmitted? Broadcast? The most important feature of cinema, as opposed to video or TV, is that its image is vectorial. One can see, not only what the image shows, but also its itinerary. One sees the image on its way. It moves through a determinate space, without which it cannot be seen, for it needs the opened out, projected space of the cinema (the word which originally signified the technical apparatus came to signify the place where it functions). It needs the screen to be at a requisite focal length, it needs the relative dispositions of screen, audience and projector. It is much more uncertain, by contrast, whence and whither sound travels. Light travels geometrically in straight lines, it goes from place to place. Sound saturates space, like a vapour.

Sight sees double. In cinematic seeing, there is always something seen against something else. Seeing is layered, and folded: one thing superimposed against, or cut out of something else: a figure against a ground, a framed action or event. There is universal agreement as to the minimal unit of the visual language of film: it is the shot, that which exists between one cut and another. Shots have visible edges. We see the end of the shot and the beginning of another as an edge, just as we keep in view the edge of the frame even as we concentrate our attention on what is going on in the frame. The universal language of cutting replicates the blinking and flickering, the continuous intermittence (and concomitant power of looking away) that constitutes sight, and is the basis of the kinetic illusion of cinema. Everywhere there are visible ridges, fringes, folds, edges, lines, guillotines, splits, discontinuities, overlayerings. I think it is this that Bela Balazs meant when he said that ‘sounds throw no shadows’, and therefore do not get in each other’s light, and need not be seen side by side.

The fact that cinematic vision is a matter of edges and intermittences is related to its reflexivity, its power to see itself for itself, to hold itself in view. In cinema there is an ostension: the presentation of something like a ‘sight-act’ which corresponds to the speech-act. When we acknowledge the illocutionary force of what somebody has said to us, we say ‘I see what you’re saying’ – meaning, I understand that, in saying what you have just said, you were meaning to perform this or that action, of warning, promising, congratulation, reproof, permission, pardoning, etc. Similarly, we acknowledge the act of seeing constituted at any moment in a film, and perhaps also the total way of seeing constituted by an entire film, or filmic oeuvre, by saying: yes, I see what you’re seeing; I see how you’ve seen that, and I see how you are presenting to my view what appeared to yours. As we watch a film, our attention blinks or flickers between seeing the object and seeing the camera’s look at the object. It is therefore no accident that the distinctiveness and power of silent film should have been understood above all in terms of collage and montage, or the laying of discrete elements adjacent to or on top of each other.

So film undoubtedly has a language, perhaps because it always already was one. Film was always capable of showing what it was by the same means as it employed to be what it was. There is a language – of forms, processes, events, actions and concepts – which is adequate to the visual experience of film: not identical with it, but adequate to it. It is much harder to come up with a language which is adequate in the same way to film sound. Speaking of sound usually means indulging in one or another displacement or obliquity. Most commonly, one will substitute the discussion of sound sources for discussion of the sound itself. Less commonly, one may resort to a technical language to allow one to isolate, analyse and manipulate the physical aspects of the sound. Whenever one attempts to speak phenomenologically of the experience of sound itself, one is drawn into its primary register of mutative substance – of weights, shapes, textures – and the primarily, though not exclusively tactile sensations which render them. This tactile language is, of course, catachretical (the sound does not really consist of substances but of vibrations), but it is not a displacement, like the other two ways of speaking about sound just mentioned, for it does not deflect us from its object. To speak of sound in terms of mutative substance is not a category error, or impropriety, which opens up a gap in the discourse between the object designated and the act of designation. In this way of speaking about sound, the problem is not that discourse is too exterior to that with which it has to deal, but that it is sucked in or consumed by sound: discourse reverts to or folds over on itself, to form its own sonorous substance, compelling it, not just to name the sound in question, but also mimetically to mingle with it. A discourse about sound, about its non-transcendable corporeality, is thus itself denied transcendence. If discourse comes into being by the transmutation of the eating function of the mouth to the speaking function (a movement from the full mouth to the empty mouth, as Nicolas Abraham and Maria Torok have suggested ), then, in speaking about sound, speaking can become again what it once was, a kind of eating, it can be eaten up by eating.

The language of vision is adequate to it because it replicates (refolds, isomorphically transposes) vision’s power to see and grasp itself, to abstract itself, to reflect and reflect on itself. You can analyse the visual components of film in language because the gap between the language and the film is isomorphic with the gap within the visual experience of film itself. The language of sound is not adequate to sound because it is contaminated by it; it does not represent sound, but merges with it. This is not isomorphism, which preserves two equivalent structures in their equivalence, but metamorphosis; language and its object entering into each other in mutually mutative mimicry.

So the space of visuality in film is always at least double. It is a space in which heterogeneous objects can move around. There is always a space within which what visibly happens takes place. There is always a mis-en-scène, a a setting, a set-up, a putting into the scene. But sound, though always recruited and commuted to this scenography, can also participate in another order, an order of archaic topography evoked by Didier Anzieu, in which there is no distinction between figure and ground, object and setting:

Before it becomes a setting that contains objects, space is not differentiated from the objects that occupy it. Even the expression ‘from the objects that occupy it’ has no meaning. This lack of differentiation between an object and the place occupied by it in space is the cause of one of the most archaic anxieties that the mind has to face – the anxiety of seeing an object that moves tear out the part of space in which it was located, take it with it, and encounter other objects into which it crashes, destroying their place.

I connect this with the aspiration which Lyotard has identified in contemporary music and sound art towards a kind of primary inscription, an inscription which bursts or ignites its frame:

I’d like to falsify the value of the prefix ‘e’ to hear in écriture something like a ‘scratching’ – the old meaning of the root scrioutside of, outside any support, any apparatus of resonance and reiteration, any concept and pre-inscribed form. But first of all outside any support. The matter I’m talking about, the nuance (colour, timbre) would have to be imagined – but this is already much too heavy – as though it were at one and the same time the event and what it happens to. There would not first be a surace (the whole tradition, heritage, memory) and then this stroke coming to mark it. This mark, if this is the case, will only be remark. And I know that this is how things always are, for the mind which ties times to each other and to itself, making itself the support of every inscription. No, it would rather be the flame, the enigma of flame itself. It indicates its support in destroying it. It belies its form. It escapes its resemblance with itself.

Sound, as Michel Chion has emphasised, has no frame. There is nothing to hold it in; so there is nothing to hold it in. Vision has the screen, and on top of the screen, the image, which can be full or empty of images, but is always there as the space within which images play. There is always an edge and an outside to the frame, even, perhaps especially when action seems to spill beyond it. There is also therefore no ‘framing’ of sound in the metaphorical sense either: no way to put the hearing of sound into a frame and overhear it. Hearing is always and only exposure of the ear. There is no ear of the mind as there is an eye of the mind, just as, and perhaps just because, there are no earlids to correspond to eyelids.

Let’s say there are two kinds of folding involved in the orders of cinematic sight and sound. Cinematic sight folds things together as one folds a sheet of paper, bringing edges and planes together so that they touch at specific points and faces. Cinematic sound folds things together in the way that one shapes or folds what is already a three-dimensional substance: as one folds an egg into a meringue mixture, for example.
A Philosophy of Mixed Bodies
It can be agreed that sound in cinema is bodily. But it is bodiliness of a certain kind. It is diffuse and intermittent bodiliness. It is intense, but evanescent. It has no place to reside or come to rest. The image on the screen installs you at the point of view, every seat being the monarch’s seat, the best seat in the house. The sound of the screen is not primarily ‘on’ the screen, but in the listener. The very concern about the spatiality of film sound, which is shared by early sound recordists, technicians and cinema designers and contemporary film theorists, may be a concern not so much to establish the right place for sound and hearing, but to make them signify in a spatial register at all. For sound works – or can work – by a special kind of proximity, a proximity that ducks underneath proxemics, perspective, position. Sound depends, for example, on the apprehension of the behind and the underneath.

Sound must be understood as primarily experienced, or at least at times experiencable, not in the modality of ostension, or exhibition, but in the modality of what might be called the mutative commixture of substances. Sound is substantial, plastic, voluminous. In sight, things move across, or come up against each other. The visual channel of film is, to borrow a Joycean word, a ‘collideorscape’. Everywhere in the order of cinematic sight (which is not identical with the order of cinema in total, or as such) there is this combination and correlation, distinction and differentiation. What sight does not permit is commixture. When sounds come together, by contrast, they change and are changed; they enter into each other. Edges dissolve. When one hears a sound, one never hears just one thing. One hears the sound of contact, of echo, reverberation of one thing against each other; any sound is always at least two. One never even hears a particular commingled sound – a footstep, a punch, a shot, a shout – alone and uncommingled: one hears the sounding of the sound in a particular environment, the sound as it has been touched and retouched in the acoustic context through which it has spread, and which has imparted its own particular qualities to the sound, bleaching out certain frequencies, dampening or sharpening the attack or decay of the sound, desiccating or liquefying its timbre, emaciating it or doubling it in reverberation. There is no pure silence to be heard in the cinema, just as there is no blank screen to be seen which does not strike us as blankness projected on to the supporting screen. Sound recordists always take care never to leave a location without recording several samples of its characteristic silence-imprint – of its room-tone, which they can stir into the dialogue during ADR. In sound, things merge and interpenetrate. Sound is the mixing and reciprocal mutation of bodies and substances. Sound is the realm of metamorphosis. This is why sound is so much closer to touch and even to smell than to sight (Walter Murch has spoken of the importance to him of ‘the smell of the sound’.) Sound is not only sometimes quadrophonic, it is also quadruped. Sight is upright; in Freud’s terms, the magisterial quality of seeing in homo erectusis defined mostly by the fact that it signifies the triumphant departure of the nostrils from the genitals.

One of the ways in which cinema strives to keep its visual and its sound tracks together is in the work of post-production. However, there is a striking difference between optical and sonorous post-production. The processes of visual editing, though they involve a great deal more than mere sequencing of material – mixing, filtering, cross-cutting – nevertheless never escape from the demands of the à voir. This means that something of the image always survives the process of editing. Editing is an optimisation of sight, a sharpening of focus, never a fundamental transformation of the seen. The processes of sound post-production involve much more fundamental transformations, transcodings, manglings and inventions. Recorded sound has the quality of being much more manipulable than the image. An image can be repeated, or sustained, filtered, faded down, but it cannot be subjected to anything like the range of variations in intensity or condition that recorded sound can. A single sound can, if necessary, be stretched out to occupy the entire soundtrack of a film. It can be contracted, sweetened, dirtied up, damaged, accelerated, slowed. Of course most sound production especially for Hollywood movies has been geared towards the production of dialogue and ‘sound effects’ that fit and cooperate, even if they do not belong to the images that are seen; that fill out what is seen on the screen, rather than emptying it out. But there is always a tendency for the order of mutative commixture to pull against this vocation. It is only in animation, where the photographic bond between image and source is loosened, that image attains the same elasticity and transformability.

This manipulability of sound means that it can much more readily be coded into other things; into touch, smell, texture, weight. Sound is pulverised (it is surprising how much spontaneous and casual violence there is in the language of sound editors and postproduction people when they talk of what they do to sounds). In mixing and editing, vision remains vision: no matter what is done to it, it never turns into anything else but vision. Sound, by contrast, is subjected to, or delivered to fundamental transformations of its nature. It is perhaps not surprising that most of the effects I have been describing in the order of sound-substance or mutative commixture are to the fore in thrillers, or action films, involving violence. For it may be that a certain violence, a violence done to the notion of a continuous and organised and individuated body is inseparable from sound.

Film is limited to rendering the sensory fullness of the world in terms only of sound and vision, both of which must therefore be, to some degree, synaesthesic. But the degrees are markedly different. Sight has a close synaesthesic correspondence with touch – Merleau-Ponty has spoken of the association between seeing and grasping and the eye’s action of palpation. But the majority of the remaining sensory apprehensions of the world – of wetness, texture, weight, heat, texture and odour – are channelled in film through sound rather than vision. Hearing, which is anyway more intrinsically mixed than the action of seeing, seems to be more inclined to enter into synaesthesic exchanges than seeing.

Hearing is also associated with intensities of different kinds; in the oscillations of volume, frequency, duration, attack, reverberation, etc. The realities rendered in the image are extensive rather than intensive; they are quantitative rather than qualitative. It is hard to say what the precise visual equivalents are for increased and decreased volume, timbre, attack, and so on, are. Film only has one visual mode into which to transcode all of this variation of intensity and amplitude: the rhythm of camera movement and cutting. The effect of increased visual alertness brought about by variation of sound intensities may in fact derive from and depend on the dynamism of this deficit, the fact that there are no precise visual equivalents. (It is interesting to note how subdued the work of the camera became for a while, once sound was there to do the work of marking variations in different kinds of intensity; previously, one might say, the variability of movement, as well as of projection rate, did this sonorous work in visual terms.)

Rick Altman points to all the ways in which sound seems to deport cinema from itself, contaminating it with all its different contexts of understanding and perception. In fact, the early opponents of sound worried that it would actually narrow the range of possibilities for cinema; restricting it by language, making it subservient to the demands of realism. Eisenstein worried that sound would make the fluidity of montage impossible, since sound made each scene ‘sticky’ and self-identical. Sound certainly tied cinema to what one might call ‘duraction’, or actions enacted through time and having their own specific relative durations. Sound forced a modernist cinema of ‘spatial form’ which was capable of moving across a spatialised landscape, employing repetition, looping, and variations of speed, to be more narrowly obedient to temporal passage, to the linear demands of consequence and plot – ‘if this…then that’. For a time, cinema did indeed become less fluid, more stylised, more centred.

But, in a curious way, although sound always adds time or temporality to the image, subjecting the visual to the idea of elapse, the plastic manipulability of sound also expresses the possibility of time congealed into spatial form. Speed, for example, is transposed in sound into effects of intensity or definition. A soundtrack in which lots of things are happening, or in which a percussive beat is accelerating, achieves its effects largely through the intensification of alertness achieved through heightened definition and contrast. What matters is not so much the variation of speed, as the intensification of the sense of the sound as a substance, undergoing violent agitation. One could say that some of the fluidity in terms of visual construction, the sense that the film might be shot and projected at varying speeds, to correspond to the exigencies of mood and feeling, passed across into the soundtrack, which, though the imperious demands of synchronisation, which come down in the end to the ventriloquial demand to glue the voice to the lips, do not allow it to pull away very far from the demands of synchronised here-say and see-hear, allow for the production of different kinds of intensity and definition.

However, the association of sound with synaesthesic transformation and with oscillating intensity is also one of the reasons that the sound of a film can never really achieve or maintain autonomy as sound. The tendency of sound is always to cling around or lock on to visual or visualisable events, in the process transforming those events, in that process of synchresis, the ‘spontaneous and irresistible weld produced between a particular auditory phenomenon and visual phenomenon when they occur at the same time’ identified by Michel Chion. The effect of this is to break up the continuity of the sound-track. I think that Chion is right when he says that there is no such thing as a sound track, nor any possibility of a real or sustained ‘counterpoint between image and sound’.

Despite all the ways in which the language of sound engineers and sound-aestheticians mimics visual processes of cutting, layering, combining, coordinating and correlating, and despite all of the attempts to mimic sound-mixtures in visual terms – through cross-fading, lap-dissolves, double-exposure, use of colour to ‘bleed’ scenes into one another – the order of correlation is fundamentally separate from the order of commixture. But the mimicry or relaying of effects between cinematic sight and sound raises an interesting, and far from merely theoretical question. If sight and sound inhabit and instate different orders, which are nevertheless intimately related to one another, what is the nature of the relation between them? Is the relation between sight, as the order of correlation, and sound as the order of mutative commixture, itself one of correlation or one of mutative commixture? Do the orders of sight and sound come together in terms of sight or of sound?

The visual order of cinema blinks; its aural order stutters, in the generalised sense offered by Deleuze. I offer, as an image of this stuttering, the scene from David Fincher’s Seven in which a young couple entetaining a police detective in their new apartment experience the shuddering vibrations of a subway train. The sound of the train is one of those unplaceable sounds, which is not easily accounted for in terms of Chion’s model of the located, the acousmatic and the acousmêtric. The train is not present on the scene; but its sound is. Its sound becomes visible as vibration, as the shudder imparted to the frame itself. In passing across into the order of the visible it ceases being of the order of sound: it is an image of the power of sound to take substance to its limit, a power which is thematically related to the concerns of the film Seven itself with the decomposition of matter, especially the matter of the human body, as it is to many films in which sound features like this:Apocalypse Now, Terminator 1 and 2, Star Wars, The Mask. Sometimes, as in a work like the television version of Beckett’s Not I, which gives us only the image of a frantically working female mouth in close-up, this violence is made to invade the order of the invisible, so that it is not merely held on one sound of the discourse/orality divide, but floods across into visuality. In Not I, what is spoken of is language and speaking as the agonised work of substance; the speaking of it summons it up, but does not frame or control it. One begins truly to see, and to see with, the blindness of sound.
Coming to Life
If we take seriously the idea of a cinema-philosophy, the work of thinking undertaken in cinema itself which Deleuze proposes as the double and warrant of his cinematised philosophy, then the thinking undertaken in sound might be a special form of this thinking: a thinking through things, rather than the allegorical use of the materiality of what is seen and heard to figure meaning. Putting it in the Artaudian terms that dominate in The Logic of Sense (and Deleuze’s writings with Félix Guattari), there is, in cinema, as in early life, a fluid body of noise that is raised up into the body of voice. ‘What is stolen from the schizophrenic is not the voice; what is stolen by the voice from on high is, rather, the entire sonorous, prevocal system that he was able to make into his “spiritual automaton.” ‘

It is hard not to represent this concrete thinking as anything but primitive, archaic, or infantile, since we privilege clarity and articulacy and a cinema that knows what it is doing. Oddly, the use of sound in this way, as a primary, but evanescent materiality, may allow cinema to preserve some of the polymorphous signifying capacities of the silent cinema.

Sobchack’s subtle, illuminating analysis of the forms of living expressiveness that cinema uniquely makes available offers a huge benefit to the understanding of the form and its possibilities. Sobchack asks, not so much ‘what is cinema like?’, as ‘what does cinema want?’. Her answer is that cinema wants to come to life. It is the beginning of the fulfilment of every historical art form which has similarly craved the condition of life. It is this which links cinema to literature in modernism; cinema is taken into literature as the form of its most distinctive outrunning. Modern literature becomes itself in sprinting to keep up with the cinema that is leaving it so far behind, in its fluency, lyricism, sensory inclusiveness and immediacy. Modern literature dispenses with itself in the way in which cinema dispenses with it, outruns itself in the ways in which it is outrun by cinema.

Cinema has thematised this ambition in the many stories it tells of automata endowed with life: Frankenstein’s monsters, robots, cyborgs, computers, puppets, ventriloquist’s dummies, animated cartoons. The stories cinema tells of life animated are the story of its own power to bring to life. Sobchack’s analysis suggests that the very form of cinema is the embodiment of this will-to-life, film’s will to form its own body. In this, his analysis draws close to that of Deleuze, who devotes the final chapter of his Cinema 2 to the form and idea of cinema’s automatism.

Sound is often the warrant and enactment of this coming to life. The Iron Giant, who has come crashing to the ground, is revived by the young boy who throws a stone into its cavernous, clanking insides, in a sonorous parallel to the laying of the strip of paper beneath the Golem’s tongue. The cinema comes to life by coming to sound: and plays anxiously with its powers to simulate life by artificially synchronising sound in films like Dead of Night. Where noise becomes voice in film, here, just for a moment, voice has become inhabited by noise.

Just as cinema brought to life a different kind of literary writing, an aspiration in writing to be the more-than-itself that cinema represented, so perhaps what cinema continues to want to bring to life is an hors-corps, the body of a body-beyond-cinema. It is perhaps not surprising that cinema should attempt to make out this body-beyond itself through its obsession with figuring ‘life but not as we know it’ and with the making of other kinds of body: robots, monsters, artificial organisms. Sound is often closely implicated in this process. Walter Murch describes how, in order to produce the unearthly sound of a robot in the film THX 1138:

We had one Nagra tape recorder with the clean voices on it. We ham-radioed them into the universe, received them back again as if they were coming from another country. fiddling with the tuning so that we would get that wonderful “sideband” quality to the voices.

Sound, which belongs to and yet never sufficiently belongs to the body of cinema, just as the voice belongs to and yet never sufficiently belongs to the individual body, may be one of the avenues to this body-beyond-cinema. Soundstuff is not simply sonic gravy, not just a kind of all-purpose indeterminate plasm that can be applied to the film at will. It is form informed, a kind of thinking through things. Could it be that we may be discovering the archaic and counterfactual delights of the matter of sound in an auditised environment characterised by decompositions, becomings, compoundings as much as by articulations? It is easy to represent this as simply an infantile or archaic plunge back into a kind of formlessness. But it may turn out to mark the beginnings of a new reorganisation, or dispensation of the organs, which responds as much to the complex phase-spaces and topologies of modern physics and mathematical modellings, as much as it looks back to the infantile dominion of merely blind or brutish touch, or Anzieu’s archaically sticky spaces. So many of the complex forms with which engineers, astrophysicists, mathematicians, systems managers, physicists and even architects work produce challenges to the simple kinds of articulation apprehensible by the eye. Walter Murch tries to imagine the ways in which sound might belong to the new media-organisms of the future:

You tune in channel 432, one wall of your room disappears, and what you see is a live transmission of a volcano in South America, or the earth as seen from the moon. What is the sound for that? If you really could tune in the earth from the moon, live, our whole relationship to the earth would change. When there’s that moon channel 432, and you see the earth right before you, instead of the fourth wall, what’s the sound for that?

The new sound-films perhaps look beyond their condition of film: look forward to or begin to sound out the more complex and turbulently convoluted worlds of matter and information, meaning and noise in the emerging sensory orders that may be coming to life. The first generation of cinema brought the human body to life in film. Whatever will succeed cinema will be trying to bring to life the body of the world. And what’s the sound for that?