The Rhetoric of Description: Embodiment, Power, and Playfulness
in Representations of the Visual

Margaret Price and Erin Kathleen Bahl

Dimensions of Description

Read by Margaret. Audio length: 0:54.

This page offers a detailed account of our six-part framework for audio description (AD). AD studies (which could also be called description studies) is analogous to "captioning studies/caption studies" (Snell, 2012; Zdenek, 2015) in that it deepens and clarifies understanding of a specific access practice. Our framework is intended to serve as a heuristic for thinking through what AD is and what implications it has for the making of meaning. As noted in our Introduction, this framework is not intended as an exhaustive list of aspects of AD. Rather, we hope others will apply the framework in various contexts, and add to it, as work in critical access studies continues to evolve.

Read by Margaret. Audio length: 7:00.

In a 2004 Computers and Composition article, Kathleen Blake Yancey argued that digital composition is "made whole by a new kind of coherence" (p. 89)—a coherence built, she went on to explain, through association, repetition, context, and other rhetorical moves such as "weaving" (p. 95). Now, almost two decades later, we generally take that "new" kind of coherence for granted. For instance, it's common to speak of "tagging someone in" to a conversation, "blocking" particular users, or "muting" keywords or topics, as we attempt to manage ever-mobile content. However, the more coherent a platform appears to the user, the less attuned that user may be to the fragmentation and gaps it allows (or that are visited upon it). Yancey noted that fragmentation and gaps are as much a part of the "new" coherence as are acts like weaving or associating, and she rightly reminded us that "what we make of [that]" (p. 89) is an ethical as well as a pedagogical choice.

When composing digitally, then, where and when should descriptions go? For a video, an additional audio track may be layered over the original audio, using gaps in the dialogue or sound effects to add verbal information about the unfolding visual action. However, that can create a certain amount of aural crowding, not to mention an increased cognitive load for the audience (Fryer, 2016). Occasionally, though not often, the video may be stopped to allow time for description, in a practice called "extended description" (3Play Media, 2019). In the case of a still image, different questions of real estate (Zdenek, 2011) arise: Where will the description be located vis-à-vis the still image? Will it be an audio file, a text file, or both? How will users of the image know where to find the accompanying description? Will there be multiple descriptions—for example, a brief alt tag, and then a longer description included elsewhere? How shall those different versions of AD be presented and negotiated? And finally, what to do with interfaces that move beyond conventional notions of "moving" or "still" images? Through the app Be My Eyes, for example, the user chats with a sighted volunteer who describes whatever the phone camera is pointed at, thus creating a complicated mix of stillness, motion, author, and audience.

During the three years we've worked on this webtext, numerous social media platforms have begun to offer dedicated spaces for description. A detailed analysis of each platform is beyond the scope of this webtext, but would provide fascinating material for future studies or teaching exercises. Perhaps the most salient factor, from a compositional point of view, is that in every case, the platform appeared before its description feature did. Thus, users were already describing images before the official "describe your image here" feature was added. Users invented and applied workarounds, and once developers caught on, retrofits—some hasty, some carefully crafted—began to be rolled out (Edwards-Onoro, 2016).

Changes to Facebook between 2006 and 2010 offer an example of this process. The platform became available to any user with an email address in 2006, and immediately thereafter, some users—perhaps especially those who are blind or have blind Facebook friends, though we're only guessing—started describing their posted images. Often, in that early period, the poster would post an image, then add the description in the first comment field. As this user-centered practice evolved, it became common, especially within closed groups, for users to ask one another for assistance describing images. Sometimes, if the image was complex (like a multi-panel comic), the description of an image would be distributed via crowdsourcing through many comments. In 2010, Facebook began experimenting publicly with image recognition to auto-tag images; in 2011, the company announced formation of an "accessibility team"; and in 2016, images began to be tagged with automatically generated alt text (Metz, 2015) (discussed further in our section on "Tools").

Most social media platforms now include a description function of some kind. However, the cues offered about arrangement (that is, where and when) are not always clear. As of 2021, when you upload an image to your own feed (formerly "wall") on Facebook, you are first asked "What's on your mind, [username]?" via a prompt that appears above the photo. If you leave that field blank, post the photo, then navigate to its individual page (by clicking on it), and click "Edit," you get a different prompt to fill that same space: "Add a description." Thus, this field seems intended to serve a range of potential purposes, and is not explicitly identified as a space for alt text (which can be added on Facebook, but is not offered as a default option). By contrast, Twitter's tool for adding description of an image is explicitly labeled "+ALT" (on the mobile interface) and appears near a link saying "What is alt text?" (on the desktop interface). And Instagram, as noted in Description and Digital Media Studies, requires the user to select "Advanced Settings" from the bottom of a menu, then select the option labeled "Write Alt Text"—which, once written, does not appear with the Instagram post unless the user has a screen reader or other specialized tool for reading alt text.

We offer these accounts of navigating various social media platforms not to claim that some of the platforms are doing AD better than others, but to note the ways that a composer's choices affect arrangement. So do the audience's choices: For instance, the discussion of navigating Facebook, Twitter, and Instagram in the previous paragraph all reflect visual scanning of the page, rather than use of a screen reader. In summary, arrangement is not only deeply rhetorical, but governed by dynamics of power. This includes the kind of power that Aimi Hamraie (2017) has called "access-knowledge," that is, the ability to discern and act upon material forces as they manifest through spaces and interfaces.

For an example of "arrangement" in audio description, go to "Story Moment 2."

Read by Margaret. Audio length: 5:49.

As noted in our discussion of embodiment, AD always involves assumptions. For instance, a human figure might be described as "a man" based on phenotypic characteristics, rather than on self-identification. Such assumptions radiate through every aspect of AD. For human figures, assumptions may include gender, race, skin tone, age, body size, and disability, as well as emotional inferences such as "a bleak expression" (see Fryer, 2016, p. 169). For background or objects, assumptions might include phrases such as "a messy room" or "a storm is coming."

Scholars of critical access studies, AD users, and AD practitioners have theorized the issue of assumptions in detail (Braun, 2008; Hutchinson et al., 2020; Kleege, 2016; Walczak & Fryer, 2017). Most guidelines have advised describers to use their best judgment when making assumptions, and have emphasized the importance of the overall meaning of the film, performance, or image being described. The American Council of the Blind's (2010) Audio Description Guidelines and Best Practices, for example, stated:

Describe individuals by using the most significant physical characteristics. Identify ethnicity/race as it is known and vital to the comprehension of content. If it is, then all main characters' skin colors must be described—light-skinned, dark-skinned, olive-skinned. (Citing the race only of non-white individuals establishes "white" as a default and is unacceptable.) (p. 11)

This recommendation emphasized "significant" information. That key point, which Sean Zdenek (2015) has argued involves "the creative selection and interpretation of sound (for caption studies)" (p. 3), serves as a touchstone for AD as well. Information must be not only selected, but interpreted, and thus becomes part of a composition's overall purpose and arc. While most AD guidelines don't delve deeply into semiotics, linguistics, or narrative theory, the professionals composing descriptions, and the users accessing those descriptions, are usually well aware of the implications for power and privilege that their work entails (though that awareness may not play out seamlessly in the descriptions themselves).

Efforts to automate description have failed thus far to account for the power-laden dynamics involved. Indeed, in some cases, developers have tried to downplay the fact that making assumptions through description could be problematic at all. In 2016, for example, Facebook rolled out a feature designed to create automatic alt text for all its images, using recognition technology.1 Because the tool used to generate these descriptions is not as nuanced as a human describer, descriptions sometimes border on the ridiculous. For example, a picture of a pizza (recently accessed by Margaret) is accompanied by alt text reading, "Image may contain: food." In some contexts, that description might be useful; however, in the context of the Domino's Pizza Facebook page, from which it is quoted, it is not especially useful. Furthermore, when using a screen reader, plowing through those abundant yet vague descriptions is time-consuming—an issue analogous to the "accessible entrance around back" problem. Facebook claimed that its recognition feature will "[provide] our visually impaired community the same benefits and enjoyment that everyone else gets from photos," but ignores the fact that there may be real costs to those relying on descriptions for information (Wu et al., 2016).

Even more troubling, automatically generated descriptions replicate and exacerbate the racism, sexism, and other inequities that prevail offline. A well-known example occurred in 2015, when Google was called out for using an auto-tag that identified Black people as gorillas. This racist labeling is part of a phenomenon called "algorithmic bias" (Garcia, 2016; Glotfelter, 2019; Noble, 2018). Algorithms, as defined by Christa Teston (2018), are "co-constructed difference machines that rely on population labels in order to perform computational cuts" (p. 290). These computational cuts—or machine-generated decisions, as we might think of them—generally work best with light-skinned men's faces, and worst with dark-skinned women's faces. Furthermore, not only is no algorithm neutral, but some racist and sexist features are built deliberately into recognition software (that is, they display explicit rather than implicit bias). This is the case, for example, when a video doorbell is designed to announce all dark-skinned people as "suspicious."

For an example of "assumptions" in audio description, go to "Story Moment 5."

Footnote


1 As has been pointed out by many scholars as well as activists (including the American Civil Liberties Union), recognition software is developed on social media platforms for purposes of surveillance, not accessibility—though the companies running the platforms generally claim only altruistic intentions.

Read by Margaret. Audio length: 8:17.

When a description is delivered as an audio file, the words are read by an entity that we usually call a "voice."2 The voice may be a live person's, or may be computer-generated. Embodiment is bundled into that voice: Markers of gender, race, region, ethnicity, age, and a thousand other factors reverberate through voice, and are interpreted by those receiving the voice in a thousand different ways (Bates et al., 2019; Kleege, 2016). Therefore, description always adds to the embodiment of a text, whether the voice is "heard" as male, female, computer-generated, white, American, or otherwise inflected. Furthermore, such "hearing" might be literal, or when reading AD in textual form, metaphorical. In summary—a supposedly neutral description is not neutral at all, but a reification of certain accents, cadences, dialects, and other embodied cues.

Guidelines for AD discuss the implications of voice at length. For example, Louise Fryer's (2016) An Introduction to Audio Description: A Practical Guide included a full chapter on "delivery," which is discussed in terms of speech studies and linguistics. Fryer identified aspects of delivery including "stress, pitch, tempo, dynamic range and, especially, the way the words are segmented" (p. 87), then moved on to issues including accent, gender, and fluency (pp. 88–96). This "practical guide" did not dwell on issues of identity or power, but its abundant examples made clear that those issues are always present in AD:

The AD of the animated children's film The Incredibles (Bird, 2004) is voiced at first by a man. It suddenly switches to a woman's voice and the gender of the voice alternates throughout the film. It gradually becomes apparent that the male voice is used for the episodes in which the characters are shown in their superhero roles. The female voice is used for the episodes in which the characters are shown as their alter egos, i.e. their "normal" selves. (p. 89)

Fryer did not comment on the fact that the "male voice" has been used to denote super-ness, while the "female voice" has been used to denote ordinary-ness, aside from noting that it was not "politically correct" (p. 90). However, even a passing comment on the political nature of that gendered choice has been unusual in discussions of AD. Meanwhile, because the practice of AD has been spreading so fast, new examples of describers' and artists' attention to embodiment in description has been constantly emerging.

A striking example can be found in Beyoncé's (2019) recent documentary Homecoming, distributed by Netflix. When AD is enabled for this film, its specificity and word choices—for example, "a woman of African descent" and "a Nefertiti-inspired crown"—cue the audience to center the perspective of Beyoncé in particular, and Black women in general. Other, subtler cues—for instance, the describer's timbre, pitch, emphasis, and other aural qualities—may also add to the de-centering of a more typical describer's voice, which tends to re-center whiteness by using linguistic and aural features from "standardized English" (Gilyard, 2000). In our account of the AD for Homecoming, we avoid engaging in what John Baugh (2003) has called "linguistic profiling" (p. 155), that is, a simplistic assumption that a voice "sounds Black," "sounds white," or in some other way signals race definitively. Rather, we focus on the relationship between AD's embodied qualities and the film's intended audience. In doing so, we draw upon Baugh's (2003) point that voice, in print or in audio form, might be subject to discriminatory profiling, but might also emphasize the importance of "linguistic enclaves" that "evoke solidarity among their speakers" (p. 163).

Homecoming is for and about Black audiences and pays particular attention to the history and culture of Historically Black Colleges and Universities (HBCUs). As Tamara Winfrey Harris (2019) wrote, the documentary's cameras "bypass confused white faces to seek out Black festivalgoers brimming with elated recognition and grateful acknowledgement. [Beyoncé] offered no roadmap for the lost and no instructions for the confused. Beyoncé performed for the folks who get it—Black folks." This emphasis on delivering a work about, by, and for Black people continues through the embodiment of the film's audio describer. This is a phenomenon Amanda Nell Edgar (2019) has called vocal intimacy, or "the voice's ability to create physiological and affective relationships between speaker and listener . . . centraliz[ing] relationships of familiarity" (p. 4) Following Baugh and Edgar, we note that vocal qualities can be an important source of solidarity for minoritized speakers, including those who speak Black English (see also Ball & Lardner, 2005; Jordan, 1988; Perryman-Clark, 2013; Richardson, 2003; Smitherman, 1977, 2006). Homecoming is a striking example of AD because it demonstrates the potential power of vocal intimacy—especially when that intimacy is not primarily attuned to white audiences.3

Embodiment attaches not only to the linguistic qualities of a describer's voice but also to their location vis-a-vis the text they are describing. Are they a coauthor, a technology, part of a text's style, all of the above? In what ways does that describer's identity, either self-identified or as "read" by the audience, bear upon the meaning of the text as delivered? Further, what about their identity as a professional? On July 2, 2019, the American Council of the Blind (ACB) announced Netflix's decision to begin identifying the professionals who describe and narrate their streaming content. These professionals, ACB reported, "are proud of their work and like to be recognized... And this is something many viewers of the videos have requested, too." As an analytical dimension, then, embodiment indicates not only the features of a describer's voice that imply embodiment, but also the fact that a describer exists (or once existed) materially, even if the describing voice is machine-generated.

For an example of "embodiment" in audio description, go to "Story Moment 1."

Footnotes


2 For the purposes of this webtext, we acknowledge but do not address in detail the vast literature on "voice" in written texts (see, for example, Peter Elbow's 1995 Landmark Essays in Voice and Writing).

3 Unfortunately, much more often, the issue for Black-centered works with AD is a lack of vocal intimacy. In his podcast episodecodd on Black Panther, Thomas Reid (2018) noted that "For those of us watching with Audio Description, well the vibe wasn't the same. Trying to remain in the dream nation of Wakanda was impossible when we're being shaken awake by the narrator who by all accounts was a British White man."

Read by Margaret. Audio length: 4:18.

The frame of a story, in narrative terms, is the world delineated by that story. John Gardner (1991) referred to it as "the dream"—the "rich and vivid play in the mind" (p. 31) that allows the reader to invest in the story. Investing doesn't just mean being interested—it means, according to Gardner, identifying with the story to the extent that it "helps us to know what we believe, reinforces those qualities that are noblest in us, leads us to feel uneasy about our faults and limitations" (p. 31). And taking the reader outside that dream, Gardner has argued, is "one of the chief mistakes a writer can make" (pp. 31–32). Gardner was referring mainly to realistic fiction, assuming the goal of not breaking the frame, but any number of narrative moves—an actor in a fictional play suddenly addressing the audience, for example—deliberately disrupts a narrative frame in order to enrich an artistic work or production.

Frame is likewise a familiar concept in folklore studies via the work of Erving Goffman (1974), who used "frame" to refer to basic elements of organization through which social groups recognize and define a situation, and "frame analysis" to "refer to the examination on these terms of the organization of experience" (pp. 10–11). Narrative terms such as "intradiegetic" and "extradiegetic" refer to the movement of characters, knowledge, or artifacts (such as music) inside or outside the narrative's frame (Genette, 1980, p. 228). In summary, frames mark often-porous boundaries between different levels of (a) text(s) and, when recognized and shared, implicitly determine the scope of content relevant to a particular set of experiences.

In their 2017 article "Centering Disability in Qualitative Interviewing," Stephanie L. Kerschbaum and Margaret Price discussed the significance of framing as it pertains to videotaped interviews. Their argument, ultimately, is that finding the "best" frame for a particular situation (in that case, a semi-structured interview) must be participatory in nature, taking into account the needs of those who will appear within the frame as well as those who will be viewing the video. Applying their argument to AD, we are confronted with the following questions: What counts as the frame of a description? What audiences are invoked or imagined through the construction of that frame, and how is the frame shaped not only through the words of the description itself, but by the constraints of tools and arrangement?

As with all other dimensions of AD, frames must be considered with reference to power and privilege. A description that wanders far outside its immediately visual frame might be charming for those scanning the text visually, but overly time-consuming for those scanning aurally. Correspondingly, a deliberately limited frame might privilege users who already have a store of contextual knowledge about a particular image or video, placing those with less contextual knowledge metaphorically in the dark. There is no correct way to frame and re-frame an audio description; again, composers must come to their practices with attention to rhetorical questions of purpose, audience, tools, and circulation in mind. Julie Collins Bates, Francis McCarthy, and Sarah Warren-Riley (2019) have argued that "Access is never simply about tools or even information availability. It is complicated by lived, embodied experiences and is always a product of the power imbalances that are already in place in everyday society." We follow their call, and that of Annette Harris Powell (2007), in understanding the composition of access as a practice rather than an event.

For an example of "frame" in audio description, go to "Story Moment 3."

Read by Margaret. Audio length: 6:09.

We draw the term "thickness" from Clifford Geertz (1973), who built upon philosopher Gilbert Ryle's work to form a theory in anthropological ethnography. Geertz emphasized that, when used by an ethnographer, "thick description" doesn't just mean detailed description; rather, it's an inevitably interpretive act, "like trying to read (in the sense of 'construct a reading of') a manuscript" (p. 10). Norman Denzin (1989) later expanded the concept to argue that thick description can be applied to a variety of qualitative methods. Denzin's definition has been particularly useful for our consideration of AD, because it noted—though implicitly—that thick description always involves relations of power:

[Thick description] presents detail, context, emotion, and the webs of social relationships that join persons to one another. Thick description evokes emotionality and self-feelings. It inserts history into experience. It establishes the significance of an experience, or the sequence of events, for the person or persons in question. In thick description, the voices, feelings, actions, and meanings of interacting individuals are heard. (p. 83)

From this quotation, we particularly note Denzin's reference to "webs of social relationships," "history," and "the significance of an experience." These references indicate that a thick description isn't just a longer description, but is also attuned to the ways that power differences may be manifested, exacerbated, or contested.

Technical guides, such as Description Key (Described and Captioned Media Program, 2021), often recommend that descriptions should be neutral: "Describe objectively, without interpretation, censorship, or comment" ("How to Describe" page) This guideline reflects a reasonable awareness that users may not want elaborate descriptions, especially when watching a fast-paced movie or attempting to scan a web page using a screen reader (see Tools). However, in their effort to make the purpose of description understandable to a lay audience, technical guides often imply that "objective" interpretation is possible. We argue that all descriptions, including ones meant to be objective, are still laden with assumptions and markers of embodiment; they simply encode those biases within a presumed "view from nowhere" (Nagel, 1986).

Some thickness decisions are relatively easy to make. For instance, an alt tag is always short because of genre conventions. Like a hashtag or a title, it gains its value from its brevity (though genre-busting exceptions occasionally pop up). By contrast, a description of a still image written on Facebook for a close circle of friends might be much thicker, containing not only visual details but also details that go significantly outside the frame of the image itself. As with all the other dimensions discussed in this webtext, consideration of rhetorical factors including audience, purpose, and context is crucial to thoughtful use of thickness in description.

As rhetoricians, we are accustomed to reeling off those considerations—Audience! Purpose! Context!—but we must remember that the decisions about how to balance those factors are not easy to make. They have material consequences. For example, Margaret recently designed a simple website that included a few images. She wrote alt text for each image, relying on the preferences of some of her blind friends, and also following AD industry standards as she knew them. Her motivation was to ensure that visitors to her site wouldn't miss anything. However, the site was then reviewed by another blind user, an accessibility coordinator for her university, and he asked her a question: "Are these images decorative?" That led to a discussion about what "decorative" meant in this context—a discussion that specifically referred to Zdenek's (2011) ideas about significant captions, which should "contribute to the purpose of scene." Ultimately, Margaret concluded that most of her images were, at least according to this user, decorative. "If they're decorative, don't include any description at all, not even an alt tag," her colleague advised. "It will waste time [for people using screen readers]."

Note the material consequences being weighed here. Margaret was concerned about users missing the experience of the images—one kind of cost—while her colleague was concerned about another kind of cost: time. In this instance, Margaret followed her colleague's advice. Her decision was carefully considered, yet carried inevitable material consequences.

Although it is possible, at times, to offer readers a choice between different levels of thickness (the strategy we use in our illustrative examples), it's not possible to do that without ringing up other costs, such as time and labor. It's also not possible to offer those different experiences equitably unless all users have a clear sense of exactly how the text or platform works, which is rarely the case. We—Erin and Margaret—struggled with the many compromises involved in designing this webtext. We hope that the layers of access we've built will work for all readers, yet we know at the same time that accessible design is never a done deal; it is, as Jay Dolmage (2008) has memorably reminded us, "a form of hope, a manner of trying" (p. 24).

For an example of "thickness" in audio description, go to "Story Moment 6."




Read by Margaret. Audio length: 10:37.

For this project, we define "tools" as designed technologies used to fulfill a purpose—in the present discussion, creating audio description (AD). Tools are never purely functional or neutral, but are always embedded in complex social, political, cultural, and experiential contexts (Feenberg, 1999; Heidegger, 1977; Melonçon, 2013; Selfe & Selfe, 1994). Feenberg (1999) in particular has called attention to the ways in which people encounter technologies through their lived experiences:

Lifeworld meanings experienced by subordinate actors are eventually embodied in technological designs; at any given stage in its development, a device will express a range of these meanings gathered not from 'technical rationality' but from past practices of its users. Technology as a total phenomenon thus must include an experiential dimension since experience with devices influences the evolution of their design. (p. xii)

In other words, tools are always produced, used, and understood through experiences—making it especially important to consider adaptive tools' affordances and constraints through the lens of disabled peoples' lived experiences.

In their study of universal design, Aimi Hamraie (2017) noted that specific tools "demand specific types of bodies" and thus constrain our ways of knowing–making (p. 48). Knowing–making, a term coined by Hamraie, is the discursive–material process through which users change the material features of their environment—for example, by smashing a curb cut into a curb, or by hacking a wheelchair so it can do things not originally intended by the designer. When people engage in knowing-making, they not only re-design objects, but also change "the terms of legibility and illegibility in relation to liberal inclusion or economic citizenship" (Hamraie, 2017, p. 17). In other words, knowing–making recognizes that when something is made a particular way, it both shapes and is shaped by the knowledge (including the lived experience) brought to it. Thus, most doors in U.S. public space assume that the person who wishes to pass through has strong, agile hands and arms unencumbered by crutches, bundles, or a stroller—unless some other primary consideration is at work. Think about public spaces in the United States that almost always have automatic-opening doors (grocery stores, large retail stores, hospitals, airports) and those that almost always do not (schools, banks, restaurants). What histories of knowing-making cause some of those doors to open invitingly, as if they see you coming, and others to remain closed, guarded by handles several feet off the ground that must be grasped and squeezed, then pulled or pushed with force?

In all instances of AD, it matters what kinds of tools are made available by the designers of platforms or interfaces. It matters how users take up, work around, adapt, or remain unaware of those tools. It matters that affordances and limitations guide every choice made within a given system—and it's important to recognize that choice is not the only factor at work as meaning takes shape as part of an infrastructure.

Twitter was one of the first social media platforms to provide a feature intended specifically for user-composed image descriptions. The feature is accompanied by detailed information about compatibility with iOS, Android, VoiceOver, Talkback, JAWS, and NVDA (Edwards-Onoro, 2016). Originally, the tool was accompanied by the prompt, "Describe this image for the visually impaired," but Twitter has since revised its language, suggesting a more expansive audience (and getting rid of "visually impaired"): "You can add a description, sometimes called alt-text, to your photos so they're accessible to even more people, including people who are blind or have low vision." Instagram, on the other hand, sticks with the narrower audience and the archaic language: "Alt text describes your photos for people with visual impairments."

A wealth of assumptions is packed into that brief sentence. First, it assumes that alt text is written "for"—on behalf of—a certain group of users. Second, it assumes that the medicalized term "impairment" is preferable to the term used by most blind and low-vision people, namely, blind (Omvig, 2009). Third, it assumes that only blind and low-vision users will need alt text. But that assumption, as discussed in our section on frame, overlooks the benefits that description may offer to all audiences4: additional information, deeper context, or the ability to continue engaging with images despite low bandwidth or platform failure.

Platform failure struck Facebook and Instagram on July 3, 2019. Photos would not load, so users found themselves reading the alt text that had been auto-generated by each platform's image-recognition tool. Although popular media played up the "eerie" accuracy of the descriptions (Morrison, 2019), in fact most of these descriptions were quite limited. But July 3, 2019 was also a telling moment in terms of the rhetoric of description. Alt text, usually unnoticed by sighted readers, was suddenly front and center, and most people had no idea what it was or how it should be used. On that day, one of Margaret's friends posted, "Is anyone else having trouble loading images?" and she responded (in a comment) by posting one of her own photos, which did not appear as an image, but rather as a blank frame overlaid with the alt text she'd written to describe the image. Her friend responded rather frantically, "I still don't see anything, just some words." For a brief moment on that day, Georgina Kleege's (2005) point that there are as many ways to be blind as there are to be sighted became . . . visible? Margaret's friend was indeed seeing something, but the replacement of the visual photo with its alt-text description caused him to believe he couldn't see anything.

In our Conclusion, we suggest an exercise that encourages students to examine a particular description tool, unpacking how it works, what assumptions it seems to make, and how it might be taken up (or not) in everyday practice. Instagram, for instance, provided a fascinating opportunity. As Twitter originally did, Instagram has buried its alt-text tool deep within Settings; however, Instagram has intensified the rhetorical firewall by forcing users to click on "Advanced Settings" in order to find the tool in the first place. (We surmise that some users may avoid the "Advanced Settings" button as a matter of principle, fearing that the content housed there will either be impossible to understand, or will make unwanted changes that cannot be undone. We also note that the words "Advanced Settings" are for some reason smaller than the other items in the same menu). Once the tool is located, users can write alt text, but Instagram hasn't offered any explanation of how alt text is different from the "Write a caption" field that is offered as a default feature for each upload. Nor has it explained what sort of tool (such as a screen reader) might be used to access the alt text. Thus, Instagram users who wish to describe their images may find themselves struggling to make sense of the features available to them—and, perhaps more importantly, struggling to understand why they would avail themselves of any of these features in the first place.

Tools built into platforms are one part of this dimension; another important part is the tools used to access various interfaces, including desktops, laptops, mobile and wearable technologies, browsers, operating systems, and screen readers. Screen readers alone have provided a rich area of study. Melissa Helquist (2015), a rhetoric and composition scholar specializing in access, argued that scanning a web page with a screen reader is both time-consuming and, if the page is not composed carefully, extremely confusing—a point thas has also been forwarded by Meredith Ringel Morris et al. (2016) as well as Violeta Voykinskaet al. (2016). Another example has received a fair amount of attention on Twitter: If hashtags are not written in camel case with each new word capitalized, a screen reader will read something unintelligible—and again, time-consuming—rather than parsing out each word. (Try to pronounce "GotToBeMe" as one word—gottobeme.) Finally, yet another area for potential research is the range of applications such as TapTapSee, which uses automated object recognition, and Be My Eyes, which connects a device user with a sighted volunteer who provides a live description of whatever image the device user sends via the app.

For an example of tools in audio description, go to "Story Moment 4."

Footnotes


4 Although outside the scope of this webtext, it's important to note that the this benefits everyone argument has been debated within disability communities and disability studies. Jay Dolmage (2015) has written about the danger of "interest convergence": it may advance the idea that conditions should change "for minorities only when the changes can be seen (and promoted) as positive for the majority group as well."