In publica commoda

2.2.2 Audio description

People who cannot see or have low vision use audio description (AD), which can be played with the video. This is always necessary if important information is conveyed via the image and not spoken. Text overlays are also read out. This enables the entire video to be understood by listening. This gives rise to a new audio and possibly video track, which consists of the original soundtrack of the video and the spoken descriptions of the images (see also 2.1.2 Digital infrastructure and technical equipment).

<br />
Comparison of two bar charts of Covid case distribution in Augsburg and Stuttgart using scatter measures. The bar chart for Augsburg is listed on the left and the one for Stuttgart is opposite, on the right. In both charts, the X-axis shows the number of cases per week, ranging from 0 to 300. On the Y-axis, the characteristics of the case numbers are shown from 0 to 10. The characteristics of the case numbers are shown for both cities in the form of blue columns. The arithmetic mean of the number of cases is marked below the x-axis as a purple triangle and is 28 for Augsburg and 91 for Stuttgart. While the columns in Augsburg are relatively close to the mean (28) with values between 0 and 120, they are distributed across the entire width of the x-axis from 0 to 300 in Stuttgart. A red double-headed arrow is located horizontally above the blue columns in both diagrams. This shows the scatter of the values, in the measure of the standard deviation. In the Augsburg diagram, the red arrow extends from 0 to 120 due to the smaller scatter and indicates a standard deviation of 33. In the Stuttgart diagram, the red arrow extends over the entire width of the X axis, so that the scatter of the values in Stuttgart, with a standard deviation of 100, is significantly greater than in Augsburg.
Diagram 6: Example of a complex graph on the chronological course of Covid case distribution in Augsburg and Stuttgart, which requires a lengthy audio description.

Consequently, AD is an alternative text for any visual information, which is read out. The Gut fürs Image! guide explains in simple but comprehensive terms what you should watch out for when preparing alternative texts. German public radio broadcasters’ standards for audio description should also be referred to. They do however need to be adapted in places to university teaching videos. The classic method of producing AD (insertion of AD in existing pauses in speech) is not really applicable to these teaching formats, as the regular pauses in speech are not normally long enough to explain what are sometimes complex illustrations and graphics.
There are various approaches: the one you choose for AD depends in part on the content of the video and the player you use.

  • Original soundtrack replaces AD: if all the visual information is conveyed by the original soundtrack, you do not need AD. There is an example here on YouTube.
  • Synchronous production of video and AD (audio track): if it is not possible to convey the image in speech, it can help to build in pauses. If there are enough pauses, brief AD can be added to the audio track. However, there must be sufficient pauses in speech in the primary audio for this. The Web Accessibility Initiative gives an example of such audio tracks.
  • Use of extended AD (video track): production of a separate video where the AD is added to the primary audio track. This should only be done when the video is incomprehensible without AD and the available pauses are not long enough. The Web Accessibility Initiative gives an example of an extended AD.
  • Audio introduction: an audio introduction is provided in addition to the videos as accessible text or audio file. This requires more time and cognitive effort from users than simply listening to the descriptive soundtrack.

The DaLele4All team has decided in favour of production of an extended AD (video track). This method enables us to avoid planning lengthy pauses and still allow sufficient time for describing the complex statistical graphs and tables. Therefore, during preparation, pauses for AD are noted in the speaker’s script, so that they can be added later. The AD are voiced in the rooms of the University of Göttingen’s video team, using their technology. The video team also took on post-production of the AD and added it in the marked pauses in the spoken video. We also decided to show the relevant presentation slide in full screen while the AD is running, to avoid showing the speaker in an awkward pose as it plays. Preparation and recording of the content for the seven AD in our first teaching video took far more time than the subsequent videos, which each needed two to four AD. It is not possible to give an average for production of each AD. The time taken depends heavily on the complexity of the images that require description, as well as on experience.

Producing the (extended) audio description

The spoken text and planned images are coordinated in the storyboard during the preparation phase. Unnecessary images are removed and references to simple ones are integrated in the spoken text. It is a good idea to discuss clearly the learning objective and/or the role of other illustrations in relation to the content you wish to convey. You should also consider their complexity critically and where necessary simplify them. In the next stage, AD are drafted for all the relevant visualised content that is not included in the speaker’s script. The AD neutrally describes the content of the images in relation to the educational context.
AD should only put into words what you can see in the graphics, it should not provide interpretations. Details that are visible but not relevant to the subject may be omitted. Users’ prior knowledge also has a major influence on the description. The material that has to be described precisely in the AD depends on what specialist information the illustration should convey, or the content which will be built on in subsequent teaching videos. The complexity and/or length of the description also depends on the method of integration. The draft should be written by someone with specialist knowledge, as it is hard for non-experts to judge what is actually relevant in an illustration. For this reason commissioning external service providers with AD is of limited use and gives rise to additional tasks to ensure technical accuracy.
Finally, the timings for the AD are defined. These should be set to ensure that the images have been described before the relevant content is addressed in the basic spoken text. The drafts should be checked by the team before recording. A process of multiple checks helps to finalise the draft, add essential but previously omitted details and find standardised and appropriate formulations. The draft should also be coordinated with potential users, to ensure that the content is accessible in this form and everything necessary is conveyed comprehensibly.
After this, the AD is voiced. In order to differentiate the AD from other parts of the video, it should not be read by the person who presents the video. The sound quality of AD voiced by another person should however be equally high standard. AD can be voiced by a non-expert, provided they have sufficient notes on specialist semantics and pronunciation (e.g. of formulas or concepts).
If you are recording the entire AD yourself, audio description tools can help. These enable you to view a video, choose/edit the timeframe and voice the description straight away.

  • With Frazier from VIDEO TO VOICE AD can be created using various synthetic voices (Text-to-Speech) and this produces a broadcast-ready video.
  • AD can also be produced using CADET from the National Center for Accessible Media.

    Recording AD involves the following tasks and checks:
    1. Production and agreement of the speaker’s script and the presentation slides with illustrations
    2. Choice of method (original sound replaces audio description, audio description in pauses in speech, extended audio description or audio introduction) and setting of timings for the AD in the video (minute/second)
    3. Production of draft audio description for chosen method
    4. Audio description checked and revised by team
    5. Test with users
    6. Recording the audio/video track
    7. Integration of audio description into the video in accordance with chosen method

    Practical tips

    An exchange between the teacher and the person who writes the AD is hugely important from the start. It can enable objectives, complexity and the need for illustrations to be discussed and considered in advance:

    • It is a good idea to consult routinely on whether the proposed images are necessary to convey the material, or whether they are merely decorative and have no real content. In line with TU Dresden we recommend: “Backgrounds and other graphic decorative elements and placeholders (...) that contain no relevant information do not need an alternative description. (...) You may consider removing (...) purely decorative images” (cf. Müller & Voegler, 2020:41, our translation).
    • If formulas, specialist terms or other specialist aspects occur in the presentation, it is a good idea to clarify them immediately in the spoken version. If this is omitted, you will have to explain any potential misunderstandings before creating the AD, or how specialist terms or formulas are pronounced, to avoid errors in the description or the need for later corrections.
    • Where there are images that are of minimal relevance, we have opted to include a brief mention in the spoken text in the video, to keep the cost of the AD down and avoid barriers from the first. For instance, in the Durchschnittsstatistik video [Average statistics], between 2:50 and 3:33 our teacher describes the map showing the territorial units of Germany together with the visual markings in his spoken text, making AD unnecessary here.
    • You should consider the complexity of the planned illustrations critically. What is shown and what elements are relevant to the subject? Illustrations can often be simplified to show just the relevant substance. Reduction makes the AD easier and shorter.
    • If you opt to use extended AD, the AD at the start of the teaching video should mention that this is the AD version of the video. For instance, we introduce our AD versions with this sentence: “Important visual images that are relevant to the content are given an audio description in the following.” Because of the use of AD in television broadcasts, people with visual impairments are used to AD starting immediately in pauses in speech. A voiced reference to the AD version can avoid confusion over whether it is the correct video version. If the spoken video version does not clearly give the session number, subject of the event and name of the teacher, this should be added as a note in the AD at the start.
    • If you opt to use extended AD, the points where AD is integrated should be chosen to allow the AD for an image to come before the basic spoken text on the content. This enables people with visual impairments to keep the AD in mind as they listen to references to the content in the image in the spoken text.
    • Extensive AD of complex tables is not advisable, because it is difficult to retain the content in your head. It is better here to give the subject and table headings in an AD and refer to the transcription for the specific values (see 2.2.5 Transcription). The transcription gives the table with all the values in an accessible form.