Challenges in Transcription

For someone who is unfamiliar with transcription, a simple definition would be to write out all the words that one hears from an audio file. These files may be in mp3, wav, avi or any of the other popular formats.

Thus, to the uninitiated, transcription may seem like an extremely easy task. Just type out what is heard and we are done. Simple, correct? Unfortunately, the work of the transcriber is nowhere near that easy. There are many possible reasons why that is so, and we will explore some of them below.

In the best-case scenario, all the speakers speak slowly, clearly, fluently and in a relaxed manner. The transcriber can then type all that he hears into a word file and use time stamps to indicate when he hears certain sentences. However, real-life is much more complicated and less ideal.

Firstly, many recordings are not made in quiet environments. Imagine trying to hear someone speak in a crowd, or on a street where there is heavy traffic. The transcriber really has a hard time figuring out what is being said in these circumstances, and may need to listen to the file many times. Sometimes, he simply cannot make out what is being said despite his best efforts and can only mark some parts as inaudible.

Secondly, not everyone speaks in a clear and neutral accent. Many people are influenced by their native language and also the way in which people in their locations speak. Thus, it may be difficult to make out what someone with a heavy accent is talking about even when he speaks in English. This is because we are accustomed to how a language sounds. If the person deviates too much from this expectation, it makes the transcriber’s job really difficult, because he has to spend time deciphering what the content is about.

Finally, the speed and tone of the speaker also affect the quality of transcription. When people are angry, upset or excited, they usually speak much faster and also in a higher tone. Being emotionally affected, their thoughts and expressions may become jumbled up, and so what is expressed may be very confusing to the listener or transcriber.

