Recording Transcription Solutions on a Shoestring Budget: The Good, The Bad and The Ugly
Transcribing interview recordings is always a drag. You want that magical unicorn in the middle of the management triangle that combines quality, speed, and affordability, but two out of three is realistically the best you can hope for.
If you have the budget for it, formal transcription services are great - you send them the audio file, and they send you the finished product, oftentimes in less than 24 hours. But the costs add up quickly if you’ve got more than a handful of interview recordings
So, what's a diligent social scientist to do with the endless hours of audio that needs to be turned into text?
Our time at the 2021 AAA conference late last year drove this point home repeatedly. Several visitors to our booth broached the topic, asking:
“…so what are you guys using for audio transcription?”
There are no perfect solutions, but I was fascinated by the number of different hacks and workarounds that our booth visitors had devised to address their transcription needs.
Solutions generally fell into two categories: Direct conversion of audio files into text files, and real-time speech-to-text workarounds (i.e. literally just playing the audio recording into a speech-to-text converter). A third option - “get an intern/grad student to do it” – isn’t really viable for many of us (but if you can, you better be paying them a fair wage).
Best practices:
Remember that whatever you’re going to do with your interview recordings, the quality of the final product is dependent on several factors. Aside from asking people to speak a little more slowly, the cadence of natural speech is something you can’t (and shouldn’t) try to affect. You can, however, have some influence over the quality of the recording. I’ve written elsewhere about the importance of relying on a dedicated voice recorder but stay tuned for some upcoming blogs and videos with some strategies and suggestions on how to improve audio quality with your equipment selection, recorder placement, and location choices.
When it comes to speed, digital conversion of audio files into text files is at the top of the list, but cost and quality can vary widely. Even in a best-case scenario, somebody will still have to clean up the transcripts and correct errors from poor audio, unconventional pronunciation, accents, and specialized vocabulary. Still, it costs less than paying a transcriptionist, so if you’re committed to doing it on the cheap you generally have three options:
The Good: Microsoft Word’s transcribe function
The Bad: Older versions of Dragon Naturally Speaking transcription software.
The Ugly: YouTube’s auto-caption feature
The Good: Microsoft Word transcription
Microsoft’s word’s “Dictate” function is relatively new, but in my experience it’s decent, though far from perfect. What many people might not realize, though, is that it also has the capability to do audio file to text file conversion. The process for file transcription is straightforward, and a walk-through can be found here:
The nice thing about using this conversion function is that the resulting transcription document displays in tandem with a media player that is cued up with the original audio; this is extremely helpful when it comes to cleaning up the final text. This function is included in the cost of Microsoft Word, as long as you’re already signed into a Microsoft 365 account. There are some critical limitations: file conversion is capped at 5 hours per month for uploaded recordings and the individual audio files must be smaller than 200mb. There doesn’t appear to be any way to purchase more time beyond the 5-hour max, but you’ve probably got a few teammates or friends who aren’t using this feature on their own Microsoft 365 accounts, so with a little social capital you should be able to easily double or triple your monthly conversion capabilities. The other big downside is that Microsoft Word’s transcription functionality doesn’t support any languages other than English, which is unfortunate, but not exactly surprising.
The Bad: Dragon NaturallySpeaking
As part of my personal struggle with dysgraphia, I've been using several different versions of Nuance: Dragon NaturallySpeaking’s transcription software over the past 15 years. The current version of Dragon Professional does have the capability to convert audio files to text, but it’s $500, putting it well outside of what most of us pay for shoelaces. They do have a home version for a mere $200, but that doesn’t offer the file conversion capability that we’re discussing here.
That said, several of the older versions of Dragon NaturallySpeaking included functional audio file to text file conversion capability (I used it on versions 10 and 11 of the software). Many of these older versions can be found on eBay for less than ten dollars, sometimes still in their unopened packaging. With a little research beforehand on the specific version you’re buying (and a CD-ROM drive on your computer) you could theoretically convert an unlimited number of interview recordings into text files. Protip – there are LOTS of variations, but the “Home” versions are less likely to have the file conversion capability than the “Professional” versions).
So, what’s the downside? Well, some of the older versions were a bit buggy and the company doesn’t provide tech support or patch updates for them anymore. The real problem, however, is that the software is really intended to be used by a single speaker, and although it calibrates well to the voice of that primary user, it’s much less accurate for everyone else. If any of the people in the recording have any type of accent, you’ll probably spend a lot more time than you’d like cleaning up the finished product.
Using old versions of Dragon NaturallySpeaking software may seem like a creative solution to interview transcription on a tight budget, but what you save in money will be outweighed by costs in time and needless frustration.
The Ugly: YouTube Closed Captions
Probably one of the cleverest and cheapest solution that we heard from other social scientists was from a researcher who uses YouTube's automatic closed caption functionality to generate quick & dirty transcriptions of audio files. First, you create a video using a stationary image with your interview recording as the audio playing overtop of it. Thanks to YouTube’s content-accessibility policies, any video hosted on their website can be automatically automatically captioned with little more than a click of a button. The process sounds more complicated than it really is, you can find a walk through here.
Once auto-caption is turned on, you can then download the captioning as a text file using the process outlined here.
Although the closed captioning isn’t super-accurate, it’s instantaneous, it’s free, and there’s no limit on how many files you can convert in this fashion. Not only that, but because of the global ubiquity of YouTube, this automatic captioning can be done in more than a dozen different languages. At this time, that includes Dutch, English, French, German, Indonesian, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Turkish, and Vietnamese. More languages are added on a regular basis.
Epilogue: Better Shoestrings are Cheaper than you Think, try Otter.ai
Research happens in a variety of contexts, and a “shoestring budget” may mean any number of different things. The workarounds listed above all have their shortcomings, and more often than not the only way to compensate for their limitations is by spending more time cleaning up the final transcription text - and we all know that time is its own cost.
If you need transcription of more than four or five interview recordings, and you’re on a tight budget, skip the workarounds. Machine-learning audio transcription tools get better and cheaper every year.
Otter.ai is your best bet.
I’ve used their platform a few times; the quality is good and it’s easy to use. It’s English-only at this time, but the software is designed to handle numerous accents from English speakers around the world, and it also accommodates the use of specialized names and technical terminology. The price tag for the pro version is $12.99 for a month, or $99.99 for a full year.
The amount of hassle that this alleviates is worth way more than the cost of a few cups of coffee.
Uxeda is not affiliated with or compensated by otter.ai or any of the other products mentioned here. We’re just social science researchers like you, who are trying to do quality work in challenging environments.