Transform Speech to Text with Python and AWS

Robert Taylor July 10, 2023

We all have moments when the flow of ideas comes best through speech rather than typing. I find this is especially true when journaling, brainstorming, or working on a complex problem. As a result, I often prefer to dictate my thoughts whenever possible. Not only can it be more efficient — as it allows me to lay down ideas quickly while potentially doing something else at the same time — but also it can be very relaxing.

However, audio files containing speech data are not the easiest to integrate into a work flow. The dictations must first be transformed from spoken words into written text before they can become useful to future me. This has led me to develop a humble yet practical Python tool that leverages the AWS Transcribe service to automate a transcription process as much as possible. The tool works by first pushing my recorded speech files to an AWS S3 bucket in the cloud from my computer, then triggering one or more transcription jobs, and finally pulling down the resulting text transcriptions once the jobs are complete.

It’s not a groundbreaking tool, but its simplicity and ease of use have made my day-to-day dictation transcription process much more efficient. An added benefit is that it keeps me in control of my own data. I’m making the tool open-source in the hope that others find it useful too. Find the source code, installation instructions, and detailed usage info here.

Basic Usage

Before the tool can fully operate, you must provide it with your own AWS S3 bucket. AWS Transcribe requires a bucket for temporarily storing the input-output files during processing. Check out the project’s README for more info about how to configure the S3 bucket.

Once you have the bucket ready, the Python tool needs to know where to find it. This is where a config file comes into play. At a minimum, the config file should at least specify the name of your bucket. Here’s an example (it can be called anything, but let’s assume it’s named transcriber.conf):

[bucket]
bucket_name = my-transcription-bucket

The Python tool operates in two primary modes: push and pull. In the push mode, the tool uploads your audio files from your local directory to the specified S3 bucket for transcription. Conversely, in pull mode, it fetches the transcribed text files from the S3 bucket back to your local directory after the transcription is completed. In both cases, you must provide the tool with the config file via the -c or --config option, followed by the path to your configuration file.

To push your speech files for transcription, do:

transcribe.py -c transcriber.conf push

Once the transcriptions are complete (this can take anywhere from a few seconds to a couple minutes), pull the text files down with:

transcribe.py -c transcriber.conf pull

What’s convenient about the tool is its smart handling of files. If you attempt to push an audio file that already exists in the S3 bucket or pull a transcription that’s already in your local directory, the tool will skip these files, preventing unnecessary duplication.

By default, the tool looks for your audio files in a folder named “audio” and saves the transcriptions in a “transcripts” folder. Both folders are assumed to be in your current working directory. However, you might want to customize these paths based on your preferences. Don’t worry, the configuration file allows you to override these default paths.

For more detailed information on how to use the tool and customize its settings, I recommend checking out the project’s README. This tool is designed to be flexible and adapt to your workflow, making the transcription process as seamless as possible. Happy transcribing!


GET IN TOUCH

Have a project in mind?

Reach out directly to hello@humaticlabs.com or use the contact form.

HUMATIC LABS LLC

All rights reserved