speech-to-text

Here’s how to generate transcripts with the Google Speech to Text api.

I’m providing instructions for running this on a micro Debian instance on Google Cloud Compute Engine.

Here’s how to download an interview Tyler Cowen did with Malcolm Gladwell.

1) Install YDL (The Youtube Download Library) for Commandline

The apt-get option seems to have an outdated version of ydl: (I’m using Debian)

Alternate YDL Option:

If you can’t run youtube-dl because debian has on old version, try:

sudo apt install python3-pip

sudo pip3 install --upgrade youtube-dl

2) Get the FFMeg library

sudo apt-get update

sudo apt-get install ffmpeg

3) Running the YDL Download Command

youtube-dl --extract-audio --audio-format wav --audio-quality 5 --postprocessor-args "-ac 1" https://www.youtube.com/watch?v=ehlhrqSWPbo > malcolm_gladwell.wav

4) Copy .wav file to Storage Bucket

install gcsfuse to mount bucket

gsutil cp malcolm_gladwell.wav gs://bucket_name/

5) Upload filename.wav to Speech-text API

gcloud ml speech recognize-long-running \
'gs://citeit_speech_text/malcolm-gladwell.wav' \
--include-word-time-offsets \
--language-code='en-US' \
--async

Check Job Progress

config = types.RecognitionConfig(
sample_rate_hertz=44100,
enable_word_time_offsets=True,
audio_channel_count=2,
language_code='en-US'

6) Download transcript.json

sudo pip3 install --upgrade webvtt-py

import webvtt

for caption in webvtt.read(downloads/5BXtgq0Nhsc.en.vtt'):
    print(caption.text)

7) Create timings.json

8) Create a Demo Page

Draft Algorithm to match edited transcript to auto-generated transcript with timings (view Developer Mode Console for console log)