Here’s how to generate transcripts with the Google Speech to Text api.
I’m providing instructions for running this on a micro Debian instance on Google Cloud Compute Engine.
Here’s how to download an interview Tyler Cowen did with Malcolm Gladwell.
1) Install YDL (The Youtube Download Library) for Commandline
The apt-get option seems to have an outdated version of ydl: (I’m using Debian)
Alternate YDL Option:
If you can’t run youtube-dl because debian has on old version, try:
sudo apt install python3-pip
sudo pip3 install --upgrade youtube-dl
2) Get the FFMeg library
sudo apt-get update
sudo apt-get install ffmpeg
3) Running the YDL Download Command
youtube-dl --extract-audio --audio-format wav --audio-quality 5 --postprocessor-args "-ac 1" https://www.youtube.com/watch?v=ehlhrqSWPbo > malcolm_gladwell.wav
4) Copy .wav file to Storage Bucket
install gcsfuse to mount bucket
gcsfuse myBucket ~/path/to/mount
gsutil cp malcolm_gladwell.wav gs://bucket_name/
5) Upload filename.wav to Speech-text API
export GOOGLE_APPLICATION_CREDENTIALS=<i>~/.google-credentials.json</i>
gcloud ml speech recognize-long-running \
'gs://citeit_speech_text/malcolm-gladwell.wav' \
--include-word-time-offsets \
--language-code='en-US' \
--async
Check Job Progress
gcloud ml speech operations describe 2046110510732500677
Config Options
config = types.RecognitionConfig(
sample_rate_hertz=44100,
enable_word_time_offsets=True,
audio_channel_count=2,
language_code='en-US'
6) Download transcript.json
gcloud ml speech operations describe 2046110510732500677 > malcolm-gladwell-transcript.json
6a) Download Youtube Auto Generated transcript:
sudo pip3 install --upgrade webvtt-py
import webvtt
for caption in webvtt.read(downloads/5BXtgq0Nhsc.en.vtt'):
print(caption.text)
7) Create timings.json
8) Create a Demo Page
- Draft Algorithm to match edited transcript to auto-generated transcript with timings (view Developer Mode Console for console log)