Here’s how to generate transcripts with the Google Speech to Text api.
I’m providing instructions for running this on a micro Debian instance on Google Cloud Compute Engine.
Here’s how to download an interview Tyler Cowen did with Malcolm Gladwell.
1) Install YDL (The Youtube Download Library) for Commandline
The apt-get option seems to have an outdated version of ydl: (I’m using Debian)
Alternate YDL Option:
If you can’t run youtube-dl because debian has on old version, try:
sudo apt install python3-pip
sudo pip3 install --upgrade youtube-dl
2) Get the FFMeg library
sudo apt-get update
sudo apt-get install ffmpeg
3) Running the YDL Download Command
youtube-dl --extract-audio --audio-format wav --audio-quality 5 --postprocessor-args "-ac 1" https://www.youtube.com/watch?v=ehlhrqSWPbo > malcolm_gladwell.wav
4) Copy .wav file to Storage Bucket
install gcsfuse to mount bucket
gsutil cp malcolm_gladwell.wav gs://bucket_name/
5) Upload filename.wav to Speech-text API
gcloud ml speech recognize-long-running \
'gs://citeit_speech_text/malcolm-gladwell.wav' \
--include-word-time-offsets \
--language-code='en-US' \
--async
Check Job Progress
config = types.RecognitionConfig(
sample_rate_hertz=44100,
enable_word_time_offsets=True,
audio_channel_count=2,
language_code='en-US'
6) Download transcript.json
sudo pip3 install --upgrade webvtt-py
import webvtt
for caption in webvtt.read(downloads/5BXtgq0Nhsc.en.vtt'):
print(caption.text)
7) Create timings.json
8) Create a Demo Page
- Draft Algorithm to match edited transcript to auto-generated transcript with timings (view Developer Mode Console for console log)