Python: Get YouTube Video Transcript From URL For Use In Generative AI And RAG Summarization

There are times when it’s necessary to download the text transcript of a YouTube video using Python. This can be done with the help of the youtube_transcript_api library. Getting access to the text content of YouTube videos can be extremely helpful when using Generative AI and Retrieval Augmented Generation (RAG) to create summarization, analysis, or using the transcript for additional context to your AI prompts.

Step-by-Step Guide to Extract YouTube Transcript

Step 1: Install the Required Library

First, install the youtube_transcript_api library using pip:

pip install youtube_transcript_api

Step 2: Python Code to Get Transcript from URL

Here’s the Python script you can use:

from youtube_transcript_api import YouTubeTranscriptApi

# Example YouTube URL
url = 'https://www.youtube.com/watch?v=eukiu-9o-08'
print(url)

# Extracting the Video ID
video_id = url.replace('https://www.youtube.com/watch?v=', '')
print(video_id)

# Fetching the transcript
transcript = YouTubeTranscriptApi.get_transcript(video_id)

# Output the full raw transcript content (includes time and duration info)
print('FULL TRANSCRIPT:')
print(transcript)

# Converting the transcript into readable text
transcript_text=''
for x in transcript:
  text = x['text']
  transcript_text += f' {text}\n'

# Output just the text of the transcript
print('TRANSCRIPT TEXT:')
print(transcript_text)

Understanding the Video ID

YouTube assigns each video a unique identifier called the video ID, typically found in the URL after watch?v=. For instance, in the URL https://www.youtube.com/watch?v=2N-rwsa5lEw, the video ID is 2N-rwsa5lEw. The Python script extracts this ID to specifically fetch the correct transcript.

What Does the Transcript Look Like?

The transcript retrieved via the youtube_transcript_api is initially structured as a list of dictionaries. Each dictionary contains:

text: The spoken words in the video.
start: When the sentence starts in seconds.
duration: Duration of each segment.

To utilize the transcript effectively, we extract only the text component and format it into a clean, readable format.

Here’s a snippet of part of the raw transcript of a video:

[{'text': 'hi everyone welcome back to build 59', 'start': 0.16, 'duration': 5.56},
{'text': 'your source for cloud devops and AI', 'start': 2.639, 'duration': 4.921}, 
{'text': "Technologies if you're new here my name", 'start': 5.72, 'duration': 3.4}, 
{'text': "is Chris peachman I'm a longtime", 'start': 7.56, 'duration': 4.84},
{'text': 'Microsoft MVP Hashi Ambassador and', 'start': 9.12, 'duration': 5.88},
{'text': 'Microsoft certified trainer on this', 'start': 12.4, 'duration': 5.0},
{'text': 'channel we dive into best practices', 'start': 15.0, 'duration': 4.88},
{'text': 'tools and Frameworks to supercharge your', 'start': 17.4, 'duration': 5.56}, 
{'text': 'development workflows devops processes', 'start': 19.88, 'duration': 6.0}, 
{'text': "and things in the cloud don't forget to", 'start': 22.96, 'duration': 4.399},

How Can YouTube Transcripts Support Generative AI and RAG?

Using YouTube transcripts as textual data opens up numerous opportunities in the realm of Generative AI. Transcripts provide structured textual content that can enrich various AI-driven applications, especially those employing Retrieval Augmented Generation (RAG). In this section, we explore the multiple ways YouTube transcripts can significantly enhance Generative AI workflows, facilitating deeper engagement and more accurate AI-generated outputs.

Generative AI Applications

Extracting transcripts can immensely support Generative AI models. By feeding these transcripts into Large Language Models (LLMs), you can:

Summarize lengthy videos.
Generate detailed analysis or insights.
Create question-answering interfaces based on video content.

Retrieval Augmented Generation (RAG)

In the Retrieval Augmented Generation (RAG) pattern using in Generative AI solutions, external content (such as video transcripts) is embedded into a vector database. When users query a Generative AI application, relevant transcript segments are retrieved and fed into the model, significantly improving the accuracy and relevance of AI-generated responses.

Additional Uses of Transcript Text

Using the additional context from a YouTube video transcript isn’t just limited to Generative AI and RAG applications. There are multiple practical scenarios where having the transcript text available can be highly beneficial. Here are a few other impactful ways to leverage transcript data:

SEO Content Creation: Convert popular video transcripts into blog posts or social media content.
Accessibility: Create captions or subtitles, enhancing accessibility for hearing-impaired audiences.
Educational Purposes: Easily convert educational video content into text-based study materials.

Conclusion

Leveraging Python to extract YouTube transcripts is a powerful technique, offering tremendous value for developers and content creators alike. Whether you’re developing Generative AI-powered applications, optimizing content for SEO, or improving accessibility, this straightforward method unlocks a wealth of possibilities from any YouTube video.