OpenAI Video Creator (1/6): Generating Video Ideas Using Web Scraping

Hey Finxters! Welcome to another exciting new course on how to create informational videos with OpenAI. This blog is part of a series where we’ll build a simple pipeline based on the OpenAI API to produce engaging video content. Today, we’ll dive into Step 1 of this process: Gathering Video Ideas effectively using Web Scraping.

Here’s a sneak peek at the entire series:

  • Step 1: Gather Video Ideas effectively using Web Scraping
  • Step 2: Create Compelling Video Scripts using OpenAI’s Vision API
  • Step 3: Generate Images for Videos using the OpenAI API
  • Step 4: Create Engaging Voice-overs using OpenAI TTS
  • Step 5: Combine all the elements to produce the final cut

The Building Blocks of an Informational Video Project

Before we delve into the process, let’s set the context. Our goal is to create fun, informational videos about axolotls, the fascinating “walking fish” that have captured the hearts of many with their adorable smiley faces and remarkable regenerative abilities. To craft an engaging and factual video script about these creatures, we’ll explore various blogs and articles instead of directly generating content through the OpenAI Playground. This approach ensures that the video script is rich with accurate and diverse insights..

In the first phase, we collect screenshots from different blogs to get a variety of ideas. In the next part, we will use OpenAI’s Vision API to analyze these images. This advanced tool doesn’t just read text; it interprets visual context, discerns themes, and synthesizes information. By feeding our curated screenshots into the Vision API, we’ll prompt it to craft a cohesive, compelling video script that distills the core messages from multiple sources into a single, engaging narrative. This approach bridges the gap between static blog content and dynamic video storytelling, setting the stage for captivating visual content.

To illustrate the process, let’s walk through the steps of taking a screenshot of a webpage titled “23 Axolotl Facts for Kids” (https://www.deepseaworld.com/animal-behaviour/23-axolotl-facts-for-kids/).

Here’s a breakdown of the Python script that uses Selenium WebDriver and Pillow to automatically capture a full-page screenshot of a specified website.

pip install pillow selenium

By running the command pip install pillow selenium, we are installing these two Python libraries on your system, which will then allow us to use their functionality in our Python scripts and programs.

Importing Required Libraries

First, we need to import the necessary libraries:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from PIL import Image
import time

These imports give us access to Selenium’s WebDriver and support functions, as well as the Pillow (Python Imaging Library) for image processing.

Setting Up the WebDriver

Next, we set up our WebDriver:

# Define the URL of the web page we want to screenshot

url = "https://www.treehugger.com/things-you-dont-know-about-axolotl-4863490"

webdriver_path = 'C:\chromedriver_win32\chromedriver.exe'

First, we specify the URL of the webpage we want to interact with. In this case, we’re targeting a page about axolotls on the SeaWorld website. This could be any URL you’re interested in scraping data from.

Next, we define the path to our WebDriver. This path points to the location of the ChromeDriver executable on your system. ChromeDriver is the interface that allows Selenium to control Chrome. Make sure you’ve downloaded the appropriate version for your system and Chrome browser. You can install ChromeDriver from this link: https://googlechromelabs.github.io/chrome-for-testing

Configuring Chrome Options

Now, we set up some options for our Chrome browser:

options = webdriver.ChromeOptions()
options.add_argument('--headless=new') 
driver = webdriver.Chrome(options=options)

We create a ChromeOptions object, which allows us to customize how Chrome will run. The “--headless=new argument” tells Chrome to run in headless mode. This means the browser will run in the background without opening a visible window, which is great for automated scripts.

Navigating to the Webpage

driver.get(url)

driver.get(url) enables the WebDriver to load our target URL. This command tells Selenium to navigate to the specified webpage.

Taking a Full-Page Screenshot

This is where things get interesting. We use a try-finally block to ensure our WebDriver closes properly, even if an error occurs:

try:
    element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.TAG_NAME, 'body')))
    S = lambda X: driver.execute_script("return document.body.parentNode.scroll" + X)
    driver.set_window_size(S('Width'),S('Height'))
    driver.find_element(By.TAG_NAME,"body").screenshot("./images/axolotl_2.jpg")
    
finally:
    driver.quit()

Let’s break this down:

Wait for the Page to Load: First, we use the WebDriverWait class to wait for the presence of the body tag on the web page. This ensures that the page has finished loading before we proceed with taking the screenshot.

Get the Scroll Width and Height: We define a lambda function S that executes JavaScript code within the browser context to retrieve the scroll width and height of the web page. This is necessary because the browser window size may not be large enough to capture the entire page content.

Set the Browser Window Size: Next, we set the browser window size to the full scroll width and height of the web page using the set_window_size method of the WebDriver instance.

Take the Screenshot: With the browser window size adjusted, we can now take a screenshot of the entire web page by finding the body element and calling the screenshot method on it. The screenshot is saved to the specified file path ('./images/axolotl_2.png' in this case). Note that we’ve changed the file format to PNG for better quality.

Quit the WebDriver Session: Finally, we call the quit method on the WebDriver instance to close the browser and terminate the WebDriver session.

Displaying the Screenshot

After taking the screenshot, we use the Pillow library to open and display the image.

img = Image.open("./images/axolotl_1.jpg")
img.show()

Pillow is a fork of the Python Imaging Library (PIL) and is one of the most popular libraries for opening, manipulating, and saving various image file formats in Python. It’s designed to be more user-friendly and easier to install while maintaining compatibility with existing code.

This opens the saved screenshot file and displays it using the default image viewer on your system.

Tips and Tricks

File Format: While the initial attempt might produce a JPG image, it’s recommended to save screenshots in PNG format for better quality.

Error Handling: Watch out for warnings regarding invalid escape sequences and file extensions. Proper error handling ensures a smooth workflow.

Why This Approach?

This script is particularly useful for capturing full-page screenshots, including content that might be below the fold (not visible without scrolling). It’s an automated solution that can be easily integrated into web testing pipelines or used for creating visual archives of web content.

In this way, we can take screenshots of several blogs to get a good blend of information for our video. In the next part, we’ll use OpenAI’s Vision API to create a video script from these screenshots.

GitHub: https://github.com/finxter/Info_Video_Project

Leave a Comment