python - Download a PDF with selenium - Stack Overflow|Programmer puzzle solving

I'm trying to download PDFs with selenium, but the argument driver.download_file(file_name, target_directory) returns "WebDriverException: You must enable downloads in order to work with downloadable files."

I tried adding the option chrome_options.enable_downloads = True, but it didn't work. I also tried using a different browser (I obtained the same problem with Edge, and Firefox returned another error). I also tried several older version of Selenium, without any success.

In the end, all I want is to download PDFs and store them in a specific folder. If anyone has any advice on how I can achieve this, it would be very helpful!

Here is my complete code, please let me know if I can provide anything else :)

import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

def download_pdf_and_rename(url, filename):
   # Configure Chrome options to download PDFs to a temporary directory
    chrome_options = Options()
    
    chrome_options.enable_downloads = True

    driver = webdriver.Chrome(options=chrome_options)

    # Access the PDF URL
    driver.get(url)

    time.sleep(5)  # Adjust the sleep time as needed
    
    driver.download_file('my_pdf.pdf', MY_PATH)
    
    # Close the browser
    driver.quit()


download_pdf_and_rename(".1257/aer.20170866", "my_pdf.pdf")

Thanks!

In the end, all I want is to download PDFs and store them in a specific folder. If anyone has any advice on how I can achieve this, it would be very helpful!

Here is my complete code, please let me know if I can provide anything else :)

import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

def download_pdf_and_rename(url, filename):
   # Configure Chrome options to download PDFs to a temporary directory
    chrome_options = Options()
    
    chrome_options.enable_downloads = True

    driver = webdriver.Chrome(options=chrome_options)

    # Access the PDF URL
    driver.get(url)

    time.sleep(5)  # Adjust the sleep time as needed
    
    driver.download_file('my_pdf.pdf', MY_PATH)
    
    # Close the browser
    driver.quit()


download_pdf_and_rename("https://pubs.aeaweb./doi/pdfplus/10.1257/aer.20170866", "my_pdf.pdf")

Thanks!

Share Improve this question asked Nov 18, 2024 at 12:08 Lucie Bois 11 bronze badge

check the eample in the official repository – cards Commented Nov 18, 2024 at 12:30

Add a comment |

2 Answers 2

Sorted by: Reset to default 0

Selenium doesn't have a built-in enable_downloads attribute. Instead, you need to set specific Chrome preferences to control the behavior of downloads, including the directory where files should be saved and how to handle PDF files.

import time
import os
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager

def download_pdf_and_rename(url, target_directory, filename):
    # Ensure the target directory exists
    if not os.path.exists(target_directory):
        os.makedirs(target_directory)
    chrome_options = Options()
    chrome_options.add_experimental_option("prefs", {
        "download.default_directory": target_directory,  
        "download.prompt_for_download": False,  
        "plugins.always_open_pdf_externally": True,  
    })

    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
    driver.get(url)
    time.sleep(10)
    downloaded_file_path = os.path.join(target_directory, "document.pdf")
    renamed_file_path = os.path.join(target_directory, filename)
    if os.path.exists(downloaded_file_path):
        os.rename(downloaded_file_path, renamed_file_path)
        print(f"File downloaded and renamed to: {renamed_file_path}")
    else:
        print("Downloaded file not found. Check the download settings or file name.")
    driver.quit()
download_pdf_and_rename(
    "https://pubs.aeaweb./doi/pdfplus/10.1257/aer.20170866",
    target_directory="./downloads",
    filename="my_pdf.pdf"
)

This is not a Selenium solution but you can make a request for the service in Python and check the response's Content-Disposition header. That will contain the name of the file that is being downloaded.

There is a chance that the request will get blocked, so you might need to play around with request-headers to get around blocked requests.

Programmer puzzle solving

python - Download a PDF with selenium - Stack Overflow

2 Answers 2

Articles related to this article

comment list (0)