最新消息:Welcome to the puzzle paradise for programmers! Here, a well-designed puzzle awaits you. From code logic puzzles to algorithmic challenges, each level is closely centered on the programmer's expertise and skills. Whether you're a novice programmer or an experienced tech guru, you'll find your own challenges on this site. In the process of solving puzzles, you can not only exercise your thinking skills, but also deepen your understanding and application of programming knowledge. Come to start this puzzle journey full of wisdom and challenges, with many programmers to compete with each other and show your programming wisdom! Translated with DeepL.com (free version)

python - Download a PDF with selenium - Stack Overflow

matteradmin4PV0评论

I'm trying to download PDFs with selenium, but the argument driver.download_file(file_name, target_directory) returns "WebDriverException: You must enable downloads in order to work with downloadable files."

I tried adding the option chrome_options.enable_downloads = True, but it didn't work. I also tried using a different browser (I obtained the same problem with Edge, and Firefox returned another error). I also tried several older version of Selenium, without any success.

In the end, all I want is to download PDFs and store them in a specific folder. If anyone has any advice on how I can achieve this, it would be very helpful!

Here is my complete code, please let me know if I can provide anything else :)

import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

def download_pdf_and_rename(url, filename):
   # Configure Chrome options to download PDFs to a temporary directory
    chrome_options = Options()
    
    chrome_options.enable_downloads = True

    driver = webdriver.Chrome(options=chrome_options)

    # Access the PDF URL
    driver.get(url)

    time.sleep(5)  # Adjust the sleep time as needed
    
    driver.download_file('my_pdf.pdf', MY_PATH)
    
    # Close the browser
    driver.quit()


download_pdf_and_rename(".1257/aer.20170866", "my_pdf.pdf")

Thanks!

I'm trying to download PDFs with selenium, but the argument driver.download_file(file_name, target_directory) returns "WebDriverException: You must enable downloads in order to work with downloadable files."

I tried adding the option chrome_options.enable_downloads = True, but it didn't work. I also tried using a different browser (I obtained the same problem with Edge, and Firefox returned another error). I also tried several older version of Selenium, without any success.

In the end, all I want is to download PDFs and store them in a specific folder. If anyone has any advice on how I can achieve this, it would be very helpful!

Here is my complete code, please let me know if I can provide anything else :)

import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

def download_pdf_and_rename(url, filename):
   # Configure Chrome options to download PDFs to a temporary directory
    chrome_options = Options()
    
    chrome_options.enable_downloads = True

    driver = webdriver.Chrome(options=chrome_options)

    # Access the PDF URL
    driver.get(url)

    time.sleep(5)  # Adjust the sleep time as needed
    
    driver.download_file('my_pdf.pdf', MY_PATH)
    
    # Close the browser
    driver.quit()


download_pdf_and_rename("https://pubs.aeaweb./doi/pdfplus/10.1257/aer.20170866", "my_pdf.pdf")

Thanks!

Share Improve this question asked Nov 18, 2024 at 12:08 Lucie BoisLucie Bois 11 bronze badge 1
  • check the eample in the official repository – cards Commented Nov 18, 2024 at 12:30
Add a comment  | 

2 Answers 2

Reset to default 0

Selenium doesn't have a built-in enable_downloads attribute. Instead, you need to set specific Chrome preferences to control the behavior of downloads, including the directory where files should be saved and how to handle PDF files.

import time
import os
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager

def download_pdf_and_rename(url, target_directory, filename):
    # Ensure the target directory exists
    if not os.path.exists(target_directory):
        os.makedirs(target_directory)
    chrome_options = Options()
    chrome_options.add_experimental_option("prefs", {
        "download.default_directory": target_directory,  
        "download.prompt_for_download": False,  
        "plugins.always_open_pdf_externally": True,  
    })

    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
    driver.get(url)
    time.sleep(10)
    downloaded_file_path = os.path.join(target_directory, "document.pdf")
    renamed_file_path = os.path.join(target_directory, filename)
    if os.path.exists(downloaded_file_path):
        os.rename(downloaded_file_path, renamed_file_path)
        print(f"File downloaded and renamed to: {renamed_file_path}")
    else:
        print("Downloaded file not found. Check the download settings or file name.")
    driver.quit()
download_pdf_and_rename(
    "https://pubs.aeaweb./doi/pdfplus/10.1257/aer.20170866",
    target_directory="./downloads",
    filename="my_pdf.pdf"
)

This is not a Selenium solution but you can make a request for the service in Python and check the response's Content-Disposition header. That will contain the name of the file that is being downloaded.

There is a chance that the request will get blocked, so you might need to play around with request-headers to get around blocked requests.

Post a comment

comment list (0)

  1. No comments so far