最新消息:Welcome to the puzzle paradise for programmers! Here, a well-designed puzzle awaits you. From code logic puzzles to algorithmic challenges, each level is closely centered on the programmer's expertise and skills. Whether you're a novice programmer or an experienced tech guru, you'll find your own challenges on this site. In the process of solving puzzles, you can not only exercise your thinking skills, but also deepen your understanding and application of programming knowledge. Come to start this puzzle journey full of wisdom and challenges, with many programmers to compete with each other and show your programming wisdom! Translated with DeepL.com (free version)

javascript - How to get the raw data from pdf.js - Stack Overflow

matteradmin5PV0评论

I am building a page which is using PDF.js to load and render a pdf as the following code.

var url = '/path-to-pdf.js';
PDFJS.workerSrc = "./js/external/pdf.worker.js";

PDFJS.getDocument(url).then(function getPdfHelloWorld(pdf) {

    var pageNumber = 1;
    renderPage($(".center-info")[0], pdf, 1, function pageRenderingComplete() {
        if (pageNumber > pdf.numPages) {
            return; // All pages rendered
        }
        // Continue rendering of the next page
        renderPage($("display-div")[0], pdf, ++pageNumber, pageRenderingComplete);
    });

});

I would like to make client-side download, which means I have to access the raw PDF directly. Is it possible to do that here?

I am building a page which is using PDF.js to load and render a pdf as the following code.

var url = '/path-to-pdf.js';
PDFJS.workerSrc = "./js/external/pdf.worker.js";

PDFJS.getDocument(url).then(function getPdfHelloWorld(pdf) {

    var pageNumber = 1;
    renderPage($(".center-info")[0], pdf, 1, function pageRenderingComplete() {
        if (pageNumber > pdf.numPages) {
            return; // All pages rendered
        }
        // Continue rendering of the next page
        renderPage($("display-div")[0], pdf, ++pageNumber, pageRenderingComplete);
    });

});

I would like to make client-side download, which means I have to access the raw PDF directly. Is it possible to do that here?

Share Improve this question edited Aug 4, 2014 at 14:20 ppn029012 asked Aug 3, 2014 at 5:13 ppn029012ppn029012 5702 gold badges6 silver badges21 bronze badges 1
  • Look here for inspiration: github./mozilla/pdf.js/blob/… – Rob W Commented Aug 4, 2014 at 15:59
Add a ment  | 

2 Answers 2

Reset to default 6

I just got the answer. We can access the data by getData() method.

PDFJS.getDocument(url).then(function getPdfHelloWorld(pdf) {

    pdf.getData().then(function(arrayBuffer) {
        var pdfraw = String.fromCharCode.apply(null, arrayBuffer);

        // Operation your raw pdf here...
    });

Cheers

async function extract(input) {
    const pdf = await pdfJS.getDocument(input);

    const elements = [];

    for (let pageNumber = 1; pageNumber <= pdf.numPages; pageNumber++) {
        const page = await pdf.getPage(pageNumber);
        const textContent = await page.getTextContent({
            normalizeWhitespace: true,
            disableCombineTextItems: false,
        });

        textContent.items.forEach(item => {
            elements.push(item);
        });
    }

    return elements;
}
Post a comment

comment list (0)

  1. No comments so far