最新消息:Welcome to the puzzle paradise for programmers! Here, a well-designed puzzle awaits you. From code logic puzzles to algorithmic challenges, each level is closely centered on the programmer's expertise and skills. Whether you're a novice programmer or an experienced tech guru, you'll find your own challenges on this site. In the process of solving puzzles, you can not only exercise your thinking skills, but also deepen your understanding and application of programming knowledge. Come to start this puzzle journey full of wisdom and challenges, with many programmers to compete with each other and show your programming wisdom! Translated with DeepL.com (free version)

javascript - How to use XMLHttpRequest to download an HTML page in the background and extract a text element from it? - Stack Ov

matteradmin5PV0评论

I want to make a Greasemonkey script that, while you are in URL_1, the script parses the whole HTML web page of URL_2 in the background in order to extract a text element from it.

To be specific, I want to download the whole page's HTML code (a Rotten Tomatoes page) in the background and store it in a variable and then use getElementsByClassName[0] in order to extract the text I want from the element with class name "critic_consensus".


I've found this in MDN: HTML in XMLHttpRequest so, I ended up in this unfortunately non-working code:

var xhr = new XMLHttpRequest();
xhr.onload = function() {
  alert(this.responseXML.getElementsByClassName(critic_consensus)[0].innerHTML);
}
xhr.open("GET", "/",true);
xhr.responseType = "document";
xhr.send();

It shows this error message when I run it in Firefox Scratchpad:

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at /. This can be fixed by moving the resource to the same domain or enabling CORS.


PS. The reason why I don't use the Rotten Tomatoes API is that they've removed the critics consensus from it.

I want to make a Greasemonkey script that, while you are in URL_1, the script parses the whole HTML web page of URL_2 in the background in order to extract a text element from it.

To be specific, I want to download the whole page's HTML code (a Rotten Tomatoes page) in the background and store it in a variable and then use getElementsByClassName[0] in order to extract the text I want from the element with class name "critic_consensus".


I've found this in MDN: HTML in XMLHttpRequest so, I ended up in this unfortunately non-working code:

var xhr = new XMLHttpRequest();
xhr.onload = function() {
  alert(this.responseXML.getElementsByClassName(critic_consensus)[0].innerHTML);
}
xhr.open("GET", "http://www.rottentomatoes./m/godfather/",true);
xhr.responseType = "document";
xhr.send();

It shows this error message when I run it in Firefox Scratchpad:

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at http://www.rottentomatoes./m/godfather/. This can be fixed by moving the resource to the same domain or enabling CORS.


PS. The reason why I don't use the Rotten Tomatoes API is that they've removed the critics consensus from it.

Share Improve this question edited Feb 6, 2018 at 22:01 Brock Adams 93.7k23 gold badges241 silver badges305 bronze badges asked Nov 5, 2014 at 19:19 darkreddarkred 6379 silver badges28 bronze badges 3
  • 2 What is not-working? What error do you get? – Bergi Commented Nov 5, 2014 at 19:20
  • 2 No error message inside Firefox's Scratchpad. After seeing Igor Barinov's reply, I checked the Firefox Web Console and that's where appears the error message he mentioned. I added the error message to my question. – darkred Commented Nov 5, 2014 at 19:52
  • I edited my answer with new idea, give it a try! – Igor Barinov Commented Nov 5, 2014 at 20:38
Add a ment  | 

3 Answers 3

Reset to default 5

For cross-origin requests, where the fetched site has not helpfully set a permissive CORS policy, Greasemonkey provides the GM_xmlhttpRequest() function. (Most other userscript engines also provide this function.)

GM_xmlhttpRequest is expressly designed to allow cross-origin requests.

To get your target information create a DOMParser on the result. Do not use jQuery methods as this will cause extraneous images, scripts and objects to load, slowing things down, or crashing the page.

Here's a plete script that illustrates the process:

// ==UserScript==
// @name        _Parse Ajax Response for specific nodes
// @include     http://stackoverflow./questions/*
// @require     http://ajax.googleapis./ajax/libs/jquery/2.1.0/jquery.min.js
// @grant       GM_xmlhttpRequest
// ==/UserScript==

GM_xmlhttpRequest ( {
    method: "GET",
    url:    "http://www.rottentomatoes./m/godfather/",
    onload: function (response) {
        var parser  = new DOMParser ();
        /* IMPORTANT!
            1) For Chrome, see
            https://developer.mozilla/en-US/docs/Web/API/DOMParser#DOMParser_HTML_extension_for_other_browsers
            for a work-around.

            2) jQuery.parseHTML() and similar are bad because it causes images, etc., to be loaded.
        */
        var doc         = parser.parseFromString (response.responseText, "text/html");
        var criticTxt   = doc.getElementsByClassName ("critic_consensus")[0].textContent;

        $("body").prepend ('<h1>' + criticTxt + '</h1>');
    },
    onerror: function (e) {
        console.error ('**** error ', e);
    },
    onabort: function (e) {
        console.error ('**** abort ', e);
    },
    ontimeout: function (e) {
        console.error ('**** timeout ', e);
    }
} );

The problem is: XMLHttpRequest cannot load http://www.rottentomatoes./m/godfather/. No 'Access-Control-Allow-Origin' header is present on the requested resource.

Because you are not the owner of the resource you can not set up this header.

What you can do is set up a proxy on heroku which will proxy all requests to rottentomatoes web site Here is a small node.js proxy https://gist.github./igorbarinov/a970cdaf5fc9451f8d34

var https = require('https'),
    http  = require('http'),
    util  = require('util'),
    path  = require('path'),
    fs    = require('fs'),
    colors = require('colors'),
    url = require('url'),
    httpProxy = require('http-proxy'),
    dotenv = require('dotenv');

dotenv.load();

var proxy = httpProxy.createProxyServer({});
var host = "www.rottentomatoes.";
var port = Number(process.env.PORT || 5000);

process.env.NODE_TLS_REJECT_UNAUTHORIZED = "0";

var server = require('http').createServer(function(req, res) {
    // You can define here your custom logic to handle the request
    // and then proxy the request.
    var path = url.parse(req.url, true).path;

    req.headers.host = host;
res.setHeader("Access-Control-Allow-Origin", "*");
    proxy.web(req, res, {
        target: "http://"+host+path,

    });

}).listen(port);

proxy.on('proxyRes', function (res) {
    console.log('RAW Response from the target', JSON.stringify(res.headers, true, 2));
});


util.puts('Proxying to '+ host +'. Server'.blue + ' started '.green.bold + 'on port '.blue + port);

I modified https://github./massive/firebase-proxy/ code for this

I published proxy on http://peaceful-cove-8072.herokuapp./ and on http://peaceful-cove-8072.herokuapp./m/godfather you can test it

Here is a gist to test http://jsfiddle/uuw8nryy/

var xhr = new XMLHttpRequest();
xhr.onload = function() {
  alert(this.responseXML.getElementsByClassName(critic_consensus)[0]);
}
xhr.open("GET", "http://peaceful-cove-8072.herokuapp./m/godfather",true);
xhr.responseType = "document";
xhr.send();

The JavaScript same origin policy prevents you from accessing content that belongs to a different domain.

The above reference also gives you four techniques for relaxing this rule (CORS being one of them).

Articles related to this article

Post a comment

comment list (0)

  1. No comments so far