Google Books allows viewing the scans in colour, but when I click the option to download the PDF, I am provided only with a black-and-white version.

Is it known how to obtain the original colour images, outside of inspectelementing each page one by one?

  • bela@lemm.ee
    link
    fedilink
    arrow-up
    3
    ·
    edit-2
    1 year ago

    I just spent a bit too much time making this (it was fun), so don’t even tell me if you’re not going to use it.

    You can open up a desired book’s page, start this first script in the console, and then scroll through the book:

    let imgs = new Set();
    
    function cheese() {    
      for(let img of document.getElementsByTagName("img")) {
        if(img.parentElement.parentElement.className == "pageImageDisplay") imgs.add(img.attributes["src"].value);
      }
    }
    
    setInterval(cheese, 5);
    

    And once you’re done you may run this script to download each image:

    function toDataURL(url) {
      return fetch(url).then((response) => {
        return response.blob();
      }).then(blob => {
        return URL.createObjectURL(blob);
      });
    }
    
    async function asd() {
      for(let img of imgs) {
        const a = document.createElement("a");
        a.href = await toDataURL(img);
        let name;
        for(let thing of img.split("&")) {
          if(thing.startsWith("pg=")) {
            name = thing.split("=")[1];
            console.log(name);
            break;
          }
        }
        a.download = name;
        document.body.appendChild(a);
        a.click();
        document.body.removeChild(a);
      }
    }
    
    asd();
    

    Alternatively you may simply run something like this to get the links:

    for(let img of imgs) {
    	console.log(img)
    }
    

    There’s stuff you can tweak of course if it don’t quite work for you. Worked fine on me tests.

    If you notice a page missing, you should be able to just scroll back to it and then download again to get everything. The first script just keeps collecting pages till you refresh the site. Which also means you should refresh once you are done downloading, as it eats CPU for breakfast.

    Oh and NEVER RUN ANY JAVASCRIPT CODE SOMEONE ON THE INTERNET TELLS YOU TO RUN

    • antonim@lemmy.dbzer0.comOP
      link
      fedilink
      arrow-up
      3
      ·
      1 year ago

      Well, I may be technologically semi-literate and I may have felt a bit dizzy when I saw actual code in your comment, but I sure as hell will find a way to put it to use, no matter the cost.

      You’re terrific, man. No idea what else to say.

      • bela@lemm.ee
        link
        fedilink
        arrow-up
        3
        ·
        1 year ago

        lmk if you run into an issue

        This kind of stuff is like an IRL puzzle game. I thought it would be a simple five minute adventure, but of course google has made sure it isn’t! I suppose for 3 stars I would have given it to you in a pdf format, but I fear the man who could do that in javascript.