Tim Habersack

Where I put my things..

Downloading all html links of a certain extension from a web page

Aug 16th 2019

I had reason to download a bunch of PDFs from a website recently and didn't want to click 80+ links.

Here is how I approached it. On the page I opened up the console and ran:

var links =[]; Array.prototype.map. call(document.querySelectorAll("a[href$=\".pdf\"]"), function(e, i){if((links||[]).indexOf(e.href)==-1){ links.push( e.href);} }); console.log('"' + links.join('" "') + '"');

Note the .pdf in there, it was the file extension I was looking for. This gives me back a nice list of urls each in parenthesis. Then I just used wget to fetch them all.

wget "url1" "url2" ...

HT to this SO post that gave me the solution.