Working with XPath
Recently I've been re-writing my web scrapers after a senior dev advised that using regex to parse HTML is just a terrible way to do it and I should really be using the language made for parsing XML that is XPath, so I've been busy with this and using what I've learnt written some C subroutines
to easily parse web pages as it takes an awful lot of code to just download a web page to a string and run an XPath query on it. These subroutines are now working in my Arbitrage betting software
To get the XPath queries, the the easiest way I've found is to right click the bit of the web page I'm interested in grabbing in Chrome or Firefox then "Inspect element", just below the source code in the new window it has the node we selected which we can then turn into a query.
I've also been working with the Perl module HTML::TreeBuilder::XPath for parsing web pages, in contrast to libxml this only requires 5 lines of code to return the results to an array from a web page. I've increased the storage for the server this website is hosted on and written a script
using this module to automatically download videos from TempleOS.org
and upload them here
because Terry regularly deletes the videos and the Youtube re-uploaders have all stopped.
Another small script
I originally wrote in C but then ported to Perl scrapes the website allkeyshop.com
according to a config file and sends an email for any games which are selling for below a set price threshold.
I imagine these examples will help anyone getting started with XPath, lets just hope Microsoft doesn't make it illegal
for us to scrape publicly available content.
The TempleOS script has now been turned off, more complete archives have been compiled such as archive.org