While recently doing a small project, I was amazed by how much web scraping I could do with just one line of Bash. I used the text-based Lynx browser [1] and then piped the output to a grep search. Figure 1 shows the one-line Bash example that scrapes the current snow depth from the Sunshine Village Snow Forecast web page.
In this article, I will introduce some techniques to easily scrape web pages, and then I will create a desktop notification script that provides the daily snow forecast.
The Lynx Text Browser
For my Bash web scraping, I started out by looking at using command-line tools such as curl [2] with the htm12text [3] utility. This technique definitely works, but I found that using the Lynx browser offers a one-step solution with a slightly cleaner text output.
To install Lynx on Raspian/Debian/ Ubuntu, use:
sudo apt install lynx
The Lynx -dump option will output a web page to text with HTML tags, HTML encoding, and JavaScript removed. Figure 2 shows that a Lynx dump can greatly clean up the original web page and make searching considerably easier.
Sometimes a simple Bash grep search might be all that you need. However, there are many cases where some text manipulation is required. The good news is that Bash has a nice selection of line and string manipulation tools.
This story is from the #262/September 2022 edition of Linux Magazine.
Start your 7-day Magzter GOLD free trial to access thousands of curated premium stories, and 8,500+ magazines and newspapers.
Already a subscriber ? Sign In
This story is from the #262/September 2022 edition of Linux Magazine.
Start your 7-day Magzter GOLD free trial to access thousands of curated premium stories, and 8,500+ magazines and newspapers.
Already a subscriber? Sign In
URL filtering with Pi-hole Into the Funnel
Supporting browser plug-ins, network-based DNS blockers like Pi-hole help protect you against online tracking and unwanted content.
Artificial intelligence on the Raspberry Pi Learning Experience
You don't need a powerful computer system to use Al. We show what it takes to benefit from Al on the Raspberry Pi and what tasks the small computer can handle.
MakerSpace Manage your greenhouse with a Raspberry Pi Pico W Sheltered Growth
You can safely assign some greenhouse tasks to a Raspberry Pi Pico W, such as controlling ventilation, automating a heater, and opening and closing windows.
Control Center
Tipi gives you complete control of more than 100 applications and services. A mouse click is all it takes to install the apps.
In One Fell Swoop
Topgrade detects all the package managers installed on a system and executes them one by one at the command line.
Go Faster!
The fastest way through a curve on a racetrack is along the racing line. Instead of heading for Indianapolis, Mike Schilli trains his reflexes with a desktop application written in Go, just to be on the safe side.
Math Magic
MathLex lets you easily transform handwritten math formulas to digital format and use them on the web.
Custom Repair Toolkit
You can do more with System Rescue than just repair broken systems. By adding tools and scripts, you can create a custom rescue environment that meets your needs.
At Your Disposal
Debvm lets you quickly create a temporary virtual machine with a small memory footprint, ideal for testing scripts or mixing repositories
A Fresh Breeze
Vanilla OS, an immutable filesystem, seamlessly integrates applications from other distributions with an innovative container-based package manager.