Simple web scraping with Bash: Ski Report
Linux Magazine|#262/September 2022
With one line of Bash code, Pete scrapes the web and builds a desktop notification app to get the daily snow report.
Pete Metcalfe
Simple web scraping with Bash: Ski Report

While recently doing a small project, I was amazed by how much web scraping I could do with just one line of Bash. I used the text-based Lynx browser [1] and then piped the output to a grep search. Figure 1 shows the one-line Bash example that scrapes the current snow depth from the Sunshine Village Snow Forecast web page.

In this article, I will introduce some techniques to easily scrape web pages, and then I will create a desktop notification script that provides the daily snow forecast.

The Lynx Text Browser

For my Bash web scraping, I started out by looking at using command-line tools such as curl [2] with the htm12text [3] utility. This technique definitely works, but I found that using the Lynx browser offers a one-step solution with a slightly cleaner text output.

To install Lynx on Raspian/Debian/ Ubuntu, use:

sudo apt install lynx

The Lynx -dump option will output a web page to text with HTML tags, HTML encoding, and JavaScript removed. Figure 2 shows that a Lynx dump can greatly clean up the original web page and make searching considerably easier.

Sometimes a simple Bash grep search might be all that you need. However, there are many cases where some text manipulation is required. The good news is that Bash has a nice selection of line and string manipulation tools.

This story is from the #262/September 2022 edition of Linux Magazine.

Start your 7-day Magzter GOLD free trial to access thousands of curated premium stories, and 8,500+ magazines and newspapers.

This story is from the #262/September 2022 edition of Linux Magazine.

Start your 7-day Magzter GOLD free trial to access thousands of curated premium stories, and 8,500+ magazines and newspapers.

MORE STORIES FROM LINUX MAGAZINEView All
URL filtering with Pi-hole Into the Funnel
Linux Magazine

URL filtering with Pi-hole Into the Funnel

Supporting browser plug-ins, network-based DNS blockers like Pi-hole help protect you against online tracking and unwanted content.

time-read
10+ mins  |
#274/August 2023: The Best of Small Distros
Artificial intelligence on the Raspberry Pi Learning Experience
Linux Magazine

Artificial intelligence on the Raspberry Pi Learning Experience

You don't need a powerful computer system to use Al. We show what it takes to benefit from Al on the Raspberry Pi and what tasks the small computer can handle.

time-read
7 mins  |
#274/August 2023: The Best of Small Distros
MakerSpace Manage your greenhouse with a Raspberry Pi Pico W Sheltered Growth
Linux Magazine

MakerSpace Manage your greenhouse with a Raspberry Pi Pico W Sheltered Growth

You can safely assign some greenhouse tasks to a Raspberry Pi Pico W, such as controlling ventilation, automating a heater, and opening and closing windows.

time-read
7 mins  |
#274/August 2023: The Best of Small Distros
Control Center
Linux Magazine

Control Center

Tipi gives you complete control of more than 100 applications and services. A mouse click is all it takes to install the apps.

time-read
6 mins  |
#274/August 2023: The Best of Small Distros
In One Fell Swoop
Linux Magazine

In One Fell Swoop

Topgrade detects all the package managers installed on a system and executes them one by one at the command line.

time-read
3 mins  |
#274/August 2023: The Best of Small Distros
Go Faster!
Linux Magazine

Go Faster!

The fastest way through a curve on a racetrack is along the racing line. Instead of heading for Indianapolis, Mike Schilli trains his reflexes with a desktop application written in Go, just to be on the safe side.

time-read
9 mins  |
#274/August 2023: The Best of Small Distros
Math Magic
Linux Magazine

Math Magic

MathLex lets you easily transform handwritten math formulas to digital format and use them on the web.

time-read
5 mins  |
#274/August 2023: The Best of Small Distros
Custom Repair Toolkit
Linux Magazine

Custom Repair Toolkit

You can do more with System Rescue than just repair broken systems. By adding tools and scripts, you can create a custom rescue environment that meets your needs.

time-read
8 mins  |
#274/August 2023: The Best of Small Distros
At Your Disposal
Linux Magazine

At Your Disposal

Debvm lets you quickly create a temporary virtual machine with a small memory footprint, ideal for testing scripts or mixing repositories

time-read
4 mins  |
#274/August 2023: The Best of Small Distros
A Fresh Breeze
Linux Magazine

A Fresh Breeze

Vanilla OS, an immutable filesystem, seamlessly integrates applications from other distributions with an innovative container-based package manager.

time-read
5 mins  |
#274/August 2023: The Best of Small Distros