Web Scraping vs. API: Scraping Weather Underground Data with the Weather API (2024)

Dak Doyle

4 min read

Mar 16, 2023

In my last article, I wrote about web scraping weather data from Weather Underground using Beautiful Soup. The reasoning behind this, was I found that the charts generated from weather underground didn’t meet my needs for historical data analysis.

Web Scraping vs. API: Scraping Weather Underground Data with the Weather API (2)

As a result, I turned to my friend Jake Fitzsimmons for advice on using the Weather Underground API key to retrieve historical weather data.

In this article, I’ll walk you through the code that Jake Fitzsimmons helped me optimise, which uses the Weather Underground API to retrieve weather data for a specific station ID and date range. We’ll also discuss how this approach compares to web scraping and the benefits of using an API.

Please visit my github for the code.

import requests
import json
import pandas as pd# function to get wind direction based on degrees
def get_wind_direction(degrees):
 direction_names = ["N", "NNE", "NE", "ENE", "E", "ESE", "SE", "SSE", "S", "SSW", "SW", "WSW", "W", "WNW", "NW", "NNW"]
 index = round(degrees / (360. / len(direction_names))) % len(direction_names)
 return direction_names[index]
# URL to access weather data. Add this from your json packet when inspecting the source. 
url = 'https://################'
# station ID to get weather data for. You will need to add your own station
station_id = 'IFLETC15'
# start and end dates to get weather data for
start_date = '20230205'
end_date = '20230314'
# API key to access weather data. You will need to add your own api key
api_key = '############################'
# create a list of dates to get weather data for
dates = pd.date_range(start=start_date, end=end_date, freq='D')
# create an empty list to store all the weather data
all_data = []
# loop through all the dates and get weather data for each date
for date in dates:
 # set the parameters for the API request
 params = {
 'stationId': station_id,
 'format': 'json',
 'units': 'm',
 'date': date.strftime('%Y%m%d'),
 'numericPrecision': 'decimal',
 'apiKey': api_key
 }
 # send the API request and get the response
 response = requests.get(url, params=params)
 # check if the API request was successful
 if response.status_code == 200:
 # get the weather data from the response
 data = json.loads(response.text)["observations"]
 # loop through all the rows in the weather data and add wind direction data
 for row in data:
 metric = row.pop("metric")
 row.update(metric)
 row["wind_direction"] = get_wind_direction(row["winddirAvg"])
 # add the weather data to the list of all data
 all_data += data
 else:
 # print an error message if the API request failed
 print(f'Request failed with status code {response.status_code}')
 # create a Pandas DataFrame from the weather data and save it to a CSV file
df = pd.DataFrame.from_dict(all_data)
df.to_csv('weather_data1.csv', index=False)

The code begins by importing the necessary libraries — requests, json, and pandas — for API calls, data processing, and formatting respectively. It then defines a function get_wind_direction() that takes the wind direction in degrees and returns the direction name in cardinal direction (N, NNE, NE, ENE, E, ESE, SE, SSE, S, SSW, SW, WSW, W, WNW, NW, NNW).

The next few lines set the URL for the Weather Underground API, along with the station ID, start date, end date, and API key. We use pandas to create a date range for the given start and end dates.

The code then initializes an empty list called all_data, which will be used to store the retrieved weather data. It then loops through the dates in the date range and sends an API request for each date, specifying the station ID, format, units, numeric precision, and API key as parameters. If the request returns a status code of 200, the response data is parsed into JSON format and the observations data is extracted. For each observation, the metric data is removed from the dictionary and merged with the observation data. The wind direction is calculated using the get_wind_direction() function and added to the dictionary. The observation data is then added to the all_data list. If the request returns a status code other than 200, an error message is printed.

After all of the data has been retrieved, the all_data list is converted into a pandas DataFrame and saved as a CSV file called “weather_data1.csv” without index columns.

While web scraping is a viable method for retrieving data from websites, using an API is often more efficient and reliable. APIs are designed to provide a programmatic interface to access data from a remote server, and the data is typically returned in a standardised format such as JSON or XML. This makes it easier to parse and manipulate the data, compared to scraping HTML data which can be messy and prone to changes.

Additionally, many websites impose rate limits on web scraping to prevent excessive traffic and protect their server. Using an API can bypass these limitations since they are typically designed to handle large amounts of requests from multiple sources.

Next steps, I will start creating my Tableau Dashboard. The progress of this can be found on my Tableau Public profile.

Web Scraping vs. API: Scraping Weather Underground Data with the Weather API (3)

Once I am satisfied with my Tableau build, I will start pulling in data from other local weather stations to get a greater picture of the weather in Fletcher, NSW, Australia.

In this article, we explored how to use the Weather Underground API to retrieve historical weather data for a specific station ID and date range. We compared the benefits of using an API over web scraping and discussed how the retrieved data is processed and saved using pandas. I hope this article has provided insight into the power of APIs for data retrieval and encouraged you to explore the use of APIs in your own projects.

Web Scraping vs. API: Scraping Weather Underground Data with the Weather API (2024)

References