Getting Daily Mean Temperature from Historical Weather Data

I’ve been working on a few home automation projects using historical weather data from the Big Query NOAA Public Weather Dataset to schedule automations, estimate energy usage, or predict when I’ll get my next home heating oil delivery. To take things a step further, I need to know the average mean temperature on a given date.

To do this, we’ll take the last 20 years of NOAA weather data for a given weather station and calculate the mean mean ;) (average temperature by day using the mean temperature values per day). We could just as easily calculate the mean high and low temperature for each day as well.

Let’s get started!

If you have not already done so, be sure to get familiar with the Big Query NOAA GSOD public dataset using my previously published tutorial Finding the Closest Weather Stations. Be sure that you are using a weather station with (complete) “good data”.

You will need the following information:

  • stn — Station number for the location
  • wban — Historical “Weather Bureau Air Force Navy” number

Let’s start with a simple query to get the mean temperature for each day of the year for the last full year:

SELECT 
CAST(temp as Numeric) as meantemp,
CONCAT(year,mo,da) as thedate # Construct a date field
FROM
`bigquery-public-data.noaa_gsod.gsod2019`
WHERE
max != 9999.9 # code for missing data
AND stn = '725037'
AND wban = '94745'
ORDER BY thedate

We CAST the temp value from a FLOAT to NUMERIC to avoid issues when performing calculations (despite what IEEE says). As you can see, we output a simple list of dates (thedate) with the mean temperature per date (temp).

Now that we’ve gotten the mean temperature by date, let’s expand to a period that include several years of data so we can calculate a mean value per day from historical data. Using the BETWEEN operator, we can quickly and easily select a number of years at once.

SELECT 
CAST(temp as Numeric) as meantemp,
CONCAT(mo,da) as thedate # Get rid of the Year
FROM
`bigquery-public-data.noaa_gsod.gsod20*` # Use wildcard to select many tables at once
WHERE
max != 9999.9 # code for missing data
AND _TABLE_SUFFIX BETWEEN '00' AND '19' # Limit wildcard tables to 2000-2019
AND stn = '725037'
AND wban = '94745'

We should now have 20 mean temperature values per date:

To make this data useful, we we are going to need a way to GROUP data by date (minus the year) in MMDD format. In order to get the mean temperature by date, we are going to need to SUM the meantemp and divide by the COUNT (or number of occurrences) of thedate. By using the COUNT, we cover ourselves in case of missing data.

SELECT 
SUM(CAST(temp as Numeric))/COUNT(temp) as mean_mean_temp,
CONCAT(mo,da) as thedate # Get rid of the Year
FROM
`bigquery-public-data.noaa_gsod.gsod20*`
# Use wildcard to select many tables at once
WHERE
max != 9999.9 # code for missing data
AND _TABLE_SUFFIX BETWEEN '00' AND '19'
# Limit wildcard tables to 2000-2019
AND stn = '725037'
AND wban = '94745'
GROUP BY thedate
ORDER BY thedate

The result is 366 rows (Don’t forget about the leap years ;) of the mean mean temperature by date.

We can easily do the same for high (max) and low (min) temperature by day as well.

SELECT 
SUM(CAST(temp as Numeric))/COUNT(temp) as mean_mean_temp,
SUM(CAST(min as Numeric))/COUNT(min) as mean_min_temp,
SUM(CAST(max as Numeric))/COUNT(max) as mean_max_temp,
CONCAT(mo,da) as thedate # Get rid of the Year
FROM
`bigquery-public-data.noaa_gsod.gsod20*`
# Use wildcard to select many tables at once
WHERE
max != 9999.9 # code for missing data
AND _TABLE_SUFFIX BETWEEN '00' AND '19'
# Limit wildcard tables to 2000-2019
AND stn = '725037'
AND wban = '94745'
GROUP BY thedate
ORDER BY thedate

Since this query uses over 3GB to process, we might consider saving this as its own Big Query table for use is later queries. Let’s click the “Save Results” button and select “BigQuery table”.

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Data Science Weekly Issue 8

What can we learn from 42.5 million mortgage loan applications?

Define Comparative/Homology Modeling.

NFL Twitter Sentiment: Raiders at Colts, Week 17, 2021

Curation assignment template — MCO435

7 Best Business Analytics Courses Online

Docker for Data Scientists

Box and Violin Plots with Python’s Seaborn

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jake Holmquist

Jake Holmquist

More from Medium

Big Data : Know everything here

Cloud Data Engineering and Analytics Automation in One Package

Import stock data from Alpha Vantage into SQL database

Data Engineering Project Retail Store — Part 4 — Analyzing the Data