Jump to content
Sign in to follow this  
Zenarcher

Weather Model Experiment Results

Recommended Posts

How It Was Done,
During the early part of December 2018, I decided to start an experiment to see how the weather models would perform forecasting the weather up to 6 days ahead. To get things started I choose a location near me that being Stornoway on the West coast of Scotland there is a Met Office weather station situated there and I wanted to use a Met Office weather station since it would record accurate weather observations during this experiment and there is also another weather station close by at the airport I would use this as a back up in case the one from the Met Office failed to report any readings during the experiment.

Please Note,
I plan on doing more of these once the new GFS model is out and the next time I will be using multiple locations across the whole of the UK instead of one location in order to get more accurate results. Please do not take the results from this experiment too seriously as it was just for me to see which weather forecast was the best for my local area and I've decided to share the results here with you all.

Weather Models Used,
GFS (Global Forecast System) run by the United States' National Weather Service.
ECM (European Centre for Medium-Range Weather Forecasts).
UKMO The UK's Met Office.
ICON (ICOsahedral Nonhydrostatic general circulation model) from the German Weather Service.
GEM or as it's sometimes called the CMC is from the Environment and Climate Change Canada weather model.
FMI Hirlam Model from the Finnish Meteorological Institute.
ARPEGE from Meteo France.
HIRLAM or sometimes called KNMI is from the Netherlands Weather Service.

Weather Apps and Websites Used,
BBC Weather I decided to include it since they use data from the MeteoGroup I'm not sure what weather model they use it could be the ECM or a combination of several weather models.
Weather Underground They have the largest weather station community in the world which helps them produce their own forecasts.
DarkSky An award-winning weather app.
The Weather Channel Which Google use for their weather forecasts.
Accuweather.

How The Results Were Recorded,
Every day over the following weeks I would take notes on what every forecast said from all of the weather models 12z runs and when the time and day came I would check the observation made at the weather station and take a note on how far off their predictions were. Then all of the results were averaged out with a months worth of data to find the mean error from each of the forecasts.

Air Temperature Results,
24 Hours (1 day)

KJRiQ1w.png

Average Mean Error,
1st Met Office 0.52
2nd BBC Weather 0.65
3rd ICON 0.66
4th Weather Underground 0.73
5th The Weather Channel 0.74
6th DarkSky 0.75
7th ECM 0.76
8th AccuWeather 0.77
8th GEM 0.88
10th GFS 0.89
11th ARPEGE 0.93
12th FMI 1.61
13th HIRLAM 2.20

Notes:  The GFS was a bit of a surprise with it being down in 10th place it was slightly more bias towards having the temperatures warmer than forecast the same goes to FMI and HIRLAM which also both struggled when it came to predicting colder weather and overnight temperatures.

48 Hours (2 days)

mQSbEkp.png

Average Mean Error,
1st ICON  0.72
2nd Met Office 0.75
3rd ECM 0.77
4th AccuWeather 0.81
5th DarkSky 0.82
6th Weather Underground 0.85
7th GFS 0.90
8th The Weather Channel 0.95
9th GEM 1.06
10th ARPEGE 1.09
11th BBC Weather 1.15
12th HIRLAM 1.29
13th FMI 1.72

72 Hours (3 days)

4HY8PJw.png

Average Mean Error,
1st ICON 0.79
2nd AccuWeather 0.88
3rd Met Office 0.97
4th BBC Weather 1.04
5th ECM 1.11
6th DarkSky 1.12
7th GFS 1.13
8th Weather Underground 1.19
9th GEM 1.31
10th ARPEGE 1.54

96 Hours (4 days)

XDc47V2.png

Average Mean Error,
1st ICON 1.06
2nd Met Office 1.09
3rd BBC Weather 1.22
4th GEM 1.23
5th ECM 1.25
6th Weather Underground 1.30
7th DarkSky 1.31
8th GFS 1.50

Notes:  You may notice some of the weather models and have disappeared this is because not all of them go this far out.

120 Hours (5 days)

WSWYqHM.png

Average Mean Error,
1st ICON 1.29
2nd GEM 1.43
3rd Met Office 1.44
4th Weather Underground 1.47
5th DarkSky 1.50
6th ECM 1.52
7th BBC Weather 1.62
8th GFS 2.00

Notes:  This is as far as the ICON 7km data goes and it still remains in first place it's done really well with 3rd place at day 1 but leading all the way from day 2 to day 5 it's the clear winner when it comes to forecasting temperatures at short range.

144 Hours (6 days)

bE4GAeg.png

Average Mean Error,
1st Weather Underground 1.50
2nd GEM 1.70
3rd DarkSky 1.71
4th BBC Weather 1.72
5th ECM 1.75
6th Met Office 1.76
7th GFS 1.79

Mean Error Between 24 and 144 Hours (1 to 6 days),

f6dqlHK.png

1st ICON 0.90 (up to 120 hours 5 days)
2nd Met Office 1.08
3rd Weather Underground 1.17
4th ECM 1.19
5th DarkSky 1.20
6th BBC Weather 1.23
7th GEM 1.26
8th GFS 1.36

Notes: The ICON seem's to be in a different league when it comes to predicting the air temperatures I've looked through the data and it shows just how incredibly consistent it was.

Day time vs Night Time Temperatures between 1 to 6 days (24 to 144 hours),

1MtY6Mq.png

I thought it would also be worth showing how each weather model performed at predicting temperatures during the day and at night.

Average Mean Error For Daytime Temperature,
1st ICON 0.83 (Up to 120 hours 5 days)
2nd ECM 0.89
3rd Met Office 1.00
4th Weather Underground 1.03
5th BBC Weather 1.10
6th DarkSky 1.18
7th GEM 1.26
8th GFS 1.40

emao17r.png

Average Mean Error For Nighttime Temperature,
1st ICON 0.98
2nd Met Office 1.18
3rd DarkSky 1.20
4th GEM 1.28
5th Weather Underground 1.31
6th GFS 1.34
7th BBC Weather 1.37
8th ECM 1.50

Notes: Almost all of the weather models performed more poorly at nighttime compared to the daytime and it's interesting to see how the ECM and BBC Weather done well during the day but are both the worst when it comes to predicting nighttime temperatures.

Wind Speed Results,
24 Hours (1 day)

p4MXQ7h.png

Average Mean Error,
1st GFS 2.04
2nd DarkSky 2.29
3rd Met Office 2.56
4th BBC Weather 2.64
5th Weather Underground 2.76
6th The Weather Channel 2.79
7th ICON 3.25
8th GEM 3.47
9th ECM 3.59
10th AccuWeather 3.95
11th FMI 4.09
12th HIRLAM 4.13
13th ARPEGE 4.75

Notes: The ECM underestimated the wind speeds for the whole experiment. In my experience, I've seen it do the same thing for some other locations and when I redo this experiment using multiple locations it will be interesting to see if the ECM still underperforms when it comes to wind speeds.

48 Hours (2 days)

WrO1IKg.png

Average Mean Error,
1st Met Office 2.15
2nd Weather Underground 2.50
3rd The Weather Channel 2.72
4th BBC Weather 2.83
5th GFS 2.97
6th DarkSky 3.02
7th ICON 3.29
8th ECM 3.43
9th GEM 4.04
10th HIRLAM 4.15
11th ARPEGE 4.38
12th FMI 4.43
13th AccuWeather 5.04

Notes:  A few of the weather models seemed to handle the wind speeds poorly those being the HIRLAM, ARPEGE, FMI and the AccuWeather all of them tended to overestimate the wind speeds especially the ARPEGE weather model while the AccuWeather seemed to suffer from the same issue as the likes of the ECM and underestimated the wind speed.

72 Hours (3 days)

kTqzzYU.png

Average Mean Error,
1st BBC Weather 2.86
2nd Met Office 2.88
3rd Weather Underground 3.00
4th DarkSky 3.25
5th GFS 3.38
6th ECM 3.75
7th ICON 3.77
8th GEM 4.61
9th ARPEGE 4.63
10th AccuWeather 5.44

96 Hours (4 days)

xdTQwVa.png

Average Mean Error,
1st BBC Weather 2.92
2nd Met Office 3.36
3rd Weather Underground 4.47
4th ICON 4.72
5th ECM 4.76
6th DarkSky 4.86
7th GEM 5.43
8th GFS 5.95

Notes: The GFS has made a big jump back down to last place as many will know the GFS likes to overdo wind speeds while it seems to manage them well within the 1 to 3-day range it seems beyond that it tends to overestimate them more than other weather models.

120 Hours (5 days)

tNRHtfz.png

Average Mean Error,
1st Weather Underground 4.07
2nd BBC Weather 4.16
3rd Met Office 4.31
4th ECM 4.75
5th DarkSky 4.97
6th GFS 5.40
7th GEM 5.81
8th ICON 6.27

144 Hours (6 days)

2vpG5T5.png

Average Mean Error,
1st Weather Underground 4.16
2nd BBC Weather 4.50
3rd Met Office 5.26
4th DarkSky 5.42
5th ECM 5.64
6th GFS 5.90
7th GEM 7.18

Notes:  GEM seemed to really struggle around this point. While the ECM may have been behind the GFS it has done a better job with the wind speeds at day 5 and 6 compared to the GFS.

Mean Error Between 24 and 144 Hours (1 to 6 days)

1ENqqkv.png

1st BBC Weather 3.31
2nd Met Office 3.42
3rd Weather Underground 3.49
4th DarkSky 3.96
5th ICON 4.26 (up to 120 hours 5 days)
6th GFS 4.27
7th ECM 4.32
8th GEM 5.09

What about wind gusts?
I decided to test that out as well however it was just on one occasion during the early part of January where a storm brought gusts of 70 to 90mph winds to the North of Scotland. 24 hours before I made a note of all the predicted gusts from some of the weather models and I did use up to four different locations all where a Met Office station was located the results are below,

ehR8Mr5.png


Air Pressure Results,

i4RYV8g.png

24 Hours (1 day) Average Mean Error,
1st ECM 0.34
2nd DarkSky 0.37
3rd BBC Weather 0.38
4th Met Office 0.50
5th ARPEGE 0.54
6th ICON 0.59
7th GEM 0.65
8th Weather Underground 0.66
9th GFS 0.70
10th HIRLAM 1.04
11th FMI 1.11

Notes: It's surprising to see the GFS so far down the list. As for the HIRLAM and FMI, they both proved to be very inconsistent even at just 24 hours.

48 Hours (2 days)

IE3i2AZ.png

Average Mean Error,
1st BBC Weather 0.70
2nd ICON 0.75
3rd ECM 0.77
4th Met Office 0.81
5th ARPEGE 0.84
6th DarkSky 0.90
7th Weather Underground 1.07
8th GFS 1.20
9th GEM 1.21
10th HIRLAM 1.29
11th FMI 2.52

72 Hours (3 days)

NkL4GYi.png

Average Mean Error,
1st ICON 1.40
2nd Met Office 1.41
3rd DarkSky 1.50
4th ECM 1.68
5th Weather Underground 1.78
6th BBC Weather 1.90
7th GFS 1.91
8th GEM 1.97
9th ARPEGE 2.11

Notes: ARPEGE which came 5th for both day 1 and 2 now falls to the last place it seems to become more unreliable from day 3 onwards.

96 Hours (4 days)

JlBUzr6.png

Average Mean Error,
1st ICON 2.22
2nd ECM 2.25
3rd Met Office 2.68
4th GEM 2.77
5th BBC Weather 2.90
6th Weather Underground 3.04
7th DarkSky 3.09
8th GFS 3.47

120 Hours (5 days)

39YVWHZ.png

Average Mean Error,
1st DarkSky 3.84
2nd ICON 3.95
3rd Met Office 4.18
4th GFS 4.25
5th Weather Underground 4.42
6th ECM 4.75
7th GEM 4.77
8th BBC Weather 5.06

Notes:  This may surprise some people but once we got to day 5 DarkSky didn't seem to struggle as much as any other weather model.

144 Hours (6 days)

QLnzfzg.png

Average Mean Error,
1st DarkSky 4.97
2nd Weather Underground 5.09
3rd Met Office 5.31
4th GEM 5.32
5th GFS 5.50
6th BBC Weather 6.04
7th ECM 6.15

Mean Error Between 24 and 144 Hours (1 to 6 days),

cuePFoC.png

1st ICON 1.78 (up to 120 hours 5 days)
2nd DarkSky 2.44
3rd Met Office 2.48
4th ECM 2.65
5th Weather Underground 2.67
6th GEM 2.78
7th BBC Weather 2.83
8th GFS 2.83

Notes: ICON and DarkSky were the top 2 best are predicting air pressure the Met Office and the ECM were not too far behind. Both the GFS and BBC Weather were the worst.

Mean Error From Air Temperature, Wind Speed and Air Pressure 24 to 144 hours (1 to 6 days),

2IaQd3o.png

Now that we have looked at each individual part of the experiment it's time to add everything together at each timeframe to find the best on average from the temperature, wind and pressure. The chart above shows the results for 24 hours (1 day) and below shows the list of results,

1-day forecast (24 hours),
1st DarkSky 1.13
2nd Met Office 1.19
3rd GFS 1.21
4th BBC Weather 1.22
5th Weather Underground 1.38
6th ICON 1.50
7th ECM 1.56
8th GEM 1.66
9th ARPEGE 2.07
10th FMI 2.27
11th HIRLAM 2.45

2 days forecast (48 hours),

f4SsVTm.png
1st Met Office 1.23
2nd DarkSky 1.33
3rd Weather Underground 1.47
4th BBC Weather 1.56
5th ICON 1.58
6th ECM 1.65
7th GFS 1.69
8th GEM 1.90
9th ARPEGE 2.10
10th HIRLAM 2.24
11th FMI 2.89

3-day forecast (72 hours),

jXTrFTC.png
1st Met Office 1.64
2nd BBC Weather 1.93
3rd DarkSky 1.95
4th ICON 1.98
5th Weather Underground 1.99
6th GFS 2.14
7th ECM 2.18
8th GEM 2.63
9th ARPEGE 2.70

4-day forecast (96 hours),

MjflplT.png

1st BBC Weather 2.34
2nd Met Office 2.37
3rd ICON 2.66
4th ECM 2.75
5th Weather Underground 2.93
6th DarkSky 3.08
7th GEM 3.14
8th GFS 3.64

5-day forecast (120 hours),

NEoWSlu.png

1st Met Office 3.31
2nd Underground 3.32
3rd DarkSky 3.43
4th BBC 3.61
5th ECM 3.67
6th ICON 3.83
7th GFS 3.88
8th GEM 4.00

6-day forecast (144 hours),

wtrHzqb.png

1st Weather Underground 3.58
2nd DarkSky 4.03
3rd BBC Weather 4.08
4th Met Office 4.11
5th GFS 4.39
6th ECM 4.51
7th GEM 4.73

The final overall set of results average mean between 24 and 144 hours (1 to 6 days) for Air temperature, wind speed and air pressure,

zhSweC6.png

1st Met Office 2.30
2nd ICON 2.31 (up to 120 hours 5 days)
3rd Weather Underground 2.44
4th BBC Weather 2.45
5th DarkSky 2.49
6th ECM 2.72
7th GFS 2.82
8th GEM 3.01

Notes: The overall winner from this experiment is the UK Met Office it performed as the most consistent weather model with the least number of mistakes between temperature, wind speed and air pressure between 1 and 6 days over the period of 1 month. As to why the ECM, GFS and GEM are behind? This is because it seems the likes of the BBC Weather, DarkSky and the Weather Underground use more than weather model I explain a bit more in the summaries below and I believe this gives them a slight edge over some of the weather models.

My thoughts on each weather model and website/app,

GFS: Its one of the worst overall when it came to air temperatures it always had a slight bias towards warmer temperatures and the few cold nights that occurred during December the GFS failed miserably at predicting them even at just 12 hours before the GFS was up to 4°C on a few occasions off the actual recorded temperature. The only strong point that came from it was predicting wind speeds better than any other weather model at 24 hours ahead. The GFS was also one of the worst overall at air pressure this has been known for a number of years now with most low-pressure systems it likes to overdo them which was evident several times during this experiment. In the end it finishes in 7th place just ahead of the GEM model it may come to no real surprise to some people how badly the GFS performed overall and I can only hope that the new upgrade brings it better results.

ECM: It was a little bit of a disappointment for me it's still my own personal favourite though. It was one of the best at forecasting temperatures and performed remarkably well with the daytime temperatures but appeared to really struggle with nighttime temperatures. It came 2nd last with the mean wind speeds but actually did a decent job when it came to predicting the wind gusts. The ECM showed it was one of the best at air pressure between day 1 and 4 but fell back a lot for day 5 and 6. Overall it finishes 6th out of 8th that will mainly be because of the wind speeds letting it down it did, however, it did do better than the GFS and GEM. I know the ECM vs GFS happens a lot at the moment the ECM is the overall better one but the gap between how well they perform is actually pretty small which would maybe explain why sometimes one of them does a better job than the other but then this is just all based off this one experiment so it's important not to read too much into it.

UK Met Office: It did great with air temperatures both day and night. It came 2nd overall for the wind speeds and 3rd place for the air pressure and it was the overall winner.

ICON: It was the best and by some margin for predicting air temperatures and also being the best for day & night time. It wasn't just the air temperatures it was the best at up to 120 hours with the air pressure as well. Coming 2nd overall I think may be a surprise to some people I really didn't expect it to do so well and does seem to be a weather model worth checking.

GEM: One of the worst for temperatures although it did an okay job at night time temperatures. For wind speeds, it tended to overestimate them and was actually one of the worst overall for it. It wasn't the worst nor the best for air pressure and by the looks of it, the only thing GEM was a bit decent at.

FMI: One of the worst performing weather models of the whole experiment it failed to do well in anything I can't really say anything positive about it.

ARPEGE: It didn't do too well with the air temperatures or mean wind speeds but actually was one of the best with the air pressure especially up to 2 days ahead and it was the overall best at predicting the wind gusts.

HIRLAM: Very similar to the FMI weather model it was just very inconsistent with everything.

BBC Weather: I'm not sure what weather model they use it's very likely to be the ECM but the results from the experiment differ a bit the ECM was better at the temperatures and air pressure while the BBC Weather was the overall best with the mean wind speeds while the ECM was one of the worst which makes me think the BBC Weather use not only the ECM but also perhaps some other weather models as MeteoGroup say on their website "With over 100 meteorologists, a department dedicated to research and development and investment in the five weather models recognised as being the most accurate in the world: ECMWF, UKMO, GFS, HIRLAM and WRF, we are able to provide the most precise forecasts in the market." Might suggest they use more than just the ECM. In the end, I think the BBC Weather may have surprised some people coming 4th out of 8th overall.

Weather Underground: With the weather models now out of the way it's time to start with the weather websites and apps something I added into this experiment out of interest to see how good they would be compared to the weather models I really didn't have much hope for them but they certainly did much better than I expected with the Weather Underground coming 3rd overall is pretty amazing. A bit more about how their forecasts work "Our ever-expanding network of 250,000+ personal weather stations is the largest of its kind and provides us with a unique ability to provide the most local forecasts based on actual weather data points. BestForecast™ uses the most innovative forecast models available and cross-verifies their output with all of the localized data points. Only our unrivaled amount of local neighborhood weather data can generate forecasts for your front door." Also uses different weather models "A variety of inputs, including, but not limited to, ECMWF, GFS, and NAM"

DarkSky: Just like the Weather Underground it performed much better than I expected coming in 5th place. It done a great job at predicting the nighttime temperatures and was one of the best when it came to air pressure especially at day 5 and 6 where it was the overall best but I couldn't understand how a website/app could so so well so I did a bit of research and found this in an article from a few years ago it seems DarkSky uses a few weather models like the GFS, GEM and ICON, “We first correct for geographically related bias in each model individually (e.g., GFS temperature seems to be annoyingly low in some places, higher in others). And then we monitor the accuracy of each model, geographically, to calculate a standard error which we use to weight each model on-the-fly whenever a request comes in for a forecast."

The Weather Channel: It did a decent enough job in its short range 2-day forecasts with temperatures and wind speeds it wasn't the best or the worst and did actually do better than some of the weather models. However, The Weather Channel uses the API data from the Weather Underground so they basically say the same thing it's just that The Weather Channel runs behind and updates later so your best just visiting Weather Underground itself to get the very latest forecast.

Accuweather: Seems good enough for temperatures doing a fairly decent job but was bad for underpredicting the wind speeds.

How I think the experiment went,

I think it went fairly well all of the weather models were presented with a difficult challenge of forecasting the weather for just one very localised location. It threw up a few surprises with some of the weather websites/apps doing a better job than some of the weather models although it looks like we now know why that is. As I said at the start it's important not to take these results too seriously as they are just based on one location over a time period of a month and just looked at temperatures, wind speeds and air pressure. Once the new GFS model is out a new experiment will take place which will include more locations across the UK and I will try to add more weather models and websites/apps to it and I will try to include more things, for example, rainfall and cloud cover. I hope you enjoyed reading this rather long post but found it as enjoyable and fun as I did doing this thank you for reading.

Share this post


Link to post
Share on other sites

A really interesting and informative tranche of research.  Thank you.  Look forward to tranche 2.

Share this post


Link to post
Share on other sites

As a data analyst myself this is fascinating, thanks for all the hard work!I

I'mactually starting a project at work using the Dark Sky API to get forecast data for multiple sites, good to see its verification stats are more than reasonable.

I look forward to seeing what else you come up with.

Share this post


Link to post
Share on other sites
On 16/01/2019 at 17:53, snefnug said:

A really interesting and informative tranche of research.  Thank you.  Look forward to tranche 2.

 

23 hours ago, fujita5 said:

As a data analyst myself this is fascinating, thanks for all the hard work!I

I'mactually starting a project at work using the Dark Sky API to get forecast data for multiple sites, good to see its verification stats are more than reasonable.

I look forward to seeing what else you come up with.

Thank's I'm glad you both found it interesting. It's good to have a better idea on how well each weather model does now and what ones to look at and ignore. I think people may need to pay more attention to the forecasts from Weather Underground and DarkSky because they both done really well.

Since DarkSky done well with Air Pressure and especially at 120 to 144 hours better than any other weather model you can view it's pressure charts although they aren't exactly clear without any isobars appearing on them so I've drawn on them to make them seem a bit clearer,

96 Hours,

96.thumb.png.ea5f9a9e530c45b77a9f3d562e2b20ed.png

120 Hours,

120.thumb.png.8384c249db9e3bd2f8b749d045427fba.png

144 Hours,

144.thumb.png.63710fdd7e8e9c87e499d8a201b8d465.png

Looks like it's a bit of a flat pattern it's going for.

Out of interest, I decided to see what the chances are for the upcoming cold spell I included Dark Sky, Weather Underground, BBC Weather, ICON and both the ECM GFS along with their ensemble mean. Location is London the chart shows the average daily temperature there is a pretty big disagreement between them for next week as the big red line shows the overall average temperature does look to drop between now and the 23rd of January after then temperatures rise slightly as we approach the end of January before they start to fall again,

mean.thumb.png.3df3f47bcb84cb263617db2ed210ec72.png

Edited by Zenarcher

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this  

×
×
  • Create New...