3. Hands On

3.2. Download Data

If you haven't already, download the workshop data from the main page.

The data we have to work with today is:

  • Lake Monsters - LakeMonsters.gpkg - locations of lake monsters; global distribution
  • Lakes - Lakes_GreatLakes-Area.gpkg - a clip of the Natural Earth Data lakes dataset for the Great Lakes and areas adjacent
  • States - US_CAN_Admin1.gpkg - a clip of the Natural Earth Data states and provinces data for the US and Canada (I'm going to refer to these as "states" for simplicity, but I want to acknowledge that this includes Canadian provinces as well.)

Geopackage (.gpkg) is a single file, open vector format. We're using it today because it's one file per dataset (unlike Shapefile), which makes data management so much easier. See the README.txt file that comes with the data download for more details and sources of the data.

Data Processing

We'll be working with an international dataset of locations of lake monsters, the most famous of which is arguably Nessie who supposedly lives in Loch Ness in Scotland. This dataset was assembled from Wikipedia's List of Lake Monsters. The lake names were geocoded (you can find the R script that I wrote to process the data in the r_scripts folder of this repo), corrected, then exported to a geopackage file. Why did I process this data for you? It took a few hours to do and requires skills we are not focusing on in this workshop.