Exploring Endangered Species Data with Python

Exploring Endangered Species Data with Python

Amanda Devine

Data Wrangler, Global Genome Initiative

27 July 2018

Girls Who Code Summer Immersion Program in Washington DC field trip to the Smithsonian National Museum of Natural History

Slides and Jupyter notebook available at https://github.com/amdevine/gwc-endangered-species

About Me

Bio

Winston Churchill High School (go Bulldogs!)

Dartmouth College (BA in Biology (Ecology) and Neuroscience

Lab technician (dermatology, infectious disease, coral reefs)

Data wrangler for the Global Genome Initiative

Global Genome Initiative (GGI)

Smithsonian initiative

Collect all of life on Earth

Preserve in cryorepositories for genomic research

Sample data recorded in the Global Genome Biodiversity Network (GGBN) Data Portal

GGI Web Projects

GGI Data Tools website (Django; https://www.globalgeno.me)

GGI Gap Analysis app (Shiny; https://ggidata.shinyapps.io/gapanalysis)

genetic_collections (Python library; https://github.com/MikeTrizna/genetic_collections)

Data Wrangling

What is Data Wrangling?

Per Wikipedia: Transforming and mapping data from one raw data form into another format with the intent of making it [useful] for a variety of downstream purposes, such as analytics.

My favorite tools:

Jupyter Notebook

Document that contains executable Python code and Markdown-formatted text

Good for running self-contained analyses

Easily share with others

IRkernel: Run notebooks with R instead of Python

nbviewer: Converts notebooks to shareable HTML documents

RISE: Run a Jupyter notebook as a slide show

OpenRefine

Powerful tool for cleaning messy data

Complex filtering, sorting, and grouping

Mass editing records

Special language (GREL) to filter and edit data with formulas

R

Programming language developed for statistics

Powerful at data manipulation

More intuitive than Python when working with data??

RStudio: popular R development software

Shiny: R library, easily develop web apps to vizualize data

Endangered Species Data

Endangered Species Act

Administered by the U.S. Fish & Wildlife Service

Established in 1973 “to conserve and protect endangered and threatened species and their habitats”

Species are listed under the ESA in two ways:

  1. FWS scientist assessment
  2. Petition from the general public
Virginia big-eared bat
Virginia big-eared bat

What do we want to know?

What question are we trying to answer?

How have rates of listing species under the Endangered Species Act changed over time?

What summary or visualization do we want to produce at the end?

A bar graph showing the number of species listed by year

Environmental Conservation Online System (ECOS)

Database that serves reports on threatened and endangered species

Pre-generated reports available online here: https://ecos.fws.gov/ecp/species-reports

Let’s look at these data in a Jupyter notebook: https://github.com/amdevine/gwc-endangered-species/blob/master/US%20Endangered%20Species%20Data.ipynb

Thanks!

Resources: Working with Data in Python

Resources: Coding Groups and Organizations

  • Women Who Code DC. Meetup group for female-identifying coders in the Washington, DC area. Covers many different tech-related topics, frequent meetups. https://www.meetup.com/Women-Who-Code-DC/

  • Hear Me Code. Organization that offers beginner coding lessons for women in the Washington, DC area. Also has an excellent Google group that emails out about a lot of professional opportunities. https://hearmecode.com/

  • Data Carpentry. National organization that offers workshops on data wrangling. The website contains workshop materials if you can’t attend a workshop in person. https://datacarpentry.org/

Image Credits

Title Slide: Grey Crowned Cranes. Image from Pexels, CC0 License. https://www.pexels.com/photo/nature-bird-love-heart-45853/

About Me: Giant Panda. Photo by Cesar Aguilar from Pexels, Pexels License. https://www.pexels.com/photo/panda-1123765/

Bio: Personal photo.

Global Genome Initative: Tissue samples in the NMNH Biorepository. Photo by Adrian Van Allen, 2015.

Data Wrangling: Whale shark at the Georgia Aquarium. Photo by Zac Wolf; CC BY-SA 2.5, https://commons.wikimedia.org/w/index.php?curid=3511009

Data Wrangling: Wonder Woman: Wonder Woman with Lasso. Image from AllPosters. https://www.allposters.ca/-sp/Wonder-Woman-Wonder-Woman-with-Lasso-posters_i13190262_.htm

Endangered Species Data: Bufo periglenes (Golden toad). Photo by Charles H. Smith. Retrieved from Wikipedia: https://commons.wikimedia.org/wiki/File:Bufo_periglenes1.jpg

Thanks: Rafflesia arnoldii. Image from lazypenguins.com, blog post “15 strangely beautiful flowers”. https://lazypenguins.com/15-strangely-beautiful-flowers/

Any Questions: Joes Apartment Cockroach GIF. Image from GIPHY. https://giphy.com/gifs/scarface-when-mtv-was-worth-watching-joes-apartment-CbY83hpLkcrZe

Any Questions?