Exploratory data analysis with R

Author: ucfagls

Created: 2024-05-17 05:10am

Edited: 2024-06-10 01:57am

Keywords: blended learning, statistics, data science, flipped classroom, R, exploratory data analysis, data visualisation, data wrangling

Description:

This design describes a hands-on, R-based, activity where students will put into practice data wrangling and plotting skills to conduct an exploratory data analysis (EDA) of the Palmer penguins data set in R. To prepare for the course students will watch short videos on exploratory data analysis and the relevant R commands as well as read a chapter from the R for Data Science 2nd edition book (Wickham et al 2023). Students will complete a short multiple-choice quiz on this material prior to the in-class activity.

The in-class activity will begin by going through the quiz providing feedback and addressing any common misunderstandings evident from responses to the questions. Following this, students will work individually while grouped into pairs or small groups; this is to allow each student to work at their own pace but have peers available to answer questions or help with problems students encounter. Students will work through a tutorial built using RMarkdown and the learnr package. As well as running R code blocks and modifying code blocks to achieve a stated aim, students will answer questions embedded in the tutorial that are based on the output from the R code.

The learnr tutorial will be run inside Posit Cloud, a cloud-based R Studio instance that provides a known computing environment.

The tutorial will guide the students through an exploratory data analysis of the Palmer Penguins data set using data wrangling, computation of summary statistics, and creation of plots with ggplot. The main new concepts introduced in this activity are measures of central tendency and spread, and appropriate plots for showing data and their distribution. This activity builds upon earlier topics where data wrangling and data visualisation are covered.

Upon completion of the tutorial, students will submit their answers to the tutorial questions in the form of a hash that they paste into a MS Forms or Google Forms form which allows the teachers to collect responses.

Intended Learning Outcomes:

  • Understand how to use the summarise verb to calculate summary statistics from a data set
  • Modify R code templates to display, explore, and summarise a data set
  • Use basic ggplot commands to explore and summarise a data set graphically
  • Apply data wrangling skills and data visualisation techniques and theory to explore a data set
Resources Tasks Supports

Pre-class activities

Videos on Brightspace

Watch videos on exploratory data analysis

Online discussion forum for topic

Link to online version of chapter 10

Read chapter 10 in R 4 Data Science

Online discussion forum for topic

Cheat sheets on data wrangling and plotting

Prepare for in-class activity
(Review cheat sheets of key concepts and R functions)

None

Quiz on Brightspace

Complete short quiz

Feedback

In-class activities

Slides addressing any misunderstandings

Respond to quiz

(address common misunderstandings)

Teacher feedback

Posit Cloud project

Complete R Tutorial

Automated feedback and guidance programmed into exercise

Teacher (x2)

Peers

Link to MS Forms to collect hashes

Instructions on Brightspace

Submit answers from tutorial

Teacher

Instructions in tutorial and on Brightspace

Additional information

This design describes an in-class activity in the Animals and Data (Dyr of Data) course on an Animal Science bachelors degree. The course is a first year statistics and data science course, designed to equip students with the necessary practical skills to successfully complete their bachelors programme. The course follows the blended learning paradigm with elements of the flipped classroom, where students will prepare for classes by watching short videos on the topic, and completing readings.

In-class activities will are devoted to interactions between students and teachers, individual, and group work. In-class activities will all be hands-on, practical activities using the statistical software R within the Posit Cloud environment. The first 8 weeks of the course will follow a design similar to one above where in-class activities are informed by a short quiz based on the pre-class preparatory activities, plus a guided R-based tutorial and other hands-on activities. In the second half of the course, students will spend the majority of their in-class time working in groups on their data science portfolio. Out-of-class activities will be in the form of lecture videos and readings to support the in-class activities and the specific portfolio element being worked on that week.