Exploratory data analysis with R
Description:
This design describes a hands-on, R-based, activity where students will put into practice data wrangling and plotting skills to conduct an exploratory data analysis (EDA) of the Palmer penguins data set in R. To prepare for the course students will watch short videos on exploratory data analysis and the relevant R commands as well as read a chapter from the R for Data Science 2nd edition book (Wickham et al 2023). Students will complete a short multiple-choice quiz on this material prior to the in-class activity.
The in-class activity will begin by going through the quiz providing feedback and addressing any common misunderstandings evident from responses to the questions. Following this, students will work individually while grouped into pairs or small groups; this is to allow each student to work at their own pace but have peers available to answer questions or help with problems students encounter. Students will work through a tutorial built using RMarkdown and the learnr package. As well as running R code blocks and modifying code blocks to achieve a stated aim, students will answer questions embedded in the tutorial that are based on the output from the R code.
The learnr tutorial will be run inside Posit Cloud, a cloud-based R Studio instance that provides a known computing environment.
The tutorial will guide the students through an exploratory data analysis of the Palmer Penguins data set using data wrangling, computation of summary statistics, and creation of plots with ggplot. The main new concepts introduced in this activity are measures of central tendency and spread, and appropriate plots for showing data and their distribution. This activity builds upon earlier topics where data wrangling and data visualisation are covered.
Upon completion of the tutorial, students will submit their answers to the tutorial questions in the form of a hash that they paste into a MS Forms or Google Forms form which allows the teachers to collect responses.
Intended Learning Outcomes:
- Understand how to use the summarise verb to calculate summary statistics from a data set
- Modify R code templates to display, explore, and summarise a data set
- Use basic ggplot commands to explore and summarise a data set graphically
- Apply data wrangling skills and data visualisation techniques and theory to explore a data set
Resources | Tasks | Supports | |||
---|---|---|---|---|---|
Pre-class activities |
|||||
Videos on Brightspace |
→ |
Watch videos on exploratory data analysis ↓ |
← |
Online discussion forum for topic |
|
Link to online version of chapter 10 |
→ |
Read chapter 10 in R 4 Data Science ↓ |
← |
Online discussion forum for topic |
|
Cheat sheets on data wrangling and plotting |
→ |
Prepare for in-class activity |
← |
None |
|
Quiz on Brightspace |
→ |
Complete short quiz ↓ |
← |
Feedback |
|
In-class activities |
|||||
Slides addressing any misunderstandings |
→ |
Respond to quiz (address common misunderstandings) ↓ |
← |
Teacher feedback |
|
Posit Cloud project |
→ |
Complete R Tutorial ↓ |
← |
Automated feedback and guidance programmed into exercise Teacher (x2) Peers |
|
Link to MS Forms to collect hashes Instructions on Brightspace |
→ |
Submit answers from tutorial |
← |
Teacher Instructions in tutorial and on Brightspace |
|
Additional information
This design describes an in-class activity in the Animals and Data (Dyr of Data) course on an Animal Science bachelors degree. The course is a first year statistics and data science course, designed to equip students with the necessary practical skills to successfully complete their bachelors programme. The course follows the blended learning paradigm with elements of the flipped classroom, where students will prepare for classes by watching short videos on the topic, and completing readings.
In-class activities will are devoted to interactions between students and teachers, individual, and group work. In-class activities will all be hands-on, practical activities using the statistical software R within the Posit Cloud environment. The first 8 weeks of the course will follow a design similar to one above where in-class activities are informed by a short quiz based on the pre-class preparatory activities, plus a guided R-based tutorial and other hands-on activities. In the second half of the course, students will spend the majority of their in-class time working in groups on their data science portfolio. Out-of-class activities will be in the form of lecture videos and readings to support the in-class activities and the specific portfolio element being worked on that week.