- Why ggplot2
- Component of Grammar of Graphic
- Basic Structure
- Construct Plot Layer by Layer
- Scales, Position, Facets and Themes
- An Example
2/8/2018
Things you cannot do:
Data that you want to visualise and a set of aesthetic mappings describing how variables in the data are mapped to aesthetic attributes.
Layers made up of geometric elements and statistical transformation. Geometric objects, geoms for short, such as points, lines, polygons, etc. Statistical transformations, stats for short, summarise data in many useful ways, such as, histogram and summarising a 2d relationship with a linear model.
The scales map values in the data space to values in an aesthetic space, whether it be colour, or size, or shape.
A coordinate system, coord for short, describes how data coordinates are mapped to the plane of the graphic.
A facet describes how to break up the data into subsets and how to display those subsets as small multiples.
A theme which controls the finer points of display, like the font size and background colour.
43234 observations and 16 variables
It consists of features of each Airbnb room in NYC such as price, review per month, name of neighbourhood, name of borough, latitude and longitude
Simple cleansing
ggplot(data, aes(x = ,y = )) + layers + additional elements
ggplot(Airbnb, aes(x = reviews_per_month, y = price))
Display the data or the statistical summaries of the data
Mainly use geom_xxx() function
An alternative way is stat_xxx() function
A plot must have at least one geom or stat function. there is no upper limit. You can add a layer to a plot using the + operator
All kinds of geom function and stat function: ggplot2 cheat sheet
ggplot(Airbnb, aes(price, reviews_per_month)) + geom_point(size = 0.1) + facet_grid(~room_type)
Describe how variables are mapped to visual properties
aes()
Specifying the Aesthetics in the Plot(ggplot()) or in the Layers(geom_xxx() or stat_xxx())
Aesthetic Mapping can consists of Position (i.e., on the x and y axes), color (“outside” color), fill (“inside” color), shape (of points), linetype and size, etc.
ggplot(Airbnb, aes(x = price)) + geom_histogram(bins = 40, aes(color = room_type), fill = "grey")
ggplot(Airbnb, aes(x = price)) + geom_histogram(bins = 40, aes(fill = room_type), color = "grey")
ggplot(Airbnb) + geom_violin(aes(neighbourhood_group, price), colour = 'blue')
ggplot(Airbnb) + geom_violin(aes(neighbourhood_group, price), colour = 'blue') + geom_boxplot(aes(neighbourhood_group, price), width = 0.16, outlier.size = 0, notch = TRUE)
Controls the mapping from data to aesthetic
Include position, color and fill, size, shape and line type
Scale | Example |
---|---|
scale_color_ | scale_color_gradient |
scale_fill_ | scale_fill_discrete |
scale_size_ | scale_size_manual |
scale_shape_ | scale_shape_discrete |
scale_linetype_ | scale_linetype_manual |
scale_x_ | scale_x_continuous |
scale_y_ | scale_y_reverse |
w1 <- ggplot(Airbnb) + geom_bar(aes(x = neighbourhood_group, fill = room_type)) w1
w1 + scale_x_discrete(name = 'Borough') + scale_y_continuous(name = 'Number of rooms', breaks = seq(0,20000,4000)) + scale_fill_grey(name = 'Room type')
Control appearance of non-data elements
Titles, tick marks and labels
Legends appearance
Overall look
theme()
Facetting generates small multiples each showing a different subset of the data.
facet_grid() and facet_wrap()
facet_grid() is fundamentally 2d, being made up of two independent components.
facet_wrap() is 1d, but wrapped into 2d to save space
ggplot(Airbnb, aes(price)) + geom_histogram(bins = 40) + facet_wrap(~neighbourhood_group)
ggplot(Airbnb, aes(price)) + geom_histogram(bins = 40) + facet_grid(room_type ~ neighbourhood_group)
How to construct this plot?
Airbnb_sub1 <- aggregate(Airbnb[ ,c(10,14)], list(Airbnb$neighbourhood, Airbnb$neighbourhood_group, Airbnb$room_type), mean) neighbourhood_count <- data.frame(table(Airbnb$neighbourhood, Airbnb$neighbourhood_group, Airbnb$room_type)) neighbourhood_count <- neighbourhood_count[which(neighbourhood_count$Freq != 0), ] Airbnb_sub2 <- merge(Airbnb_sub1, neighbourhood_count, by.x =c( 'Group.1', 'Group.2', 'Group.3'), by.y = c('Var1', 'Var2', 'Var3'), all.x = TRUE) names(Airbnb_sub2) <- c('neighbourhood', 'neighbourhood_group', 'room_type', 'average_price', 'average_rpm', 'count') good_neighbourhood <- subset(Airbnb_sub2, Airbnb_sub2$average_rpm >= 3) good_neighbourhood <- subset(good_neighbourhood, good_neighbourhood$average_price <=150)
p <- ggplot(Airbnb_sub2, aes(average_price, average_rpm)) + geom_point(aes(color = neighbourhood_group, size = count)) + facet_wrap(~room_type) p
Change the point shape to open circle
Add horizontal and vertical line
Add name of those 'good neighbourhoods'
Title, label axes, legend
Change theme
p1 <- ggplot(Airbnb_sub2, aes(average_price, average_rpm)) + geom_point(aes(color = neighbourhood_group, size = count), shape = 1, stroke = 1) + facet_wrap(~room_type) p1
p2<- p1 + geom_vline(aes(xintercept = 150), color = 'grey') + geom_hline(aes(yintercept = 3), color = 'grey') p2
p3 <- p2 + geom_text(aes(label = neighbourhood), data = good_neighbourhood, size = 2.5) p3
library(ggrepel) p3 <- p2 + geom_text_repel(aes(label = neighbourhood), data = good_neighbourhood, size = 2.5) p3
p4 <- p3 + theme_minimal() p4
p5 <- p4 + scale_x_continuous(name = 'Average Price', breaks = seq(0, 400, 100)) + scale_y_continuous(name = 'Average Reviews Per Month', breaks = c(1:8)) + scale_color_discrete(name = '') + scale_size_continuous(name = '') + labs(title = 'Neibourhood in NYC Airbnb') + theme(legend.position = 'top', legend.text = element_text(size = 8, color = 'gray10'), plot.title = element_text(size = 15, face = 'bold', hjust = 0.5))
p5
Wickham, H. (2016). ggplot2: elegant graphics for data analysis. Springer.