Project

beer market

Published

May 16, 2024

library(tidyverse)
library(dplyr)
library(skimr)

library(ggplot2)
library(dplyr)


url <- "https://bcdanl.github.io/data/beer_markets_all.csv"
beer_markets <- read.csv(url)


beer_markets2 <- beer_markets |>
  group_by(state, brand) |>
  summarise(n = n()) |>
  slice_max(n, n=10)

skim(beer_markets)

Data summary
Name	beer_markets
Number of rows	73115
Number of columns	25
_______________________
Column type frequency:
character	14
logical	6
numeric	5
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
X_purchase_desc	1	12	29	115
brand	1	9	13	5
container	1	3	30	7
market	1	5	20	92
state	1	4	20	49
buyertype	1	4	7	3
income	1	5	8	5
age	1	3	5	4
employment	1	4	4	3
degree	1	2	7	4
cow	1	4	25	4
race	1	5	8	5
tvcable	1	4	7	3
npeople	1	1	5	5

Variable type: logical

skim_variable	complete_rate	mean	count
promo	1	0.20	FAL: 58563, TRU: 14552
childrenUnder6	1	0.07	FAL: 68109, TRU: 5006
children6to17	1	0.20	FAL: 58155, TRU: 14960
microwave	1	0.99	TRU: 72676, FAL: 439
dishwasher	1	0.73	TRU: 53258, FAL: 19857
singlefamilyhome	1	0.81	TRU: 59058, FAL: 14057

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
hh	1	17407721.61	11582147.34	2000235.00	8223438.00	8413624.00	30171315.00	30440718.00	▂▇▁▁▇
quantity	1	1.32	1.15	1.00	1.00	1.00	1.00	48.00	▇▁▁▁▁
dollar_spent	1	13.78	8.72	0.51	8.97	12.99	16.38	159.13	▇▁▁▁▁
beer_floz	1	265.93	199.52	12.00	144.00	216.00	360.00	9216.00	▇▁▁▁▁
price_per_floz	1	0.06	0.01	0.00	0.05	0.06	0.06	0.23	▃▇▁▁▁

beer_markets2 <- beer_markets |>
  group_by(income, dollar_spent) |>
  summarise(n = n())

average_spent_by_income <- beer_markets %>%
  group_by(income) %>%
  summarize(average_dollar_spent = mean(dollar_spent, na.rm = TRUE)) |> 
  mutate(income = factor(income,
                         levels = c("under20k", 
                                    "20-60k",
                                    "60-100k",
                                    "100-200k",
                                    "200k+")))



ggplot(average_spent_by_income, aes(x = income, y = average_dollar_spent ))+
  geom_point()

This graph is very interesting. It shows us how much each income bracket spends on average when purchasing beer. Unsurprisingly, the wealthier groups tend to spend more on average. What did surprise me though, is the fact that the lower income groups only spend around $2 less per purchase. It goes to show the difference in priorities between wealthy and poor people. Furthermore, this graph shows that alcohol consumption is something all income levels find important considering that they each spend similar amounts of money when buying alcohol.

beer_markets2 <- beer_markets |> 
  count(state, income) |> 
  group_by(state) |> 
  mutate(total = sum(n)) |> 
  ungroup() |> 
  filter(dense_rank(-total)<=10) |> 
  mutate(state = fct_reorder(state, total))

ggplot(beer_markets2)+
  geom_bar(aes(y = state, fill = income, x = n),
           stat = "identity", position = "fill")

This graph shows the proportion of alcohol bought by each income group by state. In almost every state, the 20-60k group is spending the most on beer while the 200k+ group buys the least in every state observed. This is surprising because most people would expect that the people with the most money would buy the most beer. I think the reason for this is because people who are addicted to alcohol tend to have less money. The people who can control their usage are much more responsible with not only their drinking, but also their finances.