aes(): aesthetic mappings (x, y, color, size, shape)
geom_*(): geometric objects (points, bars, lines)
facet_*(): faceting for small multiples
theme_*(): appearance
ggplot(data, aes(x = var1, y = var2)) +geom_point()
Histogram
ggplot(nhData, aes(x = BMXBMI)) +geom_histogram(bins =40, fill ="steelblue", color ="white") +labs(x ="BMI (kg/m²)", y ="Count", title ="Distribution of BMI") +theme_minimal()
Histogram with Facets
ggplot(nhData, aes(x = BMXBMI, fill = gender)) +geom_histogram(bins =30, alpha =0.7) +facet_wrap(~gender) +labs(x ="BMI (kg/m²)", y ="Count") +theme_minimal()
Boxplot
ggplot(nhData, aes(x = gender, y = BMXBMI, color = gender)) +geom_boxplot() +labs(x ="Sex", y ="BMI (kg/m²)", title ="BMI by Sex") +theme_minimal()
Boxplot with Facets by Ethnicity
ggplot(nhData, aes(x = gender, y = BMXBMI, color = gender)) +geom_boxplot() +facet_wrap(~ethnicity) +labs(x ="Sex", y ="BMI (kg/m²)") +theme_minimal() +theme(legend.position ="none")
Scatterplot
ggplot(nhData, aes(x = RIDAGEYR, y = BMXBMI, color = gender)) +geom_point(alpha =0.2, size =0.8) +labs(x ="Age (years)", y ="BMI (kg/m²)", title ="BMI vs. Age") +theme_minimal()
Scatterplot with Facets
ggplot(nhData, aes(x = RIDAGEYR, y = BMXBMI, color = gender)) +geom_point(alpha =0.2, size =0.5) +facet_wrap(~gender) +geom_smooth(method ="lm", se =FALSE, color ="black") +labs(x ="Age (years)", y ="BMI (kg/m²)") +theme_minimal()
Linear Regression in R
Model the relationship: \(y = \alpha + \sum_{i=1}^{M} \beta_i x_i\)
fit <-lm(LBXGLU ~ BMXBMI + RIDAGEYR + male + black + mexican + other_hispanic + other_eth,data = nhData)
Three levels of output:
Model level: \(R^2\), residual standard error
Term level: coefficient estimates, p-values
Observation level: predictions, residuals
Model Summary
summary(fit)
Call:
lm(formula = LBXGLU ~ BMXBMI + RIDAGEYR + male + black + mexican +
other_hispanic + other_eth, data = nhData)
Residuals:
Min 1Q Median 3Q Max
-77.67 -11.76 -4.21 3.49 592.88
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 64.46384 1.80388 35.736 < 2e-16 ***
BMXBMI 0.57673 0.06243 9.237 < 2e-16 ***
RIDAGEYR 0.39515 0.01863 21.212 < 2e-16 ***
male 5.62365 0.76229 7.377 1.82e-13 ***
black 1.98181 1.02949 1.925 0.05427 .
mexican 5.83894 0.94913 6.152 8.12e-10 ***
other_hispanic 5.56316 1.84842 3.010 0.00263 **
other_eth 5.29182 2.15850 2.452 0.01425 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 30.25 on 6324 degrees of freedom
(14672 observations deleted due to missingness)
Multiple R-squared: 0.1068, Adjusted R-squared: 0.1058
F-statistic: 108 on 7 and 6324 DF, p-value: < 2.2e-16
Interpreting the Coefficients
BMXBMI: change in glucose per 1 kg/m² increase in BMI
RIDAGEYR: change in glucose per 1 year increase in age
male: difference in glucose for males vs. females (reference)
black, mexican, etc.: difference vs. white (reference group)
Reference categories: female for sex, white for race/ethnicity.