github twitter linkedin email
Data Exploration: U.S. Church Attendance 2001-2008

This post offers an example of data exploration in R. I will cover the following topics:

More specifically, I will be using data from the US Congregational Life Survey, to examine how American church attendance has fluctuated between 2001 - 2008. These trends are an important part of any contemporary understanding of religiosity, and especially relevant when considering America’s role as a “religious outlier” in the global landscape (Berger et al. 2008). By focusing on longitudinal trends rather than absolute levels, Voas and Chaves (2016) argue America’s decades of religious decline fit squarely within global patterns of secularization. However, measuring religiosity (and its “decline”) is a task with little consensus (Voas et al. 2002, Warner 2010), and church attendance is no exception (Marcum 1999, Chaves 2017). The US Congregational Life Survey asks head clergy in 2008 to estimate weekly attendance for the previous seven years. While not perfect, this measure gets around problems of social desirability found in individual surveys.

RQ: How has American church attendance fluctated between 2001 - 2008, and how does this differ across denominations?


# Loading packages
require(foreign)
require(tidyverse)
require(RColorBrewer)

# Custom plot theme
john_theme <- theme(text=element_text(size = 14, family="Times New Roman"),
        axis.text = element_text(size = 14),
        panel.background = element_rect(fill="white"),
        panel.grid.minor = element_line(color="grey90"),
        panel.grid.major = element_line(color="grey90"),
        plot.margin = unit(c(0.5,0.5,0.5,0.5), "cm"),
        plot.caption = element_text(size = 9))

Importing Data

From the ARDA website, I use the pre-labeled Stata (.DTA) file to save time on recoding.

data <- read.dta("http://www.thearda.com/download/download.aspx?file=U.S.%20Congregational%20Life%20Survey,%20Wave%202,%20Random%20Sample%20Congregational%20Profile%20Survey.DTA")

Preparing Data

I start by examining a tabulation of denominations, saved as DENOM1. After constraining to character, I save a list of the ten largest denominations as top10. Lastly, I filter the dataset to only include these top 10 denominations and save as data2.

denom <- arrange(count(data, DENOM1), desc(n))
denom$DENOM1 <- as.character(denom$DENOM1)
top10 <- denom$DENOM1[1:10]
data2 <- filter(data, DENOM1 %in% top10)

Now let’s look at attendance trends. Head clergy for each congregation were asked the following question: “For each of the last seven years, what is your best estimate of average weekly attendance at worship services for this congregation? If you have more than one worship service, record the average attendance for all services combined.” The answers for each year 2001 - 2008 are stored as separate variables, but we’ll use the gather function to transform the eight columns into two columns. year will now contain “ATTEND01”, “ATTEND02”, etc. and attend will contain the attendance estimates for each year.

data3 <- gather(data2, "year", "attend", ATTEND08:ATTEND01)

To recode our yearly attendance variable, I create a new variable date and replace “ATTEND” with “20”. This will turn “ATTEND08” into “2008”, for example. These regular expressions are invaluable data wrangling tools. Check out this great resource for more info on regular expressions in R. After deleting the year variable, I convert some data classes and print out a tabulation of our denominations. (Keep in mind the n here represents the number of congregation-years, not the number of congregations.)

data3$year <- as.character(data3$year)
data3$date <- gsub("ATTEND", 20, data3$year)
data3 <- data3 %>% 
  select(-year) %>% 
  mutate(date = as.numeric(date),
         DENOM1 = as.character(DENOM1))
count(data3, DENOM1)
## # A tibble: 10 x 2
##    DENOM1                                     n
##    <chr>                                  <int>
##  1 American Baptist Churches                 40
##  2 Episcopal Church                          88
##  3 Evangelical Lutheran Church in America   344
##  4 Lutheran Church, Missouri Synod           64
##  5 Presbyterian Church USA                  232
##  6 Roman Catholic Church                    400
##  7 Southern Baptist                          48
##  8 Unitarian Universalist Association        40
##  9 United Church of Christ                  136
## 10 United Methodist Church                  240

Some recoding to help abbreviate denomination names. I use the startsWith function to avoid typing the full names.

data3$DENOM1[startsWith(data3$DENOM1, "Evangelical")] <- "ELCA"
data3$DENOM1[startsWith(data3$DENOM1, "Lutheran")] <- "Missouri Synod"
data3$DENOM1[startsWith(data3$DENOM1, "Unitarian")] <- "Unit. Universalist"
data3$DENOM1[startsWith(data3$DENOM1, "United Meth")] <- "United Methodist"
data3$DENOM1[startsWith(data3$DENOM1, "United Church of Ch")] <- "UC Christ"
data3$DENOM1[startsWith(data3$DENOM1, "Pres")] <- "Presbyterian"
data3$DENOM1[startsWith(data3$DENOM1, "Epis")] <- "Episcopal"
data3$DENOM1[startsWith(data3$DENOM1, "Am")] <- "Am. Baptist"
data3$DENOM1[startsWith(data3$DENOM1, "Roman")] <- "Roman Catholic"

Plotting Data v.1

With our cleaned variables, we can now plot date X attend with separately colored lines for each level of DENOM1. Importantly, to tell R which points to connect with the line, make sure to specify that you want to group by individual congregations (group = congrega). Otherwise, R will draw a line connecting each individual ELCA congregation (for example) before moving to the next year and doing the same. For a subtle effect, I’ve made large congregation estimates more opaque than smaller ones using the alpha = attend command.

ggplot(data3, aes(x = date, y = attend, color = DENOM1)) +
  geom_point(aes(alpha = attend)) +
  geom_line(aes(group = congrega, alpha = attend)) +
  scale_alpha_continuous(guide = F) +
  labs(x = "year", y = "Avg Wkly Attend") + john_theme

This graph is not helpful. The Catholic congregations are obscuring other denomination trends. Instead, I plot each denomination in its own pane using facet_wrap(). I also remove the color legend using scale_color...(guide=FALSE), and add some annotations / theme elements. With many categories, these graphs are hard to size correctly. I’ve given a size that works for me and attached the plot here as a raw image file.

ggplot(data3, aes(x = date, y = attend, color = DENOM1)) +
  geom_point() +
  geom_line(aes(group = congrega)) +
  geom_smooth(color = "black", se = F) +
  scale_color_discrete(guide = F) +
  facet_wrap(~DENOM1, scales = "free", ncol = 5) +
  labs(x = "Year", y = "Avg Wkly Attend", title = "US Church Attendance", 
       caption = "________________________________________________\nSource: US Congregational Life Survey\nTop-10 largest denominations: 195/251 congregations\nJohn A. Bernau 2018 // www.johnabernau.com") + john_theme

# ggsave("~/Desktop/uscong.jpg", width = 15, height = 8.5, unit = "in")

(Click to enlarge)

While a hasty observation would conclude that American Baptists have the steepest negative slope, a closer examination would note this trend is only based on four congregations as opposed to the hundreds of ELCA or Catholic congregations. Also notice that the Y-axis is different for each plot. The largest Roman Catholic congregation reports a weekly attendance of 10,000 members(!) while the largest Unitarian Universalist maxes out at 160. A graphically subtle, but substantively important detail!

Standardizing Variables

To fix this, let’s standardize each congregation at its starting level of attendance in 2001. We’re not interested in absolute totals per say, but rather the change in those totals over time. Before starting this analysis, I use select to get rid of extra variables and rename our attendance measure.

new <- data3 %>% 
  select(DENOM1, congrega, congmovd, numbsrvc, capacity,  date, attend) %>% 
  rename(total_attend = attend)

After taking out rows with no attendance information, I group by congregation and identify the smallest date value for each and save as first_date. This date will serve as the baseline attendance for each congregation. The filter command extracts these baseline rows, the select command prunes our variables to only three of interest (congregation, the first date in the survey, and the attendance for this time point), and the rename command gives these measures interpretable names.

new2 <- new %>% 
  filter(!is.na(total_attend)) %>% 
  group_by(congrega) %>% 
  mutate(first_date = min(date)) %>% 
  filter(date == first_date) %>% 
  select(congrega, date, total_attend) %>% 
  rename(founded = date, base = total_attend)

head(new2)
## # A tibble: 6 x 3
## # Groups:   congrega [6]
##   congrega founded  base
##   <chr>      <dbl> <int>
## 1 R2031       2008   750
## 2 R2307       2008  1900
## 3 R2099       2007  1100
## 4 R1061       2006   350
## 5 R2107       2006   732
## 6 R2108       2006   212

I then use inner_join to combine this dataframe to the original. For every row with matching congregation values, it will tack on our two new columns founded and base. With this information, we can calculate a new variable delta as the difference between a given year’s attendance and the congregation’s baseline attendance.

new3 <- inner_join(new, new2) %>% 
  filter(!is.na(total_attend)) %>% 
  mutate(delta = total_attend - base)

As a side note, 90% of congregations in this survey report 2001 as their baseline, but 19 congregations started later: (count(new3, congrega, founded) %>% filter(founded != 2001)). By filtering the last survey-year and totaling the attendance changes, I also report the net-change in total attendance:

Across all 195 congregations considered here, attendance has fallen by 2,033 members between 2001 - 2008.

filter(new3, date == 2008) %>% summarize(sum(delta))
##   sum(delta)
## 1      -2033

Plotting Data v.2

This plot presents this information in a powerful way. Each line represents a congregation’s change in attendance from its starting baseline.

ggplot(new3, aes(date, delta)) +
  geom_line(aes(group = congrega), size = 1, alpha = 0.4) +
  annotate("text", x = 2003, y = 1500, size = 4, 
           label = "Net change -2,033") +
  labs(title = "US Church Attendance:", 
       subtitle = "Congregation deviations from baseline",
       x = NULL, y = NULL,
       caption = "________________________________________________\nSource: US Congregational Life Survey\nTop-10 largest denominations: 195/251 congregations\nJohn A. Bernau 2018 // www.johnabernau.com") + john_theme

While many congregations report sustained decline, a few report enormous growth: +2,000 attendees! We can make this more informative by coloring according to denomination.

ggplot(new3, aes(date, delta, color = DENOM1)) +
  geom_line(aes(group = congrega), size = 1, alpha = 0.4) +
  annotate("text", x = 2003, y = 1500, size = 4, 
           label = "Net change -2,033") +
  labs(title = "US Church Attendance:", 
       subtitle = "Congregation deviations from baseline",
       x = NULL, y = NULL,
       caption = "________________________________________________\nSource: US Congregational Life Survey\nTop-10 largest denominations: 195/251 congregations\nJohn A. Bernau 2018 // www.johnabernau.com") + john_theme +
    guides(colour = guide_legend(override.aes = list(alpha = 1, size=2))) +
    scale_color_discrete(name = "Denomination")

The Roman Catholics simply have a lot of variability. Four churches report more than 500 new attendees and about six churches report declines of at least 500. What about the rest? Using filter, we can calculate the non-Catholic net change in attendance:

These 151 non-Catholic congregations lost 2,656 attendees between 2001 - 2008.

filter(new3, DENOM1 != "Roman Catholic") %>% 
  filter(date == 2008) %>% 
  summarize(sum(delta))
##   sum(delta)
## 1      -2656

Using this new subset, we can plot non-Catholic congregation trends.

ggplot(filter(new3, DENOM1 != "Roman Catholic"), # removing Catholics
       aes(date, delta, color = DENOM1)) +
  geom_line(aes(group = congrega), size = 1, alpha = 0.4) +
  scale_color_brewer(palette = "Set1", name = "Denomination") +
  annotate("text", x = 2002.5, y = -350, size = 4, 
           label = "Net change -2,656") +
  labs(title = "US Church Attendance:", 
       subtitle = "Congregation deviations from 2001 baseline: NO CATHOLICS",
       x = NULL, y = NULL,
       caption = "________________________________________________\nSource: US Congregational Life Survey\nTop-10 largest denominations - catholics: 151/251 congregations\nJohn A. Bernau 2018 // www.johnabernau.com") + john_theme +
  guides(colour = guide_legend(override.aes = list(alpha = 1, size=2)))

While this plot does a good job of showing attendance trends for each congregation, it is hard to make any meaningful conclusions. However, we can use this data to create aggregate denomination trends. By first grouping according to denomination and then adding up the yearly changes, we get a much more informative summary (saved as ag) and plot.

ag <- new3 %>% 
  group_by(date, DENOM1) %>% 
  summarise(delta = sum(delta)) %>% 
  arrange(DENOM1)
head(ag)
## # A tibble: 6 x 3
## # Groups:   date [6]
##    date DENOM1      delta
##   <dbl> <chr>       <int>
## 1  2001 Am. Baptist     0
## 2  2002 Am. Baptist   -89
## 3  2003 Am. Baptist  -152
## 4  2004 Am. Baptist  -158
## 5  2005 Am. Baptist  -125
## 6  2006 Am. Baptist  -155
# Plotting aggregates
ggplot(ag, aes(date, delta)) +
  geom_line(aes(group = DENOM1, color = DENOM1), 
            size = 1, alpha = 0.7) +
  annotate("text", x = 2007, y = -3000, size = 4, 
           label = "Net change -2,033") +
  scale_color_discrete(name = "Denomination") +
  labs(title = "US Church Attendance:", 
       subtitle = "Denomination deviations from 2001 baseline",
       x = NULL, y = NULL,
       caption = "________________________________________________\nSource: US Congregational Life Survey\nTop-10 largest denominations: 195/251 congregations\nJohn A. Bernau 2018 // www.johnabernau.com") + john_theme

Once again, the Catholic variability overpowers all other denominations. At this point, we might want to examine the data more closely. Did the Roman Catholic Churches really gain 4,500 new attendees in only three years? What could have spurred this increase? Or, more likely, what kind of reporting anomaly can account for this? Keep in mind that these attendance measures come from clergy self-reports in 2008. These post-hoc estimates might not be as robust as we would like. In any case, we can control for this variability by isolating each denomination within its own pane.

ggplot(ag, aes(date, delta)) +
  geom_line(aes(group = DENOM1, color = DENOM1), size = 1, alpha = 0.7) +
  scale_color_discrete(guide = F) +
  scale_x_continuous(breaks = 2001:2008, 
                     labels = c("", "2001", "", "",
                                "", "", "2008", "")) +
  labs(title = "US Church Attendance:", 
       subtitle = "Denomination deviations from 2001 baseline",
       x = NULL, y = NULL,
       caption = "\nNet Change = -2,033 Attendees\n________________________________________________\nSource: US Congregational Life Survey\nTop-10 largest denominations: 195/251 congregations\nJohn A. Bernau 2018 // www.johnabernau.com") + john_theme +
  facet_wrap(~DENOM1, scales = "free_y", ncol = 5)

# # Save as an image with appropriate dimensions:
# ggsave("~/Desktop/dd.jpg", width = 10, height = 6, units = "in")

(Click to enlarge)

Conclusion

This post used the US Congregational Life Study to examine trends in church attendance between 2001 - 2008. While some congregations reported modest increases, the overwhelming trend among the ten largest American denominations is one of decline. For all 195 congregations used here, attendance fell by more than 2,000 members. These trends are based on head clergy estimates in 2008. While not an error-proof source of information, this survey remains one of the best profiles of American religous congregations to date. These findings corroborate previous work by David Voas, Mark Chaves, and others documenting the general decline of religous institutions in America. Future work should take advantage of other variables in this dataset. How might congregational attributes like average age, youth programs, or gender composition predict attendance trends? Specifically, I am wondering about the spatial dynamics of a worship space: holding absolute attendance constant, does a small but full space encourage more new members than a large empty space? Pairing these seven-year attendance estimates with information on number of services, weekly attendees, and worship space capacity, one could estimate a dynamic panel data model predicting the institutional mechanisms of this widespread decline in US church attendance.


  • Berger, Peter, Grace Davie, and Effie Fokas. 2008. Religious America, Secular Europe?: A Theme and Variations. Aldershot, England; Burlington, VT: Routledge.

  • Chaves, Mark. 2017. American Religion: Contemporary Trends. Second edition. Princeton, NJ: Princeton University Press.

  • Marcum, John P. 1999. “Measuring Church Attendance: A Further Look.” Review of Religious Research 41(1):122–30.

  • Voas, David, Alasdair Crockett, and Daniel V. A. Olson. 2002. “Religious Pluralism and Participation: Why Previous Research Is Wrong.” American Sociological Review 67(2):212–30.

  • Voas, David and Mark Chaves. 2016. “Is the United States a Counterexample to the Secularization Thesis?” American Journal of Sociology 121(5):1517–56.

  • Warner, Rob. 2010. Secularization and Its Discontents. London, GB: Continuum.


Code Home