How to Use Vital Statistics*
Data users are diverse: public health officials evaluate a program using
death data, demographers project school enrollments with birth data, and
business people decide to open a formal-wear shop based on marriage data. Many
of these users have a thorough knowledge of statistics, but others find the
entire subject-matter confusing and intimidating. For either group, a
misunderstanding of what vital statistics can mean could lead to wrong
conclusions. Therefore, this section is included to provide an overview of how
to use vital statistics. It is addressed to the person looking at vital events
for the first time, but the experienced user may also find a review helpful.
STEP 1: FINDING THE CORRECT NUMBER
The first step is to determine how many of a particular vital event took
place during the year. This involves asking two questions.
1. Which event or events are appropriate?
LOW BIRTH WEIGHT INFANTS
Deciding which events to use is important since sometimes the choice of one
event over another can lead to vastly different conclusions.
This may be more complicated than it sounds because examining more than one
type of event may be required. For example, a researcher who is concerned with
teenage pregnancies will have to consider abortions and fetal
deaths, not simply the number of births.
2. Who should be counted?
If you are a hospital planner who is deciding to expand or contract delivery
services, you want to count the number of births which occurred in your
area, regardless of where the parents live. If you are projecting school
enrollment, you want to count only how many children will potentially be
residing in your area.
OCCURRENCE DATA: The event (the death, birth, marriage, etc.) actually took
place in the county or city. The person who is the focus of the event may have
lived in Davenport, Iowa.
RESIDENCE DATA: The person involved in the event lived in the geographic
region mentioned, but the event itself may have taken place anywhere in the
United States or Canada. In other words, a resident of the city of El Paso who
died in an accident while on vacation in Hawaii has been added to the city of El
Paso resident death figure.
When in doubt about which type of data to use, resident figures are usually
the best choice. Most birth and death data are published by residence, which
means that comparisons with other states, or to the United States as a whole,
will be easier. Exceptions to this rule are listed in the individual sections.
Once the correct event has been determined, and the choice between occurrence
and residence data has been made, the statistician can find the correct figures
in the tables(s) in this book. If the needed table is not listed, contact the
Center for Health Statistics (512-776-7509) for more information.
STEP 2: MAKING THE NUMBER MEANINGFUL WITH RATES AND RATIOS
In many instances simply knowing the number of events is not sufficient. A.
Bradford Hill expressed this important statistical concept: "It is well
recognized that white sheep eat more than black sheep because there are more of
them." For example, we know more people died in San Antonio than in Brownsville,
because San Antonio has a much larger population. But what is the
likelihood of dying in each municipality?
In order to answer this question, statisticians calculate rates. This means
that the number of events which occurred is compared to the population for which
that event could have occurred, and the figure is then standardized to some
number (such as 1,000 or 100,000) for convenience.
Here is an example:
CRUDE DEATH RATE = DEATHS X 1,000 (The number picked by vital statisticians
to eliminate decimal points)/POPULATION (The number of people who could have
The more specifically a statistician can define the "population at risk" (the
denominator or bottom part of the formula), the more meaningful the rate. For
example, the crude birth rate, which compares the number of births to the
population, is not nearly as informative as the general fertility rate, which
uses only the number of women of childbearing-age (15-44) for comparative
purposes. The general fertility rate is not distorted by changes in the number
of men or pre-pubescent or post-menopausal women in the population.
Unfortunately, we do not always have the correct denominator for the
equation. In these situations a substitute is used. For example, how many people
are at risk of getting divorced? The number of married people is only available
for census years. As a substitute, the crude divorce rate is calculated using
the total population regardless of marital status. In other situations, the
event is simply compared to another related number. For instance, the abortion
ratio compares the number of abortions to the number of births. This is easier
and more accurate than trying to determine the true denominator, which is the
total number of pregnant women.
When calculating rates and ratios, great care must be taken to make certain
that the appropriate time periods, geographical boundaries and populations are
STEP 3: COMPARING TWO OR MORE NUMBERS
Numbers are more meaningful when they are converted into rates and ratios.
But problems can arise when rates or ratios are compared for different
geographical areas, different time periods, or different categories such as men
Statisticians expect a certain amount of chance variation and have methods to
take this into account. The confidence interval uses the number of cases and
their distributions to determine what the rate "really" is. If two rates have
overlapping confidence intervals, then the difference between them may be due to
this chance variation. In other words, the difference is not statistically
significant . When comparing rates and ratios, differences should be tested
for statistical significance.
Chance variation is a common problem when the numbers being used to calculate
rates are extremely small. Large swings often occur in the rates which do not
reflect real changes. Consider Maverick County's infant mortality rates for a
five year period, shown below:
Year Births Infant Deaths Infant Mortality Rates
1988 868 5 5.8
1989 843 4 4.7
1990 900 9 10.0
1991 1,021 4 3.9
1992 1,165 2 1.7
1988-92 4,797 24 5.0
The rates vary widely from year-to-year. Note that the 1991 infant death rate
is double the 1992 rate, even though there were only two more infant deaths
occurring in 1991 than there were in 1992.
Many rates based on small numbers are published in this book because readers
demand them. However, anyone preparing to make important decisions based on
these rates should be wary. Consider this rule of thumb: a rate based on 20
cases has a 95% confidence interval about as wide as itself (the interval for a
rate of 50 is between 25 and 75). Even large differences between two rates based
on 20 cases or less are probably not statistically significant.
If 20 are too few, how many cases are sufficient to say that a true
difference exists? Unfortunately, we have no easy rules for this. To be safe,
the vital statistician should always try to combine several years of data or
consolidate geographical areas. Confidence intervals should be calculated, and
differences should be tested for statistical significance.
Changes in Measurement
Another problem is that the numbers being compared have not always been based
on the same type of measurement. Definitions, population estimates,
certificates, and coding procedures change from time to time as the need arises.
This can create "artificial" differences which can disguise "real" differences.
The cause-of-death item provides an excellent example of changes in
From 1980 to 1988, approximately 1,800 to 2,100 Texans died each year due to
Diabetes. The range of annual crude death rates for these years is 11.7 to 13.1
per 100,000 residents. In 1990, 3,458 Texans died from this cause for a crude
death rate of 20.4 per 100,000 residents.
It appears that the incidence of Diabetes increased. But actually, a revision
to the death certificate resulted in more deaths being coded as due to that
cause-of-death. In 1989, the cause-of-death section was expanded from three to
four lines, which provides more room for describing multiple conditions leading
Taking Age, Sex and Race into Account
Before comparing two places or two time periods always compare the population
characteristics, such as age, sex and race, first. If discrepancies are noted in
any relevant variables, than the rates should be adjusted or standardized in
order to make the comparisons free of differences in the structure of the
populations. An example of age-adjustment
by the direct method of standardization is given in the Technical Appendix.
STEP 4: ANALYZING THE DATA
The first three steps have been fairly mechanical:
(1) Choose the correct events and the correct group to determine the number
of events which took place for the geographical areas and time periods.
(2) Calculate the rates.
(3) Compare these rates to determine if the differences are statistically
NOW the vital statistician must begin to ask the difficult questions. If we
find that two rates are statistically significantly different, how can we find
out why they're different? If the differences which we expected did not prove to
be significant, is there another item which perhaps is making an actual
difference? Frequently the statistician has to refine the research question and
begin all over again.
*Technical Notes reprinted courtesy of the Oregon Center for Health
Statistics; illustrative examples were changed to reflect Texas data.
2010 Annual Report List of Tables and References
Annual Reports for Other Years
Center for Health Statistics