Easy Analytic Software, Inc. (EASI) – Methodology (2005)

 

The following is a general description of the methodology used by EASI to update the demographic and economic characteristics for the United States, States, Counties, ZIP Codes, Census Tracts, Block Groups, and ZIP Plus 4’s. 

 

The purpose of this explanation is not to divulge any proprietary methods but to illustrate the efforts made on your behalf to create accurate updates.  EASI statistician’s and programmers have over 30 years of experience updating these types of data.  By industry standard EASI estimates would be considered of the highest quality. 

 

1.     Input files

a)    With the current release EASI will benchmark at the Block Group and higher levels all of the details supplied with the 2000 Census (all related releases at the Block Group level).  All data are now derived from SF3.

 

b)    Note that racial data including Black Population, Asian Population, White Population, and Other Population, are different questions with this Census and data are not compatible with the 1990 Census (multiple race categories are now possible and are part of the Other group).  EASI uses the 2000 Census Block Group data as our benchmark.

 

c)    EASI has collected from the Census Bureau all current local (counties etc.) and national updates and estimates for all the key demographic information.    All these official estimates have been analyzed and then incorporated into our estimates and projections.

 

d)    EASI has summarized from the United States Postal Service (USPS) mailable Households at a County, ZIP Code, Census Tract, and Block Group level.  These data have been used as the primary input to estimate local current change within a small area such as a Block Group.  Mailable households are not the same as Census Households but are used to indicate recent change in household formations.  These changes are combined with an EASI proprietary model for updating and forecasting at the Block Groups.

 

e)    The Mailable Household data match starts by identifying for every ZIP Plus 4 (ZIP+4), which Block Group it belongs to.  EASI develops a split file and a plurality file of these matches using the latest Tiger file, to determine which Block Group (primary) they should be assigned to. One of the key goals is to identify all correct ZIP Codes and ZIP+4’s and then assign them to the correct Block Group that these mailable  Households should be assigned to.  EASI has also reconfigured the 1990 Block Groups into the 2000 Block Group configuration.  An analysis of this decade change is also included in our model.

 

f)     EASI has also analyzed the 2000 Census Block files in order to create a population centroid for each Block Group.  The results of that analysis are used for all ring study analysis.

 

g)    Specific other sources include:

 

1.     Bureau of the Census – 2000 Census PL 94 – 171; 2000 Census SF1 and SF3.  (These Census data are the EASI benchmark or starting point for the demographic updates and the forecasts.)   Other related sources are: Annual Demographic Survey, Current Population Reports (P20; P25; P60; and numerous special Census reports.

 

2.     ZIP and County Business Patterns (US Department of Commerce - Economics and Statistics Administration- Bureau of the Census.).

 

3.     US Department of Justice - Federal Bureau of Investigation. (2001).

 

4.     National Center for Education Statistics - Common Core of Data (CCD)

 

5.     National Oceanic and Atmospheric Administration - National Environmental Satellite, Data and Information Service - National Climatic Data Center.

 

6.     United States Department of the Interior - Geological Survey - Office of Earthquakes, Volcanoes, and Engineering.

 

7.      Bureau of Labor Statistics - Department of Labor.

 

2.     Data Preparation

The steps in creating the ZIP Plus 4 (ZIP+4 ) and Block Group mailable Households           include:

 

a)    Start with a USPS ZIP+4 file for December 2004 which includes all valid residential ZIP+4’s in the country whether are residential mail or not

 

b)    For each ZIP+4’s, we add Census Blocks Groups based upon the Tiger file distance formula.  Approximately 20 million records are processed by this direct match (about 75%).

 

c)    For each remaining ZIP+4, we match against our internal geocode file (latitude and longitude).   This file is based on running through address matching/geocoding software.  Approximately 18% of total are matched to their Block Group this way.

 

d)    For each remaining ZIP+4 that cannot be geocoded by b) or c), we use a calculated carrier route or Block Group centroid.  We weight the geographies to a larger area and calculate a latitude and longitude.  We then determine which is the closest (distanced) Block Group.  This is done for approximately 5% of total.

 

e)    If still unassigned then, we use nearest neighbor ZIP+4.  There are approximately 2% or total are done through this approach (recent, 6 months old ZIP+4s are often in this category).

 

f)     Block Groups assignments are from the most recent Census Tiger file.  Tiger errors, where identified (such as wrong FIPS Codes) have been corrected.

 

g)    ZIP Plus 4’s are assigned data based upon the data of the Block Group that it has been assigned to.  (Note: There are no official Census Bureau data for ZIP+4.)

 

h)    The mailable household data are for residential ZIP+4’s (no business-exclusive ZIP+4’s are included).    

 

3.     Analysis

EASI has developed a series of models which use the relationship between the current mailable households at the Block Group level to estimate household relationships compared to the county and to the ZIP Code. EASI analyzes the change in relationships between these mailable households over time and compares this to the county and to the ZIP Code households using a proprietary formula.  The analysis relates the current estimate of mailable households to the number of mailable households at the time of the 2000 Census (4/1/00).

           

One key component of the analysis is a proximity site review of all ZIP+4’s based upon their Block Group assignment (208,790 Block Groups).    This analysis prepares our input data before use in EASI demographic models.

           

Newly released Census county estimate information are analyzed compared to prior releases.(P-25 and P-26).  Annually, EASI also incorporates relevant national data as control totals.  This is done for a variety of demographic factors.  EASI derives this from analysis of national data, over time, from the Annual Demographic Survey, the Current Population Survey, American Community Survey, and the Annual Housing Survey.  There are also from a variety of sources at the Census Bureau web site (www.census.gov).

           

ZIP Code results are independently compared to the USPS current ZIP Code file of residential deliveries.  Additional updating sources include: USPS AMS files and Postal bulletins (the ZIP Alert); these record any annual changes that take place to ZIP codes including name changes, delivery or branch changes as they become official.  Other sources include: U.S. Postal Service City-State File (monthly) and Delivery Statistics File.  These CD ROM’s incorporate main inventory of ZIP Codes and the post office and other names associated with them.  Each year EASI conducts a complete review of these files to maintain a current ZIP Code roster.  EASI inventories the old ZIP Codes as well.

           

Updates to the current year and a 5 year projections are first done at the United States level and for key variables at the county level as well.  Block Group (BG) level estimates are all controlled to the county control totals.  That is, the Block Group data will add to the separately generated county data.  Other geographies are summarized from the Block Group level  BGs are added up to create Census Tracts and parts of BGs are added to get ZIP Codes and to get cities. 

                       

4.     Consistency – year to year changes

Each year EASI uses all available sources to maintain the highest quality of our estimates.  Sometimes the new information will makes year to year changes less meaningful e.g. a 2005 ZIP Code may have a different definition of BG’s because of postal changes in the last year.  However, the changes from our 5 year forecast, within an EASI calendar year, are consistent from the current estimate but changes from last’s years estimates are not necessarily so. EASI geography estimates are all based on the same geography, that is all ZIP Code estimates for April 1, 1990; April 1, 2000, 1/1/2005; and 1/1/2010 are all based on the same geographic definition.

 

Other issues of consistency relate to release of Census data and new definitions of variables.  For example, the 2000 Census has a new Race question (e.g. White Alone, Black Alone, Asian Alone, etc.) which allows for multiple races and is not compatible with previous estimates.  

 

Another factor in consistency is that with some data sources information becomes available annually but with others data elements may not be released but once every two or even three years. Post-Census estimates are also subject to revision for several reasons.  Occasionally a sample size can be expanded to allow for more detailed results.  Another change could be that the sample is framed against any new data such as the 2000 Census.   EASI with decades of experience analyzes all information and if usable then EASI incorporates the results into our estimates.

 

ZIP Code Details - As mentioned above, ZIP Codes even if they seem to be the same (same 5 digits) are especially difficult for consistency from year to year (they are always consistent within the EASI data and software.)  Since each ZIP Code area may change from year to year EASI spends considerable effort to develop new ZIP Code data for each and every year.  That is, EASI assigns a portion of each Block Group to a ZIP Code based on the latest information for each year (1990, 2000, current and five year forecast).  Note: Annually EASI’s creates a proprietary ZIP to Block Group (partial) analysis and we also allocate all land area to create a ZIP Code.

 

Income

There are many different definitions of income that are available for analysis.  With the release of the 2000 Census EASI will be using the actual 2000 Census Income estimates (for the year 1999) as our starting point.  These estimates are then modeled using the P60 Money Income in the United States (Current Population Reports – Consumer Income) as well as other data.  EASI income models are based on race and by family characteristics to obtain a current estimate.  All use the 2000 Census definition of income as a benchmark.

 

EASI income estimates are controlled to analysis from the Money income data after analyzing the differences in that sample compared to the actual 2000 Census.  Note: In prior years, EASI Income estimates will not be controlled to Disposable Income.  If EASI estimates are compared to other sources such as Disposable Income or Total Personal income they may seem different. 

 

Consumer Expenditure Survey (CEX)

The results of the CEX are analyzed annually by EASI and then combined with EASI estimates at the Block group level.  The Bureau of Labor Statistics and the Bureau of the Census conduct the CEX.  There are two parts to the survey.  The first part is a diary, which is completed by respondents for two consecutive 1-week periods.  The second part is an interview survey, which are conducted quarterly (3 months) for five quarters.  The interview survey includes about 95% of all expenditures and includes large expenditures such as property, automobiles, major appliances, rent, utility payments insurance premiums, and many others.

 

EASI annually models these results of about 540+ categories of expenditures against our updated demographic estimates.  EASI’s models use our own BG demographic estimates to update these potential sales.

 

An example:

EASI models the age of respondent, income of respondent, and tenure (own home versus rent).   Then for each demographic characteristic we have an average expenditure for the previous calendar year (e.g. a respondent earning $50,000 to $75,000 spent $210 (for example only) and we might then see that a respondent with income of $35,000 to $50,000 spent $150 (for example only).

 

We take all the values for the demographics and then develop a model for this CEX characteristic that combines the factors to get one BG level estimate.

 

 

 

Retail Sales and Store Groups

EASI’s Retail Sales Estimates include Food Service – Total Retail Sales includes the standard 12 major stores plus Food Service).  All data are based on an extensive review of County and ZIP Code Retail Trade data for 1997.  EASI created a file of benchmark data from the released Census data which is used for our annual update.

Each year, EASI creates a new consistent file of benchmark and updated for 1997, 2005 (Year 2004), 2010 (Year 2009).  EASI re-benchmarks estimates for each update to a new set of Block Group estimates for all retail categories.  These estimates are based on our current analysis of the latest NAICS employment data for each retail store and food service.  Note: EASI resolves any inconsistencies between sources as part of this annual process.

The 13 store groups that comprise Total Retail Sales are:

1.     Motor Vehicle and Parts Dealers

2.     Furniture and Home Furnishings Stores

3.     Electronics and Appliance Stores 

4.     Building Material and Garden Equipment and Supplies Dealers

5.     Food and Beverage Stores

6.     Health and Personal Care Stores

7.     Gasoline Stations

8.     Clothing and Clothing Accessories Stores

9.     Sporting Goods, Hobby, Book, and Music Stores

10.  General Merchandise Stores

11.  Miscellaneous Store Retailers

12.  Nonstore Retailers

13.  Food Services

Benchmark Methodology and Assumptions

These retail data are benchmarked at the county and at the city level (only cities were benchmarked with 2,500 population or more with 11 or more establishments in the 1997 Census).  These county and city data and estimates are based on actual store locations.  Then EASI develops a ZIP code version of this file.  EASI models these actual store locations (counties and certain cities) down to a block group level using a business employment relationship developed from the latest ZIP Business Patterns.  This is done in order to allow the retail sales estimates to be used as part of standard database summaries.  Note: EASI does not know the actual locations of stores at the Block Group.  Other geographies are estimated by adding up the Block Group estimates.

The updates are modeled against estimated changes based upon the ZIP Business Patterns.  Therefore, the sum of the BG’s retail sales estimates within a ZIP Code is consistent to the ZIP Code Business employment data.  Any inconsistencies between sources are reviewed and made consistent to the most current data from ZIP Business Patterns.

EASI models the retail trade data to a Block Group based on a proximity model.  The model assigns exclusive Business or Retail ZIP Codes to the closest Block Group. For example, from ZIP Business Patterns EASI can identify point business locations and the retail configuration within each.

 

5.     Accuracy

With all estimates and with ours as well, the higher the level of data (national is the highest) the more accurate the estimate.  Our data follows standard demographic techniques, all developed with over 30 years of experience in this industry.  It is considered a highly accurate technique. 

 

EASI data has also been “field tested”.  That is, portions of our updated data are available at our web site and have been used by hundreds of thousands of users.  These users raise questions about our updates, which we investigate.  This input does help us to review and check results and makes our estimates better.

 

Here are some common questions:

 

Why are the Post Office mailable households different than EASI’s?

 

a)    One reason is that the differences between the counts of ZIP households in the Census and the mailable households from the post office is that there are  differences in definitions between mailable households and Census households. There can be two mailable households in a residence but only one household.  The Census will call it a single household if there is a relationship and the post office does not keep track of relationships.

 

How close is EASI updated data to other sources?

 

b)    EASI has made an extensive effort to obtain all relevant information and to incorporate it in a logical statistical manner.  Other companies who use similar sources and statistical approach should give similar results.  One method of comparison is a circle or ring study.  An analysis of comparable ring studies has shown a current population difference of less than 2 percent.  In denser population areas the results of the ring analysis are within .005 percent analysis.  With the release of the 2000 Census an analysis showed that EASI ZIP Code estimates were in over 98% of the cases within .005 percent.

 

6.     Validity Checking

EASI has made numerous checks for internal and external consistency in all our estimates. There are 3 types of checks that are rigorously reviewed. These include; Census internal consistency, controlling updates to definitions of estimates, and correcting for, or preventing, rounding errors, especially in small geographies.

2000 Census validity check is an analysis and comparison of the results of SF1 vs. SF3 estimates at the Block Group level. Due to sample sizes and Census procedures for disseminating the Census results there frequently are Census results which are inconsistent. These results are analyzed and EASI has developed a series of algorithms to adjust these estimates to make them consistent. (Examples are mostly in small BGs where there might be a single household, by total or by race, found in SF3 but no population in SF3. Or a value for a single cell in a detailed by race age distribution won’t re-add to the SF1 distribution for the same results.) EASI strives to correct all of these problems with the Census data and remove these as issues that could affect EASI updates.

EASI updating validity checks involve controlling all Census 2000 distributions that require a controlling definition. EASI then makes the same checks on the EASI updates in order to prevent inconsistencies from coming into the updates.

 

The next issue is the controlling of distribution to the correct sum. A basic example of that is population by age and sex must add to population. This same issue is where the sum of the male age 0 to 5 for White, Black, Asian, and Other must add to total 0 to 5. Another example is that education attainment is defined as for the population 25. Note: Each distribution has a requirement like this. Many must add to population 16+ or population 3+, or households, or population, etc.  Other key ones are that Hispanic must be less than or equal to Total Population less White Non-Hispanic Population.  Also key is that White Non-Hispanic Population must be less than or equal to White Population.  These conditions for updates apply across all estimates including individual age groups (0 to 5, 6 to 11, etc.) and individual income groups ($o to $15k, 15k to 25k, etc.).

The last part of the validity check is to find and fix rounding errors.  Rounding errors are introduced in all estimates since results for the sum of a distribution will frequently not exactly add to the require estimate. To accommodate the rounding error  EASI has developed various ways of adjusting the error into the most likely cell (in EASI rounding errors are calculated simultaneously as the distribution is being estimated, so when a group or cell sum is off by 1 (high or low) EASI immediately makes the adjustment in that actual group or cell.

These checks are performed at the BG, City, and ZIP Code levels. This is required since EASI splits BGs to create cities and ZIP Codes.  Since splitting of BGs can introduce these validity issues the EASI methodology require the BG checks described above to be repeated at both cities and ZIP Codes as well.

 

***

 

For a further discussion of these methodologies please call Robert Katz at 800 HOW EASI (469 3274) (email info@easidemographics.com).

Easy Analytic Software, Inc.
240 Benigno Boulevard, Bellmawr, NJ 08031
phone: 856.931.5780
fax: 856.931.4115