• NAACCR 2014 Gold Certification

          NPCR 2016 Registry of Distinction

          USCS 2017 Registry for Surveillance

Data Linkages

Data record linkage is the process whereby it is determined if a record in one file matches to one or several records in another file. Data linkages may be requested by researchers to link a study’s data set to the Texas Cancer Registry (TCR) database to obtain cancer-related information on the study’s patients. Both data sets must contain common data items (e.g., name, social security number, date of birth, sex, residence, etc.) to conduct a linkage.

In addition to linking on exact matches between records, the TCR can also match records which aren’t perfect matches on all variables, such as when 2 digits in a social security number are transposed. 

Requests for data linkages follow the TCR’s Confidential Data Request and DSHS IRB Approval Process. Data provided to the TCR for linkage should be securely transferred (via Web Plus) as ASCII fixed width files. The data must include appropriate fields in the correct position and format, as shown in the table below. At a minimum, required variables include: (a) last name, first name, social security number, date of birth, and sex; OR (b) last name, first name, date of birth, sex, street address, city, and zip code. If data items for both (a) and (b) are available, TCR recommends including all information for a more robust linkage.   If available, additional personally identifiable information, such as middle name/initial and maiden name, are also recommended. Although street address, city, maiden name, and middle name are often not used for matching, they are useful in examining possible matches. Multiple matches with insufficient information to uniquely identify the record of the person in the researcher’s data will not be released. Data from other state cancer registries and Veterans Affairs (VA) facilities will also not be released.

Data Position and Format
Variable Column  Format / Description
study unique ID 1-8 Created by researcher. Use leading Zeros if less than 8 digits.
first name 9-48 Left alignment. Missing should be blank.
last name 49-88 Left alignment. Missing should be blank.
social security number 89-97 No hyphens; recode missing as 999999999.
date of birth 98-106  YYYYMMDD; no slashes or hyphens. If only year or year and month are known, missing month and/or day should be blank.
sex 107 1=male; 2=female; 3=other; 4=transsexual NOS; 5=transsexual, natal male; 6=transsexual, natal female; 9= not stated/unknown.
street address
(number and street)
108-167  Left alignment. Missing should be blank.
city 168-217  Left alignment. Missing should be blank.
zip code (9 or 5 digit) 218-226  Left alignment. Blanks follow the 5-digit code if the 4-digit extension is not available.
middle name 227-266  Left alignment. Middle initial without a period should be included if name is not available. Missing should be blank.
maiden name 267-306  Left alignment. Missing should be blank.

For additional information about data linkages or if you would like to have a record linkage completed with your data, please contact the TCR Epidemiology Group at CancerData@dshs.texas.gov.

Back to Research Data

Contact TCR

Last updated June 18, 2018