• CER Specialized Registry

    NAACCR 2014 Gold Certification


Data Linkages

Data record linkage is the process whereby it is determined if a record in one file matches to one or several records in another file. Data linkages may be requested by researchers to link a study’s data set to the Texas Cancer Registry (TCR) database to obtain cancer-related information on the study’s patients. Both data sets must contain common data items (e.g., name, social security number, date of birth, sex, residence, etc.) to conduct a linkage.

Currently, the TCR uses SAS® and Link PlusTM software to perform deterministic and probabilistic linkages. In addition to linking on exact matches between records, probabilistic linkage can also match records which aren’t perfect matches on all variables, such as when 2 digits in a social security number are transposed.

Requests for data linkages follow the TCR’s Confidential Data Request and DSHS IRB Approval Process. Data provided to the TCR for linkage should be securely transferred (via Web Plus) as ASCII fixed width files. The data must include appropriate fields in the correct position and format, as shown in the table below. At a minimum, required variables include: (a) last name, first name, social security number, date of birth, and sex; OR (b) last name, first name, date of birth, sex, street address, city, and zip code. If data items for both (a) and (b) are available, TCR recommends including all information for a more robust linkage.   If available, additional personally identifiable information, such as middle name/initial and maiden name, are also recommended. Although street address, city, maiden name, and middle name are often not used for matching, they are useful in examining possible matches. Multiple matches with insufficient information to uniquely identify the record of the person in the researcher’s data will not be released. Data from other state cancer registries and Veterans Affairs (VA) facilities will also not be released.


Data Position and Format
VariableColumn Format / Description
study unique ID1-8Created by researcher. Use leading Zeros if less than 8 digits.
first name9-48Left alignment. Missing should be blank.
last name49-88Left alignment. Missing should be blank.
social security number89-97No hyphens; recode missing as 999999999.
date of birth98-106 YYYYMMDD; no slashes or hyphens. If only year or year and month are known, missing month and/or day should be blank.
sex1071=male; 2=female; 3=other; 4=transsexual NOS; 5=transsexual, natal male; 6=transsexual, natal female; 9= not stated/unknown.
street address
(number and street)
108-167 Left alignment. Missing should be blank.
city168-217 Left alignment. Missing should be blank.
zip code (9 or 5 digit)218-226 Left alignment. Blanks follow the 5-digit code if the 4-digit extension is not available.
middle name227-266 Left alignment. Middle initial without a period should be included if name is not available. Missing should be blank.
maiden name267-306 Left alignment. Missing should be blank.


For additional information about data linkages or if you would like to have a record linkage completed with your data, please contact the TCR Epidemiology Group at CancerData@dshs.texas.gov.


Back to Research Data

Contact TCR

Last updated October 23, 2017