Data Linkages

Data record linkage is the process whereby it is determined if a record in one file matches to one or several records in another file. Data linkages may be requested by researchers to link a study’s data set to the Texas Cancer Registry (TCR) database to obtain cancer-related information on the study’s patients. Both data sets must contain common data items (e.g. name, social security number, date of birth, sex, residence, etc.) to conduct a linkage.

In addition to linking on exact matches between records, TCR can also match records which aren’t perfect matches on all variables, such as when 2 digits in a social security number are transposed. 

Requests for data linkages follow TCR’s Confidential Data Request and DSHS IRB Approval Process. Data provided to TCR for linkage should be securely transferred (via TCR's approved secure file transfer protocol) as ASCII fixed width files. The data must include appropriate fields in the correct position and format, as shown in the table below. At a minimum, required variables include: (a) last name, first name, social security number, date of birth, and sex; OR (b) last name, first name, date of birth, sex, street address, city, and zip code. If data items for both (a) and (b) are available, TCR recommends including all information for a more robust linkage.   If available, additional personally identifiable information, such as middle name/initial and maiden name, are also recommended. Although street address, city, maiden name, and middle name are often not used for matching, they are useful in examining possible matches. Multiple matches with insufficient information to uniquely identify the record of the person in the researcher’s data will not be released. Data from other state cancer registries, Veterans Affairs (VA) and Department of Defense facilities will also not be released.

Expectations for Data Linkage Requests
Conducting complex data linkages between study data sets and TCR data requires significant expertise, experience, and time. The Cancer Epidemiology and Surveillance Branch (CESB) staff are knowledgeable about TCR data and well-trained in probabilistic data linkages and use of the necessary software. In addition, CESB staff are willing and able to provide support needed throughout the process, whether that is preparing the study data set for linkage, interpreting the linkages results, understanding strengths and limitations of TCR data or linked results, and describing TCR data, or the methods used for the linkage. Therefore, CESB asks that data requestors include a CESB staff member as a collaborator in any research project requesting a data linkage. This collaborator should be listed on the Research Team Log submitted to the IRB. Manuscripts, posters, abstracts, or other items for public dissemination will be provided to CESB for review at least 5 working days prior to submission.

Variable Column  Format / Description
study unique ID 1-8 Created by researcher. Use leading Zeros if less than 8 digits.
first name 9-48 Left alignment. Missing should be blank.
last name 49-88 Left alignment. Missing should be blank.
social security number 89-97 No hyphens; recode missing as 999999999.
date of birth 98-106  YYYYMMDD; no slashes or hyphens. If only year or year and month are known, missing month and/or day should be blank.
sex 107 1=male; 2=female; 3=other; 4=transsexual NOS; 5=transsexual, natal male; 6=transsexual, natal female; 9= not stated/unknown.
street address
(number and street)
108-167  Left alignment. Missing should be blank.
city 168-217  Left alignment. Missing should be blank.
zip code (9 or 5 digit) 218-226  Left alignment. Blanks follow the 5-digit code if the 4-digit extension is not available.
middle name 227-266  Left alignment. Middle initial without a period should be included if name is not available. Missing should be blank.
maiden name 267-306  Left alignment. Missing should be blank.

For additional information about data linkages or if you would like to have a record linkage completed with your data, please contact the TCR Epidemiology Group at