Data record linkage is the process whereby it is determined if a record in one file matches to one or several records in another file. Data linkages may be requested by researchers to link a study’s data set to the Texas Cancer Registry (TCR) database to obtain cancer-related information on the study’s patients. Both data sets must contain common data items (e.g., name, social security number, date of birth, sex, residence, etc.) to conduct a linkage.
In addition to linking on exact matches between records, the TCR can also match records which aren’t perfect matches on all variables, such as when 2 digits in a social security number are transposed.
Requests for data linkages follow the TCR’s Confidential Data Request and DSHS IRB Approval Process. Data provided to the TCR for linkage should be securely transferred (via Web Plus) as ASCII fixed width files. The data must include appropriate fields in the correct position and format, as shown in the table below. At a minimum, required variables include: (a) last name, first name, social security number, date of birth, and sex; OR (b) last name, first name, date of birth, sex, street address, city, and zip code. If data items for both (a) and (b) are available, TCR recommends including all information for a more robust linkage. If available, additional personally identifiable information, such as middle name/initial and maiden name, are also recommended. Although street address, city, maiden name, and middle name are often not used for matching, they are useful in examining possible matches. Multiple matches with insufficient information to uniquely identify the record of the person in the researcher’s data will not be released. Data from other state cancer registries and Veterans Affairs (VA) facilities will also not be released.
|Variable||Column||Format / Description|
|study unique ID||1-8||Created by researcher. Use leading Zeros if less than 8 digits.|
|first name||9-48||Left alignment. Missing should be blank.|
|last name||49-88||Left alignment. Missing should be blank.|
|social security number||89-97||No hyphens; recode missing as 999999999.|
|date of birth||98-106||YYYYMMDD; no slashes or hyphens. If only year or year and month are known, missing month and/or day should be blank.|
|sex||107||1=male; 2=female; 3=other; 4=transsexual NOS; 5=transsexual, natal male; 6=transsexual, natal female; 9= not stated/unknown.|
(number and street)
|108-167||Left alignment. Missing should be blank.|
|city||168-217||Left alignment. Missing should be blank.|
|zip code (9 or 5 digit)||218-226||Left alignment. Blanks follow the 5-digit code if the 4-digit extension is not available.|
|middle name||227-266||Left alignment. Middle initial without a period should be included if name is not available. Missing should be blank.|
|maiden name||267-306||Left alignment. Missing should be blank.|
For additional information about data linkages or if you would like to have a record linkage completed with your data, please contact the TCR Epidemiology Group at CancerData@dshs.texas.gov.