Back to Previous Page

Grouping of variables to facilitate statistical disclosure limitation methods in multivariate data sets

Jan 2018

By Oganian, Anna ; Iacob, Ionut ; Lesaja, Goran

http://dx.doi.org/10.1007/978-3-319-99771-1_13

Source: Priv Stat Databases. ?

[PDF-480.21 KB]

English

Details Supporting Files You May Also Like

Details:

Alternative Title:

Priv Stat Databases

Personal Author:

Oganian, Anna ; Iacob, Ionut ; Lesaja, Goran

Description:

Data sets that are subject to Statistical Disclosure Limitation (SDL) often have many variables of different types that need to be altered for disclosure limitation. To produce a good quality public data set, the data protector needs to account for the relationships between the variables. Hence, ideally SDL methods should not be univariate, that is, treating each variable independently of others, but multivariate, handling many variables at the same time. However, if a data set has many variables, as most government survey data do, the task of developing and implementing a multivariate approach for SDL becomes difficult. In this paper we propose a pre-masking data processing procedure which consists of clustering the variables of high dimensional data sets, so that different groups of variables can be masked independently, thus reducing the complexity of SDL. We consider different hierarchical clustering methods, including our version of hierarchical clustering algorithm, that we call |, and outline how the data protector can define an appropriate number of clusters for these methods. We implemented and applied these methods to two genuine multivariate data sets. The results of the experiments show that | has a potential to solve this problem efficiently. The success of the method, however, depends on the correlation structure of the data. For the data sets where most of the variables are correlated, clustering of variables and subsequent independent application of SDL methods to different clusters may lead to attenuated correlation in the masked data, even for efficient clustering methods. Thereby, the proposed approach is a trade-off between the computational complexity of multivariate SDL methods and data utility loss due to independent treatment of different clusters by SDL methods. | Statistical disclosure limitation (SDL), hierarchical clustering, dimensionality reduction.

Subjects:

[+]

Source:

Priv Stat Databases. ?

Pubmed ID:

32337514

Pubmed Central ID:

PMC7182379

Document Type:

Journal Article

Funding:

CC999999/ImCDC/Intramural CDC HHS/United States

Collection(s):

CDC Public Access

Main Document Checksum:

[+]

Download URL:

https://stacks.cdc.gov/view/cdc/87430/cdc_87430_DS1.pdf

File Type:

	nihms-1016601-f0001.gif	gif
	nihms-1016601-f0001.jpg	jpeg
	nihms-1016601.nxml	xml

More +

You May Also Like

Case Report and Literature Review of Prosthetic Cardiovascular Mucormycosis

Cite

Hoellinger, Baptiste ;

Magnus, Louis

...

11 2023 | Emerg Infect Dis. 2023; 29(11):2388-2390

We report a rare case of aorto-bi-iliac prosthetic allograft mucormycosis in a 57-year-old immunocompetent patient in France. Outcome was favorable af...

[PDF - 611.99 KB]

Uptake of online HIV-related continuing medical education training among primary care providers in Southeast United States, 2017–2018

Cite

Henny, Kirk D. ;

Duke, Christopher C.

...

12 2021 | AIDS Care. 33(12):1515-1524

Primary care providers play a vital role for HIV prevention and care in high burden areas of the Southeast United States. Studies reveal that only a t...

[PDF - 267.78 KB]

Checkout today's featured content at stacks.cdc.gov

Grouping of variables to facilitate statistical disclosure limitation methods in multivariate data sets

Details:

You May Also Like

Have Questions?

CDC INFORMATION

CONNECT WITH CDC