Measuring Semantic Similarity across EU GDPR Regulation and Cloud Privacy Policies Conference Paper uri icon

abstract

  • Data protection authorities formulate policies and rules which the service providers have to comply with to ensure security and privacy when they perform Big Data analytics using users Personally Identifiable Information (PII). The knowledge contained in the data regulations and organizational privacy policies are typically maintained as short unstructured text in HTML or PDF formats. Hence it is an open challenge to determine the specific regulation rules that are being addressed by a provider's privacy policies. We have developed a semantically rich framework, using techniques from Semantic Web and Natural Language Processing, to extract and compare the context of a short text in real-time. This framework allows automated incremental text comparison and identifying context from short text policy documents by determining the semantic similarity score and extracting semantically similar key terms. Additionally, we also created a knowledge graph to store the semantically similar comparison results while evaluating our framework across EU GDPR and privacy policies of 20 organizations complying with this regulation associated with various categories apply to Big Data stored in the cloud. Our approach can be utilized by Big Data practitioners to update their referential documents regularly based on the authority documents.

name of conference

  • 2020 IEEE International Conference on Big Data (Big Data)

published proceedings

  • 2020 IEEE International Conference on Big Data (Big Data)

author list (cited authors)

  • L. Elluri, .., K. Pande Joshi, .., & A. Kotal.

publication date

  • 2020