CERIAS Tech Report 2012-12
Privacy Preserving Access Control on Third-Party Data Management Systems
by Mohamed Nabeel
Center for Education and Research
Information Assurance and Security
Purdue University, West Lafayette, IN 47907-2086
Graduate School ETD Form 9
(Revised 12/07)
PURDUE UNIVERSITY
GRADUATE SCHOOL
Thesis/Dissertation Acceptance
This is to certify that the thesis/dissertation prepared
By Mohamed Yoosuf Mohamed Nabeel
Entitled
Privacy Preserving Access Control on Third-Party Data Management Systems
For the degree of
Doctor of Philosophy
Is approved by the final examining committee:
Elisa Bertino, Ph.D.
Chair
Ninghui Li, Ph.D.
Samuel S. Wagstaff, Ph.D.
Dongyan Xu, Ph.D.
To the best of my knowledge and as understood by the student in the Research Integrity and
Copyright Disclaimer (Graduate School Form 20), this thesis/dissertation adheres to the provisions of
Purdue University’s “Policy on Integrity in Research” and the use of copyrighted material.
Elisa Bertino, Ph.D.
Approved by Major Professor(s): ____________________________________
____________________________________
Approved by: William J. Gorman, Ph.D.
Head of the Graduate Program
07/18/2012
Date
Graduate School Form 20
(Revised 9/10)
PURDUE UNIVERSITY
GRADUATE SCHOOL
Research Integrity and Copyright Disclaimer
Title of Thesis/Dissertation:
Privacy Preserving Access Control on Third-Party Data Management Systems
For the degree of
Doctor
Philosophy
Choose of
your
degree
I certify that in the preparation of this thesis, I have observed the provisions of Purdue University
Executive Memorandum No. C-22, September 6, 1991, Policy on Integrity in Research.*
Further, I certify that this work is free of plagiarism and all materials appearing in this
thesis/dissertation have been properly quoted and attributed.
I certify that all copyrighted material incorporated into this thesis/dissertation is in compliance with the
United States’ copyright law and that I have received written permission from the copyright owners for
my use of their work, which is beyond the scope of the law. I agree to indemnify and save harmless
Purdue University from any and all claims that may be asserted or that may arise from any copyright
violation.
Mohamed Yoosuf Mohamed Nabeel
______________________________________
Printed Name and Signature of Candidate
07/12/2012
______________________________________
Date (month/day/year)
*Located at http://www.purdue.edu/policies/pages/teach_res_outreach/c_22.html
PRIVACY PRESERVING ACCESS CONTROL FOR THIRD-PARTY DATA
MANAGEMENT SYSTEMS
A Dissertation
Submitted to the Faculty
of
Purdue University
by
Mohamed Yoosuf Mohamed Nabeel
In Partial Fulfillment of the
Requirements for the Degree
of
Doctor of Philosophy
August 2012
Purdue University
West Lafayette, Indiana
ii
iii
ACKNOWLEDGMENTS
First and foremost, I would like to express my deepest gratitude to my adviser,
Prof. Elisa Bertino, for her unwavering support, patience and guidance through out
my PhD program. Without her constant support, advice and encouragement, this
dissertation could not have been completed.
I would like to thank Prof. Ninghui Li, Prof. Samuel S. Wagstaff, Jr., Prof. Sunil
Prabhakar and Prof. Dongyan Xu for taking time off their busy schedule to be in my
committee and providing their invaluable input.
I am also grateful to my mentors and supervisors who I worked with during my
summer internships and graduate assistantships: Ann Christine Catlin from Rosen
Center for Advanced Computing, Dr. David G. Stork from Ricoh Innovations, and
Dr. Mourad Ozzani from Cyber Center.
I am fortunate to be surrounded by an amazing group of fellow graduate students
and friends at Purdue. Special thanks to my colleague Ning Shang whom I closely
collaborated with during my initial research work. I would like to thank Purdue
University for supporting my research through Purdue Research Foundation (PRF)
scholarship and the Fulbright fellowship.
Finally and most importantly, words cannot express my gratitude to my parents,
Yoosuf and Zeenathunnisa, my wife Muffarriha, my siblings Zahmy, Nasly, Shireen
and Jasly for their unconditional love and always supporting me. I am very grateful
to the Almighty God for giving me the strength to achieve my dreams.
iv
TABLE OF CONTENTS
Page
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vii
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
viii
SYMBOLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
x
ABBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xi
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xiii
1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1 Privacy Preserving Access Control in Pull Based Systems . . . . .
1.2 Privacy Preserving Access Control in Subscription-based Systems
1.3 Attribute Based Group Key Management . . . . . . . . . . . . . .
1.4 Contributions and Document Structure . . . . . . . . . . . . . . .
.
.
.
.
.
1
2
4
6
7
2 BROADCAST GROUP KEY MANAGEMENT . .
2.1 Requirements for a Secure and Effective GKM
2.2 Broadcast GKM . . . . . . . . . . . . . . . . .
2.3 Our Construction: ACV-BGKM . . . . . . . .
2.4 Security Analysis . . . . . . . . . . . . . . . .
2.5 Improving the Performance of ACV-BGKM .
2.5.1 Bucketization . . . . . . . . . . . . . .
2.5.2 Subset Cover . . . . . . . . . . . . . .
2.6 ACV-BGKM-2 . . . . . . . . . . . . . . . . .
2.6.1 Security Analysis . . . . . . . . . . . .
2.7 Experimental Results . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9
10
11
15
16
21
21
22
23
25
27
3 ATTRIBUTE BASED GROUP KEY MANAGEMENT
3.1 Scheme 1: Inline AB-GKM . . . . . . . . . . . . .
3.1.1 Our Construction . . . . . . . . . . . . . .
3.1.2 Security . . . . . . . . . . . . . . . . . . .
3.1.3 Performance . . . . . . . . . . . . . . . . .
3.2 Scheme 2: Threshold AB-GKM . . . . . . . . . .
3.2.1 Our Construction . . . . . . . . . . . . . .
3.2.2 Security . . . . . . . . . . . . . . . . . . .
3.2.3 Performance . . . . . . . . . . . . . . . . .
3.3 Scheme 3: Access Tree AB-GKM . . . . . . . . .
3.3.1 Access Tree . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
31
32
33
36
39
39
41
43
44
45
45
v
3.4
3.5
3.3.2 Our Construction
3.3.3 Security . . . . .
3.3.4 Performance . . .
Example Application . .
Experimental Results . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4 PRIVACY PRESERVING PULL BASED SYSTEMS: SINGLE LAYER APPROACH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1 Overview of the SLE Approach . . . . . . . . . . . . . . . . . . . .
4.2 Preserving the Privacy of Identity Attributes . . . . . . . . . . . . .
4.2.1 Discrete Logarithm Problem and Computational Diffie-Hellman
Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.2 Pedersen Commitment . . . . . . . . . . . . . . . . . . . . .
4.2.3 OCBE Protocols . . . . . . . . . . . . . . . . . . . . . . . .
4.2.4 Configurable Privacy . . . . . . . . . . . . . . . . . . . . . .
4.3 Single Layer Encryption Approach . . . . . . . . . . . . . . . . . .
4.3.1 Identity Token Issuance . . . . . . . . . . . . . . . . . . . .
4.3.2 Identity Token Registration . . . . . . . . . . . . . . . . . .
4.3.3 Data Management . . . . . . . . . . . . . . . . . . . . . . .
4.4 Improving Efficiency of Re-Encryption . . . . . . . . . . . . . . . .
4.5 An Example Application . . . . . . . . . . . . . . . . . . . . . . . .
4.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6.1 Privacy Preserving Secret Delivery . . . . . . . . . . . . . .
4.6.2 Data and Key Management . . . . . . . . . . . . . . . . . .
4.6.3 Encryption Management . . . . . . . . . . . . . . . . . . . .
5 PRIVACY PRESERVING PULL BASED SYSTEMS: TWO
CRYPTION APPROACH . . . . . . . . . . . . . . . . . . .
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Policy Decomposition . . . . . . . . . . . . . . . . . . .
5.2.1 Policy Cover . . . . . . . . . . . . . . . . . . . .
5.2.2 Policy Decomposition . . . . . . . . . . . . . . .
5.3 Two Layer Encryption Approach . . . . . . . . . . . .
5.3.1 Identity Token Issuance . . . . . . . . . . . . .
5.3.2 Policy Decomposition . . . . . . . . . . . . . . .
5.3.3 Identity Token Registration . . . . . . . . . . .
5.3.4 Data Encryption and Upload . . . . . . . . . .
5.3.5 Data Downloading and Decryption . . . . . . .
5.3.6 Encryption Evolution Management . . . . . . .
5.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4.1 SLE vs. TLE . . . . . . . . . . . . . . . . . . .
5.4.2 Security and Privacy . . . . . . . . . . . . . . .
5.5 Experimental Results . . . . . . . . . . . . . . . . . . .
LAYER
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
EN. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
Page
46
49
50
51
55
59
60
62
63
63
64
67
68
69
70
74
76
80
85
85
87
91
93
95
97
98
105
107
107
108
108
108
109
109
110
110
111
112
vi
Page
6 PRIVACY PRESERVING SUBSCRIPTION BASED SYSTEMS . . .
6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.1 Interactions . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.2 Trust Model . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2.1 Pedersen Commitment . . . . . . . . . . . . . . . . . . .
6.2.2 Zero-Knowledge Proof of Knowledge (Schnorr’s Scheme)
6.2.3 Euler’s Totient Function φ(·) and Euler’s Theorem . . .
6.2.4 Composite Square Root Problem . . . . . . . . . . . . .
6.2.5 Paillier Homomorphic Cryptosystem . . . . . . . . . . .
6.3 Proposed Scheme . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3.1 Initialize . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3.2 Register . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3.3 Subscribe . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3.4 Publish . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3.5 Match . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3.6 Cover . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3.7 The Distribution of Load . . . . . . . . . . . . . . . . . .
6.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . .
6.4.1 Protocol Experiments . . . . . . . . . . . . . . . . . . . .
6.4.2 System Experiments . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
118
121
124
126
127
127
128
128
128
129
130
131
132
133
134
135
137
138
138
139
143
7 Survey of Related Work . . . . . . . . . . .
7.1 Group Key Management (GKM) . . .
7.2 Functional Encryption . . . . . . . . .
7.3 Selective Publishing of Documents . . .
7.4 Secure Data Outsourcing . . . . . . . .
7.5 Secret Sharing Schemes . . . . . . . . .
7.6 Proxy Re-Encryption Systems . . . . .
7.7 Searchable Encryption . . . . . . . . .
7.8 Secure Multiparty Computation (SMC)
7.9 Private Information Retrieval (PIR) . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
147
147
148
149
150
151
151
152
152
153
8 SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
154
LIST OF REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . .
157
VITA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
163
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
vii
LIST OF TABLES
Table
Page
3.1
Access tree functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
3.2
Insurance plans supported by doctors/nurses . . . . . . . . . . . . . . .
52
3.3
User attribute matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
3.4
List of employees satisfying each insurance plan . . . . . . . . . . . . . . .
53
3.5
List of employees satisfying attributes . . . . . . . . . . . . . . . . . . . .
53
3.6
Average time for CP-ABE algorithms . . . . . . . . . . . . . . . . . . . .
56
4.1
A table of secrets maintained by the Pub . . . . . . . . . . . . . . . . .
73
4.2
Average computation time for running one round of the EQ-OCBE protocol
86
6.1
Matching decision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
136
6.2
Average computation time for general operations . . . . . . . . . . . .
139
viii
LIST OF FIGURES
Figure
Page
1.1
A typical pull based system . . . . . . . . . . . . . . . . . . . . . . . .
3
1.2
A typical publish-subscribe system . . . . . . . . . . . . . . . . . . . .
5
2.1
Average time to generate keys . . . . . . . . . . . . . . . . . . . . . . .
28
2.2
Average time to derive keys . . . . . . . . . . . . . . . . . . . . . . . .
29
2.3
Average time to generate keys with different bucket sizes . . . . . . . .
29
2.4
Average time to derive keys with different bucket sizes . . . . . . . . .
30
2.5
Average time to generate keys with the two optimizations . . . . . . .
30
2.6
Average time to derive keys with the two optimizations . . . . . . . . .
30
3.1
Average key generation time for different group sizes . . . . . . . . . .
56
3.2
Average encryption/decryption time for different group sizes . . . . . .
57
3.3
Average key generation time for varying attribute counts . . . . . . . .
58
4.1
Overall system architecture . . . . . . . . . . . . . . . . . . . . . . . .
61
4.2
Average computation time for running one round of GE-OCBE protocol
87
4.3
Time to generate an ACV for different user configurations . . . . . . .
88
4.4
Key derivation time for different user configurations . . . . . . . . . . .
89
4.5
Size of ACV for different user configurations . . . . . . . . . . . . . . .
89
4.6
ACV generation and key derivation for different number of conditions per
policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
90
4.7
Different incremental encryption modes . . . . . . . . . . . . . . . . . .
91
4.8
Average time to perform insert operation . . . . . . . . . . . . . . . . .
91
5.1
Two layer encryption approach . . . . . . . . . . . . . . . . . . . . . .
96
5.2
The example graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
104
5.3
Size of ACCs for 100 attributes . . . . . . . . . . . . . . . . . . . . . .
113
5.4
Size of ACCs for 500 attributes . . . . . . . . . . . . . . . . . . . . . .
113
ix
Figure
Page
5.5
Size of ACCs for 1000 attributes . . . . . . . . . . . . . . . . . . . . . .
114
5.6
Size of ACCs for 1500 attributes . . . . . . . . . . . . . . . . . . . . . .
114
5.7
Policy decomposition time breakdown with the random cover algorithm
115
5.8
Policy decomposition time breakdown with the greedy cover algorithm
116
5.9
Average time to generate keys for the two approaches . . . . . . . . . .
116
5.10 Average time to derive keys for the two approaches . . . . . . . . . . .
117
6.1
An example CBPS system . . . . . . . . . . . . . . . . . . . . . . . . .
119
6.2
Sub registering with Pub . . . . . . . . . . . . . . . . . . . . . . . . . .
132
6.3
Sub authenticating itself to Broker . . . . . . . . . . . . . . . . . . . . .
133
6.4
Time to blind subscriptions/notifications for different bit lengths of n .
141
6.5
Time to blind subscriptions/notifications for different l . . . . . . . . .
142
6.6
Time to perform match/cover for different bit lengths of n . . . . . . .
142
6.7
Time to perform match/cover for different l . . . . . . . . . . . . . . .
143
6.8
Equality filtering time . . . . . . . . . . . . . . . . . . . . . . . . . . .
144
6.9
Equality filtering time for different domain sizes . . . . . . . . . . . . .
145
6.10 Inequality filtering time for different domain sizes . . . . . . . . . . . .
146
x
SYMBOLS
KS
Keyspace
ACP
Policy
A
Attribute universe
SS
Secret space
S
The set of issued secrets
AS
The set of aggregated secrets
T
Access tree
xi
ABBREVIATIONS
ABAC
Attribute Based Access Control
ABE
Attribute Based Encryption
AB-GKM
Attribute Based Group Key Management
ACC
Attribute Condition Cover
ACP
Access Control Policy
ACV
Access Control Vector
AVP
Attribute Value Pair
BGKM
Broadcast Group Key Management
CBPS
Content Based Publish Subscribe
CP-ABE
Ciphertext Policy Attribute Based Encryption
DaaS
Data as a Service
EHR
Electronic Health Record
GKM
Group Key Management
KEV
Key Extraction Vector
KP-ABE
Key Policy Attribute Based Encryption
OCBE
Oblivious Commitment Based Envelope
PaaS
Platform as a Service
PI
Public Information tuple
PIR
Private Information Retrieval
RBAC
Role Based Access Control
SaaS
Software as a Service
SLE
Single Layer Encryption
SMC
Secure Multiparty Computation
TLE
Two Layer Encryption
xii
TTP
Trusted Third Party
UA
User-Attribute matrix
ZKPK
Zero Knowledge Proof of Knowledge
xiii
ABSTRACT
Mohamed Nabeel, Mohamed Yoosuf Ph.D., Purdue University, August 2012. Privacy Preserving Access Control for Third-Party Data Management Systems. Major
Professor: Elisa Bertino.
The tremendous growth in electronic media has made publication of information
in either open or closed environments easy and effective. However, most application
domains (e.g. electronic health records (EHRs)) require that the fine-grained selective access to information be enforced in order to comply with legal requirements,
organizational policies, subscription conditions, and so forth. The problem becomes
challenging with the increasing adoption of cloud computing technologies where sensitive data reside outside of organizational boundaries. An important issue in utilizing
third party data management systems is how to selectively share data based on finegrained attribute based access control policies and/or expressive subscription queries
while assuring the confidentiality of the data and the privacy of users from the third
party.
In this thesis, we address the above issue under two of the most popular dissemination models: pull based service model and subscription based publish-subscribe
model. Encryption is a commonly adopted approach to assure confidentiality of data
in such systems. However, the challenge is to support fine grained policies and/or
expressive content filtering using encryption while preserving the privacy of users.
We propose several novel techniques, including an efficient and expressive group key
management scheme, to overcome this challenge and construct privacy preserving
dissemination systems.
1
1 INTRODUCTION
In the cloud computing era, disseminating and sharing data through a third-party
service provider has never been more economical and easier than now. However,
such service providers cannot be trusted to assure the confidentiality of the data.
In fact, data privacy and security issues have been major concerns for many organizations utilizing such services. Data (e.g. electronic health records (EHRs)) often
encode sensitive information and should be protected in order to comply with various
organizational policies, legal regulations, subscription conditions, and so forth. Encryption is a commonly adopted approach to protect the confidentiality of the data.
Encryption alone however is not sufficient as organizations often have to enforce finegrained access control on the data. Such control is often based on the attributes of
users, referred to as identity attributes, such as the roles of users in the organization,
projects on which users are working and so forth, as well as the attributes of data,
referred to as content attributes. These systems, in general, are called attribute based
systems. Therefore, an important requirement is to support fine-grained access control, based on policies and subscription conditions specified using identity and content
attributes, over encrypted data.
With the involvement of the third-party services, a crucial issue is that the identity attributes in the access control policies (ACPs) often reveal privacy-sensitive
information about users and leak confidential information about the data. The confidentiality of the data and the privacy of the users are thus not fully protected if
the identity attributes are not protected. Further, privacy, both individual as well
as organizational, is considered a key requirement in all solutions, including cloud
services, for digital identity management [1–4]. Further, as insider threats [5] are
one of the major sources of data theft and privacy breaches, identity attributes must
be strongly protected even from accesses within organizations. With initiatives such
2
as cloud computing the scope of insider threats is no longer limited to the organizational perimeter. Therefore, protecting the identity attributes of the users while
enforcing attribute-based access control both within the organization as well as in the
third-party service is crucial.
In this thesis, we investigate the problem of providing privacy preserving access
control on third-party systems under two of the most popular dissemination models:
pull based service model and subscription based publish-subscribe model. In a pull
based system, the data owner (Owner) uploads its data to a third-party server which
acts as a data repository. Users having valid credentials are allowed to download
data from the server. In a subscription based system, authorized users submit subscription queries, which specify their interests, to the third-party server, which acts
as a brokering network. The Owner publishes data to the third-party server which
in turn forwards the data to many matching users based on their subscriptions. For
both models, we propose approaches to assure confidentiality of the data and privacy
of users from the third party server. The challenge is to support fine grained policies and/or expressive data filtering using encryption while preserving the privacy
of users. Group key management (GKM) is a fundamental building block used to
address this challenge. We identify that the existing GKM schemes are not well designed to manage keys based on the attributes of users and to protect the privacy. As
part of this thesis, we first address this issue by constructing a novel scheme called
attribute based GKM (AB-GKM).
1.1 Privacy Preserving Access Control in Pull Based Systems
Figure 1.1 shows the architecture of a typical pull based system. Users initially
registers with the Owner and obtains the keys for the data they are authorized to
access. The Owner selectively encrypts the data and uploads to the third party server
such as Amazon S3 or Rackspace Cloud Files. Users download encrypted data from
3
the third party and decrypt using the keys obtained from the Owner at the time of
registration.
(3) Selectively encrypt
& upload
Owner
Third Party
Server
(5) Download to re-encrypt
(1) Register
(2) Keys
(4) Download &
decrypt
User
Figure 1.1.: A typical pull based system
We identify the following requirements to assure privacy of users and confidentiality of data from the third-party while at the same time assuring that the third-party
enforces the ACPs specified by the data owner.
• The identity attributes of users must not be revealed to the third-party.
• The ACPs of the Owner must not be revealed to the third-party.
• The third-party must not learn the sensitive information in the data.
• Users must be granted access to portions of data only if their identity attributes
satisfy the corresponding ACPs.
As shown in Figure 1.1, the most common approach to support fine-grained selective attribute-based access control before uploading the data to the third-party server
is to encrypt each data item to which the same ACP (or set of ACPs) applies with the
same key. One approach to deliver the correct keys to the users based on the policies
they satisfy is to use a hybrid solution where the keys are encrypted using a public key
cryptosystem such as attribute based encryption (ABE) and/or proxy re-encryption
4
(PRE). However, such an approach has several weaknesses: it cannot efficiently handle adding/revoking users or identity attributes, and policy changes; it requires to
keep multiple encrypted copies of the same key; it incurs high computational cost.
Therefore, a different approach is required.
It is worth noting that a simplistic group key management (GKM) scheme in
which the Owner directly delivers the symmetric keys to corresponding users has
some major drawbacks with respect to user privacy and key management. On one
hand, user private information encoded in the user identity attributes is not protected
in the simplistic approach. On the other hand, such a simplistic key management
scheme does not scale well as the number of users becomes large and when multiple
keys need to be distributed to multiple users. A key contribution of this thesis is
to develop a key management scheme which does not have the above shortcomings.
We observe that, without utilizing public key cryptography and by allowing users to
dynamically derive the symmetric keys at the time of decryption, one can address the
above weaknesses. Based on this idea, we first formalize a new GKM scheme called
broadcast GKM (BGKM) and then give a secure construction of BGKM scheme and
formally prove its security.
1.2 Privacy Preserving Access Control in Subscription-based Systems
Figure 1.2 shows the architecture of a content based publish subscribe (CBPS)
system. The Owner plays the role of content publishers (Pubs) and users play the
role of subscribers (Subs). The third-party brokering network manages subscriptions
from users and distribute the data published by the Owner, called notifications, to
users based on their subscriptions.
We identify the following requirements to assure privacy of users and confidentiality of data published by the Owner form the third-party brokering network while at
the same time assuring that only authorized users can access the data.
5
Third party broker network
Data owners
Pub 1
Users
Bro1
Bro5
Sub1
Bro3
Pub2
Bro2
Sub2
Bro4
Sub3
Notification
Subscription
Figure 1.2.: A typical publish-subscribe system
• Publication confidentiality: The content of notifications must be hidden from
the third party brokers.
• Subscription privacy: The content of the subscriptions must be hidden from the
third party brokers.
• The third party brokers must make forwarding decisions on hidden notifications
and subscriptions without learning the actual differences of notification and
subscription values. In other words, a randomized comparison scheme must be
provided.
Privacy and confidentiality issues in CBPS systems have long been identified [6],
but little progress has been made to address these issues in a holistic manner. Most of
prior work on data confidentiality techniques in the context of CBPS systems is based
on the assumption that content brokers are trusted with respect to the privacy of the
subscriptions by users [7–9]. With the absence of such an assumption the problem
becomes challenging as brokers need to make decisions without knowing the actual
notifications and subscriptions. In this thesis, we address this challenge by proposing a
novel scheme which is inspired from the Paillier homomorphic cryptosystem [10], and
6
uses AB-GKM scheme and zero-knowledge proof of knowledge (ZKPK) protocols [11].
It should be noted that existing approaches that try to achieve similar goals as ours
have limitations which undermine flexibility and/or accuracy [12–14].
1.3 Attribute Based Group Key Management
Group key management (GKM) plays a key role in building privacy preserving
data dissemination systems under both pull based models as well as publish-subscribe
models. Attribute based systems enable fine-grained access control among a group
of users each identified by a set of attributes. Privacy preserving data dissemination
systems need such flexible attribute based systems for managing and distributing
group keys. However, current GKM schemes are not well designed to manage group
keys based on the identity attributes of users.
In this thesis, we construct a new key management scheme called broadcast GKM
(BGKM) that allows users whose attributes satisfy a certain policy to derive group
keys. The idea is to give secrets to users based on the identity attributes they have
and later allow them to derive actual symmetric keys based on their secrets and
some public information. A key advantage of the BGKM scheme is that adding
users/revoking users or updating ACPs can be performed efficiently and only requires
updating the public information. Our BGKM scheme satisfies the requirements of
minimal trust, key indistinguishability, key independence, forward secrecy, backward
secrecy and collusion resistance as described in [15] with minimal computational,
space and communication cost.
Using the BGKM scheme as a building block, we construct a more expressive
GKM scheme called attribute based GKM (AB-GKM) which allows one to express
any threshold or monotonic 1 conditions over a set of identity attributes as the group
membership condition. It should be noted that the AB-GKM scheme recalls the
notion of attribute-based encryption (ABE) [16–18]; however, as we discuss later in
1
Monotone formulas are Boolean formulas that contain only conjunction and disjunction connectives,
but no negation.
7
Chapter 3, ABE has several shortcomings when applied to GKM. In the pull based
model, we use the AB-GKM scheme to manage the keys used to selectively encrypt
data based on fine-grained policies. In the publish-subscribe model, we use AB-GKM
to manage the keys to encrypt payload messages.
1.4 Contributions and Document Structure
This thesis studies how we can build privacy preserving access control on third
party data management systems. Specifically, we propose privacy preserving access
control for two of the most popular dissemination models: pull based service model
and subscription based publish-subscribe model.
Chapter 2 proposes a new GKM scheme called broadcast GKM (BGKM) and
provides detailed security proofs to show that the scheme is secure. Using the BGKM
construct as a building block, in Chapter 3, we propose a more expressive scheme
called attribute based GKM (AB-GKM) which can handle any monotonic policies over
attribute conditions. We provide experimental results to show that our constructs
are efficient and practical.
Chapter 4 proposes a novel approach to privacy preserving pull based system
called Single Layer Encryption (SLE). To the best of our knowledge, it is the first
approach to assure the confidentiality of the data from the third party server and
preserve the privacy of users while enforcing attribute based ACPs on data. In the
SLE approach, the Owner itself enforces all ACPs by selectively encrypting the data
before uploading to the third party. While the SLE approach provides many benefits
over existing solutions, the Owner has to incur high communication and computation
cost to manage keys and encryptions whenever user credentials or organizational
authorization policies change. A better approach should delegate the enforcement
of fine-grained access control to the third party, so to minimize the overhead at the
Owner, whereas at the same time assuring data confidentiality from the third-party
server. In Chapter 5, we propose an extension to SLE approach called the Two Layer
8
Encryption (TLE) in order to address such requirement. Under the TLE approach,
the Owner performs a coarse grained encryption and the third party performs a fine
grained encryption. Since as much access control enforcement as possible is delegated
to the third party, the TLE approach reduces the workload at the Owner. In both
approaches, AB-GKM scheme is used to manage group keys and support attribute
based ACPs through selective encryption. We provide experimental results for both
approaches and compare their performance.
Chapter 6 proposes a novel privacy preserving subscription based system. Compared to pull based systems, additional mechanisms are required to preserve the
privacy in subscription based systems as the third party needs to make decisions
based on data in addition to the credentials of users. Our approach preserves the
privacy of the subscriptions made by users and confidentiality of the data published
by the Owner using a tweaked version of the Paillier homomorphic cryptosystem [10]
when third-party content brokers are utilized to make routing decisions based on the
content. The AB-BGKM scheme is used to manage the keys used to encrypt the
payload of the data published. Our protocols are expressive to support any type of
subscriptions and designed to work efficiently. We distribute the work such that the
load on the third party content brokers, where the bottleneck is in a CBPS system,
is minimized. We extend SIENA [19], a popular CBPS system using our protocols to
implement a privacy preserving CBPS system.
Chapter 7 surveys the work related privacy preserving data dissemination systems
as well as the cryptographic techniques we propose as part of this thesis.
Chapter 8 provides a summary of this thesis and discuss extensions and future
work.
9
2 BROADCAST GROUP KEY MANAGEMENT
Group key management (GKM) plays a key role in building privacy preserving data
dissemination systems under both pull based models as well as publish-subscribe
models. Attribute based systems enable fine-grained access control among a group
of users each identified by a set of attributes. Privacy preserving data dissemination
systems need such flexible attribute based systems for managing and distributing
group keys. However, current group key management schemes are not well designed
to manage group keys based on the identity attributes of users.
A challenging well known problem in GKM is how to efficiently handle group
dynamics, i.e., a new user joining or an existing group member leaving. When the
group changes, a new group key must be shared with the existing members, so that
a new group member cannot access the data transmitted before she joined (forward
secrecy) and a user who left the group cannot access the data transmitted after she
left (backward secrecy). The process of issuing a new key is called rekeying or update.
Another challenging problem is to defend against collusion attacks by which a set of
colluding fraudulent users are able to obtain group keys which they are not allowed
to obtain individually.
In a traditional GKM scheme, when the group changes, the private information
given to all or some existing group members must be changed which requires establishing private communication channels. Establishing such channels is a major
shortcoming especially for highly dynamic groups. We observe that, without utilizing
public key cryptography and by allowing users to dynamically derive the symmetric keys at the time of decryption, one can address this weaknesses. Based on this
idea, in this chapter, we first propose a new GKM scheme called broadcast GKM
(BGKM) scheme [20,21] that addresses this weakness. The scheme allows one to per-
10
form rekeying operations by only updating some public information without affecting
private information existing group members possess.
In this section, we first list the requirements for an effective GKM, then give an
overview of BGKM schemes and finally present our construction along with security
proofs.
2.1 Requirements for a Secure and Effective GKM
Several requirements are identified and discussed by Challel and Seba [15] and
others for effective GKM. Generally speaking, an efficient and practical GKM should
address the following requirements.
• Minimal trust requires the GKM scheme to place trust on a small number of
entities.
• Key hiding requires that with given public information, it is hard for anyone
outside the group to gain the shared group key. Ideally, every element in the
keyspace should have the same probability of being the real key.
• Key independence requires that the leak of one key does not compromise
other keys.
• Backward secrecy means that a member who has left the group cannot access
any future group keys.
• Forward secrecy means that a newly joining group member cannot access any
old keys.
• Collusion resistance requires that a set of colluding fraudulent users should
not obtain keys which they are not allowed to obtain individually.
• Low bandwidth overhead requires that the rekeying should not incur a high
volume of messages.
11
• Computational costs should be acceptable at both the server and the group
member.
• Storage requirements for keys and other relevant information should be minimal.
• Ease of maintenance requires that a single change of membership in the group
does not need many changes to take place for the other group members.
• Other requirements include service availability, minimal packet delays, and
so on. These factors are sometimes more affected by real-world settings and
implementation, and less related to the high-level design of the GKM.
2.2 Broadcast GKM
In order to provide forward and backward secrecy, rekey operations should be
performed whenever the users in the group change. Typical GKM schemes require
O(n) [22, 23] or at least O(log n) [24, 25] private communication channels to perform the rekey operation. In comparison, BGKM schemes make rekey a one-off process [26–28]. In such schemes, rekeying is performed with a single broadcast without
using private communication channels. It should be noted that even though BGKM
schemes have some similarity with secret sharing (SS) schemes, they are constructed
for different purposes. “k out of n” SS schemes [29, 30] are constructed to split a
secret among n users and allow to recover the secret by combining at least k secret
shares. On the contrary, BGKM schemes allow each valid user to recover the secret by
using only their secret share. Also, colluding users, who individually cannot recover
the secret, are not able to recover the secret collectively. Unlike conventional GKM
schemes, BGKM schemes do not give users the private keys. Instead users are given
a secret which is combined with public information to obtain the actual private keys.
Such schemes have the advantage that it requires a private communication only once
for the initial secret sharing and the subsequent rekeying operations are performed
12
using one broadcast message. Further, such schemes can provide forward and backward security by only changing the public information and without affecting secret
shares given to existing users. Based on our preliminary work [20], we propose a provably secure BGKM scheme, called ACV-BGKM (Access Control Vector BGKM), and
formalize the notion of BGKM. Further we prove the security of ACV-BGKM.
Definition 2.2.1 (BGKM) In general, a BGKM scheme consists of the following
five algorithms:
• Setup(ℓ): It initializes the BGKM scheme using a security parameter ℓ. It also
initializes the set of used secrets S, the secret space SS and the key space KS.
All the parameters are collectively denoted as Param.
• SecGen(): It selects a random bit string s ∈
/ S uniformly at random from the
secret space SS, adds s to S and outputs s.
• KeyGen(S): It chooses a group key K uniformly at random from the key space
KS and outputs the public information P I computed from the secrets in S and
the group key K.
• KeyDer(s, P I): It takes the user’s secret s and the public information P I to
output the group key. The derived group key is equal to K if and only if s ∈ S.
• Update(S) Whenever the set S changes, a new group key K ′ is generated.
Depending on the construction, it either executes the KeyGen algorithm again
or incrementally updates the output of the last KeyGen algorithm.
Now we provide some basic notions and formally define security.
Negligible functions
We call a function f : N → R negligible if for every positive polynomial p(·) there
exists an N such that for all n > N , we have f (n) < 1/p(n) [31].
Random oracle model
The random oracle model is a paradigm introduced by Bellare and Rogaway [32] for
13
design and analysis of certain cryptographic protocols. Intuitively, a random oracle
is a mathematical function that can be queried by anyone, and maps every query to
a uniformly randomly chosen response from its output domain. In practice, random
oracles can be used to model cryptographic hash functions in many cryptographic
schemes.
A BGKM scheme should allow a valid group member to derive the shared group
key, and prohibit anyone outside the group from doing so. Formally speaking, a
BGKM scheme should satisfy the following security properties. It must be correct,
sound, key hiding, and forward/backward key protecting. Let Svr be the group controller.
Definition 2.2.2 (Correctness) Let Usr 1 be a current group member with a secret.
Let K and PubInfo be Svr’s output of the KeyGen algorithm. Let K ′ be Usr’s output
of the KeyDer algorithm. A BGKM scheme is correct if Usr can derive the correct
group key K with overwhelming probability, i.e.,
Pr[K = K ′ ] ≥ 1 − f (k),
where f is a negligible function in k.
Definition 2.2.3 (Soundness) Let Usr be an individual without a valid secret. A
BGKM scheme is sound if the probability that Usr can obtain the correct group key
K by substituting the secret with a value val that is not one of the valid secrets and
then following the key derivation phase KeyDer is negligible.
We define the following security game to define the key hiding requirement.
Definition 2.2.4 (KeyHideA,Π )
1. The Svr, as the challenger, runs the KeyGen
algorithm of the BGKM scheme Π and gives the parameters Param to the adversary A.
1
In what follows we use the term Usr; however in practice the steps are carried out by the client
software transparently to the actual end user.
14
2. A selects two random keys K0 , K1 ∈ KS and give to the Svr.
3. The Svr flips a random coin b ∈ {0, 1} and selects Kb as the group key and runs
the KeyGen algorithm.
4. The Svr gives the public information PubInfo of the output of the KeyGen algorithm to A.
5. A outputs a guess b′ of b.
6. The output of the game is defined to be 1 if b′ = b, and 0 otherwise. We write
KeyHideA,Π = 1 if the output is 1 and in this case we say that A wins the
game.
The advantage of A in this game is defined as Pr[KeyHideA,Π = 1] − 1/2.
Definition 2.2.5 (Key hiding) A BGKM scheme is key hiding if given PubInfo,
any party which does not have a valid secret cannot distinguish the real group key
from a randomly chosen value in the keyspace KS with nonnegligible probability. More
specifically, a BGKM scheme, Π, is key hiding if for any adversary A as a probabilistic
interactive Turing machine [33], has a negligible advantage in the key hiding security
game 2.2.4:
Pr[KeyHideA,Π = 1] ≤ 1/2 + f (k),
where f is a negligible function in k.
Definition 2.2.6 (Forward/backward key protecting) Suppose Svr runs an Update algorithm to generate Param for a new shared group key K ′ , and a previous
member Usr is no longer a group member after the Update algorithm. Let K be a previous shared group key which can be derived by Usr with a secret. A BGKM scheme is
backward key protecting if an adversary with knowledge of the secret, K, and the new
PubInfo cannot distinguish the new key K ′ from a random value in the keyspace KS
with nonnegligible probability. Similarly, a BGKM scheme is forward key protecting
if a new group member Usr after running the Update algorithm cannot learn anything
about the previous group keys.
15
2.3 Our Construction: ACV-BGKM
We now provide our construction of BGKM, the ACV-BGKM scheme, under
a client-server architecture. The ACV-BGKM scheme satisfies the requirements of
minimal trust, key indistinguishability, key independence, forward secrecy, backward
secrecy and collusion resistance as described earlier.
ACV-BGKM algorithms are executed with a trusted key server Svr and a group
of users Usri , i = 1, 2, . . . , n.
Setup(ℓ): Svr initializes the following parameters: an ℓ-bit prime number q, a cryptographic hash function H(·) : {0, 1}∗ → Fq , where Fq is a finite field with q elements,
the keyspace KS = Fq , the secret space SS = {0, 1}ℓ and the set of issued secrets
S = ∅.
SecGen(Usri ): Svr chooses the secret si ∈ SS uniformly at random for Usri such
/ S and adds si to S.
that si ∈
KeyGen(S): Svr picks a random K ∈ KS as the group key. Svr chooses n random bit strings z1 , z2 , . . . , zn ∈ {0, 1}ℓ . Svr creates
1 a1,1 a1,2 . . .
1 a2,1 a2,2 . . .
A=
.
..
...
...
..
.
1 an,1 an,2 . . .
where
an n × (n + 1) Fq -matrix
a1,n
a2,n
.. ,
.
an,n
ai,j = H(si ||zj ), 1 ≤ i ≤ n, 1 ≤ j ≤ n, si ∈ S.
(2.1)
Svr then solves for a nonzero (n + 1)-dimensional column Fq -vector Y such that
AY = 0. Note that such a nonzero Y always exists as the nullspace of matrix A is
16
nontrivial by construction. Here we require that Svr chooses Y from the nullspace of
A uniformly at random. Svr constructs an (n + 1)-dimensional Fq -vector
X = K · eT1 + Y,
, v T denotes the transpose
where e1 = (1, 0, . . . , 0) is a standard basis vector of Fn+1
q
of vector v, and k is the chosen group key. The vector X is called an ACV , access
control vector. Svr lets P I = (X, (z1 , z2 , . . . , zn )), and outputs public P I and private
K.
KeyDer(si , P I): Using its secret si and the public information P I, Usri computes
ai,j , 1 ≤ j ≤ n, as in formula (2.1) and sets an (n + 1)-dimensional row Fq -vector
vi = (1, ai,1 , ai,2 , . . . , ai,n ). Usri derives the group key as K ′ = vi · X.
Update(S): It runs the KeyGen(S) algorithm and outputs the new public information P I ′ and the new group key K ′ .
2.4 Security Analysis
In the security analysis of ACV-BGKM, we will model the cryptographic hash
function H as a random oracle. We further assume q = O(2k ) is a sufficiently large
prime power. We first present two lemmas with their proofs and then prove the
theorems introduced in Section 2.1.
The following lemmas are useful for the security analysis of ACV-BGKM. Lemma 1
says that in a vector space V over a large finite field, the probability that a randomly
chosen vector is in a pre-selected subspace, strictly smaller than V , is very small.
Lemma 2 will be used in the proof of Theorem 2.6.1.
Lemma 1 Let F = Fq be a finite field of q elements. Let V be an n-dimensional
F -vector space, and W be an m-dimensional F -subspace of V , where m ≤ n. Let v
be an F -vector uniformly randomly chosen from V . Then the probability that v ∈ W
is 1/q n−m .
17
Proof The proof is straightforward. We show it here for completeness. Let {v1 , v2 ,
. . . , vm } be a basis of W . Then it can be extended to a basis of V by adding another
n − m basis vector vm+1 , . . . , vn . Any vector v ∈ V can be written as
v = α1 · v1 + . . . + αn · vn ,
αi ∈ F, 1 ≤ i ≤ n,
and v ∈ W if and only if αi = 0 for m + 1 ≤ i ≤ n. When v is uniformly randomly
chosen from V , if follows
Pr[v ∈ W ] = 1/q n−m .
(2)
(n)
Lemma 2 Let F = Fq be a finite field of q elements. Let vi = (1, vi , . . . , vi ), i =
1, . . . , m, and 1 ≤ m < n, be n-dimensional F -vectors. Let v = (1, v (2) , . . . , v (n) )
be an n-dimensional F -vector with v (j) , j ≥ 2 independently and uniformly randomly
chosen from F . Then the probability that v is linearly dependent of {vi , 1 ≤ i ≤ m}
is no more than 1/q n−m .
(2)
(n)
Proof Let wi = (vi , . . . , vi ), 1 ≤ i ≤ m, and w = (v (2) , . . . , v (n) ). All wi span
an F -subspace W whose dimension is at most m in an (n − 1)-dimensional F -vector
space. w is a uniformly randomly chosen (n − 1)-dimensional F -vector. By Lemma 1,
Pr[w ∈ W ] = 1/q n−1−dim(W ) ≤ 1/q n−1−m .
It follows that
Pr[v is linearly dependent of {vi : 1 ≤ i ≤ m}]
= Pr[v = α1 · v1 + . . . + αm · vm for some αi ∈ F ]
� m
�
m
t
t
= Pr
αi = 1 ∧ w =
αi · vi for some αi ∈ F
= Pr
� i=1
m
t
i=1
�
i=1
αi = 1 · Pr[w ∈ W ]
≤ 1/q · 1/q n−1−m = 1/q n−m .
18
(n+1)
Lemma 3 Let F = Fq be a finite field of q elements. Let vi = eTi + (0, . . . , 0, vi
(2n)
. . . , vi
,
), ei is the ith standard basis vector of F2q n , i = 1, . . . , m, and 1 ≤ m ≤
n, be 2n-dimensional F -vectors. Let v = eT + (0, . . . , 0, v (n+1) , . . . , v (2n) ) be a 2ndimensional F -vector with v (j) , j ≥ n + 1 chosen independently and uniformly at
random from F and e from the 2n-dimensional standard basis vectors with the position
of the non-zero element ≤ m. Then the probability that v is linearly dependent of
{vi , 1 ≤ i ≤ m} is no more than 1/q n−m .
(n+1)
Proof Let wi = (vi
(1)
(2n)
, . . . , vi
), 1 ≤ i ≤ m, w = (v (n+1) , . . . , v (2n) ), and ui =
(n)
(vi , . . . , vi ). All wi span an F -subspace W whose dimension is at most m in an
n-dimensional F -vector space. w and u are uniformly randomly chosen n-dimensional
F -vectors. By Lemma 1,
Pr[w ∈ W ] = 1/q n−dim(W ) ≤ 1/q n−m .
It follows that
Pr[v is linearly dependent of {vi : 1 ≤ i ≤ m}]
= Pr[v = α1 · v1 + . . . + αm · vm for some αi ∈ F ]
� m
�
m
t
t
αi · ui = eT ∧ w =
= Pr
αi · vi for some αi ∈ F
= Pr
� i=1
m
t
i=1
�
i=1
αi · ui = eT · Pr[w ∈ W ]
≤ 1/q n · 1/q n−m = 1/q 2n−m .
Theorem 2.4.1 ACV-BGKM is correct.
Proof The correctness of ACV-BGKM can be easily seen: Knowing its secret si and
the public values z1 , z2 , . . . , zn , a group member Usri can compute one row of matrix
A as
vi = (1, ai,1 , ai,2 , . . . , ai,n ),
19
where ai,j , 1 ≤ j ≤ n are as in formula (2.1). Therefore vi · Y = 0 for ACV Y , and
thus the group key can be derived with probability 1 as
�
�
vi · X = vi · K · e1T + Y = K · vi · eT1 = K.
Theorem 2.4.2 ACV-BGKM is sound.
Proof Let Y be a given access control vector. Let {vi , 1 ≤ i ≤ n} be a basis of the
nullspace of A. Let v = (1, v (2) , . . . , v (n+1) ), where v (i+1) = H(val||zi ), 1 ≤ i ≤ n. Usr
can derive the group key using v by following the KeyDer phase if and only if v is
linearly dependent of vi , 1 ≤ i ≤ n. When val is not a valid IST and H is a random
oracle, v is indistinguishable from a vector whose first entry is 1 and the other entries
are independently and uniformly chosen from Fq . By Lemma 2, the probability that
v is linearly dependent of {vi , 1 ≤ i ≤ n} is no more than 1/q n+1−n = 1/q, which is
negligible. This proves the soundness of ACV-BGKM.
Theorem 2.4.3 ACV-BGKM is key hiding.
Proof Let PubInfo = (X, (z1 , . . . , zn )) be the public information broadcast from Svr.
This is the only piece of information seen by the adversary that is related to the group
key. By construction, X must be linearly independent of the standard basis vector
eT1 , i.e., X has a nonzero entry after the first position. For any K ∈ KS = Fq , let
Y = X − K · eT1 .
Then it is clear that all Fq -vectors v such that v · Y = 0 form an n-dimensional
Fq -vector space, say W . It follows that the n basis vectors of W can be chosen in
such a way that they all have nonvanishing first entries. Therefore, the number of
vectors v with 1 as their first entry such that v · X = K is q n−1 , for all K ∈ KS.
When the cryptographic hash function H(·) is modeled as a random oracle and a
valid IST is unknown, every such a vector v assumes the same probability when
20
computed as specified in the KeyDer algorithm. This implies that every K ∈ KS has
the same probability, 1/q, to be the designated group key in the view of the adversary.
The key hiding property of ACV-BGKM follows as a direct consequence. Note that
ACV-BGKM is key hiding against a computationally unbounded adversary.
It is clear that “forward/backward key protecting” is a stronger condition than
“key hiding.” However, we will use the proof of the latter to show the former.
Theorem 2.4.4 ACV-BGKM is forward/backward key protecting.
Proof (Sketch) We first consider the backward key protecting property of ACVBGKM. Suppose that after the Update algorithm, an adversary has one secret s from
the previous session S0 which do not propagate to the new session S1 . As the choices
of s and the nullspace of the ACV in session S0 can be viewed as (statistically) jointly
independent of the determination of the nullspace of the ACV in session S1 , when H is
modeled as a random oracle and by design of the Update algorithm, Usr cannot learn
the group key for session S1 with non-negligible probability due to the key hiding
property of ACV-BGKM. Similarly, ACV-BGKM is forward key protecting.
Other related GKM security aspects mentioned in Section 2.1 are briefly discussed
as follows.
Minimal trust. In order to protect the shared group key from an adversary outside
of the group, ACV-BGKM only requires to use a private channel once between Svr
and each Usr, during the SecGen algorithm. The security of the ephemeral private
channels needs to be guaranteed. Any other communications, including the ones for
key issuance and rekeying, are executed via an open broadcast channel.
Key independence. It is clear that the group keys (of different sessions) are independent by ACV-BGKM construction. Furthermore, the secrets are also independent
of each other, because they are randomly generated.
21
Collusion resistance. For BGKM, it only makes sense to consider collusion attacks from outside the group. The case that a valid group member passes its secret
or the derived group key to others is not addressed by BGKM. Similar to the analysis
for ACV-BGKM’s forward/backward key protecting property, ACV-BGKM is resistant to polynomially computationally bounded adversaries. In particular, colluding
group members are not able to get the secrets of other members to derive group keys
of earlier or later sessions.
2.5 Improving the Performance of ACV-BGKM
In this section, we improve the performance of our basic ACV-BGKM scheme
using two techniques: bucketization and subset cover.
2.5.1 Bucketization
The proposed key management scheme works efficiently even when there are thousands of users. However, as the upper bound n of the number of involved users gets
large, solving the linear system AY = 0 over a large finite field Fq becomes the most
computationally expensive operation in our scheme. Solving this linear system with
the method of Gaussian-Jordan elimination [34] takes O(n3 ) time. Although this
computation is executed at the Svr, which is usually capable of carrying on computationally expensive operations, when n is very large, e.g., n = 100, 000, the resulting
costs may be too high for the Svr. Due to the non-linear cost associated with solving a linear system, we can reduce the overall computational cost by breaking the
linear system in to a set of smaller linear systems. We follow a two-level approach.
In this case, the Svr divides all the involved Usrs into multiple “buckets” (say m) of
a suitable size (e.g., 1000 each), computes an intermediate key for each bucket by
executing the KeyGen algorithm, and then computes the actual group key for all the
users by executing the KeyGen algorithm with the intermediate keys as the secrets.
Note that the intermediate key generation can be parallelized as each bucket is inde-
22
pendent. The Svr executes m + 1 KeyGen algorithms of smaller size. The complexity
of the KeyGen algorithm is proportional to O(n3 /m2 + m3 ). It can be shown that the
optimal solution is achieved when m reaches close to n3/5 .
Each intermediate key is associated with a marker so that Usrs can identify if they
have derived a valid intermediate key. For deriving the actual group key, Usrs are
required to execute m + 1 KeyDer algorithms in the worst case and 2 in the best case.
Since the KeyDer algorithm is linear in n, in general, the bucketization optimization
still improves the performance of the KeyDer algorithm. The complexity of the KeyGen
algorithm is proportional to O(n/m + m), but the average case runs faster.
2.5.2 Subset Cover
The bucketization approach becomes inefficient as the bucket size increases. The
issue is that the bucketization still utilizes the basic ACV-BGKM scheme. In our basic
ACV-BGKM scheme, as each user is given a single secret, it makes the complexity of
PubInfo and all algorithms proportional to n, the number of users in the group. We
utilize the result from previous research on broadcast encryption [35, 36] to improve
the complexity to sub-linear in n. Based on that, one can make the complexity sublinear in the number of users by giving more than one secret during SecGen for each
attribute users possess. The secrets given to each user overlaps with different subsets
of users. During the KeyGen, Svr identifies the minimum number of subsets to which
all the users belong and uses one secret per the identified subset. During KeyDer, a
user identifies the subset it belongs to and uses the corresponding secret to derive the
group key. Group dynamics are handled by making some of the secrets given to users
invalid.
We give a high-level description of the basic subset-cover approach. In the basic
scheme, n users are organized as the leaves of a balanced binary tree of height log n.
A unique secret is assigned to each vertex in the tree. Each user is given log n secrets
that correspond to the vertices along the path from its leaf node to the root node.
23
In order to provide backward secrecy when a single user is revoked, the updated tree
is described by log n subtrees formed after removing all the vertices along the path
from the user leaf node to the root node. To rekey, Svr executes Update using the
log n secrets corresponding to the roots of these subtrees. Naor et al. [35] improve
this technique to simultaneously revoke r users and describe the exiting users using
r log (n/r) subtrees. Since then, there have been many improvements to the basic
scheme. We implement Naor et al.’s complete subset scheme [35] in our experiments.
In our experimental results in Section 2.7, we show that combining the bucketization and the subset cover techniques, we can very efficiently execute ACV-BGKM
algorithms and can support very large user groups.
2.6 ACV-BGKM-2
The modified ACV-BGKM works under similar conditions as ACV-BGKM, but
instead of giving the same key k to all the users, the KeyDer algorithm gives each
Usri a different key ki when the public information tuple P I is combined with their
unique secret si .
The algorithms are executed with a trusted key server Svr and a group of users
Usri , i = 1, 2, · · · , n with the attribute universe A = {attr1 , attr2 , · · · , attrm }. The
construction is as follows:
Setup(ℓ): Svr initializes the following parameters: an ℓ-bit prime number q, the
maximum group size N (≥ n), a cryptographic hash function H(·) : {0, 1}∗ → Fq ,
where Fq is a finite field with q elements, the key space KS = Fq , the secret space
SS = {0, 1}ℓ and the set of issued secret tuples S = ∅. Each Usri is given a unique
secret index 1 ≤ i ≤ N .
SecGen(): The Svr chooses the secret si ∈ SS uniformly at random for Usri such
that si is unique among all the users, adds the secret tuple (i, si ) to S, and outputs
(i, si ).
24
KeyGen(S, K): Given the set of secret tuples S = {(i, si )|1 ≤ i ≤ N } and a random
set of keys K = {ki |1 ≤ i ≤ N }, it outputs the public information tuple P I which
allows each Usri to derive the key ki using its secret si . The details follow.
Svr chooses N random bit strings z1 , z2 , . . . , zN ∈ {0, 1}ℓ and creates an N × 2N
Fq -matrix A where for a given row i, 1 ≤ i ≤ N
if i = j
1
ai,j =
0
if 1 ≤ j ≤ N and i = j
H(si ||zj )
if N < j ≤ 2N
Like in the ACV-BGKM scheme, Svr computes the null space of A with a set of
its N basis vectors, and selects a vector Y as one of the basis vectors. Svr constructs
an 2N -dimensional Fq -vector
ACV = (
N
t
ki · eTi ) + Y,
i=1
where ei is the ith standard basis vector of F2q N . Notice that, unlike ACV-BGKM, a
unique key corresponding to Usri , ki ∈ K is embedded into each location corresponding to a valid index i. Like, ACV-BGKM, Svr sets P I = (ACV, (z1 , z2 , . . . , zN )), and
outputs P I via the broadcast channel.
KeyDer(si , P I): Usri , using its secret si and public P I, derives the 2N -dimensional
row Fq -vector vi which corresponds to a row in A. Then Usri derives the specific key
as ki = vi · ACV .
Update(S, K’): If a user leaves or join the group, a new set of keys K ′ is selected.
KeyGen(S, K’) is invoked to generate the updated public information P I ′ . Notice
that the secrets shared with existing users are not affected by the group change. It
outputs the public P I ′ .
25
2.6.1 Security Analysis
In this section, we prove the security of the modified ACV-BGKM scheme. Specifically we prove the soundness of the modified ACV-BGKM scheme. We will model the
cryptographic hash function H as a random oracle. We further assume that q = O(2ℓ )
is a sufficiently large prime power and N is relatively small. We first present an additional lemma with its proof and then prove that the modified ACV-BGKM scheme
is indeed sound.
(n+1)
Lemma 4 Let F = Fq be a finite field of q elements. Let vi = eTi + (0, . . . , 0, vi
(2n)
. . . , vi
,
), ei is the ith standard basis vector of F2q n , i = 1, . . . , m, and 1 ≤ m ≤
n, be 2n-dimensional F -vectors. Let v = eT + (0, . . . , 0, v (n+1) , . . . , v (2n) ) be a 2ndimensional F -vector with v (j) , j ≥ n + 1 chosen independently and uniformly at
random from F and e from the 2n-dimensional standard basis vectors with the position
of the non-zero element ≤ m. Then the probability that v is linearly dependent of
{vi , 1 ≤ i ≤ m} is no more than 1/q n−m .
(n+1)
Proof Let wi = (vi
(1)
(2n)
, . . . , vi
), 1 ≤ i ≤ m, w = (v (n+1) , . . . , v (2n) ), and ui =
(n)
(vi , . . . , vi ). All wi span an F -subspace W whose dimension is at most m in an
n-dimensional F -vector space. w and u are uniformly randomly chosen n-dimensional
F -vectors. By Lemma 1, we have Pr[w ∈ W ] = 1/q n−dim(W ) ≤ 1/q n−m . It follows
that
Pr[v is linearly dependent of {vi : 1 ≤ i ≤ m}]
= Pr[v = α1 · v1 + . . . + αm · vm for some αi ∈ F ]
� m
�
m
t
t
αi · ui = eT ∧ w =
= Pr
αi · vi for some αi ∈ F
= Pr
� i=1
m
t
i=1
�
i=1
αi · ui = eT · Pr[w ∈ W ]
≤ 1/q n · 1/q n−m = 1/q 2n−m .
26
Definition 2.6.1 (Soundness of the modified ACV-BGKM scheme) Let Usri
be an individual without a valid secret and Usrj with a valid secret sj , 1 ≤ i, j ≤ N .
The modified ACV-BGKM is sound if
• The probability that Usri can obtain the correct key ki by substituting the secret
with a value val that is not one of the valid secrets and then running the key
derivation algorithm KeyDer is negligible.
• The probability that Usrj can obtain a correct key kr , where j = r and 1 ≤ r ≤ N ,
by substituting sj and then running the key derivation algorithm KeyDer is
negligible.
Theorem 2.6.1 The modified ACV-BGKM scheme is sound.
Proof Let P I = (ACV, (z1 , . . . , zN )) be the public information broadcast from Svr.
Case 1: Usri does not have a valid secret and tries to derive ki .
Let Y be a vector orthogonal to the access control matrix A.
Let {vi , 1 ≤ i ≤ N }, be a basis of the nullspace of Y .
Let v = eT + (0, . . . , 0, v (N +1) , . . . , v (2N ) ), where v (i+N ) = H(val||zi ), 1 ≤ i ≤ N.
Usri can derive the key using v by running the KeyDer algorithm if and only if v
is linearly dependent from vi , 1 ≤ i ≤ N . When val is not a valid secret and H is
a random oracle, v is indistinguishable from a vector whose first N entries are from
eT and the rest of the N entries are independently and uniformly chosen from Fq .
By Lemma 4, the probability that v is linearly dependent from {vi , 1 ≤ i ≤ N } is
no more than 1/q 2N −N = 1/q N , which is negligible. This proves that the modified
ACV-BGKM scheme is sound in case 1.
Case 2: Usrj has a valid secret sj and tries to derive kr , where r = j and 1 ≤ r ≤ N .
Since Usrj has a valid secret sj , it can construct the j th row of A as follows:
(N +1)
vj = eTj + (0, . . . , 0, vj
(2N )
, . . . , vj
(i+N )
), where vj
= H(sj ||zi ), 1 ≤ i ≤ N.
27
Usrj can obtain the key kj using vj :
kj = ACV · vj .
In order to obtain the key kr , Usrj needs to compute ACV · vr where vr is defined
as follows.
vr = eTr + (0, . . . , 0, vr(N +1) , . . . , vr(2N ) ), where vr(i+N ) = H(val||zi ), 1 ≤ i ≤ N.
By construction, vr is linearly independent from vj . When val is not a valid secret
and H is a random oracle, vr is indistinguishable from a vector whose first N entries
are from eTr and the rest of the N entries are independently and uniformly chosen
from Fq . Thus, knowing vj does not provide an advantage for Usrj to compute vr .
Therefore, the probability of deriving kr by running the KeyDer algorithm remains
the same negligible value 1/q N as in case 1. This proves that the modified ACVBGKM scheme is sound in case 2.
2.7 Experimental Results
In this section, we present experimental results for the optimized ACV-BGKM.
The experiments were performed on a machine running GNU/Linux kernel version
R CoreTM 2 Duo CPU T9300 2.50GHz and 4 Gbytes memory.
2.6.32 with an Intel�
Only one processor was used for computation. The code is built with 32-bit gcc
version 4.4.3, optimization flag -O2. For the ACV-BGKM scheme, we use V. Shoup’s
NTL library [37] version 5.4.2 for finite field arithmetic, and SHA-1 implementation
of OpenSSL [38] version 0.9.8 for cryptographic hashing.
We implemented the ACV-GKM scheme with both the bucketization and the
subset cover optimizations. We utilized the complete subset algorithm introduced by
Naor et. al. [35] for the subset cover. We assumed that 5% of the users satisfying a
given Pc are revoked. With the bucketization optimization, we assumed the average
case for the KeyDer algorithm where Usrs require to derive half of the intermediate
28
keys before deriving the group key. For the experiments involving fixed number of
buckets, 10 buckets are utilized. All finite field arithmetic operations in our scheme
are performed in an 512-bit prime field.
Figure 3.1 reports the average time spent to execute the KeyGen algorithm of
the ACV-BGKM scheme without any optimizations, with bucketization, and with
subset cover optimization for different group sizes. The bucketization outperforms
the base scheme as it divides the non-linear KeyGen algorithm into smaller and more
efficient computations. Subset-cover optimization provides even better performance
as it reduces the effective group size considerably by sharing secrets among multiple
Usrs. As shown in Figure 2.2, the KeyDer algorithm has similar results.
160
Time (in seconds)
140
Base
Bucketization
Subset Cover
120
100
80
60
40
20
0
100
200
300
400
500
600
700
800
900
1000
Group Size
Figure 2.1.: Average time to generate keys
Figure 2.3 shows the average time to execute the KeyGen algorithm for 2500 and
5000 user groups with an increasing number of buckets. When more buckets are
utilized, the size of the problem the KeyGen has to solve reduces and, hence, the
bucketization provides a better performance. However, as mentioned in Section 2.5.1,
the performance starts to degrade as the number of buckets is greater than the the
optimal number of buckets. For n = 2500 and 5000, the optimal number of buckets
are around 100 and 150 respectively. These values are consistent with the theoretical
minimum overhead. Under similar settings, Figure 2.4 shows the time to execute the
29
140
Base
Bucketization
Subset Cover
120
Time (in ms)
100
80
60
40
20
0
100
200
300
400
500
600
700
800
900
1000
Group Size
Figure 2.2.: Average time to derive keys
KeyDer algorithm. The key derivation time slowly increases as the number of buckets
increases because the complexity of the second level KeyDer function increases.
450
2500 Users
2500 Users
400
Time (in seconds)
350
300
250
200
150
100
50
0
0
50
100
150
200
250
300
350
400
Number of Buckets
Figure 2.3.: Average time to generate keys with different bucket sizes
We closely analyzed the two optimizations. Figure 2.5 shows the average time
to execute the KeyGen algorithm with the bucketization, the subset cover and both
where the bucketization is applied after the subset cover technique. Both techniques
together provides a huge performance improvement. Under the similar setting, as
shown in Figure 2.6, the KeyGen also performs much better compared to the individual
optimizations.
30
500
5000 Users
2500 Users
Time (in ms)
450
400
350
300
250
200
0
20
40
60
80
100
120
Number of Buckets
140
160
180
200
Figure 2.4.: Average time to derive keys with different bucket sizes
60
Subset Cover
Bucketization
Both
Time (in seconds)
50
40
30
20
10
0
200
400
600
800
1000
1200
1400
1600
1800
2000
Group Size
Figure 2.5.: Average time to generate keys with the two optimizations
180
160
Time (in ms)
140
Subset Cover
Bucketization
Both
120
100
80
60
40
20
0
200
400
600
800
1000
1200
Group Size
1400
1600
1800
2000
Figure 2.6.: Average time to derive keys with the two optimizations
31
3 ATTRIBUTE BASED GROUP KEY MANAGEMENT
While BGKM schemes provide efficient rekeying, they do not support expressive
group membership policies over a set of attributes. In their basic form, they can only
support 1-out-of-n threshold policies by which a group member possessing 1 attribute
out of the possible n attributes is able to derive the group key. In order to address this
issue, in this chapter, we develop novel expressive attribute based GKM (AB-GKM)
schemes which allow one to express any threshold or monotonic policies over a set of
attributes.
A possible approach to construct an AB-GKM scheme is to utilize attribute-based
encryption (ABE) primitives [16–18]. Such an approach would work as follows. A
key generation server issues each group member a private key (a set of secret values)
based on the attributes and the group membership policies. The group key, typically a symmetric key, is then encrypted under a set of attributes using the ABE
encryption algorithm and broadcast to all the group members. The group members
whose attributes satisfy the group membership policy can obtain the group key by
using the ABE decryption primitive. One can use such an approach to implement an
expressive collusion-resistant AB-GKM scheme. However, such an approach suffers
from some major drawbacks. Whenever the group dynamic changes, the rekeying
operation requires to update the private keys given to existing members in order to
provide backward/forward secrecy. This in turn requires establishing private communication channels with each group member which is not desirable in a large group
setting. Further, in applications involving stateless members where it is not possible
to update the initially given private keys and the only way to revoke a member is to
exclude it from the public information, an ABE based approach does not work. Another limitation is that whenever the group membership policy changes, new private
32
keys must be re-issued to members of the group. Our constructions address these
shortcomings.
Our AB-GKM schemes are able to support a large variety of conditions over a
set of attributes. When the group changes, the rekeying operations do not affect the
private information of existing group members and thus our schemes eliminate the
need of establishing private communication channels. Our schemes provide the same
advantage when the group membership conditions change. Furthermore, the group
key derivation is very efficient as it only requires a simple vector inner product and/or
polynomial interpolation. Additionally, our schemes are resistant to collusion attacks.
Multiple group members are unable to combine their private information in a useful
way to derive a group key which they cannot derive individually.
Our AB-GKM constructions are based on an optimized version of the ACV-BGKM
(Access Control Vector BGKM) scheme presented in Chapter 2, a provably secure
BGKM scheme, and Shamir’s threshold scheme [29]. In this paper, we construct three
AB-GKM schemes each of which is more suitable over others under different scenarios.
The first construction, inline AB-GKM, is based on the ACV-BGKM scheme. Inline
AB-GKM supports arbitrary monotonic policies over a set of attributes. In other
words, a user whose attributes satisfy the group policies is able to derive the symmetric
group key. However, inline AB-GKM does not efficiently support d-out-of-m (d ≤ m)
attribute threshold policies over m attributes. The second construction, threshold
AB-GKM, addresses this requirement. The third construction, access tree AB-GKM,
is an extension of threshold AB-GKM and is the most expressive scheme. It efficiently
supports arbitrary policies. The second and third schemes are constructed by using
a modified version of ACV-BGKM, also proposed in this paper.
3.1 Scheme 1: Inline AB-GKM
Recall that in its basic form, a BGKM scheme can be considered as a 1-out-of-m
AB-GKM scheme. If Usri possesses the attribute attrj , Svr shares a unique secret
33
si,j with Usri . Usri is thus able to derive the symmetric group key if and only if Usri
shares at least one secret with Svr and that secret is included in the computation
of the public information tuple P I. In order for Svr to revoke Usrj , it only needs
to remove the secrets it shares with Usrj from the computation of P I; the secrets
issued to other group members are not affected. We extend this scheme to support
arbitrary monotonic policies, ACPs, over a set of attributes. A user is able to derive
the symmetric group key if and only if the set of attributes the user possesses satisfy
ACP.
As in the basic BGKM scheme, Usri having attrj is associated with a unique secret
value si,j . However, unlike the basic BGKM scheme, P I is generated by using the
aggregated secrets that are generated combining the secrets issued to users according
to ACP. For example, if ACP is a conjunction of two attributes, that is attrr ∧ attrs ,
the corresponding secrets si,r and si,s for each Usri are combined as one aggregated
secret si,r ||si,s and P I is computed using these aggregated secrets. By construction,
the aggregated secrets are unique since the constituent secrets are unique. Any Usri is
able to derive the symmetric group key if and only if Usri has at least one aggregated
secret used to compute P I. Notice that multiple users cannot collude to create an
aggregated secret which they cannot individually create since si,j ’s are unique and
each aggregated secret is tied to one specific user. Hence, colluding users cannot derive
the group symmetric key. Now we give a detailed description of our first AB-GKM
scheme, inline AB-GKM.
3.1.1 Our Construction
Inline AB-GKM consists of the following five algorithms:
Setup(ℓ): The Svr initializes the following parameters: an ℓ-bit prime number q, a
cryptographic hash function H(·) : {0, 1}∗ → Fq , where Fq is a finite field with q
elements, the keyspace KS = Fq , the secret space SS = {0, 1}ℓ , and the set of issued
secrets S = ∅. The user-attribute matrix U A is initialized with empty elements and
34
the maximum group size N is decided in the KeyGen. It defines the universe of
attributes A = {attr1 , attr2 , · · · , attrm }.
SecGen(γi ): For each attribute attrj ∈ γi , where γi ⊂ A and γi is the attribute
set of Usri , the Svr chooses the secret si,j ∈ SS uniformly at random for Usri such
/ S, adds si,j to S, sets U A(i, j) = si,j , where U A(i, j) is the (i, j)th element
that si,j ∈
of the user-attribute matrix U A, and finally outputs si,j .
KeyGen(ACP): We first give a high-level description of the algorithm and then
the details. Svr transforms the policy ACP to disjunctive normal form (DNF). For
each disjunctive clause of ACP in DNF, it creates an aggregated secret (s8) from the
secrets corresponding to each of the attributes in the conjunctive clause. s8 is formed
by concatenation only if secrets exist for all the attributes in a given row of the
user-attribute matrix U A. The construction creates a unique aggregated secret s8
since the corresponding secrets are unique. For example, if the conjunctive clause is
attrp ∧ attrq ∧ attrr , for each row i in U A, the aggregated secret s8i is formed only
if all elements U A(i, p), U A(i, q) and U A(i, r) have secrets assigned. All the aggre-
gated secrets are added to the set AS. Finally, Svr invokes algorithm KeyGen(AS)
from the underlying BGKM scheme to output the public information P I and the
symmetric group key k.
Now we give the details of the algorithm. Svr converts ACP to DNF as follows
ACP =
α
e
conjuncti where there are α conjuncts and
i=1
φi
conjuncti =
<
(i)
condj ,
j=1
where each conjuncti has φi conditions.
A simple multiplication of clauses (x ∧ (y ∨ z) = (x ∧ y) ∨ (x ∧ z)) and then
application of the absorption law (x ∨ (x ∧ y = x)) are sufficient to convert monotone
policies to DNF. Even though there can be an exponential blow up of clauses during
35
multiplication, it has been shown that with the application of the absorption law
the number of clauses in the DNF, at the end, is always polynomially bounded. Svr
selects N such that
N≥
α
t
N U i = NU
i=1
where NUi is the number of users satisfying conjuncti 1 . Svr creates NU s8i ’s and adds
them to AS. Svr picks a random k ∈ KS as the shared group key. Svr chooses N
random bit strings z1 , z2 , . . . , zN ∈ {0, 1}ℓ . Svr creates an m × (N + 1) Fq -matrix A
such that for 1 ≤ i ≤ NU
ai,j
1
=
H(8
si ||zj )
if j = 1
if 2 ≤ j ≤ N ; s8i ∈ AS
(3.1)
Svr then solves for a nonzero (N + 1)-dimensional column Fq -vector Y such that
AY = 0 and sets
ACV = k · eT1 + Y, and
P I = (ACV, (z1 , z2 , . . . , zN ))
KeyDer(βi , P I): Given βi , the set of secrets for Usri , it computes the aggregated
secret s8. Using s8 and the public information P I, it computes ai,j , 1 ≤ j ≤ N, as in for-
mula 3.1 and sets an (N +1)-dimensional row Fq -vector vi = (1, ai,1 , ai,2 , . . . , ai,N ). Usri
derives the group key k ′ by the inner product of the vectors vi and ACV : k ′ = vi ·ACV .
The derived group key k ′ is equal to the actual group key k if and only if the computed aggregated secret s8 ∈ AS.
Update(S): The composition of the user group changes when one of the following occurs:
1
It should be noted that NU can be reduced to n, the number of users in the group, by exploiting
the relationships between conjuncts and letting the users know the conjunct, out of the many they
satisfy, they have to use to derive the key. We leave this optimization to keep the scheme simple.
36
• Identity attributes are added or removed resulting in the change in S and U A 2 .
• The underlying policy ACP changes.
When such a change occurs, a new symmetric key k ′ is selected and KeyGen(ACP)
is invoked to generate the updated public information P I ′ . Notice that the secrets
shared with existing users are not affected by the group change. It outputs the public
P I ′ and private k ′ .
3.1.2 Security
We can easily show that if an unbounded adversary A can break the inline ABGKM scheme in the random oracle model, a simulator S can be constructed to break
the ACV-BGKM scheme.
Definition 3.1.1 (Security game for AB-GKM)
Setup The challenger runs the Setup algorithm of AB-GKM and gives the public
parameters to the adversary.
Phase 1 The adversary is allowed to request secrets for any set of attributes γi
and the public information tuples for a policy satisfying these attributes. The public
information along with the secrets allows the adversary to derive the private key.
Challenge The adversary declares the set of attributes γ that it wishes to challenged
upon. γ is different from any of the attribute sets γi that the adversary queried earlier.
The adversary submits two keys k0 and k1 . The challenger flips a random coin b and
chooses kb . The challenger generates public information for a policy P satisfying γ,
but not any γi , using the KeyGen algorithm and give it to the adversary. The public
information hides the group key kb .
2
A change in a user attribute is viewed as two events; removing the existing attribute and adding a
new attribute.
37
Phase 2 Phase 1 is repeated as many times provided that the adversary’s attribute
set does not satisfy P .
Guess The adversary outputs a guess b′ of b.
The advantage of an adversary A in this game is defined as P r[b′ = b] − 1/2.
Definition 3.1.2 (Security under the random oracle model) An AB-GKM
scheme is secure under the random oracle model of security if all adversaries have at
most a negligible advantage in the above game.
Shang et al. [20, 39] have shown that the probability of breaking ACV-BGKM is
a negligible 1/q, where q is the ℓ bit large prime number initialized in Setup. We
capture the hardness of the ACV-BGKM scheme in the following assumption:
Definition 3.1.3 (ACV-BGKM Assumption) No adversary without any valid
secrets in the random oracle model can break the ACV-BGKM scheme with more
than a negligible probability.
Theorem 3.1.1 If an adversary can break the inline AB-GKM scheme in the random
oracle model, then a simulator can be constructed to break the ACV-BGKM scheme
with non-negligible advantage.
Proof Suppose that there exists an adversary A that can break our scheme in the
random oracle model with advantage ǫ. We build a simulator B that can break
the ACV-BGKM scheme with the advantage at most ǫ. The simulation proceeds as
follows:
The challenger runs the setup algorithm of ACV-BGKM and generates secrets for
each attributes per user outside of B’s view. The simulator B runs A. B is given an
instance of ACV-BGKM and gives the public parameters to A. We assume that all
policies are in DNF such that each conjunctive term has only one attribute. The intuition behind the assumption is that inline AB-GKM is an extension of ACV-BGKM
38
to support aggregate secrets and, therefore, in the absent of aggregate secrets, inline
AB-GKM is equivalent to ACV-BGKM.
Phase 1 A submits sets of attributes γi to B and B sends the secrets using the
ACV-BGKM instance.
Challenge A submits the attribute set γ = γi as the challenge and two keys k0
and kb . B flips a random coin b and chooses kb and then using the ACV-BGKM instance, it generates the public information for a policy P that only γ satisfies hiding kb .
Phase 2 A and B repeats Phase 1 as many times provided A’s attribute sets do
not satisfy P .
Guess Using the public information and the information gathered from the two
phases, A outputs a guess b′ of b. Notice that the view of A when it is run as a
subroutine of B and when it is run directly with the inline AB-GKM scheme is identical. In other words, B simulates an instance of the inline AB-GKM for A using
an instance of the ACV-BGKM scheme. The simulation is trivial as the aggregate
secrets in AB-GKM is the same the secrets in ACV-BGKM. It should be noted that
A does not have an advantage more than ǫ from the information gather from the repeated execution of Phase 1 due to the key indistinguishability and key independence
properties of the ACV-BGKM scheme [39].
It can easily be seen that B has the same advantage of breaking the ACV-BGKM
scheme as A has on the inline AB-GKM scheme. As per the definitions, B breaks the
ACV-BGKM with P r[b′ = b] = 1/2+ ǫ. According to the assumption on the hardness
of the ACV-BGKM scheme in Theorem 3.1.1, it follows that ǫ must be negligible.
39
3.1.3 Performance
Now, we discuss the efficiency of inline AB-GKM with respect to computational
costs and required bandwidth for rekeying.
For any Usri in the group, deriving the shared group key requires N hashing
operations (evaluations of H(·)) and an inner product computation vi · ACV of two
(N + 1)-dimensional Fq -vectors, where N is the maximum group size. Therefore the
overall computational complexity is O(n).
For every rekeying operation, Svr needs to form a matrix A by performing N 2
hashing operations, and then solve a linear system of size N × (N + 1). Solving the
linear system is the most costly operation as N gets large for computation on Svr.
It requires O(n3 ) field operations in Fq when the method of Gauss-Jordan elimination [34] is applied. Experimental results about the ACV-BGKM scheme [20] have
shown that this can be performed in a short time when N is small.
When a rekeying process takes place, the new information to be broadcast is
P I = (ACV, (z1 , . . . , zN )), where ACV is a vector consisting of (N + 1) elements in
Fq , and without loss of generality we can pick zi to be strings of fixed length. This
gives an overall communication complexity O(n). An advantage of inline AB-GKM
is that no peer-to-peer private channel is needed for any persisting group members
when rekeying is executed.
Nowadays we generally care less about storage costs on both Svr and Usrs. Nevertheless, for a group of maximum N users, in the worst case, inline AB-GKM only
requires each Usr to store (O(|A|)) secrets, one secret per attribute that Usr possesses,
and Svr to keep track of all O(n|A|) secrets.
3.2 Scheme 2: Threshold AB-GKM
Consider now the case of policies by which a user can derive the symmetric group
key k, if it possesses at least d attributes out of the m attributes associated with the
group. We refer to such policies as threshold policies. Under the inline AB-GKM
40
scheme presented in Section 3.1, with such threshold policies the size of the access
control matrix (A) increases exponentially if users are not informed which attributes
to use. Specifically, to support d-out-of -m, the inline AB-GKM scheme may require
creating a matrix of dimension up to O(nmd ) where n is the number of users in the
group. Thus, the inline AB-GKM scheme is not suitable for threshold policies. In
this section, we construct a new scheme, threshold AB-GKM, which overcomes this
shortcoming.
An initial construction to enforce threshold policies is to associate each user with
a random d − 1 degree polynomial, q(x), with the restriction that each polynomial
has the same value at x = 0 and q(0) = k, where k is the symmetric group key. For
each attribute users have, they are given a secret value. The secret values given to
a user are tied to its random polynomial q(x). A user having d or more secrets can
perform a Lagrange interpolation to obtain q(x) and thus the symmetric group key
k = q(0). Since the secrets are tied to random polynomials, multiple users are unable
to combine their secrets in any way that makes possible collusion attacks. However,
revocation is difficult in this simple approach and requires re-issuing all the secrets
again.
Our approach to address the revocation problem is to use a layer of indirection between the secrets given to users and the random polynomials such that revocations do
not require re-issuing all the secrets again. We use a modified ACV-BGKM construction as the indirection layer. We cannot directly use the ACV-BGKM construction
since, multiple instances of ACV-BGKM allow collusion attacks in which colluding
users can recover the group key which they cannot obtain individually. We first
show the details of the modified ACV-BGKM scheme and then present the threshold
AB-GKM which uses the modified ACV-BGKM scheme and Shamir’s secret sharing
scheme.
41
3.2.1 Our Construction
Now we provide our construction of the threshold AB-GKM scheme which utilizes
the modified ACV-BGKM scheme, ACV-BGKM-2, presented in Section 2.6.
Recall that in this scheme, we wish to allow a user to derive the symmetric group
key k if the user possesses at least d attributes out of m. For each user Usri we associate
a random d − 1 degree polynomial qi (x) with the restriction that each polynomial
has the same value k, the symmetric group key, at x = 0, that is, qi (0) = k. We
associate a random secret value with each user attribute. For each attribute attri ,
we generate a public information tuple (P Ii ) using the modified ACV-BGKM scheme
with the restriction that the temporary key that each Usrj derives is tied to its random
polynomial qj (x), that is qj (i) = ki . Notice that each user obtains different temporary
keys from the same P I. If a user can derive d temporary keys corresponding to d
attributes, it can compute its random function q(x) and obtain the group symmetric
key k. Notice that, since the temporary keys are tied to a unique polynomial, multiple
users are unable to collude and combine their temporary keys in order to obtain the
symmetric group key which they are not allowed to obtain individually. Thus, our
construction prevents collusion attacks.
A detailed description of our threshold AB-GKM scheme follows.
Setup(ℓ) Svr initializes the parameters of the underlying modified ACV-BGKM
scheme: the ℓ-bit prime number q, the maximum group size N (≥ n), the cryptographic hash function H, the key space KS, the secret space SS, the set of issued secrets S, the user-attribute matrix U A and the universe of attributes A =
{attr1 , attr2 , · · · , attrm }.
Svr defines the Lagrange coefficient ∆i,Q for i ∈ Fq and a set, Q of elements in Fq
as
∆i,Q (x) =
� x−j
.
i
−
j
j∈Q,j�=i
SecGen(γi ) For each attribute attrj ∈ γi , where γi ⊂ A and γi is the attribute set of
42
Usri , Svr invokes SecGen() of the modified ACV-BGKM scheme in order to obtain
the random secret si,j . It returns βi , the set of secrets for all the attributes in γi .
KeyGen(α, d) Taking α, a subset of attributes from the attribute universe A and d,
the threshold value, for each user Usri , Svr assigns a random degree d − 1 polynomial
qi (x) with qi (0) set to the group symmetric key k. For each attribute attrj in the set of
attributes α (α ⊂ A and |α| ≥ d), it selects the set of secrets corresponding to attrj ,
Sj and invokes KeyGen(Sj , {q1 (j), q2 (j), · · · , qN (j)}) of the modified ACV-BGKM
scheme to obtain P Ij , the public information tuple for attrj . It outputs the private
group key k and the set of public information tuples PI = {P Ij | for each attrj ∈ α}.
KeyDer(βi , PI) Using the set of d secrets βi = {si,j |1 ≤ j ≤ N } for the d attributes
attrj , 1 ≤ j ≤ N , and the corresponding d public information tuples P Ij ∈ PI,
1 ≤ j ≤ N , it derives the group symmetric key k as follows.
First, it derives the temporary key kj for each attribute attrj using the underlying
modified ACV-BGKM scheme as KeyDer(si,j , P Ij ). Then, using the set of d points
Qi = {(j, kj )|1 ≤ j ≤ N }, it computes qi (x) as follows:
�
x−j
i−j
j∈Qi ,j=
� i
t
qi (x) =
kj ∆j,Qi (x).
∆j,Qi (x) =
j∈Qi
It outputs the group key k = qi (0).
Update(α, d) The Update algorithm is invoked whenever α, the attribute set considered, or d, the threshold value, or the group members satisfying the threshold policy
change. The group membership changes due to similar reasons mentioned under the
Update algorithm in Section 3.1.1. In such a situation, a new symmetric group key
k ′ is selected and KeyGen(α, d) is invoked to generate the set of new public infor-
43
mation tuples PI’. Notice that the secrets shared with existing users are not affected
by the group change.
3.2.2 Security
If an unbounded adversary can break our threshold AB-GKM scheme, a simulator
can be constructed to break the modified ACV-BGKM scheme. We only give a highlevel detail of the reduction based proof as the proof is similar to the proof for the
inline AB-GKM scheme.
Proof Suppose that an unbounded adversary A having a set of d − 1 attributes α
can break our scheme in the random oracle model with advantage ǫ. Note that this
is the most powerful adversary as it possesses d − 1 attributes out of the d attributes
required to derive the group key. We build a simulator B that can derive the key kd
from P Id corresponding to attrd ∈ α with the same advantage ǫ using A as subroutine.
In other words, we build a simulator to break the modified ACV-BGKM scheme.
The intuition behind our proof is that, by construction, the modified ACV-BGKM
instances corresponding to the attributes are independent. In other words, a user who
can access the key for one attribute only has a negligible advantage in obtaining the
key for another attribute using the known attributes due to the key indistinguishability and independence properties of the ACV-BGKM scheme.
The challenger creates an instance of the modified ACV-BGKM scheme for each
of the n attributes. A obtains secrets {si |i = 1, 2, · · · , d−1} for the attributes α it has
from B. The challenger constructs the public information tuples {P Ii |i = 1, 2, · · · , d},
each having a random key ki and gives them to B. B in turn gives them to A. Notice
that the view of A is identical to that of A interacting directly with an instance
of the threshold AB-GKM scheme, even though it is simulated. The random keys
correspond to a random degree d−1 polynomial q(x). Notice that A possesses secrets
to obtain the random keys ki , 1 ≤ i ≤ d − 1 and can derive the secret key kd with an
advantage ǫ from the public information tuples.
44
We omit the details of the security game defined in the previous section. As mentioned in the game, A may execute the threshold AB-GKM scheme for different sets of
attributes that do not satisfy the challenge threshold policy and do not include attrd .
As mentioned earlier, A does not gain any additional advantage by such executions.
After executing the phase 1 of the security game as many times, A outputs k,
which is equal to q(0). This allows B to fully determine q(x) as it now has d points
and derive the key kd = q(d). In other words, it allows B to break the modified ACVBGKM scheme to recover the intermediate key kd from the public information tuple
P Id without the knowledge of the secret sd . In our technical report [40], we show
that the probability of breaking the modified ACV-BGKM scheme is a negligible
1/q N where q is the ℓ bit prime number and N is the maximum number of users.
Therefore, it follows that ǫ must be negligible.
3.2.3 Performance
We now discuss the efficiency of the threshold AB-GKM with respect to computational costs and required bandwidth for rekeying.
For any Usri in the group deriving the shared group key requires:
Ld
i=1
Ni hashing
operations (evaluations of H(·)), where Ni is the maximum number of users having
attri ; and d inner product computations vi · ACVi of two (2Ni )-dimensional Fq -vectors
and the Lagrange interpolation O(m log2 m), where m = |A|. Therefore, the overall computational complexity is O(dn + m log2 m). Notice that the inner product
computations are independent and can be parallelized to improve performance.
For every rekeying phase, for each attri , Svr needs to form a matrix Ai by performing Ni2 hashing operations, and then solve a linear system of size Ni × (2Ni ). Solving
the linear system is the most costly operation as Ni gets large for computation on
L
3
Svr; it requires O( m
i=1 n ) field operations in Fq .
When a rekeying process takes place, the new information to be broadcast is
P Ii = (ACVi , (z1 , . . . , zNi )), i = 1, 2, · · · , m, where ACVi is a vector consisting of
45
(2Ni ) elements in Fq , and without loss of generality we can pick zi to be strings with
L
a fixed length. This gives an overall communication complexity O( m
i=1 n).
For a group of maximum N users, in the worst case, the threshold AB-GKM only
requires each Usr to store (O(m)) secrets, one secret per attribute that Usr possesses
and Svr to keep track of all O(nm) secrets.
3.3 Scheme 3: Access Tree AB-GKM
In the inline AB-GKM scheme, the policy ACP is embedded into the BGKM
scheme itself. As discussed in Section 3.2, while this approach works for many different types of policies, such an approach is not able to efficiently support threshold
access control policies. Scheme 2, threshold AB-GKM, on the other hand, is able to
efficiently support threshold policies, but it is unable to support other policies. In
order to support more expressive policies, we extend the threshold AB-GKM scheme.
Like threshold AB-GKM, instead of embedding ACP in the BGKM scheme, we construct a separate BGKM instance for each attribute. Then, we embed ACP in an
access structure T . T is a tree with the internal nodes representing threshold gates
and the leaves representing attributes. The construction of T is similar to that of the
approach by Goyal et al. [17]. However, unlike Goyal et al.’s approach, the goal of
our construction is to derive the group key for the users whose attributes satisfy the
access structure T .
3.3.1 Access Tree
Let T be a tree representing an access structure. Each internal node of the tree
represents a threshold gate. A threshold gate is described by its child nodes and a
threshold value. If nx is the number of children of a node x and tx is its threshold
value, then 0 < tx ≤ nx . Notice that when tx = 1, the threshold gate is an OR gate
and when tx = nx , it is an AND gate. Each leaf node x of the tree is described by
46
Table 3.1: Access tree functions
Function
Description
index(x)
Returns the index of node x
parent(x)
Returns the parent node of node x
attr(x)
Returns the index of the attribute associated with a leaf node x
qx
The polynomial assigned to node x
sat(Tx , α)
Returns 1 if the set of attributes α satisfies Tx , the subtree rooted
at node x, and 0 otherwise
an attribute, a corresponding BGKM instance and a threshold value tx = 1. The
children of each node x are indexed from 1 to nx .
We define the functions in Table 3.1 in order to construct our scheme. All the
functions except sat are straightforward to implement. A brief description of sat
follows:
The function sat(Tx , α) works as a recursive function. If x is a leaf node, it returns
1, provided that the attribute associated with x is in the set of attributes α and 0
otherwise. If x is an internal node, if at least tx child nodes of x return 1, then
sat(Tx , α) returns 1 and 0 otherwise.
3.3.2 Our Construction
The access tree AB-GKM scheme consists of five algorithms:
Setup(ℓ): Svr initializes the parameters of the underlying modified ACV-BGKM
scheme: the prime number q, the maximum group size N (≥ n), the cryptographic
hash function H, the key space KS, the secret space SS, the set of issued secrets S, the
user-attribute matrix U A and the universe of attributes A = {attr1 , attr2 , · · · , attrm }.
47
Svr defines the Lagrange coefficient ∆i,Q for i ∈ Fq and a set, Q of elements in Fq :
∆i,Q (x) =
� x−j
.
i
−
j
j∈Q,j�=i
SecGen(γi ): Taking γi , the attribute set of Usri , as input, for each attribute attrj ∈
γi , where γi ⊂ A, Svr invokes SecGen() of the modified ACV-BGKM scheme to
obtain the random secret si,j . It returns βi , the set of secrets for all the attributes in
γi .
KeyGen(ACP): Svr transforms the policy ACP into an access tree T . The algorithm outputs the public information which a user can use to derive the group key
if and only if the user’s attributes satisfy the access tree T built for the policy ACP.
The algorithm constructs the public information as follows.
For each user Usri having the intermediate set of keys Ki = {ki,j |1 ≤ j ≤ m},
where ki,j represents the intermediate key for Usri and attrj , the following construction
is performed. For each attribute attri , there is a leaf node in T . The construction of
the tree is performed top-down. Each node x in the tree is assigned a polynomial qx .
The degree dx of the polynomial qx is set to tx − 1, that is, one less than the threshold
value of the node. For the root node r, qr (0) is set to the group key k and dr other
points are chosen uniformly at random so that qr is a unique polynomial of degree dr
fully defined through Lagrange interpolation. For any other node x, qx (0) is set to
qparent(x) (index(x)) and dx other points are chosen uniformly at random to uniquely
define qx . For each leaf node x corresponding to a unique attribute attrj , qx (0) is set
to qparent(x) (1) and ki,j = qx (0).
At the end of the above computation, we have all the sets of intermediate keys
K = {Ki |Usri , 1 ≤ i ≤ N }. For each leaf node x, the modified BGKM algorithm
KeyGen(Sx , Kx ), where Sx is the set of secrets corresponding to the attribute associated with the node x and Kx = {ki,j |1 ≤ i ≤ N, attrj }, j = attr(x), is invoked to
48
generate public information tuple P Ix . We denote the set of all the public information tuples PI = {P Ij |attrj , 1 ≤ j ≤ m}.
KeyDer(βi , PI): Given βi , a set of secret values corresponding to the attributes
of Usri , and the set of public information tuples PI, it outputs the group key k.
The key derivation is a recursive procedure that takes βi and PI to derive k
bottom-up. Note that a user can obtain the key if and only if its attributes satisfy
the access tree T , i.e., sat(Tr , βi ) = 1. The high-level description of the key derivation
is as follows.
For each leaf node x corresponding to the attribute with the user’s secret value
sx ∈ βi , the user derives the intermediate key kx using the underlying modified
BGKM scheme KeyDer(sx , P Ix ). Using Lagrange interpolation, the user recursively
derives the intermediate key kx for each internal ancestor node x until the root node
r is reached and kr = k. Notice that since intermediate keys are tied to unique
polynomials, users cannot collude to derive the group key k if they are unable to
derive it individually. A detailed description follows.
If x is a leaf node, it returns an empty value ⊥ if attr(x) ∈βi , otherwise it returns
the key kx = vx · ACVx , where vx is the key derivation vector corresponding to the
attribute attrattr(x) and ACVx the access control vector in P Ix .
If x is an internal node, it returns an empty value ⊥ if the number of children
nodes having a non-empty key is less than tx , otherwise it returns kx as follows:
Let the set Qx contain the indices of tx children nodes having non-empty keys
{ki |i ∈ Qx }.
∆i,Qx (y) =
�
i∈Qx ,i�=j
qx (y) =
t
y−i
j−i
ki ∆i,Qx (y)
i∈Qx
kx = qx (0).
49
The above computation is performed recursively until the root node is reached.
If Usri satisfies T , Usri gets k = qr (0), where r is the root node. Otherwise, Usri gets
an empty value ⊥.
Update(ACP) The group members change due to the similar reasons mentioned
for the Update algorithm in Section 3.1.1. In such a situation, a new symmetric
group key k ′ is selected and KeyGen(ACP) is invoked to generate the set of new
public information tuples PI’. Like the previous two schemes, the secrets shared with
existing users are not affected by the group change.
3.3.3 Security
If an unbounded adversary can break our access tree AB-GKM scheme, a simulator
can be constructed to break the modified ACV-BGKM scheme. Like the previous
scheme, we only give a high-level detail of the reduction based proof.
Proof Suppose that an unbounded adversary A using a set of attributes α as the
challenge set that does not satisfy the access tree T breaks our scheme in the random
oracle model with advantage at most ǫ. Let the root node of T be r and the group key
k = qr (0). Notice that since A does not satisfy T and qr (x) a tr -out-of-nr threshold
scheme, which represents any type of threshold node, A satisfies no more than tr − 1
subtrees rooted at children of r out of the nr subtrees. By inference, it is easy to see
that A does not satisfy at least one leaf node.
The challenger constructs modified ACV-BGKM instances for each of the attributes and gives them to B. A obtains secrets for each of the attributes in α. B
sends the public information tuples and the access tree T to A. Notice that A can
easily derive the keys for any attribute in α, but it can derive the keys for any other
attribute only with an advantage of ǫ. According to the assumption, A does not
satisfy at least one attribute required to satisfy T . Let that attribute be attrx . A
50
derives kx from P Ix corresponding to one such unsatisfied leaf node with advantage
ǫ. Therefore, A derives the group key k with an advantage of at most ǫ.
Like the proof in Section 3.2, A derives the group key k, after executing the phase
1 of the security game as many times and give k to B. Now, B works downwards T to
recover the keys for nodes originally unsatisfied by A using Lagrange interpolation.
For example, using k and tr − 1, B obtains the key ktr for the tth
r child node of r.
Finally, B obtains the key kx for an unsatisfied leaf node x corresponding to attrx .
In other words, it allows B to break the modified ACV-BGKM scheme to recover the
key kx from the public information tuple P Ix without the knowledge of the secret sx .
As mentioned earlier, the probability of breaking the modified ACV-BGKM scheme
by applying the KeyDer algorithm is a negligible 1/q N where q is the ℓ bit prime
number and N is the maximum number of users. Therefore, it follows that ǫ must be
negligible.
3.3.4 Performance
We now discuss the efficiency of access tree AB-GKM with respect to computational costs and required bandwidth for rekeying.
For any Usri in the group, deriving the shared group key requires:
Ld
i=1
Ni hashing
operations (evaluations of H(·)), where d = |βi |, Ni is the maximum number of users
having attri , and d inner product computations vi · ACVi of two (2Ni )-dimensional
Fq -vectors and M Lagrange interpolations O(M m log2 m), where M is equal to the
number of internal nodes in T and m = |A|. Therefore, the overall computational
complexity is O(dn + M m log2 m). Notice that the inner product computations are
independent and can be parallelized to improve performance.
The cost of rekeying, communication and storage are comparable to those of the
threshold scheme presented in Section 3.2.
51
3.4 Example Application
Among other applications, fine-grained access control in a group setting using
broadcast encryption is an important application of the AB-GKM schemes. We illustrate the access-tree AB-GKM scheme using a healthcare scenario [20, 41]. We refer
the reader to our technical report [40] for more examples. A hospital (Svr) supports
fine-grained access control on electronic health records (EHRs) [42, 43] by encrypting
and making the encrypted records available to hospital employees (Usrs). Typical
hospital users include employees playing different roles such as receptionist, cashier,
doctor, nurse, pharmacist, system administrator and non-employees such as patients.
An EHR document is divided into data items including BillingInfo, ContactInfo, Medication, PhysicalExam, LabReports and so on. In accordance with regulations such
as health insurance portability and accountability act (HIPAA), the hospital policies
specify which users can access which data item(s). A cashier, for example, need not
have access to data in EHRs except for the BillingInfo, while a doctor or a nurse need
not have access to BillingInfo. These policies can be based on the content of EHRs
itself. An example of such policies is that “information about a patient with cancer
can only be accessed by the primary doctor of the patient”. In addition, patients
define their own privacy policies to protect their EHRs. For example, a patient’s
policy may specify that “only the doctors and nurses who support her insurance plan
can view her EHR”.
In order to support content-based access control, the hospital maintains some
associations among users and data. Table 3.2 shows the insurance plans supported
by each doctor and nurse, identified by the pseudonym “Employee ID”.
The hospital runs Setup algorithm to initialize system parameters and issues
secrets to employees by running the SecGen algorithm. Table 3.3 shows the content
of the user attribute matrix U A that the hospital maintains. (Small numbers are
used for illustrative purposes.)
52
Table 3.2: Insurance plans supported by doctors/nurses
EmployeeID
Role/level
Insurance Plan(s)
emp1
doctor
MedB, ACME
emp2
doctor
ACME
emp3
nurse/junior
ACME
emp4
nurse/senior
MedA
emp5
nurse/senior
MedC
emp6
doctor
MedA
emp7
doctor
MedB, ACME
emp8
nurse/senior
MedA
emp9
nurse/senior
MedA, MedB, ACME
Table 3.3: User attribute matrix
Emp
doctor nurse senior junior MedA MedB MedC ACME
ID
emp1
100
⊥
⊥
⊥
⊥
111
⊥
102
emp2
120
⊥
⊥
⊥
⊥
⊥
⊥
105
emp3
⊥
106
⊥
120
⊥
⊥
⊥
121
emp4
⊥
103
150
⊥
175
⊥
⊥
⊥
emp5
⊥
133
151
⊥
⊥
⊥
161
⊥
emp6
129
⊥
⊥
⊥
141
⊥
⊥
⊥
emp7
119
⊥
⊥
⊥
⊥
133
⊥
137
emp8
⊥
143
152
⊥
115
⊥
⊥
⊥
emp9
⊥
109
156
⊥
117
119
⊥
124
53
Now we illustrate the use of the access tree AB-GKM scheme. Consider the
following policy specification on the Medication data item of the EHR. “A senior
nurse supporting at least two insurance plans can access Medication of any patient”.
In order to implement this access control policy, we need to consider attributes role,
level and insurance plan. The access control policy looks as follows:
ACP = (“role = nurse” ∧ “level = senior” ∧ “2-out-of-{MedA, MedB, MedC,
ACME}”)
Table 3.4: List of employees satisfying each insurance plan
Attribute
Employee IDs
MedA
emp4 , emp6 , emp8 , emp9
MedB
emp1 , emp7 , emp9
MedC
emp5
ACME
emp1 , emp2 , emp3 , emp7 , emp9
In addition to Table 3.4 containing the list of employees satisfying insurance plans,
the hospital maintains the list of employees satisfying the attributes nurse and senior
as shown in Table 3.5.
Table 3.5: List of employees satisfying attributes
Attribute
Employee IDs
nurse
emp3 , emp4 , emp5 , emp8 , emp9
senior
emp4 , emp5 , emp8 , emp9
The above policy can be represented using an access tree with two internal nodes
and six leaf nodes. The root node is an AND gate and has three children. The
first and second children of the root node represent the attributes nurse and senior,
54
respectively, and the third child of the root node is a 2-out-of-4 threshold gate which
has four children representing the four insurance plans.
The hospital executes the KeyGen algorithm to generate six P I tuples and encrypts the Medication data items with the group symmetric key k:
P IM edA = (ACVM edA , (z1 , z2 , z3 , z4 ))
P IM edB = (ACVM edB , (z5 , z6 , z7 ))
P IM edC = (ACVM edC , (z8 ))
P IACM E = (ACVACM E , (z9 , z10 , z11 , z12 , z13 ))
P Inurse = (ACVnurse , (z14 , z15 , z16 , z17 , z18 ))
P Isenior = (ACVsenior , (z19 , z20 , z21 , z22 ))
Expressive access control. Notice that only one employee, emp9 , can derive the group
key k using KeyDer algorithm to decrypt Medication data items.
Collusion resistance. Notice that emp4 supports MedA and emp5 supports MedC and
both of them are senior nurses. It may appear that these two employees can collude
to derive the group key k. Since, in this particular example, the access tree AB-GKM
scheme associates each user with two unique polynomials, one for the AND gate and
another for the threshold gate, none of them individually satisfies the access tree and
KeyDer results in an incorrect key.
Handling user dynamics. Assume that emp4 starts to support the insurance plan
ACME in addition to MedA. The hospital re-generates the public information by
adding emp4 to the calculation of P IACM E and associating a new group key k ′ . Now
emp4 is able to derive k ′ using KeyDer as its attributes satisfy the access tree.
Notice that the change in the user attributes does not affect the secret information
each existing employees have. A similar approach is taken when one or more of these
attributes are revoked from an existing employee. It should be noted that, like the
55
first two schemes, this scheme has the added flexibility to support changes to the
access tree by requiring only changes to the public information.
3.5 Experimental Results
In this section we provide experimental results for the underlying optimized ACVBGKM scheme used with all three AB-GKM schemes presented earlier. We compare
our results with CP-ABE scheme with comparable security parameters.
The experiments were performed on a machine running GNU/Linux kernel version
R CoreTM 2 Duo CPU E8400 3.00GHz and 3.2 Gbytes memory.
2.6.32 with an Intel�
Only one processor was used for computation. Our prototype system is implemented
in C/C++. We use V. Shoup’s NTL library [37] version 5.4.2 for finite field arithmetic, and SHA-1 and AES-128 implementations of OpenSSL [38] version 1.0.0d for
cryptographic hashing and symmetric key encryption. We use Bethencourt et. al.’s
cpabe [44] library to gather experimental results for CP-ABE. The cpabe library uses
PBC library [45] for pairing based cryptography.
We implemented the ACV-BGKM scheme with subset cover optimization. We
utilized the complete subset algorithm introduced by Naor et al. [35] as the subset
cover. All finite field arithmetic operations in ACV-BGKM scheme are performed
in an 512-bit prime field. We used comparable and efficient pairing parameters for
CP-ABE. The size of the base finite field is set to the 512-bit prime number
8780710799663312522437781984754049815806883199414208211028653399266475630
8802229570786251794226622214231558587695823174592777133673174813249251299
98224791
and the group order to the 160-bit number 7307508186654516213611192455715049014
05976559617.
Following the well-known security practice, we generate symmetric keys and use
them for encrypting documents. Then we encrypt such encryption keys with either
the ACV-BGKM generated symmetric keys or the CP-ABE generated public keys.
56
Table 3.6: Average time for CP-ABE algorithms
Algorithm
Time (ms)
Setup
34.395
Key generation
26.725
Encryption
24.453
Decryption
13.415
Therefore, in the experiments we measure the time to encrypt and decrypt the document encryption keys only. For all the ACV-BGKM experiments, we assume that
5% of users have left the group after executing the setup.
First we give experimental results for the most simplest case where a single attribute condition is considered. Then we provide, experimental results for multiple
attribute conditions.
Table 3.6 shows the average time required to execute setup, key generation, encryption and decryption algorithms of CP-ABE scheme for one attribute condition.
30
Time (in seconds)
25
ACV-BGKM
CP-ABE
20
15
10
5
0
100
200
300
400
500
600
Group Size
700
800
900
1000
Figure 3.1.: Average key generation time for different group sizes
Figure 3.1 reports the average time required to execute the key generation algorithm of ACV-BGKM and CP-ABE with different group sizes. In both ACV-BGKM
and CP-ABE the time increases linearly with the group size. However, ACV-BGKM
57
is much more efficient as it does not involve any expensive pairing operations. It only
uses efficient hashing and binary operations over a finite field. Further, the subset
cover technique applied to ACV-BGKM reduces the computational complexity of the
underlying scheme. Without the subset cover optimization, ACV-BGKM has a nonlinear computational complexity and becomes inefficient for large groups. We omit
the comparison experimental result due to lack of space.
35
30
ACV-BGKM encryption
ACV-BGKM decryption
CP-ABE encryption
CP-ABE decryption
Time (in ms)
25
20
15
10
5
0
100
200
300
400
500
600
Group Size
700
800
900
1000
Figure 3.2.: Average encryption/decryption time for different group sizes
Figure 3.2 reports the average time required to perform encryption and decryption
in ACV-BGKM and CP-ABE schemes for one attribute condition with different group
sizes. The decryption time of ACV-BGKM is taken as the time to derive the key as
well as to decrypt the encryption key. The encryption and decryption times of CPABE remain constant whereas the decryption time of ACV-BGKM increases linearly
with the group size. As the group size increases, the key derivation algorithm of ACVBGKM requires to spend more time to build larger KEVs. The encryption time of
ACV-BGKM is negligible and remains constant as it involves an efficient symmetric
encryption only. The average encryption time of ACV-BGKM is 8.8 microseconds (as
these times are very small, the line plotting them is very close to zero in the graph in
Figure 3.2 and thus overlaps with the x-axis). It should be noted that if one caches
the KEVs, the decryption time of ACV-BGKM also becomes negligible as it involves
only modular multiplications.
58
200
180
ACV-BGKM
CP-ABE
160
Time (in ms)
140
120
100
80
60
40
20
0
1
2
3
4
5
6
7
8
9
10
Numumber of Attribute Conditions
Figure 3.3.: Average key generation time for varying attribute counts
Figure 3.3 reports the average time required to execute the key generation algorithm with varying number of attribute conditions with the group size set to 1000.
The time of both techniques increases linearly with the number of attribute conditions. However, similar to Figure 3.1, the ACV-BGKM key generation is much more
efficient than the CP-ABE key generation.
As can be seen from the experiments, our constructs are more efficient in handling
scenarios where the key generation algorithm has to be executed frequently due to
changes in user dynamics.
59
4 PRIVACY PRESERVING PULL BASED SYSTEMS: SINGLE LAYER
APPROACH
We apply the GKM schemes constructed in Chapter 3 to build privacy preserving
pull based systems. Consistent with the current technological trends, we refer to the
third party server as the Cloud.
An approach to support fine-grained selective attribute-based access control before
uploading the data to the Cloud is to encrypt each data item to which the same ACP
(or set of ACPs) applies with the same key. One approach to deliver the correct keys to
the users based on the policies they satisfy is to use a hybrid solution where the keys
are encrypted using a public key cryptosystem such as attribute based encryption
(ABE) and/or proxy re-encryption (PRE). However, such an approach has several
weaknesses: it cannot efficiently handle adding/revoking users or identity attributes,
and policy changes; it requires to keep multiple encrypted copies of the same key; it
incurs high computational cost. Therefore, a different approach is required.
It is worth noting that a simplistic group key management (GKM) scheme in
which the Owner directly delivers the symmetric keys to corresponding users has some
major drawbacks with respect to user privacy and key management. On one hand,
user private information encoded in the user identity attributes is not protected in the
simplistic approach. On the other hand, such a simplistic key management scheme
does not scale well as the number of users becomes large and when multiple keys need
to be distributed to multiple users. The goal of this paper is to develop an approach
which does not have these shortcomings.
We observe that, without utilizing public key cryptography and by allowing users
to dynamically derive the symmetric keys at the time of decryption, one can address
the above weaknesses. Based on this idea, in Chapter 2, we first formalized a new
GKM scheme called broadcast GKM (BGKM) and then gave a secure construction
60
of BGKM scheme and formally prove its security. The idea is to give secrets to users
based on the identity attributes they have and later allow them to derive actual symmetric keys based on their secrets and some public information. A key advantage
of the BGKM scheme is that adding users/revoking users or updating access control
policies can be performed efficiently and only requires updating the public information. As shown in Chapter 2, our BGKM scheme satisfies the requirements of minimal
trust, key indistinguishability, key independence, forward secrecy, backward secrecy
and collusion resistance as described in [15] with minimal computational, space and
communication cost.
In Chapter 3, using the ACV-BGKM scheme as a key building block, we constructed a more expressive GKM scheme called AB-GKM. Using our Inline AB-GKM
scheme, we develop an attribute-based access control mechanism whereby a user is
able to decrypt the data if and only if its identity attributes satisfy the Owner’s policies, whereas the Owner and the Cloud learn nothing about user’s identity attributes.
The mechanism is fine-grained in that different policies can be associated with different data items. A user can derive only the encryption keys associated with the data
items that the user is entitled to access.
The rest of the chapter is organized as follows. Section 4.1 provides an overview of
our overall SLE approach. Section 4.2 shows how to preserve the privacy of identity
attributes from both the data owner and the third-party. Section 4.3 provides detailed
description of our scheme. Section 4.4 proposes utilizing incremental unforgeable
encryption to improve the efficiency at the Owner when the re-encryption operation
is performed. Section 4.6 presents experimental results on the OCBE protocols and
key management.
4.1 Overview of the SLE Approach
As shown in Figure 4.1, our scheme for policy based content sharing in the cloud
involves four main entities: the Data Owner (Owner), the Users (Usrs) , the Iden-
61
(1) Identity attribute
User
IdP
(2) Identity token
(3) Selectively encrypt
& upload
Owner
Cloud
(5) Download to re-encrypt
(1) Register
identity tokens
(2) Secrets
(4) Download &
decrypt
User
Figure 4.1.: Overall system architecture
tity Providers (IdPs), and the Cloud Storage Service (Cloud). The interactions are
numbered in the figure. Our approach is based on three main phases: identity token
issuance, identity token registration, and data management.
1) Identity token issuance
IdPs issue identity tokens for certified identity attributes to Usrs. An identity token is
a Usr’s identity in a specified electronic format in which the involved identity attribute
value is represented by a semantically secure cryptographic commitment.1 We use the
Pedersen commitment scheme and it is described in Section 4.2.2. Identity tokens are
used by Usrs during the registration phase.
2) Identity token registration
In order to be able to decrypt the data that will be downloaded from the Cloud, Usrs
have to register at the Owner. During the registration, each Usr presents its identity
tokens and receives from the Owner a set of secrets for each identity attribute based
on the SecGen algorithm of the AB-GKM scheme. These secrets are later used by
Usrs to derive the keys to decrypt the data items for which they satisfy the ACP
1
A cryptographic commitment allows a user to commit to a value while keeping it hidden and
preserving the user’s ability to reveal the committed value later.
62
using the KeyDer algorithm of the AB-GKM scheme. The Owner delivers the secrets
to the Usrs using a privacy-preserving approach based on the OCBE protocols [46]
with the Usrs. The OCBE protocols ensure that a Usr can obtain secrets if and only
if the Usr’s committed identity attribute value (within Usr’s identity token) satisfies
the matching condition in the Owner’s ACP, while the Owner learns nothing about
the identity attribute value. Note that not only the Owner does not learn anything
about the actual value of Usrs’ identity attributes but it also does not learn which
policy conditions are verified by which Usrs, thus the Owner cannot infer the values
of Usrs’ identity attributes. Thus Usrs’ privacy is preserved in our scheme. We give
more details about the OCBE protocols in Section 4.2.3.
3) Data Management
The Owner groups the ACPs into policy configurations (Pcs). The data are divided
into data items based on the Pcs. The Owner generates the keys based on the ACPs in
each Pc using the KeyGen algorithm of the AB-GKM scheme and selectively encrypts
the data. These encrypted data are then uploaded to the Cloud. Usrs download
encrypted data from the Cloud. The KeyDer algorithm of the AB-GKM scheme allows
Usrs to derive the key K for a given Pc using their secrets in an efficient and secure
manner. With this scheme, our approach efficiently handles new users and revocations
to provide forward and backward secrecy. The system design also ensures that ACPs
can be flexibly updated and enforced by the Owner without changing any information
given to Usrs.
4.2 Preserving the Privacy of Identity Attributes
We observe that by preserving the privacy of the SecGen algorithm of the ABGKM scheme we can preserve the privacy of the whole AB-GKM scheme. We utilize
cryptographic techniques to protect the privacy of the identity attributes of the users
from the Svr while executing the SecGen algorithm. Our technique makes sure that
Usrs receive secrets only for valid identity attributes while the Svr does not learn
63
the actual identity attribute values. We now give you an overview of the two cryptographic constructs, Pedersen commitments and oblivious commitment based envelope
protocols, that we use in this regard. Further, we introduce the notion of configurable
privacy for the identity attributes.
4.2.1 Discrete Logarithm Problem and Computational Diffie-Hellman Problem
Definition 4.2.1 Let G be a (multiplicatively written) cyclic group of order q and
let g be a generator of G. The map ϕ : Z → G, ϕ(n) = g n is a group homomorphism
with kernel Zq . The problem of computing the inverse map of ϕ is called the discrete
logarithm problem (DLP) to the base of g.
Definition 4.2.2 For a cyclic group G (written multiplicatively) of order q, with a
generator g ∈ G, the Computational Diffie-Hellman problem (CDH) is the following
problem: Given g a and g b for randomly-chosen secret a, b ∈ {0, . . . , q − 1}, compute
g ab .
4.2.2 Pedersen Commitment
First introduced in [47], the Pedersen Commitment scheme is an unconditionally
hiding and computationally binding commitment scheme which is based on the intractability of the discrete logarithm problem. We describe how it works as follows.
Setup
A trusted third party T chooses a finite cyclic group G of large prime order p so that
the computational Diffie-Hellman problem is hard in G. Write the group operation
in G as multiplication. T chooses two generators g and h of G such that it is hard to
find the discrete logarithm of h with respect to g, i.e., an integer α such that h = g α .
Note that T may or may not know the number α. T publishes (G, p, g, h) as the
system’s parameters.
64
Commit
The domain of committed values is the finite field Fp of p elements, which can be
implemented as the set of integers Fp = {0, 1, . . . , p − 1}. For a party U to commit a value x ∈ Fp , U chooses r ∈ Fp at random, and computes the commitment
c = g x hr ∈ G.
Open
U shows the values x and r to open a commitment c. The verifier checks whether
c = g x hr .
4.2.3 OCBE Protocols
The Oblivious Commitment-Based Envelope (OCBE) protocols, proposed by Li
and Li [46], provide the capability of delivering information to qualified users in an
oblivious way. There are three communications parties involved in OCBE protocols:
a receiver R, a sender S, and a trusted third party T. The OCBE protocols make sure
that the receiver R can decrypt a message sent by S if and only if R’s committed value
satisfies a condition given by a predicate in S’s access control policy, while S learns
nothing about the committed value. Note that S does not even learn whether R is
able to correctly decrypt the message or not. The supported predicates by OCBE are
comparison predicates >, ≥, <, ≤, = and =.
The OCBE protocols are built with several cryptographic primitives:
1. The Pedersen commitment scheme.
2. A semantically secure symmetric-key encryption algorithm E, for example, AES,
with key length k-bits. Let EKey [M ] denote the encrypted message M under the
encryption algorithm E with symmetric encryption key Key.
3. A cryptographic hash function H(·). When we write H(α) for an input α in a
certain set, we adopt the convention that there is a canonical encoding which
65
encodes α as a bit string, i.e., an element in {0, 1}∗ , without explicitly specifying
the encoding.
Given the notations as above, we summarize the OCBE protocol for = (EQOCBE) and ≥ (GE-OCBE) predicates as follows. The OCBE protocols for other
predicates can be derived and described in a similar fashion. The protocols’ description is tailored to our work, and is stated in a slightly different way than in [46].
EQ-OCBE Protocol
Parameter generation
T runs a Pedersen commitment setup protocol to generate system parameters Param =
(G, g, h). T outputs the order of G, p, and P = {EQx0 : x0 ∈ Fp }, where
EQa0 : Fp → {true, false}
is an equality predicate such that EQx0 (x) is true if and only if x = x0 .
Commitment
T first chooses an element x ∈ Fp for R to commit. T then randomly chooses r ∈ Fp ,
and computes the Pedersen commitment c = g x hr . T sends x, r, c to R, and sends c
to S.
Alternatively, in an offline version, T digitally signs c and sends x, r, c together
with the signature of c to R. Then the validity of the commitment c can be ensured
by verifying T’s signature. In this way, after S obtains T’s public key for signature
verification, no further communication is needed between T and S.
Interaction
• R makes a data request to S.
• Based on this request, S sends an equality predicate EQx0 ∈ P.
• Upon receiving this predicate, R sends S a Pedersen commitment c = g x hr .
66
• S picks y ∈ F∗p at random, computes σ = (cg −x0 )y , and sends R a pair (η =
hy , C = EH(σ) [M ]), where M is a message containing the requested data.
Open
Upon receiving (η, C) from S, R computes σ ′ = η r , and decrypts C using H(σ ′ ).
The GE-OCBE Protocol works in a bit-by-bit fashion, for attribute values of
at most ℓ bits long, where ℓ is a system parameter which specifies an upper bound
for the bit length of attribute values such that 2ℓ < p/2. The GE-OCBE protocol is
more complex in terms of description and computation compared to EQ-OCBE (=).
It works as follows.
GE-OCBE Protocol
Parameter generation
T runs a Pedersen commitment setup protocol to generate system parameters Param =
(G, g, h), and outputs the order of G, p. In addition, T chooses another parameter ℓ,
which specifies an upper bound for the length of attribute values, such that 2ℓ < p/2.
T outputs V = {0, 1, . . . , 2ℓ − 1} ⊂ Fp , and P = {GEx0 : x0 ∈ V}, where
GEx0 : V → {true, false}
is a predicate such that GEx0 (x) is true if and only if x ≥ x0 .
Commitment
T chooses an integer x ∈ V for R to commit. T then randomly chooses r ∈ Fp , and
computes the Pedersen commitment c = g x hr . T sends x, r, c to R, and sends c to S.
Similarly, an offline alternative also works here.
Interaction
• R makes a data request to S.
• Based on the request, S sends to R a predicate GEx0 ∈ P.
67
• Upon receiving this predicate, R sends to S a Pedersen commitment c = g x hr .
• Let d = (x − x0 ) (mod p). R picks r1 , . . . , rℓ−1 ∈ Fp , and sets r0 = r −
ℓ−1
L
2 i ri .
i=1
If GEx0 (x) is true, let dℓ−1 . . . d1 d0 be d’s binary representation, with d0 the
lowest bit. Otherwise if GEx0 is false, R randomly chooses dℓ−1 , . . . , d1 ∈ {0, 1},
ℓ−1
L i
and sets d0 = d −
2 di (mod p). R computes ℓ commitments ci = g di hri for
i=1
0 ≤ i ≤ ℓ − 1, and sends all of them to S.
• S checks that cg −x0 =
ℓ−1
�
i
(ci )2 . S randomly chooses ℓ bit strings k0 , . . . , kℓ−1 ,
i=0
and sets k = H(k0 � . . . � kℓ−1 ). S picks y ∈ F∗p , and computes η = hy , C =
Ek [M ], where M is the message containing requested data. For each 0 ≤ i ≤ ℓ−1
and j = 0, 1, S computes σij = (ci g −j )y , Cij = H(σij ) ⊕ ki . S sends to R the tuple
(η, C00 , C01 , . . . , Cℓ0−1 , Cℓ1−1 , C).
Open
0
After R receives the tuple (η, C00 , C01 , . . . , Cℓ−1
, Cℓ1−1 , C) from S as above, R computes
σi′ = η ri , and ki′ = H(σi′ ) ⊕ Cidi , for 0 ≤ i ≤ ℓ − 1. R then computes k ′ = H(k0′ � . . . �
kℓ′ −1 ), and decrypts C using key k ′ .
EQ-OCBE protocol is simpler and more efficient compared GE-OCBE protocol.
The OCBE protocol for the ≤ predicates (LE-OCBE) can be constructed in a similar
way as GE-OCBE. Other OCBE protocols (for =, <, > predicates) can be built on
EQ-OCBE, GE-OCBE and LE-OCBE.
All these OCBE protocols guarantee that the receiver R can decrypt the message sent by S if and only if the corresponding predicate is evaluated as true at R’s
committed value, and that S does not learn
4.2.4 Configurable Privacy
In order to assure maximum privacy, Usr should register its identity token for
all attribute conditions whose attribute names match the id-tag field in the identity
token. While providing maximum privacy for Usr, it also inevitably increases the
68
number of OCBE protocol executions and the complexity of the AB-GKM algorithms
in almost all cases. However, in an application scenario where it is not crucial for a
Usr to achieve maximum privacy for certain identity attributes, Usrs are allowed to
register as few as possible attribute conditions for an id-tag, while at the same time
feel comfortable about the level of guaranteed privacy. In this way, the complexity of
the AB-GKM algorithms can be effectively reduced. We introduce a notion similar
to the idea of k-anonymity [48]. The following formula (4.1) shows an example of
computing privacy level for an id-tag.
Let privacy be measured by a number from 0 to 1, where 0 means “no privacy” and
1 maximum privacy. Let M ≥ 2 be the total number of attribute conditions which
apply to an id-tag in the system. Suppose all attribute conditions corresponding to
one id-tag has the same level of privacy. Let m be the number of attribute conditions
a Usr registers for an identity token that it holds. Suppose a Usr holding an identity
token always registers for the attribute condition which this identity token satisfies.
Then the level of privacy for this registered identity token of Usr can be calculated as
Formula 1 (Privacy formula)
P=
m−1
.
M −1
(4.1)
The above formula can be easily verified: for example, if there are overall M = 2
attribute conditions “role = doc” and “role = nur” for id-tag = role, then registering
for m = 1 attribute condition reveals the attribute value, i.e., P = 0, and registering
for both (m = 2) attribute conditions gives maximum privacy P = 1. Usrs may use
such a quantitative measure the level of privacy they have and the system may use the
same measure to impose a minimum privacy requirement, for example, to maintain
organizational privacy policies.
4.3 Single Layer Encryption Approach
Section 4.1, our scheme has three phases: identity token issuance, identity token
registration and data management. We did not consider the technical details and
69
privacy in Section 4.1. In this section we make our scheme privacy preserving using
the techniques introduced in Section 4.2. We explain our approach using the ABGKM scheme with the subset cover optimization as a key building block.
4.3.1 Identity Token Issuance
The IdP runs a Pedersen commitment setup algorithm to generate system parameters Param = (G, g, h). The IdP publishes Param as well as the order p of the finite
group G. The IdP also publishes its public key for the digital signature algorithm it
is using. Such parameters are used by the IdP to issue identity tokens to Usrs. We
assume that the IdP first checks the valid of identity attributes Usrs hold 2 . Usrs
present to the IdP their identity attributes to receive identity tokens as follows. For
each identity attribute shown by a Usr, the IdP encodes the identity attribute value
as x ∈ Fp in a standard way, and issues the Usr an identity token. An identity token
is a tuple
IT = (nym, id-tag, c, σ),
where nym is a pseudonym for uniquely identifying the Usr in the system, id-tag is the
tag of the identity attribute under consideration, c = g x hr is a Pedersen commitment
for the value x, and σ is the IdP’s digital signature for nym, id-tag and c. The IdP
passes values x and r to the Usr for the Usr’s private use. We require that all identity
tokens of the same Usr have the same nym,3 so that the Usr and its identity tokens
can be uniquely matched with a nym. Once the identity tokens are issued, they are
used by Usrs for proving the satisfiability of the Pub’s ACPs; Usrs keep their identity
attribute values hidden, and never disclose them in clear during the interactions with
other parties.
2
The IdP can verify the validity of Usr’s identity either in a traditional way, e.g., through a on-thespot registration, or digitally over computer networks. We will not dive into the details of identity
validity check in this thesis.
3
In practice, this can be achieved by requesting the Usr to present a strong identifier that correlates
with the identity being registered. Again, we will not discuss this process in this thesis.
70
Example 1
Suppose a Usr Bob presents his driver’s license to IdP to receive an identity token for
his age. IdP assigns Bob a pseudonym pn-1492. IdP deduces from the birth date on
Bob’s driver’s license that Bob’s age is x = 28. The IdP randomly chooses a value
r = 9270, and computes a Pedersen commitment c = g x hr . The IdP then digitally
signs the message containing Bob’s pseudonym, a tag for “age” and the commitment
c. The identity token Bob receives from the IdP may look like this:
IT = (pn-1492, age, 6267292101, 949148425702313975).
4.3.2 Identity Token Registration
We assume that the Owner defines a set of ACPs denoted as ACPB that specifies
which data items Usrs are authorized to access. ACPs are formally defined as follows.
Definition 4.3.1 (Attribute Condition).
An attribute condition cond is an expression of the form: “nameA op l”, where nameA
is the name of an identity attribute A, op is a comparison operator such as =, <, >,
≤, ≥, =, and l is a value that can be assumed by attribute A.
Definition 4.3.2 (Access control policy).
An access control policy (ACP) is a tuple (s, o, D) where: o denotes a set of data
items {D1 , . . . , Dt } of data D; and s is a Boolean formula of attribute conditions
cond1 , . . . , condn that must be satisfied by a Usr to have access to o.
4
Different ACPs can apply to the same data items because such data items may
have to be accessed by different categories of Usrs. We denote the set of ACPs that
apply to a data item as policy configuration.
Definition 4.3.3 (Policy configuration).
A policy configuration (Pc) for a data item D1 of data D is a set of policies {ACP1 , . . . ,
ACPk } where ACPi , i = 1, . . . , k is an ACP (s, o, D) such that D1 ∈ o.
4
In what follow we use the dot notation to denote the different components of an ACP.
71
Example 2
The ACP (“level ≥ 58” ∧ “role = nurse”, {physical exam, treatment plan}, “EHR.xml”)
states that a Usr of level no lower than 58 and holding a nurse position has access to
the data items “physical exam” and “treatment plan” of document EHR.xml.
There can be multiple data items in D which have the same Pc. For each Pc of D,
the Owner randomly chooses a key K for a symmetric key encryption algorithm (e.g,
AES), and uses K to encrypt all data items associated with this policy configuration.
Therefore, if a Usr satisfies ACP1 , . . . , ACPm , Owner must make sure that the Usr
can derive all the symmetric keys to decrypt those data items to which a policy
configuration containing at least one ACPi (i = 1, . . . , m) applies.
As in our AB-GKM based scheme the actual symmetric keys are not delivered
along with the encrypted data, a Usr has to register its identity tokens at the Owner
in order to derive the symmetric encryption key from the PubInfo stored at the Cloud.
The SecGen algorithm of the AB-GKM scheme and the OCBE techniques are used to
register user identity tokens in a privacy preserving manner. During the registration,
a Usr receives a set of secrets, based on the identity attribute names corresponding
to the attribute names in the identity tokens. Note that secrets are generated by
the Owner only based on the names of identity attributes and not on their values.
Therefore, a Usr may receive an encrypted set of secrets corresponding to a condition
which has a value that the Usr’ identity attribute does not satisfy. However, in this
case, the Usr will not be able to extract the secrets from the message delivering it as
shown in Section 4.2.3. Proper secrets are later used by a Usr to compute symmetric
decryption keys for particular data items of the encrypted data, as discussed in the
data management phase. The delivery of secrets are performed in such a way that
the Usr can correctly receive secrets if and only if the Usr has an identity token whose
committed identity attribute value satisfies an attribute condition in Owner’s ACP,
while the Owner does not learn any information about the Usr’s identity attribute
value and does not learn whether Usr has been able to obtain the secret.
72
To enable Usrs registration, the Owner first chooses the OCBE parameters: an ℓ′ bit prime number q, a cryptographic hash function H(·) whose output bit length is no
shorter than ℓ′ , and a semantically secure symmetric-key encryption algorithm with
key length ℓ′ bits. The Owner publishes these parameters. The Owner also constructs
a subset cover tree with n leaf nodes corresponding to each Usr for each distinct
attribute condition in ACPs. Let SCj be the subset cover for the attribute condition
condj . Then for an ACP in ACPB that a subscriber Usri under pseudonym nymi
wants to satisfy, it selects and registers an identity token IT = (nymi , id-tag, c, σ)
with respect to each attribute condition condj in ACP. Note that Usri does not
register only for the attribute condition which the Usri ’s identity token satisfies; to
assure privacy, Usri registers its identity token for more attribute conditions whose
identity attribute name matches the id-tag contained in the identity token. In this
way, the Owner cannot infer from Usri ’s registration which condition Usri is actually
interested in. Such measures greatly reduce the leaking of identity attributes due to
insider threats.
The Owner checks if id-tag matches the name of the identity attribute in condj ,
and verifies the IdP’s signature σ using the IdP’s public key. If either of the above
steps fails, the Owner aborts the interaction. Otherwise, the Owner selects the corresponding secrets from the subset cover SCj for Usri . The Owner then starts an
OCBE session as a sender (S) to obliviously transfer these secrets to Usri who acts
as a receiver (R). The Owner maintains a matrix T to store if secrets are delivered
to each Usri for each condj . Upon the completion of the OCBE session the Owner
performs the following actions:
• If nymi does not exist in the matrix, it first creates a row for it.
• It sets ri,j cell of T with respect to nymi and condj .
We remark that all secrets are independent, so the above secret delivery process
can be executed in parallel. Matrix T is used by the Owner to execute the KeyGen
algorithm of the AB-GKM scheme.
Example 3
73
Matrix 4.1 shows an example of matrix T . A Usr under pseudonym pn-0012 who has
an identity token with respect to identity tag role registers for all attribute conditions
(“role = doc” and “role = nur” are shown in Table 4.1) involving identity attribute
role. This Usr does not register for attribute conditions “level ≥ 59”, “YoS ≥ 5”
5
and “YoS < 5”, either because it does not hold an identity token with identity tag
level or YoS, thus cannot register, or because it chooses not to register as it only
needs to access data items whose associated ACP does not require conditions for
these attributes. A drawback of registering only for the conditions required is that it
may allow an attacker to infer certain attributes about the Usr with high confidence.
To protect against such attacks the Usr may choose to register for more than one
condition as explained earlier. Note that the Usr under pn-0829 registers for both
conditions YoS ≥ 5 and YoS < 5, which are mutually exclusive and thus both cannot
be satisfied by any Usr. The registration for both conditions is crucial for privacy
in that it prevents the Pub from inferring from the Usr’s registration behavior which
condition the Usr is actually interested in. A Usr under pn-1492 registers for all five
attribute conditions.
Table 4.1: A table of secrets maintained by the Pub
nym
5
level ≥ 59
YoS ≥ 5 YoS < 5 role = doc role = nur . . .
pn-0012 ⊥
⊥
⊥
1
1
...
pn-0829 1
1
1
⊥
⊥
...
pn-1492 1
1
1
1
1
...
...
...
...
...
...
...
...
YoS means “years of service”.
74
4.3.3 Data Management
Recall that the Owner encrypts all data items with the same Pc applicable with
the same symmetric key. Therefore, the Owner execute the KeyGen algorithm of the
AB-GKM for each Pc. For a given Pc, the Owner first identifies the secrets to be
considered as follows.
• The Owner first converts each ACP into DNF (Disjunctive Normal Form). For
each unique conjunctive term, it executes the remaining steps.
/ φi
condj , where the term has φi conditions. The
• Let ith conjunctive term be j=1
Owner iterates through the secrets matrix T , and finds the set of users who
satisfy all the conditions in each conjunctive term.
• At the end of the previous step, the Owner has the list of Usrs who satisfy the
Pc, their association with the subset covers SCi for each applicable condi . The
Owner identifies the covers in each SCi and the secrets corresponding the covers.
The Owner aggregates by concatenating secrets in the order of the conditions
in the conjunctive terms to produce a single secret for each user satisfying the
conjunctive terms. For example, if the conjunctive term is cond1 ∧ cond3 and
Usr5 satisfies the term, the Owner obtains the cover secrets s1 and s3 from SC1
for Usr5 and SC3 for Usr5 respectively. The aggregated secret is s1 ||s3 .
The set of aggregated secrets from the above algorithm is used as the input to the
KeyGen algorithm which produces the public information PubInfo and the symmetric group key k. The Owner creates an index of the public information tuples and
associate with the encrypted data, and uploads them to the Cloud.
If a Usr with nymi wants to view the data item D1 , it first downloads the encrypted
data item along with the PubInfo. It then picks an ACPk that it satisfies and derive
the key using the KeyDer algorithm.
Now we look at how to handle system dynamics such as adding/revoking credentials and ACP updates.
75
When a new user Usr registers at the Owner, the Owner delivers corresponding
secrets to Usr, and updates the matrix T . The Owner then performs a rekey process
for all involved data items (or equivalently, policy configurations) using the Update
algorithm. When Owner uploads new data, it also uploads the updated PubInfo index.
During credential revocations, the conditions under which a Usr needs to be revoked is out of the scope of this paper. We assume that the Owner will be notified
when a Usr with a pseudonym nymi is revoked from those who may satisfy condj . In
this case, the Owner simply reset the value ri,j from matrix T , and performs a rekey
process for all involved data items. Allowing particular secrets to be deleted from T
enables a fine-tuned user management.
A Usr’s credentials may have to be updated over time for various reasons such as
promotions, change of responsibilities, etc. In this case, the Usr with a pseudonym
nymi submits updated credential condj to the Owner. The Owner simply resets the
old ri,j entry and set a new entry in the matrix T , and performs a rekey process only
for the data items involved.
When a Usr with a pseudonym nymi needs to be removed, the Owner removes the
row corresponding to nymi from the matrix T , and performs a rekey process only for
the data items involved.
Note that in all cases of new subscription, credential revocation, credential update
and subscription revocation, the rekey process does not introduce any cost to Usrs
in that except for those whose identity attributes are added, updated or revoked, no
Usr needs to directly communicate with the Owner to update secrets–new encryption/decryption keys can be derived by using the original secrets and updated public
values stored at the Cloud. The ability to derive the secret encryption/decryption keys
using public values is a key point to achieve transparency in subscription handling.
Most of the existing GKM scheme fails to achieve this objective.
76
4.4 Improving Efficiency of Re-Encryption
In the current SLE scheme, the Owner has to download full encrypted data to
perform re-encryption whenever group dynamics changes. In order to improve the
efficiency of the re-encryption operation, in this section, we propose to utilize incremental unforgeable encryption [49, 50] technique. It requires only re-encrypt only
the modified blocks of data instead of all the blocks. We give an overview of the
technique below and later provide experimental results to show that it does improve
the efficiency of the overall system where frequent re-encryptions of data items are
performed.
The main motivation for incremental cryptography [49] is to devise cryptographic
algorithms whose output can be updated very efficiently when the underlying input
changes. Incremental cryptography has been applied to hashing, signing, message
authentication, and encryption. Since in our work we utilize existing incremental
encryption algorithms [50] only, we limit our discussion to incremental encryption.
We view a message M as a set of blocks m1 , m2 , · · · , mn , where the block size
b is decided by a security parameter ℓ. Our system should be able to perform the
following modifications operations:
• Insert operation: (insert, i, m) inserts the message block m between blocks ith
and (i + 1)th .
• Delete operation: (delete, i) deletes the ith message block.
• Replace operation: (replace, i, m) replaces the ith message block with the message block m.
Definition 4.4.1 (Modification Space) The modification space, denoted by U, is
defined as the set of all possible modification operations that can be performed on any
block of a message.
Definition 4.4.2 (Incremental Encryption) An incremental (private-key) encryp�
tion scheme
defined over modification space U is a symmetric key block cipher
77
scheme that consists of the following four algorithms: KeyGen, Enc, Dec and IncEnc. The first three algorithms are defined as in traditional block cipher schemes.
We give an overview of the algorithms below.
KeyGen(ℓ):
The key generation algorithm is a probabilistic poly(ℓ)-time algorithm that takes as
input security parameter ℓ and generates a random symmetric key k. The security
parameter also fixes a block size b.
Enc(k, M ):
The encryption algorithm is a probabilistic poly(ℓ, |M |)-time algorithm that takes as
input the symmetric key k and the plaintext message M ∈ ({0, 1}b )+ , and produces
the ciphertext C.
Dec(k, C):
The decryption algorithm is a deterministic poly(ℓ, |C|)-time algorithm that takes as
input the symmetric key k and the ciphertext C, and produces either the plaintext
message M or a special symbol ⊥ to indicate that the ciphertext C is invalid.
IncEnc(k, U , C):
The incremental encryption algorithm is a probabilistic poly(ℓ, |C|, |M |)-time algorithm that takes as input the symmetric key k, the modification operation U ∈ U, the
previous ciphertext C corresponding to M , and produces the modified ciphertext C ′
which is the encryption of the plaintext M with the modification operation U applied.
Security requirements for the incremental encryption scheme are as follows:
• Indistinguishability: The encryption algorithm should be semantically secure.
• Unforgeability (integrity): A malicious adversary who views a sequence of encryptions and incremental update operations should be unable to generate any
new ciphertext which decrypts to a valid plaintext.
78
• Obliviousness: The ciphertext should not reveal information about the revision
history of the underlying plaintext.
A practical incremental encryption scheme should at least satisfy the indistinguishability and obliviousness requirements. We call such scheme confidentiality only
scheme. If data integrity guarantee is required, the incremental encryption scheme
should satisfy the above three security requirements. We call such scheme confidentiality and integrity scheme.
An incremental encryption scheme
�
is called ideal if the running time of its
incremental encryption algorithm is independent of |M | and |C| and depends on the
type of modification only. In practice, when also data integrity must be verified, it
is not possible to construct an ideal incremental encryption scheme. However, if the
incremental encryption scheme can run in time sublinear to |M |, it is still better
than the conventional encryption schemes which requires time O(|M |) to compute
the ciphertext from scratch. With such incremental schemes, when large messages
change frequently, considerable efficiency improvements are possible.
Algorithm 1 rECB mode
1: Break the message M into b-bit blocks m1 , m2 , · · · , mn
2: Select random value r0 ← {0, 1}b
3: Enc(k, r0 )
4: for Each block mi , i = 1 to n do
5:
ri ← {0, 1}b
6:
ci = (Enc(k, mi ⊕ ri ), (k, ri ⊕ r0 ))
7: end for
8: Return c1 , c2 , · · · , cn
In our work, we implement two incremental encryption schemes for confidentiality
only and for both confidentiality and integrity. We use randomized ECB (rECB) and
RPC modes with a block cipher [50] for confidentiality only, and confidentiality and
79
integrity schemes respectively. We give a high-level description of these two modes of
encryption below.
Randomized ECB (rECB) Mode
Recall that rECB mode provides confidentiality only. Algorithm 1 describes encrypting with this mode.
Decryption is performed by computing Dec(k, ci ), i = 1, 2, · · · , n. It is easy to see
that it supports replace, delete and insert operations. Incremental update operations
result in only small changes to the ciphertext as each block is encrypted independently.
RPC Mode
RPC mode provides both confidentiality and integrity. Algorithm 2 describes
encrypting with this mode.
Algorithm 2 RPC mode
1: Break the message M into b − 2r-bit blocks m1 , m2 , · · · , mn
2: for i = 0 to n do
3:
Select random value ri ← {0, 1}r
4: end for
5: c0 = Enc(k, r0 ||ST ART ||r1 )
6: for Each block mi , i = 1 to n − 1 do
7:
ci = Enc(k, ri ||mi ||ri+1 )
8: end for
9: cn = Enc(k, rn ||mn ||r0 )
n
10: r ∗ = ⊕i=1
ri
11: c∗ = Enc(k, r ∗ ⊕ r0 ||0b−2r ||r ∗ )
12: Return c0 , c1 , c2 , · · · , cn , c∗
80
We assume that the keyword ”START” is not part of the valid message space.
c0 identifies the start of the message and c∗ identifies the end of the message and
also contains the checksum. Decryption is performed by computing Dec(k, ci ), i =
0, 1, · · · , n and Dec(k, c∗ ). The following checks are performed to verify the integrity:
• The first block contains the keyword ”START”.
• The ri values are chained correctly.
• The decryption of c∗ contains the correct r0 and the checksum.
If the integrity checks succeed, the decryption algorithm outputs the message M ,
otherwise ⊥.
Similar to rECB mode, this mode supports replace, insert and delete operations.
A main challenge in implementing an incremental encryption scheme is to manage
the blocks in order to efficiently support insert, delete and replace operations.
4.5 An Example Application
We now illustrate how the internals of our inline AB-GKM scheme works through
a simplified example in a healthcare scenario. This discussion is based on the information available at [42].
A hospital’s data center Owner has to broadcast an XML file “EHR.xml” which
contains the electronic health record (EHR) of a patient to the hospital’s employees.
<PatientRecord>
<ContactInfo>
... ...
</ContactInfo>
<BillingInfo>
... ...
</BillingInfo>
81
<ClinicalRecord>
<HistoryOfPresentIllness>
... ...
</HistoryOfPresentIllness>
<PastMedicalHistory>
... ...
</PastMedicalHistory>
<Medication>
// This has the current prescription
... ...
<Medication>
<AlergiesAndAdverseReactions>
... ...
</AlergiesAndAdverseReactions>
<FamilyHistory>
... ...
</FamilyHistory>
<SocialHistory>
// Smoking, drinking, etc.
... ...
<SocialHistory>
<PhysicalExams>
// Weight, body temperature, skin tests, etc.
... ...
</PhysicalExams>
<LabRecords>
// X-rays, etc.
... ...
</LabRecords>
82
<Plan>
// What needs to be done, etc.
... ...
</Plan>
</ClinicalRecord>
</PatientRecord>
The subdocuments of “EHR.xml”, marked with different XML tags, need to be
accessed by different employees based on their roles and other identity attributes.
Suppose the roles for the hospital’s employees are: receptionist (rec), cashier (cas),
doctor (doc), nurse (nur), data analyst (dat), and pharmacist (pha). The involved
access control policies for “EHR.xml” are
1. ACP1 = (“role = rec”, {(ContactInfo)}, “EHR.xml”)
2. ACP2 = (“role = cas”, {(BillingInfo)}, “EHR.xml”)
3. ACP3 = (“role = doc”, {(ClinicalRecord)}, “EHR.xml”)
4. ACP4 = (“role = nur ∧ level ≥ 59”, {(ContactInfo), (Medication), (PhysicalExams),
(LabRecords), (Plan)}, “EHR.xml”)
5. ACP5 = (“role = dat”, {(ContactInfo), (LabRecords)}, “EHR.xml”)
6. ACP6 = (“role = pha”, {(BillingInfo), (Medication)}, “EHR.xml”)
“EHR.xml” is divided into subdocuments based on these access control policies:
• (ContactInfo): ACP1 , ACP4 , ACP5
• (BillingInfo): ACP2 , ACP6
• (Medication): ACP3 , ACP4 , ACP6
• (PhysicalExams): ACP3 , ACP4
• (LabReports): ACP3 , ACP4 , ACP5
83
• (Plan): ACP3 , ACP4
• Other stuff: none
The policy configurations and their associated subdocuments are:
• Pc1 = {ACP1 , ACP4 , ACP5 } ↔ (ContactInfo)
• Pc2 = {ACP2 , ACP6 } ↔ (BillingInfo)
• Pc3 = {ACP3 , ACP4 , ACP6 } ↔ (Medication)
• Pc4 = {ACP3 , ACP4 } ↔ (PhysicalExams), (Plan)
• Pc5 = {ACP3 , ACP4 , ACP5 } ↔ (LabReports)
• Pc6 = {} ↔ Other XML tags
Assume that the involved hospital employees have already obtained their identity
tokens and have received their secrets through the delivery phase described earlier,
and that the secret table T has been created by Owner. Owner chooses an encryption
key Ki for each policy configuration Pci to encrypt the associated subdocuments.
Without loss of generality, we focus on the case of Pc4 = {ACP3 , ACP4 } and use
the visible records in Table 4.1 for demonstration. An SQL-styled database query
SELECT * FROM T WHERE ‘role = doc’ <> NULL
returns two rows containing pseudonyms pn-0012 and pn-1492, corresponding to the
employees which can potentially access subdocuments to which ACP3 applies. Similarly, it can be easily seen that an employee under pn-1492 is the only one who may
satisfy ACP4 . The Owner then chooses N = 3, and random values z1 , z2 , z3 . For
the employee under pn-0012 whose secret for the attribute condition “role = doc” is
86571, the Owner computes values
a1,1 = H(86571||z1 ), a1,2 = H(86571||z2 ), a1,3 = H(86571||z3 ).
84
The Owner executes a similar computation for the user under pn-1492 thus obtaining
the values
a2,1 = H(13011||z1 ), a2,2 = H(13011||z2 ), a2,3 = H(13011||z3 ).
By now the Owner has computed both required rows of matrix A for ACP3 , and
will process ACP4 . In this case, for pn-1492 whose secrets corresponding to the two
conditions “role = nur” and “level ≥ 59” are r3,1 and r3,2 , respectively, the Owner
computes
a3,1 = H(11109||60987||z1 ), a3,2 = H(11109||60987||z2 ),
a3,3 = H(11109||60987||z3 ).
For simplicity and illustration purpose, assume q = 17, and the resulting matrix over
F17
1 15
A=
1
4
1 12
3
4
13 3
.
5
6
The Owner solves AY = 0 for a non-trivial Y = (4, 4, 3, 3)T . Let K4 = 11. The Owner
sets
X = Y + (K4 , 0, 0, 0)T = (15, 4, 3, 3)T .
The Owner publishes X, z1 , z2 , z3 with the associated subdocuments (PhysicalExams),
(Plan), which are encrypted with a symmetric encryption key K4 = 11.
Suppose that the employee under pn-0012 is a doctor, thus satisfies ACP3 and has
correctly received the secret during the delivery process. To obtain the decryption
key K4 , the doctor computes a1,1 = 15, a1,2 = 3 and a1,3 = 4 as the Owner did, then
calculates
K4 = (1, a1,1 , a1,2 , a1,3 ) · X = (1, 15, 3, 4) · (15, 4, 3, 3)T = 11.
The doctor can now use this key to decrypt the subdocuments (PhysicalExams),
(Plan).
Suppose that the employee under pn-1492 is a nurse of level 58. Then it satisfies
neither ACP3 nor ACP4 ; therefore it cannot receive the secrets 11109 or 13001. Al-
85
though this nurse has the correct secret 60987 for attribute condition “role = nur”,
it is not able to compute any of a2,i or a3,i , i = 1, 2, 3, and thus is not able to obtain
a KEV to derive the decryption key K4 . Hence it cannot access the subdocuments
(PhysicalExams), (Plan).
The process is similar for the other policy configurations. It is worth remarking,
though, that for the policy configuration Pc6 , which is an empty set, the Owner can
just encrypt the associated subdocuments with an encryption key K6 without the
need of publishing X or zi , because in this case no employee is authorized to access
this portion of data.
4.6 Experimental Results
In this section, we present experimental results for various parameters in our
system. We have built a fully functioning system in C/C++ that incorporates our
techniques for privacy preserving secret delivery based on the OCBE protocols, and
efficient key management using the inline AB-GKM scheme.
The experiments were performed on a machine running GNU/Linux kernel version
R CoreTM 2 Duo CPU T9300 2.50GHz and 4 Gbytes memory.
2.6.27 with an Intel�
Only one processor was used for computation. The code is built with 64-bit gcc version
4.3.2, optimization flag -O2. The code is built over the G2HEC C++ library [51],
which implements the arithmetic operations in the Jacobian groups of genus 2 curves.
For the secret delivery and group key management phases, we use V. Shoup’s NTL
library [37] version 5.4.2 for finite field arithmetic, and SHA-1 implementation of
OpenSSL [38] version 0.9.8 for cryptographic hashing.
4.6.1 Privacy Preserving Secret Delivery
The secret delivery phase uses the OCBE protocols, which consist of three major
steps: 1) extra commitments generation (OCBE for inequality conditions only) at
86
the Usr, 2) envelope composition at the Owner, and 3) envelope opening at the Usr.6
In this section, we evaluate the performance of these three steps for both EQ- and
GE-OCBE protocols.
We choose the group G to be the rational points of the Jacobian variety (aka.
Jacobian group) of a genus 2 curve
C : y 2 = x5 + 2682810822839355644900736x3
+226591355295993102902116x2 + 2547674715952929717899918x
+4797309959708489673059350
over the prime field Fq , with q = 5 · 1024 + 8503491 (83 bits). The Jacobian group of
this curve has a prime order
p =24999999999994130438600999402209463966197516075699 (164 bits).7
Table 4.2: Average computation time for running one round of the EQ-OCBE protocol
Computation
Time (in ms)
Create Extra Commitments (Usr)
0.00
Open Envelope (Usr)
35.25
Compose Envelope (Owner)
11.80
The OCBE parameter generation program chooses non-unit points g and h in the
Jacobian group as the base points for constructing the Pedersen commitments.
We use attribute values that satisfy the attribute conditions in the policy. We
expect a similar running time if the attribute values do not satisfy the attribute
conditions in the policy. For GE-OCBE, we vary the value of the ℓ parameter, which
controls the range of the difference between the committed value x and the value x0
specified in the policy, from 5 to 40, and performed evaluation accordingly. In this
6
7
Interested readers may refer to [46, 52] for details.
The data is taken from [53].
87
experiment, we run both EQ- and GE-OCBE protocols for randomly chosen data, for
50 rounds, and take the average values. Figure 4.2 and Table 4.2 report the average
running time of one round of the GE-OCBE protocol and the EQ-OCBE protocol,
respectively.
The experimental results show that the overall computation takes at most a few
seconds for the privacy preserving registration through the OCBE protocols when all
possible identity attribute values lie within an interval of width up to 240 . Because of
the impact of the values of ℓ on the performance of the secret delivery, it is important
to choose ℓ as small as possible, while at the same time large enough to upper-bound
the attribute values. For example, the identity attribute “age” (in years) usually has
values from 0 to 200 and can be represented using 8 bits. In this case, it is sufficient
to choose ℓ to be 8. We expect other OCBE protocols for inequality predicates to
have a performance similar to that of GE-OCBE, because the design and operations
are similar.
1000
Create Extra Commitments (Sub)
Compose Envelope (Pub)
Open Envelope (Sub)
Time (in milliseconds)
900
800
700
600
500
400
300
200
100
0
5
10
15
20
25
30
35
40
l
Figure 4.2.: Average computation time for running one round of GE-OCBE protocol
4.6.2 Data and Key Management
In Chapter 3, we provided experimental results only for the Access Tree ABGKM. In this section, we report experimental results for the Inline AB-GKM which
88
is the AB-GKM scheme used in this work. We perform experiments to evaluate the
performance of generation of the ACVs at the Owner and the key derivation from the
ACVs at the Usr, and the size of the ACVs for different system parameters including
the number of maximum users and the number of attribute conditions. All finite field
arithmetic operations are performed in an 80-bit prime field.
The following experiments are performed with different user configurations. A
user configuration indicates the number of current Usrs and the maximum user limit
N . For example, the configuration ‘25% Usrs’ with N = 1000, has 250 Usrs. We use
25 policies, each on average containing two conditions. Each Usr satisfies the policy
in the policy configuration under consideration. We illustrate the experiments for
one data item, as computations related to different data items are independent and
similar, and thus can be performed in parallel.
45
40
Time (in seconds)
35
25% Subs
50% Subs
75% Subs
100% Subs
30
25
20
15
10
5
0
100
200
300
400
500
600
700
800
900
1000
Maximum Users
Figure 4.3.: Time to generate an ACV for different user configurations
Figure 4.3 reports the average time spent in computing an ACV corresponding to
the matrix A for different user configurations. An ACV is a random vector in the null
space of matrix A. We generate an ACV by first computing a basis of the null space
of A, then choosing the ACV as a random linear combination of the basis vectors. For
a given N , the ACV computation time increases with the number of current users.
This is consistent with the fact that as the number of current users increases, the
number of rows in the matrix A (consequently the rank of A) increases, requiring an
89
increasing amount of elementary matrix operations to compute the null space for the
linear solver of NTL. As shown in Figure 4.3, this computation is efficient (less than
45 seconds on a personal computer) for reasonably large N values.
6
25% Subs
50% Subs
75% Subs
100% Subs
Time (in milliseconds)
5
4
3
2
1
0
100
200
300
400
500
600
Maximum Users
700
800
900
1000
Figure 4.4.: Key derivation time for different user configurations
Figure 4.4 reports the average time for Usrs to derive the symmetric keys from
ACVs and KEVs for different user configurations. Key derivation is performed by Usrs
whose computational capabilities may be limited. Therefore, an efficient decryption
key derivation process is desired. As Figure 4.4 shows it not only incurs minimal
computational costs (a few milliseconds), but also increases only linearly with N .
10
9
ACV Size (in Kbytes)
8
25% Subs
50% Subs
75% Subs
100% Subs
7
6
5
4
3
2
1
0
100
200
300
400
500
600
700
800
900
Maximum Users
Figure 4.5.: Size of ACV for different user configurations
1000
90
Figure 4.5 shows the average size of ACVs for different user configurations. Another design goal of our approach is to keep the additional communication overhead
minimum. In order to achieve this goal, the Owner compresses the ACVs before
broadcasting them with the encrypted data. As Figure 4.5 indicates, our approach
only requires a few kilobytes to transmit these vectors, and the size increases only
linearly with N .
In the following experiment, we measure the time for ACV generation (at Owner)
and key derivation (at Usr) by varying the average number of attribute conditions per
policy, and keeping the number of policies and the maximum number of users fixed
at 25 and 500, respectively.
7000
Time (in milliseconds)
6000
5000
4000
3000
2000
1000
ACV generation
Key derviation
0
1
2
3
4
5
6
7
8
9
10
Avg. No. of Conditions per Policy
Figure 4.6.: ACV generation and key derivation for different number of conditions
per policy
Figure 4.6 shows the average running time for ACVs generation at Owner and
symmetric decryption key derivation at Usr, for different number of conditions per
policy. As the number of conditions per policy increases, the key derivation time
remains almost constant but the ACV generation time slightly increases (by less than
100 milliseconds).
91
4.6.3 Encryption Management
In this section, we compare the incremental encryption proposed as an improvement to the SLE approach against the traditional encryption.
55
rECB
RPC
50
Time (in milliseconds)
45
40
35
30
25
20
15
10
5
0
0
10
20
30
40
50
60
70
Block size (in bytes)
Figure 4.7.: Different incremental encryption modes
Figure 4.7 shows the average overall encryption time as the block size varies while
the size of the document remains at 1K. The RPC mode requires more time as it adds
integrity checks in addition to encrypting each block. The average time decreases as
the size of the block increases since the number of blocks that have to be handled
decreases.
4.5
Time (in seconds)
4
rECB
RPC
Conventional
3.5
3
2.5
2
1.5
1
0.5
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Data size (in bytes)
Figure 4.8.: Average time to perform insert operation
11000
92
Figure 4.8 reports the average time to perform a random insert operation of data
of different sizes while the block size remains at 16 bytes. The time remains almost
constant for different data sizes. The RPC mode requires more time than the rECB
mode since it additionally has to read additional blocks and update the checksum. It
is clear that with large data, incremental encryption can save a considerable amount
of time. Other modification operations also demonstrate similar pattern.
93
5 PRIVACY PRESERVING PULL BASED SYSTEMS: TWO LAYER
ENCRYPTION APPROACH
In the previous chapter, we proposed an approach called single layer encryption (SLE)
follows the conventional data outsourcing scenario where the Owner enforces all ACPs
through selective encryption and uploads encrypted data to the untrusted Cloud. The
SLE approach supports fine-grained attribute based ACPs and preserves the privacy
of users from the Cloud. However, in such an approach, the Owner is in charge
of encrypting the data before uploading them on the third-party server as well reencrypting the data whenever user credentials or authorization policies change and
managing the encryption keys. The Owner has to download all affected data before
before performing the selective encryption. The Owner thus incurs high communication and computation costs, which then negate the benefits of using a third party
service. A better approach should delegate the enforcement of fine-grained access
control to the Cloud, so to minimize the overhead at the Owner, whereas at the same
time assuring data confidentiality from the third-party server.
In this chapter, we propose an approach, based on two layers of encryption, that
addresses such requirement. Under our approach, referred to as two layer encryption
(TLE), the Owner performs a coarse grained encryption, whereas the Cloud performs a
fine grained encryption on top of the data encrypted by the coarse grained encryption.
A challenging issue in our approach is how to decompose attribute based access
control policies (ACPs) such that the two layer encryption can be performed. In
order to delegate as much access control enforcement as possible to the Cloud, one
needs to decompose the ACPs such that the Owner manages minimum number of
attribute conditions in those ACPs that assures the confidentiality of data from the
Cloud. Each ACP should be decomposed to two sub ACPs such that the conjunction
of the two sub ACPs result in the original ACP. The two layer encryption should
94
be performed such that the Owner first encrypts the data based on one set of sub
ACPs and the Cloud re-encrypts the encrypted data using the other set of ACPs. The
two encryptions together enforce the ACP as users should perform two decryptions
to access the data. For example, if the ACP is (C1 ∧ C2 ) ∨ (C1 ∧ C3 ), the ACP can
be decomposed as two sub ACPs C1 and C2 ∨ C3 . Notice that the decomposition is
consistent; that is, (C1 ∧C2 )∨(C1 ∧C3 ) = C1 ∧(C2 ∨C3 ). The Owner enforces the former
by encrypting the data for the users satisfying the former and the Cloud enforces the
latter by re-encrypting the Owner encrypted data for the users satisfying the latter.
Since the Cloud does not handle C1 , it cannot decrypt Owner encrypted data and thus
confidentiality is preserved. Notice that users should satisfy the original ACP to access
the data by performing two decryptions. We show that the problem of decomposing
ACPs for coarse and fine grained encryption while assuring the confidentiality of
data from the third party and the two encryptions together enforcing the ACPs is
NP-complete. We propose novel optimization algorithms to construct near optimal
solutions to this problem. Under our approach, the third party server supports two
services - the storage service, which stores encrypted data, and the access control
service, which performs the fine grained encryption.
We utilize the efficient Access Tree AB-GKM scheme introduced in Chapter 3
allows users whose attributes satisfy a certain ACP to derive the group key and decrypt the content they are allowed to access from the Cloud. Our system assures the
confidentiality of the data and preserves the privacy of users from the access control
service as well as the cloud storage service while delegating as much of the access
control enforcement as possible to the third party through the two layer encryption
technique.
The TLE approach has many advantages. When the policy or user dynamics
changes, only the outer layer of the encryption needs to be updated. Since the outer
layer encryption is performed at the third party, no data transmission is required
between the Owner and the third party. Further, both the Owner and the third party
service utilize the AB-GKM scheme introduced in Chapter 3 for key management
95
whereby the actual keys do not need to be distributed to the users. Instead, users
are given one or more secrets which allow them to derive the actual symmetric keys
for decrypting the data.
The rest of the chapter is organized as follows. An overview of the TLE approach
is given in Section 5.1. Section 5.2 provides a detailed treatment of the policy decomposition for the purpose of two layer encryption. Section 5.3 gives a detailed
description of the TLE approach. We briefly analyze the trade-offs, the security and
the privacy of the overall systems in Section 5.4. Section 5.5 reports experimental
results for policy decomposition algorithms and the SLE vs. the TLE approaches.
5.1 Overview
We now give an overview of our solution to the problem of delegated access control
to outsourced data in the cloud. A detailed description is provided in Section 4.3. Like
the SLE system described in Section 4.3, the TLE system consists of the four entities,
Owner, Usr, IdP and Cloud. However, unlike the SLE approach, the Owner and the
Cloud collectively enforce ACPs by performing two encryptions on each data item.
This two layer enforcement allows one to reduce the load on the Owner and delegates
as much access control enforcement duties as possible to the Cloud. Specifically, it
provides a better way to handle data updates, user dynamics, and policy changes.
Figure 5.1 shows the system diagram of the TLE approach. The system goes through
one additional phase compared to the SLE approach. We give an overview of the six
phases below:
Identity token issuance: IdPs issue identity tokens to Usrs based on their identity
attributes.
Policy decomposition: The Owner decomposes each ACP into at most two sub
ACPs such that the Owner enforces the minimum number of attributes to assure confidentiality of data from the Cloud. It is important to make sure that the decomposed
96
(1) Identity attribute
User
IdP
(2) Identity token
(5) Re-encrypt to
enforce policies
(1) Decompose
policies
Owner
(4) Selectively encrypt
& upload docs &
modified policies
(2) Register
identity tokens
(3) Secrets
User
Cloud
(2) Register
identity tokens
(3) Secrets
(6) Download &
decrypt twice
Figure 5.1.: Two layer encryption approach
ACPs are consistent so that the sub ACPs together enforce the original ACPs. The
Owner enforces the confidentiality related sub ACPs and the Cloud enforces the remaining sub ACPs.
Identity token registration: Usrs register their identity tokens in order to obtain secrets to decrypt the data that they are allowed to access. Usrs register only
those identity tokens related to the Owner’s sub ACPs and register the remaining
identity tokens with the Cloud in a privacy preserving manner. It should be noted
that the Cloud does not learn the identity attributes of Usrs during this phase.
Data encryption and uploading: The Owner first encrypts the data based on
the Owner’s sub ACPs in order to hide the content from the Cloud and then uploads
them along with the public information generated by the AB-GKM::KeyGen algorithm and the remaining sub ACPs to the Cloud. The Cloud in turn encrypts the data
97
based on the keys generated using its own AB-GKM::KeyGen algorithm. Note that
the AB-GKM::KeyGen at the Cloud takes the secrets issued to Usrs and the sub ACPs
given by the Owner into consideration to generate keys.
Data downloading and decryption: Usrs download encrypted data from the Cloud
and decrypt the data using the derived keys. Usrs decrypt twice to first remove the
encryption layer added by the Cloud and then by the Owner. As access control is
enforced through encryption, Usrs can decrypt only those data for which they have
valid secrets.
Encryption evolution management: Over time, either ACPs or user credentials
may change. Further, already encrypted data may go through frequent updates. In
such situations, data already encrypted must be re-encrypted with a new key. As
the Cloud performs the access control enforcing encryption, it simply re-encrypts the
affected data without the intervention of the Owner.
5.2 Policy Decomposition
Recall that in the SLE approach, the Owner incurs a high communication and
computation overhead since it has to manage all the authorizations when user dynamics or ACPs change. If the access control related encryption is somehow delegated
to the Cloud, the Owner can be freed from the responsibility of managing authorizations through re-encryption and the overall performance would thus improve. Since
the Cloud is not trusted for the confidentiality of the outsourced data, the Owner has
to initially encrypt the data and upload the encrypted data to the cloud. Therefore, in
order for the Cloud to allow to enforce authorization policies through encryption and
avoid re-encryption by the Owner, the data may have to be encrypted again to have
two encryption layers. We call the two encryption layers as inner encryption layer
(IEL) and outer encryption later (OEL). IEL assures the confidentiality of the data
98
with respect to the Cloud and is generated by the Owner. The OEL is for fine-grained
authorization for controlling accesses to the data by the users and is generated by the
Cloud.
An important issue in the TLE approach is how to distribute the encryptions between the Owner and the Cloud. There are two possible extremes. The first approach
is for the Owner to encrypt all data items using a single symmetric key and let the
Cloud perform the complete access control related encryption. The second approach
is for the Owner and the Cloud to perform the complete access control related encryption twice. The first approach has the least overhead for the Owner, but it has
the highest information exposure risk due to collusions between Usrs and the Cloud.
Further, IEL updates require re-encrypting all data items. The second approach has
the least information exposure risk due to collusions, but it has the highest overhead
on the Owner as the Owner has to perform the same task initially as in the SLE approach and, further, needs to manage all identity attributes. An alternative solution
is based on decomposing ACPs so that the information exposure risk and key management overhead are balanced. The problem is then how to decompose the ACPs such
that the Owner has to manage the minimum number of attributes while delegating
as much access control enforcement as possible to the Cloud without allowing it to
decrypt the data. In what follow we propose such an approach to decompose and we
also show that the policy decomposition problem is hard.
5.2.1 Policy Cover
We define the policy cover problem as the the optimization problem of finding the
minimum number of attribute conditions that “covers” all the ACPs in the ACPB.
We say that a set of attribute conditions covers the ACPB if in order to satisfy any
ACP in the ACPB, it is necessary that at least one of the attribute conditions in the
set is satisfied. We call such a set of attribute conditions as the attribute condition
cover. For example, if ACPB consists of the three simple ACPs {C1 ∧C2 , C2 ∧C3 , C4 },
99
the minimum set of attributes that covers ACPB is {C2 , C4 }. C2 should be satisfied
in order to satisfy the ACPs C1 ∧ C2 and C2 ∧ C3 . Notice that satisfying C2 is not
sufficient to satisfy the ACPs. The set is minimum since the set obtained by removing
either C2 or C4 does not satisfy the cover relationship.
Algorithm 3 GEN-GRAPH
1: C = φ
2: for Each ACPi ∈ ACPB, i = 1 to Np do
3:
ACP′i ← Convert ACPi to DNF
4:
for Each conjunctive term c of ACP′i do
5:
6:
Add c to C
end for
7: end for
8: //Represent the conditions as a graph
9: G = (E, V ), E = φ, V = φ
10: for Each conjunctive term ci ∈ C, i = 1 to Nc do
11:
Create vertex v, if v ∈ V , for each AC in ci
12:
Add an edge ei between vi and each vertex already added for ci
13: end for
14: Return G
We define the related decision problem as follows.
Definition 5.2.1 (POLICY-COVER) Determine whether ACPB has a cover of
k attribute conditions.
The following theorem states that this problem is NP-complete.
Theorem 5.2.1 The POLICY-COVER problem is NP-complete.
Proof We first show that POLICY-COVER ∈ NP. Suppose that we are given a
set of ACPs ACPB which contains the attribute condition set AC, and integer k.
100
For simplicity, we assume that each ACP is a conjunction of attribute conditions.
However, the proof can be trivially extended to ACPs having any monotonic Boolean
expression over attribute conditions. The certificate we choose has a cover of attribute
conditions AC ′ ⊂ AC. The verification algorithm affirms that |AC ′ | = k, and then it
checks, for each policy in the ACPB, that at least one attribute condition in AC ′ is
in the policy. This verification can be performed trivially in polynomial time. Hence,
POLICY-DECOM is NP.
Now we prove that the POLICY-COVER problem is NP-hard by showing that
the vertex cover problem, which is NP-Complete, is polynomial time reducible to the
POLICY-COVER problem. Given an undirected graph G = (V, E) and an integer k,
we construct a set of ACPs ACPB that has a cover set of size k if and only if G has
a vertex cover of size k.
Suppose G has a vertex cover V ′ ⊂ V with |V ′ | = k. We construct a set of ACPs
ACPB that has a cover of k attribute conditions as follows. For each vertex vi ∈ V ,
we assign an attribute condition Ci . For each vertex vj ∈ V ′ , we construct an access
control policy by obtaining the conjunction of attribute conditions as follows.
• Start with the attribute condition Cj as the ACP Pj
• For each edge (vj , vr ), add Cr to the ACP as a conjunctive literal (For example,
if the edges are (vj , va ), (vj , vb ) and (vj , vc ), we get Pj = Cj ∧ Ca ∧ Cb ∧ Cc )
At the end of the construction we have a set of distinct access control policies
ACPB with size k. We construct the attribute condition set AC = {C1 , C2 , · · · , Ck }
such that Ci corresponds to each vertex in V ′ . In order to satisfy all access control
policies, the attribute conditions in AC must be satisfied. Hence, AC is an attribute
condition cover of size k for the ACPs ACPB.
Conversely, suppose that ACPB has an attribute condition cover of size k. We
construct G such that each attribute condition corresponds to a vertex in G and an
edge between vi and vj if they appear in the same access control policy. Let this
vertex set be V1 . Then we add the remaining vertices to G corresponding to other
101
attribute conditions in the access control policies and add the edges similarly. Since
the access control policies are distinct there will be at least one edge (vi , u) for each
vertex vi in attribute condition cover such that u ∈ V1 . Hence G has a vertex cover
of size V1 = k.
Since the POLICY-COVER problem is NP-complete, one cannot find a polynomial
time algorithm for finding the minimum attribute condition cover. In the following
section we present two approximation algorithms for the problem.
The APPROX-POLICY-COVER1 algorithm 4 takes as input the set of ACPs
ACPB and returns a set of attribute conditions whose size is guaranteed to be no
more than twice the size of an optimal attribute condition cover. APPROX-POLICYCOVER1 utilizes the GEN-GRAPH algorithm 3 to first represent ACPB as a graph.
Algorithm 4 APPROX-POLICY-COVER1
1: G = GEN-GRAPH(ACPB)
2: ACC = φ
3: for Each disconnected subgraph Gi = (Vi , Ei ) of G do
4:
5:
6:
7:
if |Vi | == 1 then
Add ACi corresponding to the vertex to ACC
else
while Ei = φ do
8:
Select a random edge (u, v) of Ei
9:
Add the attribute conditions ACu and ACv corresponding to {u, v} to
ACC.
10:
11:
12:
Remove from Ei every edge incident on either u or v
end while
end if
13: end for
14: Return ACC
102
We give a high-level overview of the GEN-GRAPH algorithm 3. It takes the
ACPB as the input and converts each ACP into DNF (disjunctive normal form). The
unique conjunctive terms are added to the set C. For each attribute condition in
each conjunctive term in C, it creates a new vertex in G and adds edges between the
vertices corresponding to the same conjunctive term. Depending on the ACPs, the
algorithm may create a graph G with multiple disconnected subgraphs.
As shown in the APPROX-POLICY-COVER1 algorithm 4, it takes the ACPB as
the input and outputs a near-optimal attribute condition cover ACC. First the algorithm converts the ACPB to a graph G as shown in the GEN-GRAPH algorithm 3.
Then for each disconnected subgraph Gi of G, it finds the near optimal attribute condition cover and add to the ACC. The attribute condition to be added is related at
random by selecting a random edge in Gi . Once an edge is considered, all its incident
edges are removed from Gi . The algorithm continues until all edges are removed from
each Gi . The running time of the algorithm is O(V + E) using adjacency lists to
represent G. It can be shown that the APPROX-POLICY-COVER1 algorithm is a
polynomial-time 2-approximation algorithm as follows.
Theorem 5.2.2 APPROX-POLICY-COVER1 is a polynomial-time 2-approximation
algorithm.
Proof The above running time analysis already shows that the algorithm runs in
polynomial time. We prove that the AC cover ACC returned by the algorithm is at
most twice the size of an optimal AC cover ACC∗.
Let Ei′ denote the set of edges picked at random by the algorithm for each disconnected subgraph Gi . In order to cover the edges in Ei′ , any AC cover must include at
least one endpoint of each edge in Ei′ . Since once an edge is selected, all the incident
edges are removed, no two edges in Ei′ share an endpoint. Therefore, no two edges in
Ei′ are covered by the same vertex from ACC∗ and we have the following lower bound
103
on the size of the optimal AC cover. Note that if Ei′ is empty, i.e., Gi has only one
vertex, the only attribute condition is included in the AC cover.
|ACC ∗ | ≥
t
(|Ei | + 1)
Each execution of the random edge selection picks an edge for which neither of its
endpoints are already in ACC. Thus, it gives an upper bound on the size of the AC
cover.
|ACC| ≤ 2(
Combining equations and , we get
t
|Ei |) + 1
|ACC| ≤ 2|ACC ∗ |
Hence, we prove the theorem.
We now present the idea behind our second approximation algorithm, APPROXPOLICY-COVER2, which uses a heuristic to select the attribute conditions. This
algorithm is similar to the APPROX-POLICY-COVER1 algorithm 4 except that
instead of randomly selecting the edges to be included in the cover, it selects the
vertex of highest degree and removes all of its incident edges.
Example 4
A hospital (Owner) supports fine-grained access control on electronic health records
(EHRs) and makes these records available to hospital employees (Usrs) through a
public cloud (Cloud). Typical hospital employees includes Usrs playing different roles
such as receptionist (rec), cashier (cas), doctor (doc), nurse (nur), pharmacist (pha),
and system administrator (sys). An EHR document consists of data items including BillingInfo (BI), ContactInfo (CI), MedicationReport (MR), PhysicalExam (PE),
LabReports (LR), Treatment Plan (TP) and so on. In accordance with regulations
such as health insurance portability and accountability act (HIPAA), the hospital
policies specify which users can access which data item(s). In our example system,
104
there are four attributes, role (rec, cas, doc, nur, pha, sys), insurance plan, denoted
as ip, (ACME, MedA, MedB, MedC), type (assistant, junior, senior) and year of service,
denoted as yos, (integer). The following is the re-arranged set of ACPs of the hospital
such that each data item has a unique ACP.
(“role = rec” ∨ (“role = nur” ∧ “type ≥ junior”), CI)
(“role = cas” ∨ “role = pha”, BI)
(“role = doc” ∧ “ip = 2-out-4”, CR)
((“role = doc” ∧ “ip = 2-out-4”) ∨ “role = pha”, TR)
((“role = doc” ∧ “ip = 2-out-4”) ∨ (“role = nur” ∧ “yos ≥ 5”) ∨ “role = pha”, MR)
((“role = nur” ∧ “type ≥ junior”) ∨ (“role = dat” ∧ “type ≥ junior”) ∨ (“role = doc” ∧ “yos
≥ 2”), LR)
((“role = nur” ∧ “type = senior”) ∨ (“role = dat” ∧ “yos ≥ 4”), PE)
role
=
rec
role
=
cas
type
=
senior
role
=
nur
type
>=
junior
role
=
pha
role
=
doc
yos
>=
2
ip
=
2-out-4
Type
>=
junior
role
=
dat
yos
>=
4
yos
>=
5
Figure 5.2.: The example graph
Figure 5.2 shows the graph generated by the GEN-GRAPH algorithm for our running example. Notice that there are 5 disconnected graphs. Assume that APPROXPOLICY-COVER2 algorithm is used to construct the AC cover. As mentioned in
the approximation algorithm, single vertex graphs are trivially included in the AC
cover. The remaining attribute conditions are selected using the greedy heuristic.
105
That gives us the AC cover ACC = { “role = rec”, “role = cas”, “role = pha”, “role
= doc”, “role = nur”, “role = dat”}.
5.2.2 Policy Decomposition
The Owner manages only those attribute conditions in ACC. The Cloud handles
the remaining set of attribute conditions, ACB/ACC. The Owner re-writes its ACPs
such that they cover ACC. In other words, the Owner enforces the parts of the ACPs
related to the ACs in ACC and Cloud enforces the remaining ACs along with some
ACs in ACC. The POLICY-DECOMPOSITION algorithm 5 shows how the ACPs
are decomposed into two sub ACPs based on the attribute conditions in ACC.
Algorithm 5 takes the ACPB and ACC as input and produces the two sets of
ACPs ACPBOwner and ACPB Cloud that are to be enforced at the Owner and the Cloud
respectively. It first converts each policy into DNF and decompose each conjunctive
term into two conjunctive terms such that one conjunctive term has only those ACs in
ACC and the other term may or may not have the ACs in ACC. It can be easily shown
that the policy decomposition is consistent. That is, the conjunction of corresponding
sub ACPs in ACPB Owner and ACPB Cloud respectively produces an original ACP in
ACPB.
Example 5
For our example ACPs, the Owner handles the following sub ACPs.
(“role = rec” ∨ “role = nur” , CI)
(“role = cas” ∨ “role = pha”, BI)
(“role = doc”, CR)
(“role = doc” ∨ “role = pha”, TR)
(“role = doc” ∨ “role = nur” ∨ “role = pha”, MR)
(“role = nur” ∨ “role = dat” ∨ “role = doc”, LR)
(“role = nur” ∨ “role = dat”, PE)
106
Algorithm 5 POLICY-DECOMPOSITION
1: ACPB Owner = φ
2: ACPB Cloud = φ
3: for Each ACPi in ACPB do
4:
Convert ACPi to DNF
5:
ACPi (owner) = φ
6:
ACPi (cloud) = φ
7:
if Only one conjunctive term then
8:
Decompose the conjunctive term c into c1 and c2 such that ACs in c1 ∈ ACC,
ACs in c2 ∈ ACC and c = c1 ∧ c2
9:
ACPi (owner) = c1
10:
ACPi (cloud) = c2
11:
12:
else if At most one term has more than one AC then
for Each single AC term c of ACP′i do
13:
ACPi (owner) ∨= c
14:
ACPi (cloud) ∨= c
15:
end for
16:
Decompose the multi AC term c into c1 and c2 such that ACs in c1 ∈ ACC,
ACs in c2 ∈ ACC and c = c1 ∧ c2
17:
ACPi (owner) ∨= c1
18:
ACPi (cloud) ∨= c2
19:
20:
21:
else
for Each conjunctive term c of ACP′i do
Decompose c into c1 and c2 such that ACs in c1 ∈ ACC, ACs in c2 ∈ ACC
and c = c1 ∧ c2
22:
ACPi (owner) ∨= c1
23:
end for
24:
ACPi (cloud) = ACP′i
25:
end if
26:
Add ACPi (owner) to ACPB Owner
27:
Add ACPi (cloud) to ACPB Cloud
28: end for
29: Return ACPB Owner and ACPB Cloud
107
As shown in Algorithm 5, the Owner re-writes the ACPs that the Cloud should
enforce such that the conjunction of the two decomposed sub ACPs yields an original
ACP. In our example, the sub ACPs that the Cloud enforces look like follows.
(“role = rec” ∨ “type ≥ junior”, CI)
(“role = cas” ∨ “role = pha”, BI)
(“ip = 2-out-4”, CR)
(“ip = 2-out-4” ∨ “role = pha”, TR)
((“role = doc” ∧ “ip = 2-out-4”) ∨ (“role = nur” ∧ “yos ≥ 5”) ∨ “role = pha”, MR)
((“role = nur” ∧ “type ≥ junior”) ∨ (“role = dat” ∧ “type ≥ junior”) ∨ (“role = doc” ∧ “yos
≥ 2”), LR)
((“role = nur” ∧ “type = senior”) ∨ (“role = dat” ∧ “yos ≥ 4”), PE)
5.3 Two Layer Encryption Approach
In this section, we provide a detailed description of the six phases of the TLE
approach introduced in Section 5.1. The system consists of the four entities, Owner,
Usr, IdP and Cloud. Let the maximum number of users in the system be N , the
current number of users be n (< N ), and the number of attribute conditions Na .
5.3.1 Identity Token Issuance
IdPs are trusted third parties that issue identity tokens to Usrs based on their
identity attributes. It should be noted that IdPs need not be online after they issue
identity tokens. An identity token, denoted by IT has the format { nym, id-tag, c,
σ }, where nym is a pseudonym uniquely identifying a Usr in the system, id-tag is
the name of the identity attribute, c is the Pedersen commitment for the identity
attribute value x and σ is the IdP’s digital signature on nym, id-tag and c.
108
5.3.2 Policy Decomposition
Using the policy decomposition algorithm 5, the Owner decomposes each ACP into
at most two sub ACPs such that the Owner enforces the minimum number of attributes
to assure confidentiality of data from the Cloud. The algorithm produces two sets of
sub ACPs, ACPB Owner and ACPB Cloud . The Owner enforces the confidentiality related
sub ACPs in ACPB Owner and the Cloud enforces the remaining sub ACPs in ACPB Cloud .
5.3.3 Identity Token Registration
Usrs register their IT s to obtain secrets in order to later decrypt the data they are
allowed to access. Usrs register their IT s related to the attribute conditions in ACC
with the Owner, and the rest of the identity tokens related to the attribute conditions
in ACB/ACC with the Cloud using the AB-GKM::SecGen algorithm.
When Usrs register with the Owner, the Owner issues them two sets of secrets for
the attribute conditions in ACC that are also present in the sub ACPs in ACPB Cloud .
The Owner keeps one set and gives the other set to the Cloud. Two different sets are
used in order to prevent the Cloud from decrypting the Owner encrypted data.
5.3.4 Data Encryption and Upload
The Owner encrypts the data based on the sub ACPs in ACPBOwner and uploads
them along with the corresponding public information tuples to the Cloud. The Cloud
in turn encrypts the data again based on the sub ACPs in ACPB Cloud . Both parties
execute AB-GKM::KeyGen algorithm individually to first generate the symmetric
key, the public information tuple P I and access tree T for each sub ACP. We now
give a detailed description of the encryption process.
The Owner arranges the sub ACPs such that each data item has a unique ACP.
Note that the same policy may be applicable to multiple data items. Assume that
the set of data items D = {d1 , d2 , · · · , dm } and the set of sub ACPs ACPB Owner =
109
{ACP1 , ACP2 , · · · , ACPn }. The Owner assigns a unique symmetric key, called an ILE
key, KiILE for each sub ACPi ∈ ACPB Owner , encrypts all related data with that
key and executes the AB-GKM::KeyGen to generate the public P Ii and Ti . The
Owner uploads those encrypted data (id, EKiILE (di ), i) along with the indexed public
information tuples (i, P Ii , Ti ), where i = 1, 2, · · · , n, to the Cloud. The Cloud handles
the key management and encryption based access control for the ACPs in ACPB Cloud .
For each sub ACPj ∈ ACPB Cloud , the Cloud assigns a unique symmetric key KjOLE ,
called an OLE key, encrypts each affected data item EKiILE (di ) and produces the tuple
(id, EKjOLE (EKiILE (di )), i, j), where i and j gives the index of the public information
generated by the Owner and the Cloud respectively.
5.3.5 Data Downloading and Decryption
Usrs download encrypted data from the Cloud and decrypt twice to access the
data. First, the Cloud generated public information tuple is used to derive the OLE
key and then the Owner generated public information tuple is used to derive the ILE
key using the AB-GKM::KeyDer algorithm. These two keys allow a Usr to decrypt a
data item only if the Usr satisfies the original ACP applied to the data item.
For example, in order to access a data item di , Usrs download the encrypted data
item EKjOLE (EKiILE (di )) and the corresponding two public information tuples P Ii and
P Ij . P Ij is used to derive the key of the outer layer encryption KjOLE and P Ii used to
derive the key of the inner layer encryption KiILE . Once those two keys are derived,
two decryption operations are performed to access the data item.
5.3.6 Encryption Evolution Management
After the initial encryption is performed, affected data items need to be reencrypted with a new symmetric key if credentials are added/removed or ACPs are
modified. Unlike the SLE approach, when credentials are added or revoked or ACPs
are modified, the Owner does not have to involve. The Cloud generates a new sym-
110
metric key and re-encrypts the affected data items. The Cloud follows the following
conditions in order to decide if re-encryption is required.
1. For any ACP, the new group of Usrs is a strict superset of the old group of Usrs,
and backward secrecy is enforced.
2. For any ACP, the new group of Usrs is a strict subset of the old group of Usrs,
and forward secrecy is enforced for the already encrypted data items.
5.4 Analysis
In this section, we first compare the SLE and the TLE approaches, and then give
a high level analysis of the security and the privacy of both approaches.
5.4.1 SLE vs. TLE
Recall that in the SLE approach, the Owner enforces all ACPs by fine-grained
encryption. If the system dynamics change, the Owner updates the keys and encryptions. The Cloud merely acts as a storage repository. Such an approach has the
advantage of hiding the ACPs from the Cloud. Further, since the Owner performs
all access control related encryptions, a Usr colluding with the Cloud is unable to
access any data item that is not allowed to access. . However, the SLE approach
incurs high overhead. Since the Owner has to perform all re-encryptions when user
dynamics or policies change, the Owner has incurs a high overhead in communication
and computation. Further, it is unable to perform optimizations such as delayed ABGKM::ReKey or re-encryption as the Owner has to download, decrypt, re-encrypt
and re-upload the data, which could considerably increase the response time if such
optimizations are to be performed.
The TLE approach reduces the overhead incurred by the Owner during the initial
encryption as well as subsequent re-encryptions. In this approach, the Owner handles
only the minimal set of attribute conditions and most of the key management tasks are
111
performed by the Cloud. Further, when identity attributes are added or removed, or
the Owner updates the Cloud’s ACPs, the Owner does not have to re-encrypt the data
as the Cloud performs the necessary re-encryptions to enforce the ACPs. Therefore, the
TLE approach reduces the communication and computation overhead at the Owner.
Additionally, the Cloud has the opportunity to perform delayed encryption during
certain dynamic scenarios as the Cloud itself manages the OEL keys and encryptions.
However, the improvements in the performance comes at the cost of security and
privacy. In this approach, the Cloud learns some information about the ACPs.
5.4.2 Security and Privacy
The SLE approach correctly enforces the ACPs through encryption. In the SLE
approach, the Owner itself performs the attribute based encryption based on ACPs.
The AB-GKM scheme makes sure that only those Usrs who satisfy the ACPs can
derive the encryption keys. Therefore, only the authorized Usrs are able to access the
data.
The TLE approach correctly enforces the ACPs through two encryptions. Each
ACP is decomposed into two ACPs such that the conjunction of them is equivalent
to the original ACP. The Owner enforces one part of the decomposed ACPs through
attribute based encryption. The Cloud enforces the counterparts of the decomposed
ACPs through another attribute based encryption. Usr can access a data item only
if it can decrypt both encryptions. As the AB-GKM scheme makes sure that only
those Usrs who satisfy these decomposed policies can derive the corresponding keys,
a Usr can access a data item by decrypting twice only if it satisfies the two parts of
the decomposed ACPs, that is, the original ACPs.
In both approaches, the privacy of the identity attributes of Usrs is assured. Recall
that the AB-GKM::SecGen algorithm issues secrets to users based on the identity
tokens which hide the identity attributes. Further, at the end of the algorithm neither
the Owner nor the Cloud knows if a Usr satisfies a given attribute condition. Therefore,
112
neither the Owner nor the Cloud learns the identity attributes of Usrs. Note that the
privacy does not weaken the security as the AB-GKM::SecGen algorithm makes sure
that Usrs can access the issued secrets only if their identity attributes satisfy the
attribute conditions.
5.5 Experimental Results
In this section we first present experimental results concerning the policy decomposition algorithms. We then present an experimental comparison between the SLE
and TLE approaches.
The experiments were performed on a machine running GNU/Linux kernel version
R CoreTM 2 Duo CPU T9300 2.50GHz and 4 Gbytes memory.
2.6.32 with an Intel�
Only one processor was used for computation. Our prototype system is implemented
in C/C++. We use V. Shoup’s NTL library [37] version 5.4.2 for finite field arithmetic, and SHA-1 and AES-256 implementations of OpenSSL [38] version 1.0.0d for
cryptographic hashing and incremental encryption. We use boolstuff library [54] version 0.1.13 to convert policies into DNF. Adjacency list representation is used to
construct policy graphs used in the two approximation algorithms for finding a near
optimal attribute condition cover.
We utilized the AB-GKM scheme with the subset cover optimization. We used
the complete subset algorithm introduced by Naor et. al. [35] as the subset cover.
We assumed that 5% of attribute credentials are revoked for the AB-GKM related
experiments. All finite field arithmetic operations in our scheme are performed in an
512-bit prime field.
For our experiments, we selected the total number of attribute conditions and
the number of attribute conditions per policy based on past case studies [55, 56].
According to the case studies, the number of attribute conditions varies from 50 for
a web based conference management system to 1300 for a major European bank.
These real systems have upto about 20 attribute conditions per policy. We set the
113
total attribute condition count between 100-1500 and the the attribute conditions
per policy count between 2-20. We generate random Boolean expressions consisting
of conjunctions and disjunctions as policies. Each term in the Boolean expression
represents a attribute condition.
100
95
Random
Greedy
Cover size
90
85
80
75
70
65
60
2
4
6
8
10
12
Num. of ACs per policy
14
16
18
20
18
20
Figure 5.3.: Size of ACCs for 100 attributes
500
480
Random
Greedy
460
Cover size
440
420
400
380
360
340
320
300
2
4
6
8
10
12
14
16
Num. of ACs per policy
Figure 5.4.: Size of ACCs for 500 attributes
Figures 5.3 5.4 5.5 5.6 show the size of the attribute condition cover, that is, the
number of attribute conditions the data owner enforces, for systems having 100, 500,
1000 and 1500 attribute conditions as the number of attribute conditions per policy
is increased. In all experiments, the greedy policy cover algorithm performs better.
114
1000
950
Random
Greedy
Cover size
900
850
800
750
700
650
600
2
4
6
8
10
12
14
16
18
20
18
20
Num. of ACs per policy
Figure 5.5.: Size of ACCs for 1000 attributes
1450
1400
Random
Greedy
1350
Cover size
1300
1250
1200
1150
1100
1050
1000
950
900
2
4
6
8
10
12
Num. of ACs per policy
14
16
Figure 5.6.: Size of ACCs for 1500 attributes
As the number of attribute conditions per policy increases, the size of the attribute
condition cover also increases. This is due to the fact that as the number of attribute
conditions per policy increases, the number of distinct disjunctive terms in the DNF
increases.
Figures 5.7 5.8 shows the break down of the running time for the complete policy
decomposition process for the random and greedy cover algorithms respectively. In
this experiment, the number of attribute condition is set to {100, 500, 1000} and the
maximum number of attribute conditions per policy is set to 5. The total execution
time is divided into the execution times of three different components of our scheme.
115
100
DNF + Graph
Cover
Decompose
80
Time (ms)
60
40
20
0
100
500
Num. of ACs
1000
Figure 5.7.: Policy decomposition time breakdown with the random cover algorithm
The“DNF + Graph” time refers to the time required to convert the policies to DNF
and construct a in-memory graph of policies using an adjacency list. The “Cover”
time refers to the time required to to find the optimal cover and the “Decompose”
time refers to time required to to create the updated policies for the data owner and
the cloud based on the cover. As can be seen from the graphs, most of the time is
spent on finding a near optimal attribute condition cover. It should be noted that the
random approximation algorithm runs faster than the greedy algorithm. One reason
for this behavior is that each time the latter algorithm selects a vertex it iterates
through all the unvisited vertices in the policy graph, whereas the former algorithm
simply picks a pair of unvisited vertices at random. Consistent with the worst-cast
running times, the“DNF + Graph” and “Decompose” components demonstrate near
linear running time, and ‘the ‘Cover” component shows a non-linear running time.
Figure 5.9 reports the average time spent to execute the AB-GKM::KeyGen with
SLE and TLE approaches for different group sizes. We set the number of attribute
116
120
DNF + Graph
Cover
Decompose
100
Time (ms)
80
60
40
20
0
100
500
1000
Num. of ACs
Figure 5.8.: Policy decomposition time breakdown with the greedy cover algorithm
7
Time (in seconds)
6
SLE Owner
TLE Owner
TLE Cloud
5
4
3
2
1
0
100
200
300
400
500
600
700
800
900
1000
Group Size
Figure 5.9.: Average time to generate keys for the two approaches
conditions to 1000 and the maximum number of attribute conditions per policy to
5. We utilize the greedy algorithm to find the attribute condition cover. As seen in
the diagram, the running time at the Owner in the SLE approach is higher since the
Owner has to enforce all the attribute conditions. Since the TLE approach divides
117
45
Time (in milliseconds)
40
35
SLE Owner
TLE Owner
TLE Cloud
30
25
20
15
10
5
0
100
200
300
400
500
600
700
800
900
1000
Group Size
Figure 5.10.: Average time to derive keys for the two approaches
the enforcement cost between the Owner and the Cloud, the running time at the
Owner is lower compared to the SLE approach. The running time at the Cloud in the
TLE approach is higher than that at the Owner since the Cloud performs fine grained
encryption whereas the Owner only performs coarse grained encryption. As shown in
Figure 5.10, a similar pattern is observed in the AB-GKM::KeyDer as well.
118
6 PRIVACY PRESERVING SUBSCRIPTION BASED SYSTEMS
In the last two chapters, our focus was on pull based systems where users pull the content from the third party server. Another popular dissemination model is subscription
based publish subscribe systems. The solutions we propose for pull based systems
cannot directly be applied to subscription based system as they have the additional
requirement of letting the third party server perform content based filtering.
Many systems, including online news delivery, stock quote report dissemination
and weather channels, have been or can be modeled as Content-Based PublishSubscribe (CBPS) systems. Full decoupling of the involved parties, that is, Content Publishers (Pubs), Content Brokers (Brokers) and Subscribers (Subs), in time,
space, and synchronization has been the key [57] to seamlessly scale these systems
on demand. Hence, CBPS systems have the huge potential to be enabled over cloud
computing infrastructures. In a CBPS system, each Sub selectively subscribes to
some Brokers to receive different messages. In the most common setting, when Pubs
publish messages to some Brokers, these Brokers, in turn, selectively distribute these
messages to other Brokers and finally to Subs based on their subscriptions, that is,
what they subscribed to. These systems, in general, follow a push based dissemination
approach, that is, whenever new messages arrive, Brokers selectively distribute the
messages to Subs. Figure 6.1 shows an example CBPS system.
It is not feasible to have a private Broker network for each CBPS system and most
CBPS systems utilize third-party Broker networks which may not be trusted for the
confidentiality of the content flowing through them. Because content represents the
critical resource in many CBPS systems, its confidentiality from third-party Brokers
is important. Consider the popular example of publishing stock market quotes where
Subs pay Pub, that is the stock exchange, either for the types of quotes they wish to
receive or per usage basis. In such a domain, whenever a new stock quote, referred to
119
Third party broker network
Data owners
Pub 1
Users
Bro1
Bro5
Sub1
Bro3
Pub2
Bro2
Sub2
Bro4
Sub3
Notification
Subscription
Figure 6.1.: An example CBPS system
in general as a notification, is published, Brokers selectively send such a notification
only to authorized Subs. Confidentiality is important here because Pubs want to make
sure that only paying customers have access to the quotes. We say that a CBPS
system provides publication confidentiality if Brokers can neither identify the content
of the messages published by Pubs nor infer the distribution of attribute values of the
message 1 . For the stock quote example, in the absence of publication confidentiality,
Brokers may collect stock quotes, re-sell to others, and/or sell derived market data
without any economic incentive to Pubs.
At the same time, the privacy of subscribers is also crucial for many reasons, like
business confidentiality or personal privacy. We say that a CBPS system provides
subscription privacy if Brokers can neither identify what subscriptions Subs made nor
relate a set of subscriptions to a specific Sub. Consider again the stock quote example.
Suppose for example that Sub subscribes to some Brokers for receiving stock quotes
characterized by certain attribute values (e.g. bid price < 2438, 1000 < bid size
< 2000, symbol = “MSFT”, etc.). In the absence of subscription privacy, such a
1
We assume that a message consists of a set of attribute-value pairs.
120
subscription can reveal the business strategy of Sub. Further, Brokers may profile
subscriptions of each Sub and sell them to third parties.
Privacy and confidentiality issues in CBPS have long been identified [6], but little
progress has been made to address these issues in a holistic manner. Most of prior
work on data confidentiality techniques in the context of CBPS systems is based on the
assumption that Brokers are trusted with respect to the privacy of the subscriptions
by Subs [7–9]. However, when such an assumption does not hold, both publication
confidentiality and subscription privacy are at risk; in the absence of subscription
privacy, subscriptions are available in clear text to Brokers. Brokers can infer the content of the notifications by comparing and matching notifications with subscriptions
since CBPS systems must allow them to make such decisions to route notifications.
As more subscriptions become available to Brokers, the inference is likely to be more
accurate. It should also be noted that the above approaches restrict Brokers’ ability
to make routing decisions based on the content of the messages and thus fail to provide a CBPS system as expressive as a CBPS system that do not address security or
privacy issues. Approaches have also been proposed to assure confidentiality/privacy
in the presence of untrusted third-party Brokers. These approaches however suffer
from one or two major limitations [12–14, 58]: inaccurate content delivery, because
of the limited ability of Brokers to make routing decisions based on content; weak
security protocols; lack of privacy guarantees. For example, some of these approaches
are prone to false positives, that is, sending irrelevant content to Subs.
In this chapter, we propose a novel cryptographic approach along with our ABGKM scheme to addresses those shortcomings in CBPS systems. To the best of
our knowledge, no existing cryptographic solution is able to protect both publication
confidentiality and subscription privacy in CBPS systems that address the above
shortcomings. A key design goal of our privacy-preserving approach is to design a
system which is as expressive as a system that does not consider privacy or security
issues. We implement our scheme on top of a popular CBPS system, SIENA [19],
and provide several experimental results in order to show our approach is practical.
121
In summary, our CBPS system exhibits the following properties:
• Notifications and subscriptions are randomized and hidden from Brokers and
secure under chosen-ciphertext attacks.
• Both publication confidentiality and subscription privacy are assured as Brokers
are able to make routing decisions without decrypting subscriptions and notifications. It is the first system to achieve these properties without sharing keys
with Brokers or Subs.
• It supports any type of subscription queries including equality, inequality and
range queries at Brokers.
• The computational cost at Brokers are minimized by judiciously distributing
the work among Pubs and Subs.
The rest of the chapter is organized as follows. Section 6.1 overviews the CBPS
model and the protocols supported by our system. Section 6.2 provides some background knowledge about the main cryptographic primitives used. Section 6.3 provides
a detailed description of the proposed protocols. Section 6.4 reports experimental results for the main protocols as well as the system developed on top of SIENA using
the main protocols.
6.1 Overview
In this section we give an overview of our proposed scheme by showing the interactions between Pubs, Subs and Brokers, and the trust model. Unless otherwise stated,
we describe our approach for one Pub, mainly for brevity. However, our approach can
be trivially applied to a system with any number of Pubs. In practice, all the parties
in a CBPS system are software programs that act on behalf of real entities like actual
organizations or end users, and therefore many of the operations of the protocols we
propose are performed transparently to real entities.
122
Each notification is characterized by a set of Attribute-Value Pairs (AVPs). It
consists of two parts: the actual message in the encrypted form, which we call the
payload message, and a set of blinded AVPs derived from the payload message. As
mentioned earlier, payload message also consists of a set of AVPs. In a blinded
AVP, the value is blinded, but the attribute name remains in clear text. The blinding
encrypts the value in a special way such that it is computationally infeasible to obtain
the value from the blinded values, and that the blinded values are secure under
chosen-ciphertext attacks. We provide details on the blind operation in Section 6.3.
The payload is encrypted using the AB-GKM scheme based on the acps of the Pub.
The AB-GKM scheme makes sure that only those Subs that have valid credentials
can access payload messages. The blinded AVPs are placed in the header and the
payload message is in the body of the notification. There is a one-to-one mapping
between the AVPs in the payload message and the blinded AVPs. Depending on the
representation, each attribute name and its corresponding value may be interpreted
differently.
In an XML-like syntax, a notification has the following format:
<notification>
<header> -- blinded AVPs -- </header>
<body> -- enc. payload message -- </body>
</notification>
Depending on the representation, each attribute name and its corresponding
value may be interpreted differently. For example, the payload could be in a simple property-value format or a complex XML format. If the payload is in XML,
attribute names could be the XPaths and values could be the immediate child nodes
of XPaths. We use the latter for the examples.
A subscription specifies a condition on one of the attributes
2
of the AVPs associ-
ated with the notifications. It is an expression of the form (attr, bval 1 , bval 2 , bval 3 ,
op) where attr is the name of the attribute, bval 1 , bval 2 , bval 3 are the blinded values
2
Note that our approach can easily be extended to subscriptions having multiple attributes.
123
derived from the actual content v and its additive inverse,3 and op is a comparison
operator, either ≥ or <. All the other comparison operators are derived from op.
Note that our approach supports a wide array of conditions including range queries
for numerical attributes and keyword queries for numerical and string attributes.
Example 6
In the stock market quote dissemination system, a payload message, that is, a quote,
looks like:
<q>
<symbol>MSFT</symbol>
<bid>
<price>2328</price>
<size>10000</size>
...
</bid>
<offer>
<price>2355</price>
<size>5000</size>
...
</offer>
</q>
The set of AVPs, as a collection of pairs,
(“/q/symbol”, “MSFT”), (“/q/bid/price”, 2328),
(“/q/bid/size”, 10000),
(“/q/offer/size”, 5000)
(“/q/offer/price”, 2355),
from the payload message is blinded and placed in the header of the notification.
The notification for the above quote includes these blinded values and the encrypted
quote.
3
The additive inverse of a number v ∈ Zm can be represented by the number m − v.
124
6.1.1 Interactions
We now present an overview of the protocols proposed in our CBPS system. The
motivation behind constructing a set of protocols is that they can easily be implemented on top an existing CBPS infrastructure in order to satisfy privacy and security
requirements. In summary, Initialize protocol initializes the system parameters.
Register protocol registers Subs with Pubs. Subscribe protocol subscribes Subs
to Brokers. Publish protocol publishes notifications from Pubs to Brokers. Match
protocol matches notifications with subscriptions at Brokers. Cover protocol finds relationships among subscriptions at Brokers. An important property of the two most
frequently used protocols, Match and Cover, is that they are non-interactive. The
following gives more details of each protocol.
Initialize:
There is a set of system defined public parameters that all Pubs, Brokers and Subs
use. In addition to these parameters, Pubs also generate some public and private parameters that are used for subsequent protocols and publish the public parameters.
If there are several Pubs, each Pub generates its own public and private parameters.
Register:
Subs register themselves with the Pub to obtain a secret value and access tokens. An
access token includes Sub’s identity (id) and allows a Sub to subsequently authenticate itself to the Broker from which it intends to request notifications. An identity is
a pseudonym that uniquely identifies a Sub in the system. The secret value allows a
Sub to derive the key using the KeyDer algorithm of AB-GKM and then decrypt the
payload of notifications.
Subscribe:
In order to assure confidentiality and privacy, unlike in a typical CBPS system, Subs
125
need to perform an additional communication step with Pub to get the subscription
blinded before submitting the subscription to Broker 4 .
After authenticating themselves using access tokens to Pubs, Subs receive the content in their subscriptions blinded by the corresponding Pubs. In this step, Subs
perform as much computation as it can before sending the subscriptions to Pub so
that the overhead on Pubs is minimized. Further, this overhead on Pubs is negligible
as subscriptions are fairly stable and the rate of subscriptions is usually way less than
that of notifications in a typical CBPS system. Once this step is done, Subs authenticate themselves to Brokers without revealing their identities and present these
blinded subscriptions to Brokers. These subscriptions are blinded in such a way that
Brokers do not learn the actual subscription criteria, that is, Brokers cannot decrypt
the blinded values. However, they can perform Match (or Filter),5 and Cover protocols based on the blinded subscriptions. Furthermore, no two subscriptions for the
same value are distinguishable by Brokers. In order to prevent Brokers from linking
different subscriptions from the same Sub, Subs may request for multiple access tokens
such that all these access tokens have the same identity but are indistinguishable. For
each subscription, Subs may present these different valid access tokens so that Subs’
identities are further protected from Brokers.
Publish:
Using the counterparts of the secret values used to blind subscriptions, Pubs blind the
notifications and publish them to some Brokers. A blinded notification has a set of
blinded AVPs and an encrypted payload message. These notifications are blinded in
such a way that Brokers do not learn actual values in the messages, but can perform
Match and Cover protocols based on the subscriptions. Further, no two notifications
for the same content are distinguishable by Brokers.
4
Instead of Pub, a trusted third party may be utilized to blind subscriptions in order to reduce the
load on Pub.
5
We use the terms Match and Filter interchangeably.
126
Match:
For each notification from Pubs, Brokers compare it with Subs’ subscriptions. If there
is a match, that is, the subscription satisfies the notification, Brokers forward the
notification to the correct Subs. The outcome of the Match protocol allows Brokers
to learn neither the notification nor the publication values. It also prevents Brokers
from learning the distribution of the values.
Cover:
For each subscription received from Subs, Brokers check if covering relationship holds
with the existing subscriptions. A subscription S1 covers another subscription S2 if
all notifications that match S2 also match S1 . Finding covering relationships among
subscriptions allows to reduce the size of the subscription tables maintained by each
Broker, and hence improves the efficiency of matching. Like the Match protocol, the
outcome of the Cover protocol does not allow the Brokers to learn the subscription
values nor their distribution.
6.1.2 Trust Model
In the system design, we consider threats and assumptions from the point of
view of Pubs and Subs with respect to third-party Brokers. We assume that Brokers
are honest but curious; they perform PS protocols correctly, but curious to know
what Pubs publish and Subs consume. In other words, they are trusted for these PS
protocols but not for the content in the notifications and subscriptions nor for the
privacy of Subs if they make one or more subscription requests. Further, Brokers may
collude. Pubs are trusted to maintain the privacy of Subs. However, our approach can
be easily modified to relax this trust assumption. Pubs are also trusted to correctly
perform PS protocols and not to collude with any other parties.
127
6.2 Background
Some of the mathematical notions and the cryptographic building blocks which
inspired our approach are described below.
6.2.1 Pedersen Commitment
A cryptographic “commitment” is a piece of information that allows one to commit to a value while keeping it hidden, and preserving the ability to reveal the value
at a later time. The Pedersen commitment [47] is an unconditionally hiding and
computationally binding commitment scheme which is based on the intractability of
the discrete logarithm problem.
Pedersen Commitment
Setup A trusted third party T chooses a multiplicatively written finite cyclic group
G of large prime order p so that the computational Diffie-Hellman problem is hard in
G.6 T chooses two generators g and h of G such that it is hard to find the discrete
logarithm of h with respect to g, i.e., an integer x such that h = g x . It is not required
that T know the secret number x. T publishes (G, p, g, h) as the system parameters.
Commit The domain of committed values is the finite field Fp of p elements, which
can be represented as the set of integers Fp = {0, 1, . . . , p − 1}. For a party U to
commit a value α ∈ Fp , U chooses β ∈ Fp at random, and computes the commitment
c = g α hβ ∈ G.
Open U shows the values α and β to open a commitment c. The verifier checks
whether c = g α hβ .
6
For a multiplicatively written cyclic group G of order q, with a generator g ∈ G, the Computational
Diffie-Hellman problem (CDH) is the following problem: Given g a and g b for randomly-chosen secret
a, b ∈ {0, . . . , q − 1}, compute g ab .
128
6.2.2 Zero-Knowledge Proof of Knowledge (Schnorr’s Scheme)
The zero-knowledge proof of knowledge (ZKPK) protocol used in this paper can
be viewed a natural extension of Schnorr’s scheme [11]. In our proposed approach, we
use ZKPK as a privacy-preserving means of subscriber authentication to the brokers.
As in the case of the Pedersen commitment scheme, a trusted party T generates
public parameters G, p, g, h. A Prover which holds private knowledge of values α and
β can convince a Verifier that Prover can open the Pedersen commitment c = g α hβ as
follows.
1. Prover randomly chooses y, s ∈ F∗p , and sends Verifier the element d = g y hs ∈ G.
2. Verifier picks a random value e ∈ F∗p , and sends e as a challenge to Prover.
3. Prover sends u = y + eα, v = s + eβ, both in Fp , to Verifier.
4. Verifier accepts the proof if and only if g u hv = d · ce in G.
6.2.3 Euler’s Totient Function φ(·) and Euler’s Theorem
Let Z be the set of integers. Let Z+ denote all positive integers. Let m ∈ Z+ . The
Euler’s totient function φ(m) is defined as the number of integers in Z+ less than or
equal to m and relatively prime to m.
Theorem 6.2.1 (Euler’s Theorem) Let m ∈ Z+ . If
gcd(a, m) = 1, then aφ(m) ≡ 1 (mod m).
6.2.4 Composite Square Root Problem
Definition 6.2.1 (Composite square root problem) Let n = pq be a product of
two distinct large primes. The composite square root problem the computational
problem defined as follows: given w ∈ QR, where QR = {y|y = x2 (mod n), x ∈ Z× },
compute x ∈ {1, 2, . . . , n − 1} such that w = x2 (mod n).
129
It is well known that for each w ∈ QR, there are four x ∈ {1, 2, . . . , n − 1} such
that x2 = w (mod n). If the prime factorization of n is known, then there are efficient
algorithms to solve the above problem [59]. However, the problem seems difficult if
the factorization of n is hard. In the construction of our CBPS system, we make use
of the composite square root assumption which is based on this difficulty.
Conjecture 1 (Composite square root assumption) There exists no polynomial
time algorithm to solve the composite square root problem.
6.2.5 Paillier Homomorphic Cryptosystem
The Paillier homomorphic cryptosystem is a public key cryptosystem by Paillier [10] based on the “Composite Residuosity assumption (CRA).” The Paillier cryptosystem is homomorphic in that, by using public key, the encryption of the sum
m1 + m2 of two messages m1 and m2 can be computed from the encryption of m1
and m2 . Our approach and protocols are inspired by how the Paillier cryptosystem
works. Hence, we provide some internal details of the cryptosystem below so that
readers can follow the rest of the paper.
Key generation
Set n = pq, where p and q are two large prime numbers. Set λ = lcm(p − 1, q − 1), i.e.,
the least common multiple of p − 1 and q − 1. Randomly select a base g ∈ Z/(n2 )×
such that the order of gp is a multiple of n. Such a gp can be efficiently found by
randomly choosing gp ∈ Z/(n2 )× , then verifying that
gcd(L(gpλ
(mod n2 ), n)) = 1, where L(u) = (u − 1)/n
(6.1)
�
�−1
for u ∈ Sn = {u < n2 |u = 1 (mod n)}. In this case, set µ = L(gpλ (mod n2 ))
(mod n). The public encryption key is a pair (n, gp ). The private decryption key is
(λ, µ), or equivalently (p, q, µ).
130
Encryption E(m, r)
Given plaintext m ∈ {0, 1, . . . , n − 1}, select a random r ∈ {1, 2, . . . , n − 1}, and
encrypt m as E(m, r) = gpm · rn (mod n2 ). When the value of r is not important to
the context, we sometimes simply write a short-hand E(m) instead of E(m, r) for the
Paillier ciphertext of m.
Decryption D(c)
Given ciphertext c ∈ Z/(n2 )× , decrypt c as
D(c) = L(cλ
(mod n2 )) · µ (mod n).
(6.2)
More specifically, the homomorphic properties of Paillier cryptosystem are:
D(E(m1 , r1 )E(m2 , r2 )
D(g m2 E(m1 , r1 )
D(E(m1 , r1 )k
(mod n2 )) = m1 + m2
(mod n2 )) = m1 + m2
(mod n2 )) = km1
(mod n),
(mod n),
(mod n).
Also note that the Paillier cryptosystem described above is semantically secure against
chosen-plaintext attacks (IND-CPA).
In the construction of our CBPS system, the Paillier homomorphic cryptosystem
is used in a way that public and private keys are judiciously distributed among Pubs,
Subs, and Brokers such that the confidentiality and privacy are assured based on
homomorphic encryption. A detailed description of the construction is presented in
Section 6.3.
6.3 Proposed Scheme
In this section, we provide a detailed description of the privacy preserving CBPS
system we propose. As introduced in Section 6.1, the system consists of 6 protocols:
1) Initialize, 2) Register, 3) Subscribe, 4) Publish, 5) Match, and 6) Cover.
131
6.3.1 Initialize
A trusted party, which could be one of the Pubs, runs a Pedersen commitment
setup algorithm [47] to generate system wide parameters (G, p, g, h). These parameters have the same meaning and purpose as mentioned in Section 6.2. The same party
also runs a key generation algorithm similar to Paillier [10] to generate the parameters
(n, p, q, gp , λ, µ). Only Pubs know the parameters (p, q, λ). The parameters (n, gp , µ)
are public. Note that unlike in Paillier, µ is public in our scheme. The system parameter l is the upper bound on the number of bits required to represent any data
values published, and we refer to it as domain size. For example, if an attribute can
take values from 0 up to 500 (< 29 ), l should be at least 9 bits long. For reasons that
will soon become clear in this section we choose l such that 22l ≪ n.7 In addition
to these parameters, each Pub has a key pair (Kpub , Kpri ) where Kpri is the private
key used to sign access tokens of Subs and Kpub is the public key used by Brokers to
verify authenticity and integrity of them. Each Pub also runs the Setup algorithm of
the AB-GKM scheme to initialize the key management system for encrypting payload
messages to Subs. Each Pub computes two pairs of secret values (em , dm ) and (ec ,
dc ) such that em + dm ≡ 0 (mod φ(n2 )), and ec + dc ≡ 0 (mod φ(n2 )), where φ(·)
is Euler’s totient function and em = ec . Note that we have g em g dm ≡ g ec g dc ≡ 1
(mod n2 ). Pub uses em to blind Paillier encrypted notifications and dm , dc , ec to blind
Paillier encrypted subscriptions.8 Let s be the largest number ∈ Z such that 2s <
n and u ∈ Z such that l < u < s − 1. Finally, each Pub chooses two secret random
values rm , rc ∈ Z such that 1 < rm , rc < 2u−l and rm = rc . These values are used to
prevent Brokers from learning the distribution of the difference of the values that are
being matched. In summary, (G, p, g, h, n, gp , µ, Kpub ) are the public parameters that
all the parties know, (p, q, λ, Kpri , rm , rc , (em , dm ), (ec , dc )) are private parameters of
Pubs. Note that in a practical implementation, most of these parameters can be auto7
8
We use notation a ≪ b to denote that “a is sufficiently smaller than b.”
The “blind” operation will be introduced in Section 6.3.3.
132
generated by a computer program which usually only requires Pub to pre-determine
l depending on the domain of the content of notifications.
6.3.2 Register
As shown in Figure 6.2, each Sub registers itself with Pub by presenting an id
(identity), a pseudonym uniquely identifying Sub. In a real-world system, registration may involve Subs presenting other credentials and/or making payment. Upon
successful registration, Sub executes the SecGen algorithm of the AB-GKM scheme
to obtain a secret s. We omit the details of the AB-GKM based key management
as a detailed application of it is provided in the previous two chapters. During this
protocol, each Sub also obtains its initial access token, a Pedersen commitment signed
by Pub.
An access token allows Sub to authenticate itself to Broker from which it intends
to request notifications as well as to create additional access tokens in consultation
with Pub. To create the first access token, Sub encodes its id as an element (id) ∈ Fp ,
chooses a random a ∈ Fp , and sends the commitment com((id)) = g �id� ha and the values ((id), a). The Pub signs com((id)) and sends the digital signature Kpri (com((id)))
back to the Sub.
Figure 6.2.: Sub registering with Pub
133
6.3.3 Subscribe
During this protocol, Subs inform their interests to Brokers as subscriptions. Before
subscribing to messages, as Figure 6.3 illustrates, Subs must authenticate themselves
to Brokers. Sub gives a zero-knowledge proof of knowledge (ZKPK) of the ability to
open the commitment com((id)) signed by Pub:
ZKPK{((id), a) : com((id)) = g �id� ha }
Figure 6.3.: Sub authenticating itself to Broker
Notice that the ZKPK of the commitment opening does not reveal the identity
of Sub. Further, Sub may use different access tokens by having different random a
values for different subscriptions to prevent Brokers from linking its subscriptions to
one access token
9 10
.
If the ZKPK is successful, Sub may submit one or more subscriptions. Recall
that subscriptions are blinded by Pub before sending to Broker. The subscription
“blinding” functions, bval m , bval c1 , bval c2 are defined as follows:
9
One may use a randomized signature scheme on a committed value [60] to achieve the same objective
at the expense of additional computation cost.
10
Our scheme only provides application level privacy, but not network level privacy. For example,
it does not hide IP addresses. In order to provide network level privacy/anonymity, one needs to
utilize other orthogonal techniques such as Tor [61]
134
Let v be the original subscription.
E(v) = gpv · r1n
(mod n2 )
bval m (E(−v)) = g dm · (E(−v))rm λ
bval c1 (E(−v)) = g dc · (E(−v))rc λ
(mod n2 )
(mod n2 )
bval c2 (E(v)) = g ec · (E(v))rc λ · (E(r))λ
(mod n2 )
(6.3)
(6.4)
(6.5)
where dm , em , rm , dc , ec , rc are generated during Initialize, r in Formula 6.5 is a
random number such that r ≤ min{rc , 2(s−1−u) }.
Sub sends E(v) and E(−v), where v is the original subscription for the attribute
attr, to Pub. Pub sends back the blinded subscription to Sub and Sub sends the
tuple (attr, bval c1 (E(−v)), bval c2 (E(v)), bval m (E(−v)), op) to Broker. The first two
blinded values in the subscription are used by Broker for Cover protocol and the third
one for Match protocol. Note that Sub performs these encryptions to reduced the load
on Pubs. It should also be noted that equality filters in our protocols are treated as
range filters preventing Brokers from distinguishing equality filters from range filters.
For example, in order to subscribe for v = 5, Sub subscriber for a range filter where
v ≤ 5 and v > 4. Except for range filters, each subscription from the same Sub are
treated as disjunctive conditions.
Example 7
Sub wants to get all the notifications with bid price less than 22. The subscription
has the format (“/quote/bid/price”, 346213, 152311, 453280, <) where the second and
third parameters are the blind values of 22 and −22, respectively, for Cover protocol
to use, and the fourth is the blinded value of −22 for Match protocol to use.
6.3.4 Publish
Using em , the counterpart of dm which is used to blind subscriptions for Match
protocol, and other private parameters, Pubs blind the notifications using the function
bval n as defined below.
135
Let x be one value in the notification.
bval n (x) = g em · (E(x))rm λ · E(r)λ
= g em · E((rm x + r)λ)
(mod n2 )
(mod n2 ),
where em and rm are generated during Initialize, r is selected uniformly at random
such that r ≤ min{rm , 2(s−1−u) }.
Pubs publish the blinded notifications to Brokers. A notification has a set of
blinded AVPs and an encrypted payload message. For an illustration purpose, let us
assume these AVPs are numbered from 1 to t, where t is the number of attributes
of the payload message M being considered. The blinded notification looks like
((attr1 , bval n (x1 )), . . . , (attrt , bval n (xt ))), where attri and xi are the ith attribute name
and value respectively.
6.3.5 Match
For each notification from Pub, Broker compares it with Subs’ subscriptions to
make routing decisions. We explain the Match operation for one attribute in the
message, but it can be naturally extended to perform on multiple attributes. If at
least one of the attributes in the message matches, we say that the subscription
matches the notification, and in this case Broker forwards the notification to the
corresponding Subs. For range filters, the conjunction of two corresponding Match
operations is taken.
Let the blinded values be bval n (x) and bval m (E(−v)) that Broker has received
from Pub and Sub, respectively, for an attribute attr with subscription value being v
and notification value being x. Broker computes the following value diff as follows.
diff = L(bvaln (x) · bvalm (E(−v))
(mod n2 )) · µ
(mod n),
where L, µ are public parameters derived from Paillier. Using the diff , Broker makes
the matching decision based on Table 6.1.
136
Table 6.1: Matching decision
diff
Decision
< n/2
x≥v
> n/2
x<v
Before we show that the above computation gives a diff equal to rm · (x − v) + r,
we describe how Match protocol gives the correct matching decision while outputting
a (controlled) random diff value to Broker. Recall that in Initialize, the domain of
the input values is set to 0 ∼ 2l . Therefore, 0 ≤ x, v ≤ 2l . Notice that the difference of
any two values x and v is either between 0 ∼ 2l if the difference is positive, or between
(n−2l ) ∼ n if the difference is negative. Also, notice that the range 2l ∼ (n−2l ) is not
utilized. In order to randomize the difference, we take advantage of this unused range
and multiply the actual difference with a random secret value rm and add another
random value r both selected by Pub. The idea behind rm and r are to first expand
0 ∼ 2l range to 0 ∼ 2u and (n − 2l ) ∼ n to n − 2s ∼ n − nm , and then expand
them to 0 ∼ n/2 and n/2 ∼ n respectively. Thus the difference is randomized, yet it
allows Broker to make correct matching decisions without resulting in false positives
or negatives.
During Match protocol, Broker does not learn the content under comparison. This
is achieved due to the fact that without knowing λ, Broker cannot perform decryption
freely, but is forced to engage into the protocol described below. Not knowing the
values rm and r, Broker does not learn the exact difference of the two values under
comparison as well.
The following shows the correctness of diff . Let
y = bvaln (x) · bvalm (E(−v))
(mod n2 ).
137
y = g em · (E((rm x + r)λ) · g dm · (E(−v))rm λ
(mod n2 )
= g em +dm · {E(rm x + r)) · E(−rm v)}λ
= (E(rm (x − v) + r))λ
(mod n2 )
(mod n2 )
diff = L(y) · µ (mod n) = rm (x − v) + r.
(6.6)
6.3.6 Cover
Subscriptions are categorized into groups based on the covering relationships so
that Brokers can perform Match protocol efficiently. For each subscription received
from Subs, Brokers check if covering relationship holds within the existing subscriptions. If it exists, they add the new subscription to the group with the covering
subscription, otherwise a new group is created for the new subscription.
Notice that we have not used the blinded values bval c1 (E(−v)) and bval c2 (E(v)) in
subscriptions yet. These two values are used in the Cover protocol. In what follows,
we explain how the Cover protocol works.
Let S1 and S2 be two subscriptions for the same attr and compatible op. Two
op’s are compatible if either both of them are of the same type. bval c1 (E(v1 )) and
bval c2 (E(−v1 )) refer to the so far unused blinded values of v1 and of its additive
inverse, respectively, of the subscription S1 . The blinded values bval c1 (E(v2 )) and
bval c2 (E(−v2 )) have similar interpretations.
Broker computes one of the following two values in order to decide the covering
relationship.
diff 1 = L(bvalc2 (E(v1 )) · bvalc1 (E(−v2 ))
(mod n2 )) · µ (mod n)
diff 2 = L(bvalc2 (E(v2 )) · bvalc1 (E(−v1 ))
(mod n2 )) · µ (mod n)
(6.7)
138
diff 1 and diff 2 give results rc · (v1 − v2 ) + r and rc · (v2 − v1 ) + r′ respectively, where
r, r′ are random numbers. Broker uses the same matching Table 6.1 that is used for
making matching decision to make the covering decision. The covering decision for
range filters is performed in a similar way, but we omit the details due to lack of
space. Similar to Match, Brokers do not learn the actual subscription values.
6.3.7 The Distribution of Load
We now briefly explain the rationale behind the distribution of work load among
Pubs, Subs and Brokers. If there are O(N ) notifications and O(S) subscriptions, in
the worst case, Broker needs to perform O(N S) Match protocols. Thus, Brokers have
to perform significantly more work compared to Pubs and Subs in a typical CBPS
system. This is one of the key reasons why the performance of Brokers degrades as the
number of notifications and/or subscriptions in the system increases. By optimizing
for the frequent case, one can achieve a significant overall system improvement. We
followed this well-known design principle to redistribute the load on Brokers partly to
Pubs and Subs. Notice that there are no exponentiation operations in both Match and
Cover protocols. Hence, these protocols can be performed very efficiently. This is
made possible at the cost of extra work at Pubs and Subs. Since the protocols at Pubs
and Subs are executed less frequently compared to those at Brokers, our distribution
leads to a better overall system performance. The experimental results show that the
protocols at Brokers are very efficient and those at Pubs and Subs also run fast.
6.4 Experimental Results
In this section, we present experimental results for various operations and the two
main protocols, Match and Cover, in our system as well as our privacy preserving
CBPS (PP-CBPS) system itself which extends an enhanced SIENA system by implementing privacy preserving matching and covering using our protocols. For the
protocol experiments, we have built a prototype system in Java that incorporates
139
our techniques for privacy preserving Match and Cover protocols as described in Section 6.3.
R CoreTM 2 Duo CPU T9300 2.50GHz
The experiments are performed on an Intel�
machine running GNU/Linux kernel version 2.6.27 with 4 Gbytes memory. We utilize
only one processor for computation. The code is built with Java version 1.6.0. along
with Bouncy Castle lightweight APIs [62] for most cryptographic operations including
the symmetric-key encryption. The Paillier cryptosystem is implemented as in the
paper [10], except that we modified the algorithms to fit our scheme. We first look at
the experiments mainly on the two important protocols, Match and Cover, and then
describe the system experiments performed on PP-CBPS system.
6.4.1 Protocol Experiments
Table 6.2: Average computation time for general operations
Computation
Time (in ms)
Create access token (Sub)
4.21
Open access token (Pub)
4.17
Sign access token (Pub)
4.10
Verify token signature (Broker)
0.36
ZKP of access token (Sub)
4.18
ZKP of access token (Broker)
6.31
Encrypt payload message (Pub)
34.56
Decrypt payload message (Sub)
0.36
In our experiments we vary values of n in Paillier cryptosystem and the domain
size l, and fix the parameters for Pedersen commitment generation, digital signature
generation/verification, zero-knowledge proof of knowledge protocol, and symmetric
key encryption/decryption. In all our experiments we only measure computational
140
cost, and assume the communication cost to be negligible. All data obtained by
our experiments correspond to the average time taken over 1000 executions of the
protocols with varying values for the bit length of n in the Paillier cryptosystem and
the domain size l. We first show the computation time for the general operations in
order to provide a comparative assessment of our protocols.
We compare our protocol results with the well established computations to show
that our approach is efficient and practical.
Table 6.2 shows the average running time for various operations for which we kept
the system parameters constant. Access token creation, opening, signing are performed during Register protocol and based on Pedersen commitment scheme. Pub
signs the access token using SHA-1 and RSA with 1024-bit long private key Kpri .
Verification of the signature on the access token using the public key Kpub , and the
ownership proof of the access token via the ZKPK are performed during Subscribe
protocol. Zero-Knowledge Proof (ZKP) protocols are generally considered time consuming, but in our approach ZKP computation is comparable to other operations in
the system, in that it takes merely a few milliseconds. For the experiments, we set the
payload size to 4 Kbytes and used AES-128 as the symmetric key algorithm. These
performance results demonstrate that the constructs we use and the computations
are very efficient.
In the experiment shown in Figure 6.4, we vary the bit length of n in the Paillier
cryptosystem. Figure 6.4 shows the time to generate blinded subscriptions and notifications whose values are less than 2l where l, the domain size, is fixed at 100, a
reasonably large value. The time to generate blinded values increases as the bit length
of n increases, but even for large bit lengths, it takes only a few milliseconds. The
time required to blind subscription is split into two tasks with the Sub performing
the encryption and the Pub performing the blinding, but to blind notifications, the
Pub performs both operations as one task. We remark that the overall computational
cost can be reduced by employing well-known caching techniques.
141
100
Encrypt Subscription (Sub)
Blind Encrypted Subscription (Pub)
Blind Notification (Pub)
90
80
Time (in ms)
70
60
50
40
30
20
10
0
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
Bit length of n (Paillier)
Figure 6.4.: Time to blind subscriptions/notifications for different bit lengths of n
We measure in our experiment the performance impact on blinding when l, the
domain size, is changed. We fix n to be of length 1024 bits and measure the time to
blind subscriptions and notifications for l = 10, 20, · · · , 100. As shown in Figure 6.5,
the domain size does not significantly affect the performance of the blinding operations. Further, as indicated by both Figure 6.4 and Figure 6.5, the time for either
component of the subscription blinding is less than that for notification blinding.
Since for each subscription, the overhead at the Pub is less compared to the time
required to blind a notification, our decision to blind part of the subscription at the
Pub is comparable to blinding additional notifications.
In a CBPS, Match is the most executed protocol. Hence, it should be very efficient
so as not to overload Brokers. For each Subscribe protocol, Brokers may need to
invoke the Cover protocol and, therefore, we want to have a very efficient Cover
protocol as well. In the following two experiments, we observe the time to perform
these protocols.
Figure 6.6 shows the execution time of Match and Cover protocols as the bit
length of n in the Paillier cryptosystem is changed while the domain size l is fixed
at 100 bits. The time for both protocols increases approximately linearly with the
142
20
Encrypt Subscription (Sub)
Blind Encrypted Subscription (Pub)
Blind Notification (Pub)
Time (in ms)
15
10
5
0
10
20
30
40
50
60
Bit length of content (l)
70
80
90
100
Figure 6.5.: Time to blind subscriptions/notifications for different l
bit length of n. Note that they take only a fraction of a millisecond (less than 100
microseconds) even for large bit lengths of n. This indicates that our Match and
Cover protocols are very efficient for large bit lengths of n.
400
Match (Broker)
Cover (Broker)
350
Time (in microseconds)
300
250
200
150
100
50
0
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
Bit length of n (Paillier)
Figure 6.6.: Time to perform match/cover for different bit lengths of n
143
Figure 6.7 shows the time to execute Match and Cover protocols as the domain
size l is changed while the bit length of n is fixed at 1024. Similar to the blind
computations, computational times remain largely unchanged for different l values.
110
Match (Broker)
Cover (Broker)
Time (in microseconds)
105
100
95
90
10
20
30
40
50
60
Bit length of content (l)
70
80
90
100
Figure 6.7.: Time to perform match/cover for different l
An observation made through all our protocol experiments is that the domain size
l does not significantly affect the computational time of the key protocols Publish,
Subscribe, Match and Cover, but the bit length n of the Paillier cryptosystem does.
However, even for large bit lengths of n, our protocols take only a few microseconds
or milliseconds and thus they are very efficient and practical.
6.4.2 System Experiments
In this section, we provide the experiments performed on our PP-CBPS system.
PP-CBPS is constructed by a freely available popular wide-area event notification
implementation SIENA. SIENA provides a pluggable-architecture that allows to incorporate our protocols to provide Match and Cover operations. All the testing data
are generated uniformly at random. In all the experiments, the average time to match
a notification with a subscription is measured where 1000 notifications are generated
144
each time and the system groups the subscriptions according to the covering relationships at the time of subscription. It should be noted that the matching time does not
include the time to create notifications and subscriptions which is measured in our
protocol experiments in Section 6.4.1.
Figure 6.8 shows the time to perform equality filtering in PP-CBPS (secure matching) and SIENA (plain matching) for different number of subscriptions in the system.
Notifications and subscriptions are drown uniformly from 10 bit random integers. We
use a small domain size to demonstrate the effect of covering on the overall system
with and without security. As can be seen, PP-CBPS performs the matching within
10x of that of SIENA and is still quite efficient to match thousands of subscriptions
within 10 ms. In both cases, the increase in matching time with the number of subscriptions is sub-linear since the covering operation groups the similar subscriptions
together, reducing the number of Match protocols needs to be executed.
12
SIENA
PP-CBPS
10
Time (in ms)
8
6
4
2
0
1000
1500
2000
2500
3000
3500
No. of subscriptions
4000
4500
5000
Figure 6.8.: Equality filtering time
Figure 6.9 shows the time to perform equality filtering in PP-CBPS for two different domain sizes, 10 and 25 bits, of notifications and subscriptions for different
number of subscriptions in the system. It should be noted that SIENA currently does
not support domain sizes larger than 27 bits, but our protocols can work under much
145
larger domains. As can be seen, the matching is more efficient with smaller domains.
This is due to the fact that smaller domains create more covering relationships than
larger domains and, hence, less matching protocols need to be executed to match a
notification against all the subscriptions. Further, observe that the rate of increase
of the overall matching cost decreases as the number of subscriptions increases. This,
again, is due to the covering protocol.
60
l = 25 bits
l = 10 bits
50
Time (in ms)
40
30
20
10
0
1000
1500
2000
2500
3000
3500
No. of subscriptions
4000
4500
5000
Figure 6.9.: Equality filtering time for different domain sizes
Figure 6.10 shows the time to perform inequality filtering in PP-CBPS for two
different domain sizes, 10 and 25 bits, of notifications and subscriptions for different
number of subscriptions in the system. We observe results similar to that of equality
filtering in Figure 6.9. However, notice that the inequality filtering is much more
efficient than equality filtering for the same domain size. This is due to the fact that
inequality subscriptions create more covering relationships than equality subscriptions
requiring much less matching operations.
Even though, according to the protocol experiments in Section 6.4.1, the time to
perform individual Match or Cover operations remains largely constant for different
domain sizes, the overall system performs better with smaller domain sizes. As the
domain size is reduced, there is a higher probability of having subscriptions satis-
146
140
l = 25 bits
l = 10 bits
120
Time (in microsec)
100
80
60
40
20
0
1000
1500
2000
2500
3000
3500
4000
4500
5000
No. of subscriptions
Figure 6.10.: Inequality filtering time for different domain sizes
fying covering relationships. Hence, the number of matching operations need to be
performed reduces considerably leading to a better performance.
147
7 SURVEY OF RELATED WORK
Approaches closely related to our work have been investigated in different areas: group
key management, functional encryption, selective publication of documents, secure
data outsourcing, secret sharing schemes, proxy re-encryption systems, searchable
encryption, secure multiparty computation, and private information retrieval. We
compare our work with these areas below.
7.1 Group Key Management (GKM)
GKM is a widely investigated topic in the context of group-oriented multicast
applications [15,28]. Early work on GKM relied on a key server to share a secret with
users to distribute keys to decrypt documents [22, 23]. Such approaches suffer from
the drawback of sending O(n) rekey information, where n is the number of users, in
the event of join or leave to provide forward and backward secrecy. Hierarchical key
management schemes [24, 25], where the key server hierarchically establishes secure
channels with different sub-groups instead of with individual users, were introduced to
reduce this overhead. However, they only reduce the size of the rekey information to
O(log n), and furthermore each user needs to manage at worst O(log n) hierarchically
organized redundant keys. Similar to the spirit of our approach, there have been
efforts to make rekey a one-off process [27, 28]. It should be noted that the secure
lock approach [26] based on the Chinese Remainder Theorem (CRT) is not a true
broadcast key management scheme. Even though the session key can be updated with
a single broadcast, the scheme still incurs O(n) communication cost for rekeying. To
the best of our knowledge, the approach based on ”n out of m” secret sharing [29,30]
proposed by Berkovits [27] is the first true broadcast scheme. The paper presents
two variants. In both variants, each of the n users are given a secret share and
148
another n + r (where r > 0) shares are given to all the users in the system. In
other words, it creates a n + r + 1 out of 2n + r + 1 secret sharing scheme. A valid
user who has n + r + 1 shares can recover the secret, but others cannot. In the first
variant, each user evaluates n+r +1 equations [29] whereas, in the second variant, the
common n + r shares are pre-evaluated and given only the results to reduce the load
on users [30]. Both variants are correct, but it is not clear what security penalties
proposed variants have due to certain assumptions made about the properties of secret
shares. A recent research effort introduces a related BGKM approach based on access
control polynomials [28]. This approach encodes secrets given to users at registration
phase in a special polynomial of order at least n in such a way that users can derive
the secret key from this polynomial. The special polynomials used in this approach
represent only a small subset of domain of all the polynomials of order n, and the
security of the approach is neither fully analyzed nor proven. Further, it appears that
the security of the scheme weakens as n increases.
7.2 Functional Encryption
Functional encryption [63] is a popular public key cryptographic construct used
to support fine-grained encryption on data. Functional encryption allows to encode
an arbitrary complex access control policy with the encrypted message and allow to
decrypt the message only for those satisfying the policy encoded. There are two subclasses of functional encryption: predicate encryption with public index [16, 64, 65]
and predicate encryption without public index [66, 67].
In predicate encryption with public index schemes, the policy under which the
encryption is performed is public. Unlike the public key cryptosystems, public is not
a random string but some publicly known values that binds to users. The simplest
scheme is called identity based encryption (IBE) where user identity (e.g. email
address) is used as the public key. The idea of IBE was proposed by Shamir [68],
but the first practical constructs proposed by Boneh and Cocks [64, 65]. Attribute
149
based encryption (ABE) is a more expressive predicate encryption with public index
scheme. The concept of ABE, introduced by Sahai and Waters [16], can be considered
as a generalization of IBE. In ABE, the public keys of a user is described by a set of
identity attributes the user has. ABE has two popular variations: Key Policy ABE
(KP-ABE) where encrypted documents are associated with attributes and user keys
with policies [17]; Ciphertext Policy ABE (CP-ABE) where user keys are associated
with attributes and encrypted documents with policies [18]. In either cases the cost
of key management is minimized by using attributes that can be associated with
users. Further, an ABE based approach supports expressive ACPs. However, such an
approach suffers from some major drawbacks. Whenever the group dynamic changes,
the rekeying operation requires to update the private keys given to existing members
in order to provide backward/forward secrecy. This in turn requires establishing
private communication channels with each group member which is not desirable in a
large group setting.
In predicate encryption without pubic index schemes, the policy under which the
encryption is performed is hidden from users. In other words, such schemes preserves
the privacy of the access control policies. Anonymous IBE [69, 70], Hidden Vector
Encryption [66], and Inner product predicate [67] are all fall under such schemes.
Even though they preserve the privacy of the policy, they have limited expressibility
compared to the former schemes and also suffer from the same limitations as the
former approach. Our AB-GKM schemes address this limitation.
7.3 Selective Publishing of Documents
The database and security communities have carried out extensive research concerning techniques for the selective dissemination of documents based on access control policies [71–73]. These approaches fall in the following two categories.
150
1. Encryption of different subdocuments with different keys, which are provided to
users at the registration phase, and broadcasting the encrypted subdocuments
to all users [71, 72].
2. Selective multicast of different subdocuments to different user groups [73], where
all subdocuments are encrypted with one symmetric encryption key.
The latter approaches assume that the users are honest and do not try to access the subdocuments to which they do not have access authorization. Therefore,
these approaches provide neither backward nor forward key secrecy. In the former
approaches, users are able to decrypt the subdocuments for which they have the keys.
However, such approaches require all [71] or some [72] keys be distributed in advance
during user registration phase. This requirement makes it difficult to assure forward
and backward key secrecy when user groups are dynamic with frequent join and leave
operations. Further, the rekey process is not transparent, thus shifting the burden of
acquiring new keys on existing users when others leave or join. Having identified these
problems, our preliminary work [20], proposes an approach to make rekey transparent
to users by not distributing actual keys during the registration phase. However, the
security of the approach is not analyzed and it cannot handle large user groups.
7.4 Secure Data Outsourcing
With the increasing utilization of cloud computing services, there has been a real
need to access control the encrypted data stored in an untrusted third party. Our
work falls into this category. There has been some recent research efforts [74, 75] to
construct privacy preserving access control systems by combining oblivious transfer
and anonymous credentials. The goal of such work is similar to ours but we identify
the following limitations. Each transfer protocol allows one to access only one record
from the database, whereas our approach does not have any limitation on the number
of records that can be accessed at once since we separate the access control from the
authorization. Another drawback is that the size of the encrypted database is not
151
constant with respect to the original database size. Redundant encryption of the same
record is required to support ACPs involving disjunctions. However, our approach
encrypts each data item only once as we have made the encryption independent of
ACPs. Yu et al. [76] proposed an approach based on ABE utilizing PRE (Proxy ReEncryption) to handle the revocation problem of ABE. While it solves the revocation
problem to some extent, it does not preserve the privacy of the identity attributes as
in our approach.
7.5 Secret Sharing Schemes
Secret sharing schemes split a shared secret among a group of users by giving
secret shares to users and allow them to combine their secrets in a specific way and
obtain the shared secret. Shamir [29] proposed the first secret sharing scheme, (n, k)threshold scheme, where k users out of n can construct a unique polynomial f (x) of
degree k − 1 and recover the shared secret f (0). Since the definition of such scheme,
several extensions have been proposed [30, 77, 78]. A major difference between GKM
protocols and secret sharing schemes is that the former are designed to allow any
individual group member to obtain a shared secret by itself, and no persistent secure
communication channel is assumed between valid group members, whereas the latter
are to prevent a single group member from gaining the secret alone, and require a
secure communication channel, when group members combine the secret shares, to
protect the shared secret from being learned by parties outside the group.
7.6 Proxy Re-Encryption Systems
In a proxy re-encryption system one party A delegates its decryption rights to
another party B via a third party called a “proxy.” More specifically, the proxy
transforms a ciphertext computed under party A’s public key into a different ciphertext which can be decrypted by party B with B’s private key. In such a system
neither the proxy nor party B alone can obtain the plaintext. A direct application of
152
the proxy re-encryption system does not solve the problem of CBPS: with the proxy
as the Broker, it does not by default have the capability of selectively making contentbased routing decisions. However, it might still be possible to use proxy re-encryption
as a building block in the construction of a CBPS system for data confidentiality.
7.7 Searchable Encryption
Search in encrypted data is a privacy-preserving technique used in the outsourced
storage model where a user’s data are stored on a third-party server and encrypted
using the user’s public key. The user can use a query in the form of an encrypted
token to retrieve relevant data from the server, whereas the server does not learn any
more information about the query other than whether the returned data matches the
search criteria. There have been efforts to support simple equality queries [79, 80]
and more recently complex ones involving conjunctions and disjunctions of range
queries [81]. These approaches cannot be applied directly to the CBPS model.
7.8 Secure Multiparty Computation (SMC)
SMC allows a set of participants to compute the value of a public function using
their private values as input, but without revealing their individual private values to
other participants. The problem was initially introduced by Yao.Since then improvements have been proposed to the initial problem [82,83]. SMC solutions rely on some
form of zero-knowledge proof of knowledge (ZKPK) or oblivious transfer protocols
which are in general interactive. Interactive protocols are not suitable for the CBPS
model. Hence SMC solutions do not work for the CBPS model. Further, these solutions usually have a higher computational and/or communication cost which may
not be acceptable for a CBPS system.
153
7.9 Private Information Retrieval (PIR)
A PIR scheme allows a client to retrieve an item from a database server without
revealing which item is retrieved. Approaches of PIR assume either the server is
computationally bounded, where the problem reduces to oblivious transfer, or there
are multiple non-cooperating servers each having the same copy. Having only two
communication parties, PIR schemes are not directly applicable to the Pub-Sub-Broker
architecture of the CBPS model. Moreover, similar to SMC solutions, PIR schemes
in general have a higher communication complexity which may not be acceptable for
a CBPS system.
154
8 SUMMARY
In this dissertation, we defended our thesis that with novel group key management
and cryptographic techniques we can construct privacy preserving fine grained access
control on third party data management systems while assuring the confidentiality
of data and preserving the privacy of users. We proposed solutions under two of
the most popular dissemination models: pull based service model and subscription
based publish-subscribe model. Having identified the drawbacks and issues in the
existing key management systems for supporting privacy preserving attribute based
access control, we first proposed a novel key management scheme called AB-GKM.
Using the AB-GKM scheme along with existing and new cryptographic constructs,
we constructed privacy preserving access control on both pull and subscription based
models based on encryption.
While this dissertation provides an extensive investigation of privacy preserving
access control for pull and subscription based dissemination systems, there are a number of problems and challenges that needs to be solved. We briefly look at some of
them below:
Privacy preserving in the relational model:
Under the relational model, generally referred to as Database-as-a-service (DBaaS),
the third party server provides a relational database to store data. With the popularity of third party services such as Amazon RDS and Microsoft SQL Azure there is
a timely need to assure the confidentiality of sensitive data and the privacy of users
while supporting relational functions. The challenge is to use encryption that enforces
acps as well as allows to perform relational queries on encrypted data.
155
Content based access control in the pull model:
In the pull based model, we investigated mechanisms supporting only content independent ACPs. More expressive systems support access control based on both identity
and content attributes. An example policy may look like “A doctor can access the
data belonging to her patients only”. Additional mechanisms are required to support
content based ACPs while assuring the confidentiality of sensitive data and the privacy of users.
Providing accountability while preserving privacy:
Another important issue is how to build accountability in to third party dissemination systems while preserving the privacy. The problem is challenging as it involves
the conflicting goals of privacy and traceability. In order to balance the privacy and
accountability, we need new traitor tracing schemes. The solution to the problem
should preserve the privacy of benign users (i.e. writes cannot be traced to the user
who made them) as long as they follow the third party service provider’s terms of use.
However users should become traceable (i.e. an illegal write can be traced to a user)
if they deviate from those terms of use. Previous research addressing this problem is
very limited [84, 85]. Further, these approaches rely on a trusted third-party (TTP)
which escrows the identity of the user to the service provider. For example, each user
write is accompanied with the identity encrypted with TTP’s public key. If the service
provider finds an illegal write, it asks the TTP to escrow the identity by decrypting
the message. In such a setting, users need to trust the TTP to reveal their identity
to service provider only if their writes violate the terms of use and need to trust the
service provider not to make false identity escrow requests to the TTP. Having a TTP
(or a set of TTPs) is the ideal model and it is well known that relying on this ideal
model is vulnerable if the above trust assumptions cease to hold (for example, one of
the parties is controlled by an adversary). Answers need to be found to the questions
“How to identify a breach of terms of use and encode it as a well-defined rule?” and
“How to preserve the privacy of good users while providing accountability?”.
156
Exploiting the relationship among acps/attribute conditions:
In many systems, acps and attribute conditions exhibit partial order relationships. For
example, hierarchical policies are used in many domains. The most common example
of such hierarchies is Role Based Access Control (RBAC) models [86]. Our AB-GKM
scheme does not consider relationship among acps or attribute conditions. Due to
the non-linear cost associated with KeyGen algorithm of AB-GKM, one can improve
the efficiency of KeyGen by breaking the problem into a set of smaller problems and
using the relationship among ACPs to derive keys. It is challenging to exploit the
relationships among ACPs while preserving the privacy of users.
Privacy preserving access control on big data systems:
Big Data technologies such as Apache Hadoop are increasingly being used to store
and/or analyze sensitive data. In order to comply with various regulations and organizational policies, such data needs to be stored encrypted and the access to them
needs to be controlled based on the identity attributes of users. However, most of
the existing third party systems utilizing traditional key management schemes provide either no or limited assurance of confidentiality and privacy. The challenge is to
handle large volume of data and many users in an efficient manner while assuring the
confidentiality of data and preserving the privacy of users who use such services.
LIST OF REFERENCES
157
LIST OF REFERENCES
[1] Liberty Alliance. http://www.projectliberty.org/ [Last accessed: July 18,
2012].
[2] OpenID. http://openid.net/ [Last accessed: July 18, 2012].
[3] Microsoft Windows CardSpace.
http://windows.microsoft.com/en-us/
windows-vista/Windows-CardSpace [Last accessed: July 18, 2012].
[4] Higgins Open Source Identity Framework. http://www.eclipse.org/higgins/
[Last accessed: July 18, 2012].
[5] R. Richardson. CSI Computer Crime and Security Survey. Technical report,
Computer Security Institute, 2008.
[6] W. Chenxi, A. Carzaniga, D. Evans, and A.L. Wolf. Security issues and requirements for internet-scale publish-subscribe systems. In HICSS 2002: Proceedings
of the 35th Annual Hawaii International Conference on System Sciences, pages
3940–3947, Jan 2002.
[7] E. Bertino, B. Carminati, E. Ferrari, B. Thuraisingham, and A. Gupta. Selective
and authentic third-party distribution of XML documents. IEEE Transactions
on Knowledge and Data Engineering, 16(10):1263–1278, Oct. 2004.
[8] M. Srivatsa and L. Liu. Securing publish-subscribe overlay services with eventguard. In CCS 2005: Proceedings of the 12th ACM conference on Computer and
Communications Security, pages 289–298, 2005.
[9] M. Nabeel and E. Bertino. Secure delta-publishing of XML content. In ICDE,
2008. Proceedings of the IEEE 24th International Conference on Data Engineering, pages 1361–1363, Apr 2008.
[10] P. Paillier. Public-key cryptosystems based on composite degree residuosity
classes. In EUROCRYPT 1999: Proceeding of the 18th International Conference on the Theory and Application of Cryptographic Techniques, pages 223–238,
1999.
[11] C.P. Schnorr. Efficient identification and signatures for smart cards. In Proceedings of the 8th CRYPTO Conference on Advances in Cryptology, pages 239–252,
1989.
[12] C. Raiciu and D. S. Rosenblum. Enabling confidentiality in content-based publish/subscribe infrastructures. In Proceedings of the Securecomm and Workshops,
pages 1–11, 2006.
158
[13] M. Srivatsa and L. Liu. Secure event dissemination in publish-subscribe networks.
In ICDCS 2007: Proceedings of the 27th International Conference on Distributed
Computing Systems, pages 22–33, 2007.
[14] K. Minami, A. J. Lee, M. Winslett, and N. Borisov. Secure aggregation in a
publish-subscribe system. In WPES 2008: Proceedings of the 7th ACM workshop
on Privacy in the electronic society, pages 95–104, 2008.
[15] Y. Challal and H. Seba. Group key management protocols: A novel taxonomy.
International Journal of Information Technology, 2(2):105–118, 2006.
[16] A. Sahai and B. Waters. Fuzzy identity-based encryption. In EUROCRYPT
2005: Procedings of the 25th Annual International Cryptology Conference on
Advances in Cryptology, pages 457–473, 2005.
[17] V. Goyal, O. Pandey, A. Sahai, and B. Waters. Attribute-based encryption for
fine-grained access control of encrypted data. In CCS 2006: Proceedings of the
13th ACM Conference on Computer and Communications Security, pages 89–98,
2006.
[18] J. Bethencourt, A. Sahai, and B. Waters. Ciphertext-policy attribute-based
encryption. In SP 2007: Proceedings of the 28th IEEE Symposium on Security
and Privacy, pages 321–334, 2007.
[19] A. Carzaniga, D. S. Rosenblum, and A. L. Wolf. Design and evaluation of a
wide-area event notification service. ACM Transaction on Computer Systems,
19(3):332–383, 2001.
[20] N. Shang, M. Nabeel, F. Paci, and E. Bertino. A privacy-preserving approach
to policy-based content dissemination. In ICDE 2010: Proceedings of the 2010
IEEE 26th International Conference on Data Engineering, 2010.
[21] M. Nabeel, N. Shang, and E. Bertino. Privacy preserving policy based content
sharing in public clouds. IEEE Transactions on Knowledge and Data Engineering, 2012.
[22] H. Harney and C. Muckenhirn. Group key management protocol (GKMP) specification. Technical report, Network Working Group, United States, 1997.
[23] H. Chu, L. Qiao, K. Nahrstedt, H. Wang, and R. Jain. A secure multicast protocol with copyright protection. SIGCOMM Computer Communication Review,
32(2):42–60, 2002.
[24] C.K. Wong and S.S. Lam. Keystone: A group key management service. In ICT
2000: Proceedings of the International Conference on Telecommunications, 2000.
[25] A.T. Sherman and D.A. McGrew. Key establishment in large dynamic groups
using one-way function trees. IEEE Transactions on Software Engineering,
29(5):444–458, May 2003.
[26] G. Chiou and W. Chen. Secure broadcasting using the secure lock. Software
Engineering, IEEE Transactions on, 15(8):929–934, Aug 1989.
[27] S. Berkovits. How to broadcast a secret. In EUROCRYPT 1991: Proceedings
of the 10th annual international conference on Advances in Cryptology, pages
535–541, 1991.
159
[28] X. Zou, Y. Dai, and E. Bertino. A practical and flexible key management mechanism for trusted collaborative computing. In INFOCOM 2008: The 27th Conference on Computer Communications, pages 538–546, 2008.
[29] A. Shamir. How to share a secret. ACM Communications, 22(11):612–613, 1979.
[30] E. F. Brickell. Some ideal secret sharing schemes. In EUROCRYPT 1989: Proceedings of the workshop on the theory and application of cryptographic techniques
on Advances in cryptology, pages 468–475, 1990.
[31] O. Goldreich. Foundations of cryptography: Basic tools. Cambridge University
Press, New York, NY, USA, 2000.
[32] M. Bellare and P. Rogaway. Random oracles are practical: A paradigm for designing efficient protocols. In CCS 1993: Proceedings of the 1st ACM conference
on Computer and communications security, pages 62–73, 1993.
[33] S Goldwasser, S Micali, and C Rackoff. The knowledge complexity of interactive
proof-systems. In STOC 1985: Proceedings of the seventeenth annual ACM
symposium on Theory of computing, pages 291–304, 1985.
[34] D. Dummit and R. Foote. Gaussian-Jordan elimination. In Abstract Algebra,
page 404. Wiley, 2nd edition, 1999.
[35] D. Naor, M. Naor, and J. B. Lotspiech. Revocation and tracing schemes for stateless receivers. In CRYPTO 2001: Proceedings of the 21st Annual International
Cryptology Conference on Advances in Cryptology, pages 41–62, 2001.
[36] D. Halevy and A. Shamir. The LSD broadcast encryption scheme. In CRYPTO
2001: Proceedings of the 22nd Annual International Cryptology Conference on
Advances in Cryptology, pages 47–60, 2002.
[37] V. Shoup. NTL library for doing number theory. http://www.shoup.net/ntl/
[Last accessed: July 18, 2012].
[38] OpenSSL the open source toolkit for SSL/TLS. http://www.openssl.org/
[Last accessed: July 18, 2012].
[39] N. Shang, M. Nabeel, E. Bertino, and X. Zou. Broadcast group key management
with access control vectors. Technical report, Department of Computer Science,
Apr 2010.
[40] M. Nabeel and E. Bertino. Attribute based group key management. Technical
Report CERIAS TR 2010, Purdue University, 2010.
[41] M. Pirretti, P. Traynor, P. McDaniel, and B. Waters. Secure attribute-based
systems. In CCS 2006: Proceedings of the 13th ACM Conference on Computer
and Communications Security, pages 99–112, 2006.
[42] XML in clinical research and healthcare industries. http://xml.coverpages.
org/healthcare.html [Last accessed: July 18, 2012].
[43] M. Eichelberg, T. Aden, J. Riesmeier, A. Dogac, and G. B. Laleci. A survey
and analysis of electronic healthcare record standards. ACM Computer Survey,
37(4):277–315, 2005.
160
[44] J. Bethencourt, A. Sahai, and B. Waters. Ciphertext policy attribute based encryption library. http://http://acsc.cs.utexas.edu/cpabe/ [Last accessed:
July 18, 2012].
[45] B. Lynn. Pairing based cryptography library. http://crypto.stanford.edu/
pbc/ [Last accessed: July 18, 2012].
[46] J. Li and N. Li. OACerts: Oblivious attribute certificates. IEEE Transactions
on Dependable and Secure Computing, 3(4):340–352, 2006.
[47] T.P. Pedersen. Non-interactive and information-theoretic secure verifiable secret sharing. In CRYPTO 1991: Proceedings of the 11th Annual International
Cryptology Conference on Advances in Cryptology, pages 129–140, 1992.
[48] L. Sweeney. k-anonymity: A model for protecting privacy. International Journal
on Uncertainity Fuzziness Knowledge-Based Systems, 10(5):557–570, 2002.
[49] Mihir Bellare, Oded Goldreich, and Shafi Goldwasser. Incremental cryptography:
The case of hashing and signing. In Proceedings of the 14th Annual International
Cryptology Conference on Advances in Cryptology, pages 216–233, 1994.
[50] Enrico Buonanno, Jonathan Katz, and Moti Yung. Incremental unforgeable
encryption. In FSE 2001: Revised Papers from the 8th International Workshop
on Fast Software Encryption, pages 109–124, 2001.
[51] N. Shang. G2HEC: A Genus 2 Crypto C++ Library. http://www.math.purdue.
edu/~nshang/libg2hec.html [Last accessed: July 18, 2012].
[52] F. Paci, N. Shang, E. Bertino, K. Steuer Jr., and J. Woo. Secure transactions’
receipts management on mobile devices. In Symposium on Identity and Trust on
the Internet (IDtrust Symposiums), Apr 2009.
´ Schost. Construction of secure random curves of genus 2 over
[53] P. Gaudry and E.
prime fields. In EUROCRYPT 2004: Advances in Cryptology, pages 239–256,
2004.
[54] Boolstuff, A boolean expression tree toolkit.
boolstuff.html [Last accessed: July 18, 2012].
http://sarrazip.com/dev/
[55] A. Schaad, J. Moffett, and J. Jacob. The role-based access control system of a
european bank: a case study and discussion. In SACMAT 2001: Proceedings of
the sixth ACM symposium on Access control models and technologies, pages 3–9,
2001.
[56] K. Fisler, S. Krishnamurthi, L. A. Meyerovich, and M. C. Tschantz. Verification
and change-impact analysis of access-control policies. In ICSE 2005: Proceedings
of the 27th international conference on Software engineering, pages 196–205,
2005.
[57] P. Eugster, P.A. Felber, R. Guerraoui, and A. Kermarrec. The many faces of
publish/subscribe. ACM Computing Survey, 35(2):114–131, 2003.
[58] S. Choi, G. Ghinita, and E. Bertino. A privacy-enhancing content-based publish/subscribe system using scalar product preserving transformations. In DEXA
2010: Proceedings of the 21st Conference on Database and Expert Systems Applications, 2010.
161
[59] H. Cohen. A course in computational algebraic number theory, chapter 1.5, pages
31–36. Springer-Verlag, 1993.
[60] J. Camenisch and A. Lysyanskaya. Signature schemes and anonymous credentials
from bilinear maps. In Proceedings of the 23rd CRYPTO Conference on Advances
in Cryptology, pages 56–72, 2004.
[61] Roger D., Nick M., and Paul S. Tor: The second-generation onion router. In
USENIX 2004: In Proceedings of the 13th Usenix Security Symposium, 2004.
[62] Bouncycastle. Bouncy Castle Crypto APIs. http://www.bouncycastle.org/
[Last accessed: July 18, 2012].
[63] D. Boneh, A. Sahai, and B. Waters. Functional encryption: definitions and
challenges. In TCC 2011: Proceedings of the 8th conference on Theory of cryptography, pages 253–273, 2011.
[64] D. Boneh and M. Franklin. Identity-based encryption from the weil pairing. In
CRYPTO 2001: Proceedings of the 21st Annual International Cryptology Conference on Advances in Cryptology, pages 213–229, 2001.
[65] C. Cocks. An identity based encryption scheme based on quadratic residues.
In Proceedings of the 8th IMA International Conference on Cryptography and
Coding, pages 360–363, 2001.
[66] D. Boneh and B. Waters. Conjunctive, subset, and range queries on encrypted
data. In TCC 2007: Proceedings of the 4th conference on Theory of cryptography,
pages 535–554, 2007.
[67] J. Katz, A. Sahai, and B. Waters. Predicate encryption supporting disjunctions,
polynomial equations, and inner products. In EUROCRYPT 2008: Proceedings
of the theory and applications of cryptographic techniques 27th annual international conference on Advances in cryptology, pages 146–162, 2008.
[68] A. Shamir. Identity-based cryptosystems and signature schemes. In Proceedings
of CRYPTO 84 on Advances in cryptology, pages 47–53, 1985.
[69] M. Abdalla, M. Bellare, D. Catalano, E. Kiltz, T. Kohno, T. Lange, J. MaloneLee, G. Neven, P. Paillier, and H. Shi. Searchable encryption revisited: Consistency properties, relation to anonymous ibe, and extensions. Jurnal of Cryptology, 21(3):350–391, March 2008.
[70] C. Gu, Y. Zhu, and H. Pan. Information security and cryptology. In Dingyi
Pei, Moti Yung, Dongdai Lin, and Chuankun Wu, editors, Inscrypt, chapter
Efficient Public Key Encryption with Keyword Search Schemes from Pairings,
pages 372–383. 2008.
[71] E. Bertino and E. Ferrari. Secure and selective dissemination of XML documents.
ACM Transaction Information System Security, 5(3):290–331, 2002.
[72] G. Miklau and D. Suciu. Controlling access to published data using cryptography.
In VLDB ’2003: Proceedings of the 29th international conference on Very large
data bases, pages 898–909. VLDB Endowment, 2003.
162
[73] A. Kundu and E. Bertino. Structural signatures for tree data structures. Proceeding of VLDB Endowment, 1(1):138–150, 2008.
[74] S. Coull, M. Green, and S. Hohenberger. Controlling access to an oblivious
database using stateful anonymous credentials. In Irvine: Proceedings of the 12th
International Conference on Practice and Theory in Public Key Cryptography,
pages 501–520, 2009.
[75] J. Camenisch, M. Dubovitskaya, and G. Neven. Oblivious transfer with access
control. In CCS 2009: Proceedings of the 16th ACM conference on Computer
and communications security, pages 131–140, 2009.
[76] S. Yu, C. Wang, K. Ren, and W. Lou. Attribute based data sharing with attribute
revocation. In ASIACCS 2010: Proceedings of the 5th ACM Symposium on
Information, Computer and Communications Security, pages 261–270, 2010.
[77] J.C. Benaloh and J. Leichter. Generalized secret sharing and monotone functions. In CRYPTO 1988: Proceedings of the 8th Annual International Cryptology
Conference on Advances in Cryptology, pages 27–35, 1990.
[78] T. Pedersen. Non-interactive and information-theoretic secure verifiable secret
sharing. In CRYPTO 1991: Proceeding of 1991 CRYPTO Conference on Advances in Cryptology, volume 576 of Lecture Notes in Computer Science, pages
129–140, 1992.
[79] D. X. Song, D. Wagner, and A. Perrig. Practical techniques for searches on
encrypted data. In SP 2000: Proceedings of the 2000 IEEE Symposium on
Security and Privacy, pages 44–55, 2000.
[80] D. Boneh, G. Crescenzo, R. Ostrovsky, and G. Persiano. Public-key encryption
with keyword search. In EUROCRYPT 2004: Proceedings of the 2004 EUROCRYPT on Advances in Cryptology, 2004.
[81] D. Boneh and B. Waters. Conjunctive, subset, and range queries on encrypted
data. Theory of Cryptography, pages 535–554, May 2007.
[82] M. J. Freedman, K. Nissim, and B. Pinkas. Efficient private matching and
set intersection. In EUROCRYPT 2004: Proceeding of the 2004 EUROCRYPT
Conference on Advances in Cryptology, 2004.
[83] I. Damgård, M. Geisler, and M. Kroigard. Homomorphic encryption and secure
comparison. International Journal on Applied Cryptology, 1(1):22–31, 2008.
[84] L. Buttyán and J. Hubaux. Accountable anonymous access to services in mobile
communication systems. In SRDS 1999: Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems, pages 384–394, 1999.
[85] M. Backes, J. Camenisch, and D. Sommer. Anonymous yet accountable access
control. In WPES 2005: Proceedings of the 4th ACM Workshop on Privacy in
the Electronic Society, 2005.
[86] R. Sandhu, E. Coyne, H. Feinstein, and C. Youman. Role-based access control
models. IEEE Computer, 29(2):38–47, 1996.
VITA
163
VITA
Contact Information
Department of Computer Science
Voice: 765-337-2645 (Mobile)
Purdue University
Email: nabeel(at)cs.purdue.edu
305 N. University St., W. Lafayette, Indi-
http://www.cs.purdue.edu/ nabeel
ana 47907
Research/Development Interests
Data privacy, Context-aware security, distributed systems & security, database
systems & security, information security and applied cryptography in general
Education
Purdue University, West Lafayette, IN,
Aug. 2008 - Aug. 2012
Ph.D. in Computer Science
Advisor: Elisa Bertino
Dissertation: Privacy Preserving Access Control for Third-Party Data Management
Systems
Purdue University, West Lafayette, IN,
Aug. 2006 - May 2008
M.S. in Computer Science, GPA: 3.8/4.0
Advisor: Elisa Bertino
University of Moratuwa, Moratuwa, Sri Lanka,
Feb. 2000 - Mar. 2004
B.Sc. with Honors in Computer Science & Engineering, GPA: 4.0/4.0, Rank:1/500
164
Honors and Awards
• Recipient of Purdue Research Foundation grant.
2011 - 2012
• Recipient of Purdue Cyber Center research grant.
2010 - 2011
• Fulbright Fellow at Purdue University.
2006 - 2008
• PHP PECL Axis2/C Committer, (Later the project was moved to wso2.org).
2006
• Apache Committer for the Axis2/C project.
2006
• UNESCO Team Gold Medal Award for the Highest Class Average in B.Sc.
Engineering.
2004
• TP De S Munasinghe Award for the Highest Class Average in B.Sc. Computer Sci. & Eng.
• Silver Medal, National Best Quality Software Competition, Sri Lanka.
2004
2005
• Gold Medal Award for the All-Island Highest Aggregate (rank = 1) in Mathematics Stream, G.C.E A/L, Sri Lanka.
1998
Publications
Conference Publications
1. Mohamed Nabeel, Elisa Bertino, Privacy Preserving Delegated Access Control in
the Data-as-a-Service Model. In IEEE International Conference on Information
Reuse and Integration (IRI), 2012.
2. Mohamed Nabeel, Ning Shang, Elisa Bertino, Efficient Privacy-Preserving Publish Subscribe Systems. In ACM Symposium on Access Control Models and
Technologies (SACMAT), 2012.
3. Mohamed Nabeel, David Stork, Oblivious Tree-based Classification in the Cloud.
Under Review.
165
4. Mohamed Nabeel, Elisa Bertino, Murat Kantarcioglu, Bhavani Thuraisingham,
Towards Privacy Preserving Access Control in the Cloud. In IEEE International
Conference on Collaborative Computing (CollaborateCom), 2011.
5. Mohamed Nabeel, Elisa Bertino, Towards Attribute Based Group Key Management. In ACM Conference on Computer and Communication Security (CCS),
2011 (Poster paper).
6. Mohamed Nabeel, Ning Shang, John Zage, Elisa Bertino, Mask: A System for
Privacy-Preserving Policy-Based Access to Published Content. In ACM International Conference on Management of Data (SIGMOD), 2010 (Demo paper).
7. Ning Shang, Mohamed Nabeel, Federica Paci, Elisa Bertino, A Privacy-Preserving
Approach to Policy-Based Content Dissemination. International Conference on
Data Engineering (ICDE), 2010.
8. Mohamed Nabeel, Elisa Bertino, Secure Delta-Publishing of XML Content. In
International Conference on Data Engineering (ICDE), 2008 (Poster paper).
Journal Publications
1. Mohamed Nabeel, Elisa Bertino, Privacy Preserving Delegated Access Control
in the Cloud. Under Review In IEEE Transaction on Knowledge and Data
Engineering (TKDE).
2. Mohamed Nabeel, Elisa Bertino, Attribute Based Group Key Management.
Under Review In IEEE Transactions on Dependable and Secure Computing
(TDSC).
3. Mohamed Nabeel, Elisa Bertino, Privacy Preserving Policy Based Content Sharing in Public Clouds. Under Review In IEEE Transaction on Knowledge and
Data Engineering (TKDE).
166
Projects
Secure Advanced Metering Infrastructure Project
Current
An industry collaborated project to secure the communication links in Advanced Metering Infrastructure (AMI).
CloudMask Project
Current
A research project to build a privacy preserving cloud based storage/data service that
protects the privacy of the users who access the service as well as the data stored in
the cloud.
Ionomics Atlas
2011
Ionomics Atlas is a research project that provides a Google map based interface to find
relationship among ionomic, genetic and environmental information for Arabidopsis
Thaliana plant population. It is available to the public at http://ibnkhaldun.cs.
purdue.edu:8348/ionomicsatlas/.
Mask Project
2010
A research project to build the first system addressing the seemingly-unsolvable problem of how to selectively share contents among a group of users based on access control
policies expressed as conditions against the identity attributes of these users while
at the same time assuring the privacy of these identity attributes from the content
publisher. (C/C++/Java/Abstract Algebra)
Cancer Care Engineering Project
2010
A research project to model cancer-care systems, build educational tools and an interactive community. Have been involved in the project as a research assistant to
build certain components of the project.
167
Smart Pump Informatics
2009
A research project to mine patterns in sensitive information collected from infusion
devices (smart pumps) installed in different hospitals. Involved in the project as a
summer intern 2009 to build certain components.
An Efficient Group Key Management (GKM) Scheme
2009
Designed and Implemented a new GKM scheme which is efficient and secure under
frequent join and leave operations. C/C++, NTL library for implementing a novel
GKM scheme, OpenSSL for cryptographic functions. Developed as part of a research
paper for ICDE 2010.
A Scalable Routing Protocol to Distribute Hierarchically Organized Data
2008 - 2009
Designed and implemented a complete system which introduces the novel concept of
hierarchically organized routing tables. Java, XML and related technologies, overlay
networks. Developed as part of a research paper for DocEng 2009.
Secure Delta-Publishing of XML Documents
2008
Designed and implemented a complete Publish-Subscribe system to incrementally disseminate XML documents while preserving confidentiality and integrity. Java, XML
and related technologies including XML encryption and digital signatures, overlay
networks. Developed as part of the ICDE 2008 conference paper.
Apache Axis2/C
2006 - 2007
A high-performance open source Web Services middleware in C. Was part of the team
in 2006 (Earned the Apache committership for my work). C, Web Services standards,
middleware.
WSO2 WSF/PHP
2006
168
A high-performance open source Web Services middleware for PHP built on Apache
Axis2/C in C. Initiated the project in early 2006 and was part of the team in 2006.
C, PHP, Web Services standards, middleware, extension development for PHP.
Electronic Trading System
2004 - 2005
Responsible for design and development of several components of a trading system
which is deployed in multiple high-profile stock exchanges. C/C++, various data
specification used to disseminate trades and quotes, trading business logic.
PHPlus Web Application Development Framework
2004
A PHP based web application development framework which allow to design and
develop application logic in parallel. Developed as an undergraduate research project
and was part of the team of 4 in 2004. C/C++, PHP, XML, HTML, framework
development.
Work Experience
Purdue University, West Lafayette, IN, USA.
Research Assistant
Aug. 2011 - Present
Have been involved in projects on privacy preserving group key management, secure
and privacy preserving cloud storage services, and privacy preserving publish subscribe systems.
Rosen Center for Advanced Computing, West Lafayette, IN, USA.
Research & Development Intern.
May 2011 - Aug. 2011
Involved in devising policies to make a healthcare project HIPAA complaint and implementing the policies for the project. Also involved in research projects to analyze
efficiency of the electric vehicles in Indiana and analyze recent earthquakes in Chile.
169
Cyber Center, West Lafayette, IN, USA.
Graduate Research Assistant
Aug. 2010 - May 2011
Designed and developed a web-based system to find correlations among ionomic, genetic and environmental information of plant populations. The system is available at
http://ibnkhaldun.cs.purdue.edu:8348/ionomicsatlas/.
Ricoh Innovations Inc., Menlo Park, CA, USA.
Research & Development Intern.
May 2010 - Aug. 2010
Designed and developed techniques and complete systems to obliviously perform classification of data on an untrusted remote third-party server.
Rosen Center for Advanced Computing, West Lafayette, IN, USA.
Graduate Research Assistant
Aug. 2009 - May 2010
Involved in a health-care research project called ccehub.org, the goal of which is to
model cancer-care systems, build educational tools and an interactive community.
Rosen Center for Advanced Computing, West Lafayette, IN, USA.
Research & Development Intern.
May 2009 - Aug. 2009
Involved in a health-care research project called Smart Pump Informatics to mine
patterns in sensitive information collected from infusion devices (smart pumps) installed in different hospitals in Indiana.
Purdue University, West Lafayette, IN, USA.
Teaching Assistant
Aug. 2008 - May 2009
Conducted labs, designed assignments and graded assignments for the courses CS 426
(Computer Security), CS 251 (Data Structures & Algorithms), and CS 541 (Database
Management Systems) under different instructors.
170
Purdue University, West Lafayette, IN, USA.
Research Assistant
May 2008 - Aug. 2008
Conducted research to find an efficient and scalable approach to selectively disseminate portions of XML documents to different users confirming to access control policies. Developed a prototype to demonstrate the approach.
WSO2 Inc., Colombo, Sri Lanka.
Senior Software Engineer
Jan. 2006 - Jul. 2006
Actively participated in the development of popular open source Apache Axis2/C Web
Services engine. Earned Apache committership for my work. Initiated the project of
PHP Web Services (WSF/PHP).
Millennium Information Technologies Inc., Malabe, Sri Lanka.
Software Engineer
Mar 2004 - Dec. 2005
Actively participated in the design and development of back-end software for international capital markets. Was mainly responsible for designing and developing external
feed gateways which need to handle high volume of data and very high data rates.
Guided several employees in this area.
Colombo University, Colombo, Sri Lanka.
Part-time Instructor
Mar 2004 - Dec. 2005
Taught undergraduate level courses Network & System Administration and Object
Oriented Programming for an external bachelor’s program by Colombo university for
several groups of students.
CodegenIT Inc., Colombo, Sri Lanka.
Software Engineering Intern
Jan. 2003 - Jun. 2003
Actively participated in the design and development of back-end software for travel
and hospitality industry.
171
Organizations and Clubs
• Fulbright Association, Purdue University.
Aug. 2006 - Present
Secretary/Web Master
Aug. 2007 - May 2008
Treasurer/Web Master
Aug. 2006 - May 2007
• Graduate Student Board (GSB), Purdue University.
First year representative
Aug. 2006 - May 2007
• Web Team, University of Moratuwa, Sri Lanka.
Web Developer
Jan. 2002 - Dec. 2002
• Computer Society, University of Moratuwa , Sri Lanka.
Committee member
Jan. 2003 - Dec. 2003
Professional Activities
• ACM, student member
• IEEE, student member
• CODASPY poster track committee member
• Conference and Jouranl reviewer
– ACM Symposium on Access Control Models and Technologies (SACMAT)
– International Conference on Distributed Computing Systems (ICDCS)
– Very Large Data Bases (VLDB)
– International Conference on Data Engineering (ICDE)
– Annual International Conference on Financial Cryptography and Data Security (FC)
– ACM Symposium on Information, Computer and Communications Security
(ASIACCS)
– Annual Computer Security Applications Conference (ACSAC)
– Extending Database Technology (EDBT)
172
– ACM Conference on Data and Application Security and Privacy (CODASPY)
– IEEE International Symposium on Policies for Distributed Systems and Networks (POLICY)
– IEEE Transaction on Knowledge and Data Engineering (TKDE)
– IEEE Transactions on Dependable and Secure Computing (TDSC)
– IEEE Transactions on Information Forensics and Security
– International Journal of Information Security