WO2018080857A1

WO2018080857A1 - Systems and methods for creating, storing, and analyzing secure data

Info

Publication number: WO2018080857A1
Application number: PCT/US2017/057075
Authority: WO
Inventors: Joseph Yannaccone
Original assignee: Panoptex Technologies, Inc.
Priority date: 2016-10-28
Filing date: 2017-10-18
Publication date: 2018-05-03

Abstract

This invention relates to systems and methods for creating, storing, and analyzing secure data.

Description

SYSTEMS AND METHODS FOR CREATING, STORING, AND

ANALYZING SECURE DATA

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority from U.S. Provisional Patent Application Ser. No. 62/414,081 , filed October 28, 2016, which is hereby incorporated by reference in its entirety.

FIELD OF THE INVENTION

[0002] This invention relates to systems and methods for creating, storing, and analyzing secure data stores. In one form of a method embodiment, a user uploads a set of client data and set of correlations between data fields from a client subsystem to a gateway subsystem. The gateway subsystem applies a schema to generate a data structure from the client data that is then transformed. The transformed data structure is then transmitted and stored in a secure database on a server subsystem that is administratively separate from the gateway subsystem. Coded keys and schema solutions are only stored on the gateway subsystem, and are not accessible by the client subsystem or the server subsystem.

BACKGROUND OF THE INVENTION

[0003] Data is being collected and stored at unprecedented volumes. This trend is forecast to continue for the foreseeable future. Further, this data and the information within can be the lifeblood of the organizations that generate and use it. Thus, data analytics has become an important component of both everyday operations and the research and development work that enables organizations to evolve as demands change.

[0004] Complicating the issue of data analytics is the fact that the data to be analyzed may contain personal, financial, or other types of information that raises confidentiality concerns. Further, government regulations regarding data privacy are becoming increasingly strict and security breaches of data subject to such regulations can be extremely costly, and continue to grow in frequency and severity from an expanding variety of attack vectors. Added to these issues are insider threats, which render existing data security practices vulnerable. Insider threats have been exposed as a critical weakness with limited well-tested methods for proactive defense.

[0005] To balance the need to utilize data analytics against the need to protect the confidentiality and privacy of the data to be analyzed, improved systems and methods of performing analytics on big data sets are needed. Organizations are seeking methods that enable them to perform meaningful and timely analytics on massive-scale data sets without compromising security or violating privacy restrictions. This creates a conundrum as data hiding and meaningful analysis have generally been mutually exclusive. The problem this presents is to provide systems and methods for performing analytics on data that has been secured prior to the commencement of the analytics process so that a breach of the secure data will not compromise the confidentiality and privacy of the underlying data.

[0006] Several methods for securing data are known. For example, hashing and encryption are two methods for securing information when it is being transmitted on the Internet or stored at rest. They can both help satisfy regulatory requirements such as those under PCI DSS, HIPAA-HITECH, GLBA, ITAR, and the EU GDPR. While hashing and encryption are both effective data obfuscation technologies, they are not the same thing, and each technology has its own strengths and weaknesses. In some cases, such as with electronic payment data, both encryption and hashing are used to secure the end-to-end process.

Tokenization/Data Hashing

[0007] Traditionally, hashing, when applied to data security, is the process of substituting a sensitive data element with a non-sensitive equivalent from which it is mathematically infeasible to reverse engineer the equivalent to determine the semantic content of the original data element. In traditional hashing, a very minor change in a data element is likely to result in a dramatic change to the hashed value. As a result, where secure hashing is used, two values with the same semantic meaning, but slight variations in form, will result in very different hashed values. Tokenization is the process of substituting a sensitive data element with a non-sensitive equivalent, referred to as a token, that has no extrinsic or exploitable meaning or value. Where such techniques are used, creation and maintenance of a token lookup table (a token store or token vault) has historically been required to store mappings between tokens and actual values for de-tokenization to be possible.

[0008] U.S. Patent Number 9,336,256 to Boukobza shows a data tokenization system that includes a token vault. Boukobza discloses a database network router ("DN "), which serves as the an intermediary node between an application and a tokenized database. The database communicates directly with a token vault through the use of a DNR software agent running on the database that parses and executes commands received as part of database access requests from the DNR. Through the use of the DNR and the DNR software agent, the application can be decoupled from the database, and the burden of integrating tokenization APIs and performing tokenization or de-tokenization functions can be shifted to the DNR and the DNR software agent. Additionally, by removing the tokenization and de-tokenization functions from the application, multiple tokenization vendors can utilize the DNR and a DNR software agent as the interface between their own databases and applications.

[0009] Tokenization methods like that disclosed in Boukobza have drawbacks. Such methods require a lookup table or token store/vault that contains mappings between tokens and actual values. These token stores or vaults may contain sensitive data in cleartext and represent an alternate breach target. And, for large databases, translation tables can become costly to scale and maintain due to the need for synchronization and high availability. It is also difficult (and slow) to exchange data, since such methods require direct access to a token vault mapping token values.

[0010] As is described above, hashing is the process of transforming a string of characters or the value into a (usually) shorter fixed-length value or key that represents the original string. Hashing can be used to determine if there has been a minor change in the original since every such change will result in a different hashed value. Hashing can also be used to index and retrieve items in a database because it is faster to find the item using the shorter hashed key than to find it using the original value.

[0011] A hash function is any function that can be used to map data of arbitrary size to data of fixed size. The values returned by a hash function are called hash values, hash codes, digests, or simply "hashes." One use of hashed values is a data structure called a hash table, widely used in computer software for rapid data lookup. Hashing, however, is a one-way function that scrambles plain text to produce a unique message digest. With a properly designed algorithm, there is no way to reverse the hashing process to reveal the original data. Popular hashing algorithms include MD5, SHA-0, SHA-1, and SHA-2. Encryption, as discussed below, is a two-way function; what is encrypted can be decrypted with the proper key. Using hashing thus encrypts data in a one-way fashion. If the universe of values being hashed is known, an attacker that has access to the same hashing algorithm can build rainbow tables (i.e., pre-calculated tables for a specific hash) of possible known values to search for target data. Hashing, thus, can be insecure where there is a reasonably finite quantity of input values and a known hash function. Hashing can be secure, however, when either the hash function is unknown or the universe of input values is too large for the creation of a useful rainbow table.

Encryption/Homomorphic Encryption [0012] Encryption is the process of using an algorithm to transform plain text information into a non-readable form called ciphertext. An algorithm and an encryption key are required to decrypt the information and return it to its original plain text format. Today, SSL encryption is commonly used to protect information as it is transmitted on the Internet. Using built-in encryption capabilities of operating systems or third-party encryption tools, millions of people encrypt data on their computers to protect against the accidental loss of sensitive data in the event their computer is stolen. Encryption can be used to thwart government surveillance and theft of sensitive corporate data when a secure algorithm and reasonable encryption keys are used. Most encryption techniques involve encrypting data at rest but not in use (i.e., data is encrypted while stored on a disk, but decrypted before or during processing). Popular encryption algorithms include DES, RSA, AES, Blowfish, and Twofish, as well as others known to those of skill in the art. Encryption can be performed with software such as PGP, or with custom software or functions built into operating systems or encryption APIs.

[0013] Traditional encryption methods have drawbacks. For example, these techniques often co-lpcate the data and the keys in the same security domain, which allows a single breach to access both keys and data. Security methods and systems that do not co-locate data and keys in the same security domain are therefore desired.

[0014] For example, U.S. Patent Number 9,087,212 to Balak ishnan et al. discusses a sequence of steps for implementing database confidentiality. A database proxy intercepts SQL queries and rewrites the queries to execute on encrypted data. When faced with a first threat, the system guards against an external attacker with full access to the data stored in a database management system ("DBMS") server. The attacker is assumed to be passive, i.e., wants to learn confidential data, but does not change queries issued by the application, query results, or the data in the DBMS. That threat includes DBMS software compromises, root access to DBMS machines, and even access to the RAM of physical machines. The proxy stores a secret master key MK, a database schema, and the current encryption layers of all columns. The system executes SQL queries over encrypted data. The DBMS machines and administrators are not trusted, but the application and the proxy are trusted. The system enables the DBMS server to execute SQL queries on encrypted data almost as if it were executing the same queries on plaintext data so that existing DBMSes do not need to be changed. The DBMS query plan for an encrypted query is typically the same as for the original query, except that the operators comprising the query, such as selections, projections, joins, aggregates, and orderings, are performed on ciphertexts, and use modified operators in some cases. The DBMS server returns the (encrypted) query result, which the proxy decrypts and returns to the application.

[0015] Homomorphic encryption is a form of encryption that allows computations to be carried out on ciphertext, thus generating an encrypted result which, when decrypted, matches the result of operations performed on the plaintext. These techniques imply the requirement of one or more (partially) homomorphic encryption schemes to manipulate column (field) values in a meaningful way while encrypted.

[0016] For example, the presentation "Private Database Queries Using Somewhat Homomorphic Encryption" by Boneh, et al. discloses a private database query system. A database server is split into two entities (called "server" and "proxy"), and privacy holds only so long as these two entities do not collude.. Boneh encodes the database as one or more polynomials, manipulating these polynomials using a clients' query so as to obtain a new polynomial whose roots are the indexes of the matching records. For every attribute-value pair (a, v) in the database, the inverted index contains a record (tg, Enc(A(x))) where tg is a tag, computed as tg=Hash("a=v"), and A(x) is a polynomial whose roots are exactly the records indexes r that contain this attribute-value pair. In the basic three-party protocol, given a query SELECT * FROM db WHERE al=vl AND ^■ · · AND at=vt the client (with oblivious help from the server) computes the tags tgi=Hash("ai=vi") and sends them to the proxy.

[0017] Homomorphic encryption techniques like that described in Boneh also have several drawbacks. Computations are highly compute-intensive and have been demonstrated to perform poorly even at small scale. Further, values in these techniques are mutable and may be changed or have encrypted values computed as the result of homomorphic computations.

[0018] A security model that does not suffer from these and the foregoing drawbacks is therefore desired, particularly for use in data analytics applications. More particularly, a model that relies on proven, strong encryption algorithms, includes simple logic operations that have been demonstrated to work efficiently at massive scale so that they can be easily distributed for parallel computing across many systems, and does not operate on the original data, but rather, a network of encrypted values that represent the original data, is desired.

SUMMARY OF THE INVENTION

[0019] Embodiments of solutions to the problems described above include improved systems and methods of performing data analytics.

[0020] In one form, a system according to the present invention comprises a database designed to support data encryption in transit, at rest, and in use. The system applies a transformation process to represent user data as a network of ciphered facts. This network of facts can be queried and analyzed using instructions that have been ciphered using the same process that ciphered the data itself. The transformed data is a complete representation of the original pre-transformed data and can be transformed back into its original cleartext format without the need to maintain cumbersome translation tables such as those required by data masking and tokenization mechanisms. [0021] In another form, a system according to the present invention is divided into three administratively distinct subsystems that ensures no single domain breach can result in the loss of useable data. The subsystems are referred to "client," "gateway," and "server."

[0022] The client subsystem, for example, creates and submits queries and analytical routines in user-readable format. The gateway subsystem, for example, stores schema and secret information, performs data transformation processes, and enforces access control policies. The server subsystem, for example, writes the transformed encrypted data to storage, and executes distributed read queries and analysis operations on encrypted data. In one embodiment, queries and analysis routines originate in the client subsystem, are transformed in the gateway subsystem, and are then executed by server subsystem. The server system collects the results of read queries and analyses and relays them to the gateway subsystem, where they may be transformed and filtered using data access policies before being delivered in user-readable format to the requesting client subsystem.

[0023] In yet another form, a system according to the present invention comprises client, gateway, and server subsystems. The client subsystem is the connection point between users or external applications and the system. The gateway subsystem stores and secures a database schema and secret keys, performs data transformation and encryption, and enforces data access policies. The server subsystem provides bulk data storage and distributed- computing resources to perform advanced queries and analysis upon stored, secure data.

[0024] The gateway subsystem serves as the interconnection service between the client subsystem and server subsystem. The users and applications accessing the client subsystem create queries and analysis routines and forward them to gateway subsystem. The gateway subsystem transforms queries and analysis routines received from the client subsystem into a ciphered-representation and relays the representation to the server subsystem. The server subsystem executes the queries and analysis operations and returns one or more result sets to the gateway subsystem. Upon receipt of each result set, the gateway subsystem may transform the ciphered data back into its original form, apply any fine-grained data access policies that are configured, and relay the policy-filtered results to the client subsystem and on to the requesting users and applications.

[0025] The three-subsystem architecture makes it possible to distribute the components necessary for a threat actor to obtain sensitive data into separate administrative domains. The gateway subsystem does not share secret keys or schema information with the server subsystem. While such an embodiment can work with one client, multiple clients each having access to a subset of the source data can also be used. In such embodiments, each client subsystem has access to a subset of the schema, but never has access to the secret keys. All operations within the server subsystem are performed using a ciphered network of facts, ciphered-query and analysis parameters, and pre-defined collections of logic instructions. As a result, users and systems within the server subsystem never have access to any cleartext data, secret keys or schema information; users and systems within the client subsystem never have access to bulk data or secret keys; and users and systems within the gateway subsystem does not have the ability to create queries or directly access bulk data. Collectively, these benefits ensure that even a complete breach of any single subsystem cannot yield any useable data.

[0026] In addition to the security benefits provided by the multiple-subsystem architecture, systems according to the present invention may provide audit reporting in both the gateway and server subsystems. For example, the gateway subsystem audit may report the original query and a non-repudiated query signature provided by the originating client. The query signature is included with the transformed query when it is relayed to the server subsystem. The server subsystem report includes the query signature and the identity of the gateway subsystem that relayed it. Reports from the two subsystems can be correlated to identify and alert security staff of discrepancies or unusual activity.

[0027] In another form, the system may include a gateway subsystem that possesses one or more schemas (correlations between data fields within a set of data) in a schema store and secret key information. For example, the gateway subsystem comprises a schema that provides a mapping of cleartext data field descriptors together with a set of encryption keys. The gateway subsystem further includes a transformation function, which uses a schema to create a schema-based data structure and then one-way encrypts the structure without use of a token table to create secure data. The secure data has semantic meaning only within the confines of the gateway subsystem, where the schema and keys are present, digitally stored, and/or accessible. The gateway subsystem does not need to otherwise store the cleartext data after it has been transformed.

[0028] The gateway subsystem communicates the secure data to a server subsystem that is administratively separate from the gateway subsystem. The server subsystem does not have access to the schema or secret keys. The server subsystem may be used to perform analytics in a secure manner by analyzing the secure data for patterns, which can correspond to patterns in the cleartext data. In this manner, the analytics are performed on a dataset that comprises secure data (in particular, transformed values) and need not comprise any cleartext data. Were a breach of the server subsystem and secure data to occur, the secure data would likely be useless to the attacker.

[0029] In another form, the system of the present invention allows a user to upload client data to a gateway subsystem, and either be provided with, or select, a schema. The gateway subsystem has a transformation function that takes the client data, along with the data fields, and identification values assigned to designated correlations, and generates a schema-based data structure. The transformation function then one-way encrypts the schema-based unique data structure (i.e., makes it mathematically infeasible to recover the cleartext data from the encrypted values without semantic meaning) to create secure data. The encryption may in some instances be the result of a secure hashing algorithm applied to the schema-based data structure. The values that are hashed may not be the entire original data object, but instead, some selected subset of the original data object (i.e., some combination of fields). The exact fields that are extracted and discretized prior to hashing would be defined as part of the schema. The secure data is then uploaded to a separate but secure server subsystem for storage in a secured database (e.g., a cloud storage service) and/or analysis. In such embodiments the input values may be single values, strings, or n-tuples of values, preferably discretized as part of the transformation process.

[0030] Analysis may be conducted on the secure data (e.g., the hashed values) as they exist in the database of the server subsystem. This is accomplished by virtue of a user having an understanding of what correlations and data fields were defined in the schema, and structuring a query based on that understanding. The gateway subsystem, which has the schema, can further include a query engine that receives a query from a user, and transforms the query into a query of the secure data. This may be referred to as the "transformation" of the user's query (as opposed to transformation of the input data), and results in a query format consistent with the format of the secure data as it is stored in the secure database of the server subsystem. The original raw or cleartext input client data is not available for that query, because the schema and keys are only available on the gateway subsystem where the data and queries were transformed. The results of the analysis performed on the secure data is then returned to gateway subsystem, which can then either "reverse transform" the results, to provide raw data correlation to the querying user, or pass the results to the querying user without decoding them. Correlation is possible because each normalized set of transformed values has a unique identifier that can be used by the gateway subsystem to retrieve the cleartext data that resulted in the transformed data in question. Because the database of the server subsystem contains only secure data, the database may be stored in a less secure cloud- hosting or other location if desired, while the gateway subsystem and client subsystem are in different security domains.

BRIEF DESCRIPTION OF THE DRAWINGS

[0031] Other features in the invention will become apparent from the attached drawings, which illustrate certain embodiments of the apparatus of this invention, wherein:

[0032] Fig. 1 is a system diagram of an exemplary embodiment according to an aspect of the invention.

[0033] Fig. 2 illustrates a process flow for transforming and storing data in a database according to one embodiment of the invention.

[0034] Fig. 3 illustrates a process flow for querying data stored in a database according to one embodiment of the invention.

[0035] Fig. 4 is a system diagram of an exemplary embodiment for querying already-stored data according to an aspect of the invention.

[0036] Fig. 5 is a system diagram of another exemplary embodiment according to an aspect of the invention.

DETAILED DESCRIPTION

[0037] In the following description, for purposes of explanation, numerous examples and specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be evident, however, to one skilled in the art, that the present disclosure as expressed in the claims may include some or all of the features in these examples alone or in combination with other features described below, and may further include modifications and equivalents of the features and concepts described herein. [0038] Referring now to Fig. 1, an exemplary embodiment of a system according to an aspect of the invention is illustrated by a system having a client subsystem 100, a gateway subsystem 101 , and a server subsystem 102.

[0039] Client system 100 is a subsystem that may be used directly by clients for querying and analytic operations. Through the client subsystem 100, users are capable of sending queries to the gateway subsystem 101 in order to read or write data. The client subsystem 100 may be accessed, for example, by a user on the Internet using a client application. Alternatively, the client subsystem 100 may be accessed by a user using a client console deployed at the user's premises.

[0040] The client subsystem 100 comprises a client agent that is used to establish a connection between the client subsystem and the gateway subsystem for interactions to take place. In some embodiments, once a connection is established, the client agent is able to send query requests for writing and reading to the gateway subsystem. In one embodiment, this may take place after an authentication service has been used to clear that the user trying to establish the session has access to that gateway subsystem. The authentication service may be an external service with connectors built into the system for different functions that require authentication. In some embodiments, each service has an authentication client, which allows these services to request the individual authentication service required by each service, which would be specified in the configuration of that service. In some embodiments, the authentication service is run by a third party to make the system more secure.

[0041] Client subsystem 100 may include one or more client instances 1 10. Client instance 1 10 may be adapted for a user to receive and upload data (e.g., client data or input data) to be securely stored and made query-able by the system. Client instance 1 10 comprises a client data gathering agent 104, a data store 103, a client query engine 105, a user interface 106, a display (not illustrated), a communication mechanism for communicating information (not illustrated), and inputs (e.g., keyboard and mouse) (not illustrated).

[0042] Data store 103 may include a memory coupled to bus for storing client data. This memory may also be used for storing variables or other intermediate information during execution of instructions to be executed by agents or engines. Possible implementations of this memory may be, but are not limited to, random access memory (RAM), read only memory (ROM), or both. Databases and file systems may also be used to store and retrieve data from data store 103. Common physical forms of data store 103 include, for example, a hard drive, a magnetic disk, an optical disk, a CD-ROM, a DVD, a flash memory, a USB memory card, or any other medium from which a computer can read.

[0043] Client instance 1 10 is in communication with client service 120 in gateway subsystem 101. Client subsystem 100 and gateway subsystem 101 each includes a network interface (not illustrated). The network interfaces may provide two-way data communication between, e.g., client instance 1 10 and client service 120, as well as client service 120 and server subsystem 102. The network interface sends and receives electrical, electromagnetic, or optical signals that carry digital data streams representing various types of information across a local network, an Intranet, or the Internet.

[0044] For a local network, client instance 1 10 may communicate with a plurality of other computer machines. Software components or services may reside on multiple different computer systems or servers across a network. The processes described here may be implemented on one or more servers, for example. A server may transmit actions or messages from one component, through Internet, local network, and network interfaces to a component on client instance 1 1 0. The software components and processes described herein may be implemented on any computer system (including without limitation on custom hardware devices or specially programmed computers) and send and/or receive information across a network, for example. Further, the dashed lines shown on Fig. 1 indicate administrative separation of subsystems, but are not intended to specify architecture or deployment requirements, which may be adjusted depending on the needs and circumstances of the system provider and clients. An alternative embodiment is illustrated in Fig. 5, for example, which shows that the gateway subsystem 101 and the server subsystem 102 may be deployed, for example, through a common cloud provider, so long as they remain administratively separate.

[0045] Turning back to Fig. 1, through the user interface 106, a user may access and/or create a client data set using data gathering agent 104, which may have access to unsecure data having fields corresponding to unsecure data field definitions that are stored in, e.g., data store 103. In one embodiment, data gathering agent 104 is adapted to define a first client data set comprising at least a first data structure. In other embodiments, client instance 1 10 (through, e.g., data gathering agent 104) may encrypt the client data set, or, configure (and optionally normalize) the client data set based on a schema that defines associations between different fields or categories within the data set. In some embodiments, user interface 106 may also provide a user schema options that allow a user to customize a schema based on a client data set, or a desired query. Client instance 1 10 then transmits the client data set (in either a clear format, or in an decryptable format) to client service 120 in gateway subsystem 101 using, e.g., a secure tunnel.

[0046] In some embodiments, client instance 1 10 may have multiple data gathering agents 104. In the same or other embodiments, the client instance 1 10 that generates the client data set may be distinct or different from, such as, without limitation, at a different physical location, the client instance that communicates with client service 120. For example, the first client instance may be a business system that simply returns raw data, and the second client instance may be a system that receives or gathers such data, creates a client data set, and communicates it to client service 120. In further embodiments, the system may comprise a plurality of client instances 1 10 in communication with client service 120, as shown in Fig. 1.

[0047] In some embodiments, the gateway subsystem 101 links the client subsystem 100 and the server subsystem 102 as a means of sending information back and forth between the two subsystems. In one exemplary embodiment, the gateway system 101 is deployed on a secure public cloud managed by a cloud provider. The server subsystem 102 may be deployed on the same or different secure public cloud, and separated from the gateway subsystem 101 by a virtual firewall (e.g., see Fig. 5).

[0048] The gateway subsystem 101 may be responsible for performing a transformation function, detailed below. The gateway subsystem 101 may further comprise a group of services designed to ensure that data is ciphered securely. These services may include one or more of a policy service, a schema service, a key service, and an authentication service.

[0049] The policy service may hold all the policies used by the gateway subsystem 101. These policies are applied to data returned from the server subsystem 102 and used to filter to the data to remove any data inaccessible to the user or invisible to the user. These policies may be created and maintained by system administrators. Policies can be used to ensure that users cannot see any data that they are not permitted to have access to, while allowing them to use this invisible data to query for other necessary information.

[0050] The schema service, in one aspect, is responsible for storing schema information. These schemas are used to format queries that are sent by the client subsystem 100, as detailed further below.

[0051] The key service, in one aspect, is responsible for holding keys used to access other services and their respective configuration files in the gateway subsystem. This service may also be responsible for decrypting keys. In one embodiment, the key service does not contain the keys responsible for encrypting the data. [0052] The gateway subsystem 101 may further contain secret information such as secret salts and keys used to, e.g., access configuration for the services and data within the server subsystem 102.

[0053] In one embodiment, gateway subsystem 101 may comprise a client service 120, transformation function 130, secure query generator 140, and schema store 150. As discussed above, gateway subsystem 101 may be administratively-isolated from client subsystem 100 and server subsystem 102.

[0054] Client service 120 receives a first client data set from client instance 1 10. Transformation function 130 applies a schema from, e.g., schema store 150, to the first client data set to create a second data set, referred to as a schema-based data structure. In one embodiment, the invention includes a discretization function as part of the first client data set transformation process that produces a complete semantic description of the client data set in the form of multiple hashed values.

[0055] Transformation function 130 then one-way encrypts or hashes the schema-based data structure to generate a third data set, referred to as secure data. Preferably, the hashing portion of the transformation process will be performed using a secure, one-way hashing algorithm such as SHA-2. A salt may be used in the hashing function to improve the security of the hashed result. Alternatively, transformation function 130 causes a separate hashing function to hash the schema-based data structure to create the secure data.

[0056] Because the secure data may not be conveniently converted back into clear data, transformation function 130 may also generate a unique identifier (e.g., a value) to provide for mapping from the secure data back to the client data set.

[0057] Schema store 150 stores a digital representation of the schema. In some embodiments, the schema may comprise a generalization hierarchy adapted to define identification values to portions of a client data set based on analysis parameters related to the generalization hierarchy. Where generalization hierarchies are used, the schema is agnostic as to queries run during data analytics processes. In other embodiments, a specific schema may be generated in a manner that is optimized for a specific type of analysis. For example, a specific schema may be generated based on a specific type of data set, such as, without limitation, in a format and/or in a manner that is optimized for specific analytics queries.

[0058] In one embodiment, the schema-based structure comprises a primary data structure type, a vector of first secondary-data structures, and a vector of second secondary-data structures. Each first secondary-data structure comprises an identity, type, and value. Each second secondary-data structure comprises an identity, type, and value derived from a first data subset. The first data subset comprises a type and a vector of a second data subset. Each second data subset comprises an identity, a type, and a subset value. The first data subset corresponds to the primary data structure type. It is derived from at least one first secondary data structure value. Data from the first data subset of the same first data subset structure type have identical vectors as the second data subsets. The primary data structure is capable of deriving at least one said first data subset(s).

[0059] The creation and application of the schema according to the invention may be further illustrated by additional examples. In one embodiment, the schema is a customer-provided definition of the data structures ("StructuredDef '), data descriptors ("QualityDef ·') and rules that describe how data descriptors are to be derived from data structures ("QualitizerDef," "QualitizedDef," and "DiscretizerDef '). These definitions may provide the structural information needed for the system of the invention to transform, persist, and query user data. Each StructuredDef, for example, describes a "StructuredType" for one or more "Structured" instances that contain user data. Each QualityDef describes a "Quality Type" for one or more "Quality" instances that contain descriptor data. [0060] In one embodiment, each user operates an independent gateway subsystem. Each gateway subsystem instance has its own schema, which is created by the user. In this example, the schema has at least one QualityDef, one QualitizerDef, one DiscretizerDef, one QualitizedDef, and one StructuredDef. Operational deployments, however, may include many instances of each definition type. A schema may be created by assembling a collection of data types that the system stores, and a collection of descriptors may be used to characterize each data type.

[0061] For example, a common data type in the telecommunications industry is the telephone call. In its simplest form, a telephone call consists of a calling party telephone number, a called party telephone number, the date and time that the call started and the duration of the call. The descriptors that could be used to characterize a telephone call include: a telephone number (the calling party), another telephone number (the called party) and one or more representations of the date and time the call started, and the date and time span over which the call was active. The telephone call example could thus be represented in a schema as follows:

StructuredDef: {

TYPE: TelephoneCall ,

FIELDS : [

FieldDef : { AME : callingParty, YPE : VarChar } ,

FieldDef : { NAME : calledParty, TYPE : VarChar } ,

FieldDef : {NAME : callStart, YPE : Timestamp} ,

FieldDef : { AME : duration, YPE : Int }

] } ;

QualityDef : {

TYPE : TelephoneNumber ,

FIELDS : [

FieldDef : {NAME : number, YPE : VarChar }

] } ;

[0062] Once all of the StructuredDef(s) and QualityDef(s) have been defined, the rules for how Quality(s) will be derived from Structured(s) may then be defined. [0063] Before detailing the rules themselves, though, it may be useful to describe the concepts of "Quantization" and "Discretization." In one form, Quantization is the process of deriving one or more qualitative descriptors, known as "Quality(s)," from a user data structure, known as a "Structured." Collectively, the Quality(s) derived from a Structured may comprise a usable representation of the data contained within the Structured instance. Discretization is the process that, in some embodiments, the system may use to convert one or more input field values from a Structured instance into a set of output field values, in which each output value corresponds to one field in a distinct descriptor instance. Each "Discretizer" can take one or more input fields from a Structured instance, and produce values for multiple Quality instances (i.e., each output from a Discretizer becomes a field value in a separate Quality instance). Many Discretizer implementations are known in the art, and may be used to discretize different types of values and represent them in different ways.

[0064] In this example, the rules to derive Quality(s) from a Structured instance are specified in the form of one QualitizerDef, one DiscretizerDef, and one QualitizedDef. The QualitizerDef defines a mapping between one or more inputs and the fields of a QualityDef. Each QualitizerDef contains one or more DiscretizerDef(s). The DiscretizerDef defines a mapping between the inputs defined in its parent QualitizerDef, and the parameters of a specific Discretizer function implementation. In this instance, there is one DiscretizerDef for each field in the Quality Type specified in the QualitizerDef, and the mappings are performing in corresponding order (i.e., the first DiscretizerDef maps to the first field of the QualityType, the second DiscretizerDef maps to the second field of the QualityType, etc.). The QualitizedDef maps the fields of a StructuredDef to the inputs of a QualitizerDef, and associates the Quality that is output by the Quantization process with an identity that is distinct within the scope of its associated StructuredType.

[0065] For example, TelephoneCall StructuredDef has two fields that could be used to construct TelephoneNumber Quality(s): the first field is the "calling party," and the second is the "called party." The rules to derive Quality(s) from these two fields may be defined by one QualitizerDef, one DiscretizerDef and two QualitizedDef(s), which could be specified in a schema of the invention as follows:

QualitizerDef: {

TYPE : TelephoneNumber,

INPUTS : [

Input: {NAME : telephoneNumber , TYPE : VarChar }

] ,

DISCRETIZER DEFS : [

DiscretizerDef: { AME : NumericStringDi screti zer, INPUTS: [ telephoneNumber] }

] ,

QUALITY_TYPE : TelephoneNumber

}

StructuredDef: {

TYPE : TelephoneCall,

QUAL IT I ZED_DEFS : [

QualitizedDef : {TYPE : TelephoneNumber, IDENTITY : callingPa rty, FIELDS: [ callingParty] }

]

[0066] In this example, the schema may be used in the transformation process (detailed herein) to validate data and provide the Quantization rules to derive Quality(s) from data or query parameters prior to encryption. Data and query parameters may be validated by checking their structure and values against definitions present in the schema. [0067] In one example, the following process steps may be used to derive Quality(s):

1. The fields are first compared to those in the StructuredDef for type TelephoneCall to ensure that all required fields are present, and each field type in the submit data corresponds to the type specified in the FieldDef for that field in the StructuredDef.

2. If validated, the QualitizedDef(s) specified in the QUALITIZED DEF attribute of the corresponding StructuredDef are loaded, e.g., one at a time, and used to obtain the QualitizedDef specified by its TYPE attribute and the fields values that correspond to the field names provided in the FIELDS attribute of the QualitizedDef.

3. The fields values obtained by the QualitizedDef serve as the values for each input specified in the INPUTS attribute of the QualitizerDef. The field values and inputs are each organized as ordered items in an array. They may be then mapped by matching the field value at each index in the fields array to the input with the same index in the inputs array and placing each corresponding pair into a key-value dictionary, in which the key is the name of the input and the value is the field value.

4. Each DiscretizerDef in the DISCRETIZERJ3EFS attribute of the QualitizerDef may be used to discretize one or more of the inputs to one or more outputs, each of which will be placed into a field in a distinct Quality. The Quality will be of the type specified by the QUALITY TYPE attribute of the QualitizerDef. In this example, there must be one DiscretizerDef for each field in the QualityDef specified by the QualityType. If the QualityDef specifies more than one field for the QualityType, then the cartesian product of the the fields must be created and all possible combinations of the discretized values must be mapped to a corresponding number of Quality instances.

5. Each resulting Quality that corresponds to each QualitizedDef is packaged within a Quantized data structure that also includes the identity value specified by the IDENTITY attribute of the QualitizedDef. This makes it possible to distinguish values of similar structure, type, or appearance that were derived from different fields, or using different Qualization or Discretization processes (e.g. the ability to differentiate between the callingParty and the calledParty, in the foregoing example).

In one example, if a TelephoneCall record with values (callingParty:" 1234", calledParty:"4321", callStart:"2017-01-01T12:01 :02Z", durationa l ) is submitted to the system according to the present invention to be persisted, the following operations may be performed:

"Quantization of callingParty"

1. QualitizedDef with identity callingParty specifies QualitizerType TelephoneNumber 2. QualitizedDef with identity callingParty specifies callingParty field from StructuredType TelephoneCall should be mapped as the first and only input to QualitizerType TelephoneNumber

3. QualitizerType TelephoneNumber specifies that the first and only input should be mapped as the first and only parameter to the NumericStringDiscretizer

4. QualitizerType TelephoneNumber specifies that the output of NumericStringDiscretizer maps to the first and only field of the output Quality of type TelephoneNumber

5. QualitizedDef specifies that each Quality output by its Quantization process will be packaged within a Qualitized instance with identity callingParty

6. The final result will be the following object:

Qualitized : {

IDENTITY: callingParty,

QUALITY: { YPE : TelephoneNumber , FIELDS : ["1234"] }

}

"Quantization of calledParty"

7. QualitizedDef with identity calledParty specifies QualitizerType TelephoneNumber

8. QualitizedDef with identity calledParty specifies calledParty field from StructuredType TelephoneCall should be mapped as the first and only input to QualitizerType TelephoneNumber

9. QualitizerType TelephoneNumber specifies that the first and only input should be mapped as the first and only parameter to the NumericStringDiscretizer

10. QualitizerType TelephoneNumber specifies that the output of NumericStringDiscretizer maps to the first and only field of the output Quality of type TelephoneNumber

1 1. QualitizedDef specifies that each Quality output by its Quantization process will be packaged within a Qualitized instance with identity calledParty 12. The final result will be the following object:

Qualitized: {

IDEN ITY : calledParty,

QUALITY: {TYPE : TelephoneNumber, FIELDS : [ "4321"] }

}

[0068] Turning back to gateway subsystem 101 , the subsystem is adapted to transmit the secure data (together with the unique identifier) to server subsystem 102. The server subsystem 102 may itself hold, or be in communication with, a hold for encrypted data stored by users. The server subsystem 102 may further perform query operations that are sent by the client subsystem 100. These operations filter the data requested by the user and return the results to the gateway subsystem 101 so that they can be transformed back into cleartext format.

[0069] Server subsystem 102 may include a secure server service 160, secure data store 170, and may preferably have a secure connection (e.g., over a network) with gateway subsystem 101, e.g., interface 180. Interface 180 may provide a secure connection between gateway subsystem 101 and server subsystem 102 so that the first client data set that is stored temporarily in client service 120 before the schema is applied by transformation function 130 (and the results one-way encrypted) is not accessible through server subsystem 102. As will be appreciated by one of ordinary skill in the art, the secret keys and secret salts are only present within the gateway subsystem 101 and never exposed in unencrypted form to the server subsusystem 102. Unsecured client data is preferably only available in temporary memory storage in the gateway subsystem 101 before it is transformed and one-way encrypted. After the unsecured client data has been transformed, it may cease to exist in the gateway subsystem 101 , and at that point, the system would only be storing the secure data in the secure data store 170.

[0070] Server service 160 may provide connection to and management of secure data store 170, which comprises compute cluster 171 and database 172, and which may be, in one form, a NoSQL cloud-based database. Once a schema-based data structure has been transformed to create secure data, it may be passed through interface 180 to server service 160 and then channeled to secure data store 170 for, e.g., secure storage in database 172. Secure data store 170 may be deployed, for example, on the same or a different secure cloud.

[0071] This data stored in database 172 can be acted upon by query routines. In some embodiments, query routines consist of (a) hierarchies of logic operations and (b) criteria parameters that have been transformed using, e.g., the same secret keys and secret salts that were used to transform the first client data set.

[0072] More particularly, returning now to client instance 1 10, client query engine 105 is adapted to create queries to be run on secure data (e.g., secure data stored in database 172). Client query engine 105 may use an object query database language or any appropriate query language known to those of skill in the art. Query types may include "read" and "write," in order to, for example, read data from storage, perform a manipulation on the data (e.g., filter, extract, join), or return data as a set of results.

[0073] A user may use the user interface 106 to access client query engine 105 to create and deliver queries to client service 120, and in particular, secure query generator 140. The queries can include search requests to locate data within database 172 that meet one or more search parameters. Each search parameter can be a value, or a range of values where the search request is to locate database entries that satisfy or are likely to satisfy the value or range of values. For example, each search parameter can be a time or time interval. A database entry satisfies a single value search parameter when the database entry has a corresponding time field that contains the single value. A database entry satisfies a range value search parameter when the database entry has a corresponding time field that contains a time value that is within the range of values specified by the search parameter, which can include the outer boundaries of the range of values. The search parameters can be any appropriate parameter evident to one of ordinary skill in the art, e.g., a unit of distance or some other unit of measurement.

[0074] After receiving the client query from the client instance 1 10, secure query generator 140 may apply a schema to the client query, thereby creating a converted query. Secure query generator 140 may then hash the converted query to create a transformed query.

[0075] Client service 120 transmits the transformed query to server subsystem 102 to be executed by compute cluster 171. Compute cluster 171 accesses the relevant secure data in secure database 172 and executes the query. The query may produce results. Compute cluster 171 returns the results to the secure query generator 140 in client service 120. Secure query generator 140 may then return the secure results directly to the client instance 1 10. Alternatively, secure query generator 140 uses the unique identifier(s) associated with the secure data to "reverse transform" the secure results in order to identify the cleartext values that correlate to the secure results prior to transmission to client instance 1 10. In one embodiment, the various descriptors that are used in the query and analysis process in the server subsystem are one-way encrypted using a hash function and hence cannot be decrypted. While those values are used for query and analysis processing in the server subsystem, they are replaced with the symmetrically encrypted complete data records with which they are associated prior to being returned as results.

[0076] In some embodiments, the client instance 1 10 may have a distinct client instance from the initial client instance which uploaded the client data set. In this case, only correlation information could be made available to the distinct client instance, and not cleartext data. This arrangement may conveniently allow the viewing of analysis results by users of the distinct client instance who never had access to the raw data of the first client data set.

[0077] Turning now to Fig. 2, an exemplary method 200 for storing secure data is shown. In step 201 , a user accesses a user interface of a client instance, and generates a first client data set using a data gathering agent, which may have access to unsecure data having fields corresponding to unsecure data field definitions that are stored in, e.g., a data store. Step 201 may optionally include discretization of the values in the first client data set as part of the gathering process. In one embodiment, shown in step 202, the first client data set is then optionally encrypted. In another embodiment, shown in step 203, the user optionally configures the first client data set based on a schema that defines associations between different fields or categories within the data set. The user may also customize the schema based upon first client data set, or a desired query. In step 204, the first client data set (in either a clear format, or in an decryptable format) is transmitted to a client service using a secure tunnel. Upon receipt, a transformation function applies a schema to the first client data set to create a schema-based data structure, as shown in step 205. In step 206, the transformation function one-way encrypts (e.g., hashes) the schema-based data structure to generate secure data. In one embodiment, the transformation function generates the secure data by applying a (preferably secure) hashing algorithm to the schema-based data structure as part of the transformation process. Alternatively, the transformation function causes a separate encrypter to hash the schema-based data structure to create the secure data. In step 207, the transformation function generates a unique identifier to provide a mapping from the secure data back to the data that resulted in the hashed values. In step 208, the secure data (together with the unique identifier) is transmitted to a server subsystem through a secure connection, and then channeled to secure storage in a database, as shown in step 209.

[0078] An exemplary method 300 for querying secure data is shown in Fig. 3. A user accesses a client query engine to create a query in step 301 to be run on secure data. The client query is transmitted to a secure query generator in step 302. In some embodiments, a non-repudiated signature is transmitted along with the queries. Both the query and the corresponding signature are then forwarded to an externally-administered audit service. [0079] After receiving the client query, the secure query generator applies a schema to the client query as shown in step 303, thereby creating a converted query. Secure query generator then one-way encrypts the parameters of the converted query (e.g., data types, condition values, and field identities) to create a transformed query in step 304 and transmits the transformed query in step 305 to be executed by a compute cluster on relevant secure data in step 306. In some embodiments the transformed query is accompanied by the corresponding signature. A notification is sent to an audit service; the audit service verifies that the query was previously forwarded. Any discrepancies or inconsistencies between audit information reported by the two subsystems results in a security notification.

[0080] If secure results are generated from the execution of the query, as shown in step 306, the secure results are returned to the secure query generator in step 307. Secure query generator may then return the secure results directly to the client instance as shown in step 309, or, alternatively, in step 308, use a unique identifier to "reverse transform" the secure results in order to identify the cleartext values that correlate to the secure results, prior to transmission to a user in step 308. The user is presented with the two-way encrypted original record that was associated with the encrypted values.

[0081] Turning now to Fig. 4, in a further embodiment, the system may be configured as a system 400 for querying already-stored data. Such an exemplary system for searching secure data according to the present invention may comprise a schema store 401 , a transformation function 402, a secure query generator 403, and a secure data store 404. In some embodiments, the schema store 401 may comprise a plurality of unsecure data field definitions and an identifier. The transformation function 402 may be adapted to receive unsecure data 405 and generate a transformed data set 406 comprising a token and an identifier and to store the transformed data set in the secure data store 404. Further, the secure data store may be adapted to receive secure pattern-matching queries 407, and additionally in some embodiments, the transformed data sets. The secure query generator 403 may be adapted to receive unsecure queries 408 based on the unsecure data field definitions and convert the unsecure data field definitions into secure queries 407, and then transit the secure queries 407 to the secure data store 404. Secure data store 404 may execute the secure queries on the secure data 406, and generate results. Secure data store 404 may then return the results 409 to secure query generator 403, which may then channel the results to the client subsystem, or, using the identifier, "reverse transform" the results and return, e.g., cleartext results 410 to the client instance.

[0082] In an alternate embodiment, the system may comprise a system for securely storing data for data analytics. The system may comprise a schema store 401, a transformation function 402, and a secure data store 404. The schema store 401 may comprise a plurality of unsecure data field definitions with the transformation function being adapted to receive unsecure data 405 based on those unsecure data field definitions. After receiving the unsecure data 405, the transformation function may generate a one-way encrypted data set by securely hashing the unsecure data and generate an identifier for the one-way encrypted data, and then transmitting both the one-way encrypted data and the identifier in the secure data store 404. In a preferred embodiment, the secure data store 404 may be searched for patterns among the one-way encrypted data set without the secure data store containing any unsecure data.

[0083] As will be appreciated by one of ordinary skill in the art, the secret keys and secret salts are only present within the gateway subsystem and never exposed in unencrypted form to other subsystems. And after the unsecured client data has been one-way encrypted, it may cease to exist in the gateway or server subsystems, and at that point, those subsystems would only be storing the secure data in the secure data store 404. [0084] In another embodiment, a hashing key used for the transformation process may be stored in the gateway subsystem, and thus is never available to either the client subsystem or the server subsystem, and may preferably prevent any decoding of the secure data without gaining access to the hashing key from the gateway subsystem.

[0085] In a further embodiment, a user may choose to store encrypted raw data within the system along with the secure data. In such an instance, both data would remain securely stored, because the encryption key and the hashing keys may be stored exclusively in the gateway subsystem. Such embodiments allow querying of the transformed data, but return of the encrypted data with the transformed data as part of the query results.

[0086] Embodiments of the inventions herein described may be further illustrated by examples. In one example, the schema applied by client service 120 is comprised of a data organization schema analogous to a biological olfactory system, in which case such a schema may be referred to as the olfactic model. In the natural world, olfaction (the process of smelling) is made possible by a collection of receptor cells within the nasal cavity. Each receptor is able to bind with specific types of molecules. A scent is detected when molecules pass through the nasal cavity and bind to matching receptor cells. This binding causes the neurons that correspond to the receptor to fire. A neuron firing is analogous to setting a bit to one in a binary sequence. A unique combination of firing neurons indicates a distinct scent to the brain.

[0087] Such an olfactic model can comprise "aromatics," "arotopes," "odorants," "odotopes," and "odorizers."

[0088] An aromatic is analogous to a data record or row in a relational database. An arotope is a sub-secondary data structure (e.g., an attribute of an aromatic) comprising an arotope identity value, arotope data type, and arotope value. An odorant is another sub-secondary data structure comprising an odorant identity, odorant type, and odorant value, and represents a single descriptive characteristic derived from an aromatic. In some embodiments, one or more odorants are derived from every aromatic, and collectively, they form a complete description of the aromatic from which they were derived. Odorants can be one-way ciphered such that the ciphered values of any two odorants with the exact same odorant type and identical vectors of odotopes will be equal and can be efficiently tested for such equality. In some embodiments, the odorant may be derived from the arotope values, wherein the odorants of the same odorant type have identical vectors of odotopes.

[0089] An odotope is another sub-secondary data structure comprising an odotope identity, odotope type, and odotope value. An odotope may correspond to a category of data fields within an initial client data structure. An odorizer is another sub-secondary data structure comprising an odorizer identity value and a vector of category functions, such as, without limitation, discretizer functions adapted to create a discrete generalization hierarchy of the data fields within a first client data set. In some embodiments, the odorizers may comprise a second set of data identifiers, and the generalization hierarchy may correspond to category functions. The category functions may be customized or user-defined.

[0090] In this example, the client data set comprises at least one aromatic. An aromatic represents the combination of data type and field values in the client data set. Each record in the client data set is treated like an aromatic molecule. Two aromatics possess identical "shapes" if, and only if, they are of the same type and have identical field values.

[0091] An aromatic can be characterized by odorants, which is a discrete representation of one distinct characteristic of the aromatic and can be tested for equality with other odorants. For example, the aromatic designates a data record and the odorant designates a descriptive characteristic of the data record. This is analogous to the binding of molecules to receptor cells in biological olfaction, as two surface sections of an aromatic molecule must be identical in shape in order to be considered a match. [0092] Each aromatic is stored in symmetrically-encrypted form, which must be decrypted before it can be used in any meaningful way. Encrypted query and analysis is made possible by the association of one or more odorants with each aromatic. The odorants derived from an aromatic effectively describe it as a collection of discrete values that can be compared with other odorants as criteria values using one or more logic operations. Because odorant values are only tested for equality with other odorants, odorants can be transformed using any available cipher algorithm that will consistently produce the same output value from a given input value.

[0093] Each encrypted aromatic is indexed using one or more odorants that have been encrypted using a one-way cipher algorithm. A reverse index is available so that a known encrypted aromatic value can be used to obtain all encrypted odorants that were derived from it. Query and analysis operations can be performed upon the aromatic by using the associated encrypted odorant values as a ciphered network of facts. The ciphered network of facts that describe and represent an aromatic serves as its proxy for query and analysis operations that must be performed while the aromatic remains encrypted. Upon completion of a query or analysis routine, one or more aromatics are returned as the result set. Throughout the entire query and analysis process, the aromatics in the database and their corresponding odorants (as well as the odorants that were provided as criteria), remain encrypted. Once transferred to an administratively separate secure system like the gateway or client subsystems described above, the final result set can be decrypted using secret key information.

[0094] In another example, an embodiment of the invention comprises a transformation process whereby a client data set is transformed to a representation in the form of a ciphered network of facts. This representation of the client data set can be queried and analyzed using instructions and parameters that have undergone the same transformation process, without the need to decrypt the data being analyzed. The transformed client data set is a complete representation of the original pre-transformed client data set, and can be transformed back into its original cleartext format without the need to maintain cumbersome translation tables such as those required by data masking and traditional tokenization mechanisms.

[0095] Arotopes from an aromatic that are specified for an odorizer within a schema are input in an odorizer. The odorizer outputs one or more odorants. A single odorizer may output more than one odorant. Each odorant output by a single odorizer will have the same odorant type and odorizer identity, which is an identifier that is unique within the scope of an aromatic-type and uniquely identifies the collection of arotopes and the odorizer that is discretizing them into odorant(s).

[0096] The original aromatic along with all of the outputted odorants are put into an odorized-aromatic object. The aromatic is then serialized. A secret salt that is stored with set of secret salts (used in the transformation process for each aromatic type) is appended to the serialized aromatic. The resulting value is ciphered using a secure hash function to form a value called the Ate. Then, the aromatic is serialized. A symmetric cipher is applied to the serialized aromatic to form a variable-length binary value called the Ac.

[0097] The aromatic type is then serialized. A secret salt is appended to the serialized aromatic type. The resulting value is ciphered using a secure hash function to form a value called the AtO. Each odorant is then serialized. A secret salt that is stored with set of secret salts used in the transformation process for each odorant type is appended to the serialized odorant. The resulting value is ciphered using a secure hash function to form a value called the Ot.

[0098] The odorant type is then serialized. A secret salt is appended to the serialized odorant type. The resulting value is ciphered using a secure hash function to form a value called the OtO. The odorizer identity is then serialized. A secret salt that is stored with set of secret salts used in the transformation process for each for each aromatic type is appended to the serialized odorizer identity. The resulting value is ciphered using a secure hash function to form a value called the Psi. The odorant discretization order is obtained from the odorant as an array of one-byte signed values called the Rho.

[0099] Each Ot, OtO, Psi and Rho are put into an Aot.Vt object. The Ate, Ac, AtO, and all Aot.Vt(s) are put into an Aot object. The Aot object is the final result of the transformation process from query Q_c object to query Q_s object.

[00100] The above description illustrates various embodiments of the present invention along with examples of how aspects of the present invention may be implemented. The above examples and embodiments should not be deemed to be the only embodiments, and are presented to illustrate the flexibility and advantages of the present invention as defined by the following claims. Based on the above disclosure and the following claims, other arrangements, embodiments, implementations and equivalents will be evident to those skilled in the art and may be employed without departing from the spirit and scope of the invention as defined by the claims.

Claims

I claim :

1. A system for the analysis of secure data comprising:

one or more client instances in communication with a gateway subsystem,

wherein each said client instance is adapted to,

define a client data set configured based on a first schema comprising at least a first data structure,

transmit said client data set to said gateway subsystem,

define a client query based on said client data set and said first schema, and,

transmit said client query to said gateway subsystem;

said gateway subsystem, comprising,

a schema store comprising said first schema,

wherein said first schema comprises a generalization hierarchy adapted to define identification values to portions of said client data set based on analysis parameters related to said generalization hierarchy,

a transformation function adapted to

receive said client data set from said client instance and apply said first schema to said client data set to create a schema-based data structure,

a query generator adapted to

receive said client query from said client instance, and apply said first schema to said client query to create a converted query, and

a transformation function adapted to generate secure data by applying a hashing algorithm to said schema-based data structure,

generate a transformed query by applying a hashing algorithm to said converted query, and,

transmit said secure data and/or transformed query to a cloud- based server system, said cloud-based server system being in communication with said client service; and

said cloud-based server system, comprising,

a server service adapted to receive said secure data, and

a secure data store adapted to

receive and store said secure data,

receive and execute said transformed query to provide at least a first result, said first analysis result comprising at least a first value identified within said secure data, and

transmit said at least first analysis result to said client service, wherein said client service transmits said at least first analysis result to said client instance.

2. The system of 1 wherein said schema-based structure comprises:

a primary data structure type, and

a vector of first secondary data structures, said vector of first secondary data structures comprising,

at least one first secondary data structure identity,

at least one first secondary data structure type,

at least one first secondary data structure value; and a vector of second secondary data structures, said vector of second secondary data structures comprising,

at least one second secondary data structure identity,

at least one second secondary data structure type, and

at least one second secondary data structure value derived from a first data subset, said first data subset comprising,

a first data subset structure type, and

a vector of a second data subset, said vector of a second data subset comprising,

at least one second data subset identity,

at least one second data subset structure type, and at least one second data subset value, and

wherein the first data subset corresponds to said primary data structure type, wherein said first data subset is derived from at least one first secondary data structure value, wherein data of said first data subset of the same first data subset structure type have identical vectors of said second data subsets, and said at primary data structure is capable of deriving at least one said first data subset.

3. The system of claim 1 , further comprising at least a second client instance in

communication with said gateway subsystem.

4. The system of claim 1 , further comprising a plurality of client instances in

communication with said gateway subsystem.

5. A system for securely storing and analyzing secure data, comprising:

at least one client instance comprising a data store, a client query engine, and a data gathering engine; a gateway subsystem comprising a transformation function, a secure query generator, and a schema store comprising one or more unsecure data field definitions, wherein said transformation function comprises a transformation function adapted to receive unsecure data based on said unsecure data field definitions, generate a transformed data set by securely hashing said unsecure data, and generate an identifier for said transformed data set; and a server subsystem comprising a searchable secure data store adapted to store said transformed data set and said identifier.

6. The system of claim 5, wherein said data gathering agent accesses unsecure data having fields corresponding to said unsecure data field definitions.

7. The system of claim 6, wherein said data gathering agent is adapted to transmit unsecure data to said transformation function based on said unsecure data field definitions.

8. The system of claim 7, wherein said data gathering agent is adapted to encrypt said

unsecure data prior to transmission and said transformation function.

9. The system of claim 5, wherein said transformation function is adapted to discretize said unsecure data.

10. A system for performing data analytics on secure data comprising:

a schema store comprising a plurality of unsecure data field definitions,

a secure data store comprising secure data and identifiers,

a secure query generator adapted to,

receive a query based upon said unsecure data field definitions,

convert said query to a secure query adapted to return identifiers of secure data, return results from said secure query, whereby said secure query executes on said secure data and patterns identified in said secure data correspond to patterns in unsecure data stored according to said unsecure data field definitions.

1 1 . The system of claim 10 wherein said secure data further comprises encrypted data.

12. The system of claim 10 further comprising a client query agent having access to unsecure data having fields corresponding to said unsecure data field definitions, said client query agent being adapted to transmit said query to said secure query generator and receive said results.

13. A system for searching secure data comprising:

a schema store comprising a plurality of unsecure data field definitions and an identifier;

a transformation function adapted to receive unsecure data and generate a transformed data set comprising a token and one or more identifier;

a secure query generator adapted to

receive unsecure queries based on said unsecure data field definitions, convert said unsecure data field definitions into said secure pattern- matching queries, and

transit said secure pattern-matching queries to said secure data store; a secure data store adapted to receive secure pattern-matching queries, said transformed data sets, and said one or more identifiers; and

a secure data store adapted to store said transformed data sets and said one or more identifiers.

14. A method for storing secure data, comprising the steps of:

generating a client data set using a data gathering agent in communication with a data store having unsecure data having fields corresponding to unsecure data field definitions; configuring said client data set based on a schema that defines associations between said fields corresponding to unsecure data field definitions;

encrypting said client data set; applying a schema to said client data set to create a schema-based data structure;

one-way encrypting said schema-based data structure to create secure data;

generating an identifier to provide a mapping from said secure data back to said client data set;

secure said secure data together with said identifier in a database.

15. The method of claim 14, wherein the one-way encrypting step includes applying a hashing algorithm.

16. A method for querying secure data comprising the steps of:

accessing a client query engine to create a query to be executed on secure data;

transmitting said query to a secure query generator;

applying a schema to said query to create a converted query;

transforming the converted query to create a transformed query;

executing said transformed query on said secure data;

returning results from the executed query to said secure query generator; and using an identifier to reverse-transform the results to identify the cleartext values that correlate to said results.

17. The method of claim 16, further comprising the step of transmitting a non-repudiated signature with the query.

18. The method of claim 17, further comprising the step of transmitting said non-repudiated signature and query to an audit service.

19. A system for querying stored data, comprising:

at least one aromatic comprising,

a data structure type,

a vector of arotopes, said arotopes comprising an arotope identity, arotope type, and arotope value, and a vector of odorizers, said odorizers comprising an odorizer identity and a vector of category functions, said odorizers being derived from a collection of odorants, said odorants comprising an odorant type and vector of odotopes, said odotopes comprising an odotope identity, odotope type, and odotope value,

wherein said collection of odorants correspond to said data structure type, wherein said odorants are derived from at least one arotope value, wherein odorants of the same odorant type have identical vectors of odotopes, and said at least one aromatic is capable of deriving at least one odorant.

20. The system of claim 19, wherein said aromatic comprises a first data structure, and said arotopes comprise a first set of data identifiers from said first data structure.

21. The system of claim 20, wherein said odorizers comprise at least a second set of data identifiers from said first data structure, and wherein said category functions define said generalization hierarchy of said first schema.

22. The system of claim 21, wherein said category functions are user-defined.