Hashed Organizational Sensitive Information
The Hashed Organizational Sensitive Information database is where the hashes of the organization sensitive information that the organization wants checked in the LLM responses are stored. As the name suggests, the data in Hashed Organizational Sensitive Information should only be hashes of the strings that the company wants to check for and never the data itself in the clear text (rationale for this can be found here).
The schema for Hashed Organizational Sensitive Information is one table with a single column of the hashes. Below is an example of how the schema looks:
si_hash |
---|
410943463d1786da4b258d5113a29d3dd7119ea86002729c27482c5ad9d4150d |
b1b2fc3f32e4b1d48adb45270e3265a0a7a429d3d94ab1f96a576463b03759a2 |
317c96b8eada2d689086708d341bb4dce4ee833177a2ffa76a5a0e781fa7f03e |
Note that in this example and in the PropScreen proof of concept, SHA256 was used to hash the data, however this is not a strict requirement. Additionally in a live setting, there would need to be congruency between the hash salting and peppering techniques that could be employed. This secure by design feature would be necessary to harden Hashed Organizational Sensitive Information in the case it falls subject to a hash cracking, dictionary, or rainbow table attacks. For the purposes of proof of concept, neither salting nor peppering were deployed, but the team would like to acknowledge the importance of such practices. A graceful way to deploy these techniques would be one of the next steps in the development of the project.
Last updated