Architecture Overview

This section reflects the proposed high level architecture of the solution that will be presented in the proof of concept. This solution exists inside the AWS ecosystem and the primary language most of the functionality is written in Python.

The process begins with the LLM User, who first accesses the organizations LLM interface, this can be done through an application or a web client. Once the user has successfully authenticated themselves (a responsibility of the organization and not PropScreen) they will have the ability to use the interface to write prompts that they wish to send to the LLM. When the user sends a prompt intended for the company LLM, the prompt is first sent to PropScreen, which then forwards the prompt to the LLM. The LLM receives the prompt and generates a response. The response is then sent back to PropScreen. It is at this stage that PropScreen begins its checks for sensitive information.

The LLM Guard Sensitive Information Check then scans the LLM’s response for sensitive information. The type of sensitive information occurring in this scan is not company specific, but a scan for archetypes of sensitive information such as names, email addresses, and bank numbers. If the LLM Guard Sensitive Information Check determines if there is sensitive information present in the model's response. The results of the scan are saved for later comparison. After the LLM Guard Sensitive Information Check is complete the response is then scanned against the Context Strings Check, again the results of the scan are saved.

The results of the two scans are then sent to a check where the next action of PropScreen is determined. If LLM Guard Sensitive Information Check's scan yields a result that does not suggest the existence of sensitive information in the response and the Context Strings Check returns no hits, then the response is determined to not contain any organizational sensitive information and the response is returned to the user. If either check returns a positive result, then the response is sent to the Hashed Organization Sensitive Information Check.

The Hashed Organization Sensitive Information Check first takes the set of tokens that were in the model's response and queries them against the Hashed Organizational Sensitive Information database. This database contains data that the organization wishes to explicitly ban from existing in the LLM responses. The Hashed Organization Sensitive Information Check will then proceed based on the results from the query. If the Hashed Organization Sensitive Information Check determined there was a match between at least one of the tokens and the data in the Hashed Organization Sensitive Information database then it will replace the original LLM response with an error. That error message is what will be returned to the LLM User's interface, preventing the unauthorized disclosure of sensitive information. If there was no match between the tokens and the sensitive information then Hashed Organizational Sensitive Information Check will return the LLM’s response back to the web client which will then return it to the user.

In addition to the interdiction of sensitive information, PropScreen will also create logs of every check it performs regardless of the outcome of the check. The logs will contain the following information:

  • A Unique ID for the event

  • The time the log entry was created

  • The prompt that was sent to the LLM

  • The response of the LLM

  • The decision that PropScreen made regarding the response

    • True Negative - Response Passed first two checks and was returned to the user

    • False Positive - Response failed at least one of the first two checks, but passed the check at Hashed Database Check and was returned to the user

    • True Positive - Response failed at least one of the first two checks, and failed the check at Hashed Database Check. Error message retuned to the user instead of the original response

    • Error - Denotes any failed responses from the LLM

These logs would be saved by PropScreen to a database named Interdiction Logs that would be accessible to the security team of the organization deploying PropScreen.

Last updated