PropScreen's Checks

LLM Guard Sensitive Information Check

LLM Guard Sensitive Information Check is a scan that utilizes LLM Guard's Sensitive Scanner in order to determine if any broad schemas of sensitive information exist in a LLM's response message. The types of sensitive information that Sensitive Scanner searches for are the following types of data:

CREDIT_CARD - Credit Card Numbers
CRYPTO - Bitcoin wallet addresses (currently only BTC is supported)
EMAIL_ADDRESS - Email addresses
URL - Uniform Resource Locator
IBAN_CODE - International Bank Number Codes
IP_ADDRESS - IPv4 and IPv6 Addresses
PERSON - Names of people, must be in NAME SURNAME format and can include middle names or middle initials
PHONE_NUMBER - Phone Numbers
US_SSN - United States Social Security Numbers
US_BANK_NUMBER - American Bank Numbers
UUID - Universally Unique ID Numbers

In PropScreen's circumstance, the LLM Guard Sensitive Scanner is using a Named Entry Recognition Model (NER Mode l), specifically mdeberta-v3-base_finetuned_ai4privacy_v2, and a library of regex patterns to determine if there is sensitive information in the model response.

The purpose of the LLM Guard Sensitive Information Check is for perform a broad check of a model's response, since the other checks are against specific databases, Sensitive Information Check is necessary in order to expand the range of possible datatypes that can be flagged as sensitive information.

Additionally, LLM Guard Sensitive Information Check can allow for PropScreen to identify sensitive information such as social security numbers or names, without actually knowing them. Take the following scenario for example, an LLM response can be interdicted if the response contains credit card information of a customer in it based on the following steps:

LLM Guard Sensitive Information Check determines that there is sensitive information in the form of a credit card number in the response.
The model response is sent to the Hashed Organizational Sensitive Information Check, the tokens are hashed and checked against the hashes in the database.
A match between a hashed token and the hash is a database is determined.
The interdiction functionality of Propscreen takes effect and the model response is blocked.

In this scenario LLM Guard Sensitive Information Check gives PropScreen the ability to determine that a clients credit card number is in the response without knowing what the number is in the cleartext and avoiding a brute force approach of checking every response against the Sensitive Information Hashed Database.

In short, LLM Guard Sensitive Information Check allows for a dynamic scanning of the LLM's response in order to determine if any well known types of sensitive information exist in the response.

Context Strings Check

The purpose of Context Strings Check is to provide another, less broad but more contextually relevant check of the LLM's response. This check is conducted by comparing the LLM response against a database of what can be considered context strings. These strings are data that are not a form of sensitive information, but could be present a response that includes an unauthorized disclosure of sensitive information.

An example can illustrate the functionality of the Context Strings Check. This example will follow an organization that wants to prevent the disclosure of the names of their corporate clients. Corporations and other entities usually do not posses names that would trigged a positive result from the Sensitive Information Check, however if the Database that Context Strings Check is configured to check against contains context words such as client, ltd, corporate, business, corp, inc, customers, or any words that provide context, there is a higher likelihood that the response will be flagged for further inspection.

Response is generated by LLM Containing the name of a business that is considered sensitive information by the organization
Response passes LLM Guard Sensitive Information Check as there is no well known forms of sensitive information being disclosed in the response
Response fails Context Strings Check as there are words in the response that are present in the Context Flag Database
Response is sent to the Hashed Organizational Sensitive Information Check, fails that check, and the response is blocked

In short, Context Strings Check allows for the flagging of responses that do not contain well known forms of sensitive information to be flagged for further screening based on the context flags provided by the organization in their database.

Hashed Organizational Sensitive Information Check

The Hashed Organizational Sensitive Information Check is meant to serve as a deterministic check to verify the existence of data that the organization has determined it does not want present in any LLM response. All the tokens in the model response are checked against all the hashes stored in the Sensitive Information Hash Database.

A reality to consider with the Hashed Organizational Sensitive Information Check is that though it is a more secure approach with regards to the confidentiality of the organization's sensitive information, there is an atrophy of the robustness with this approach when compared to checking the existence of strings of data inside a model response in the plaintext. There must be a token inside of the response that matches exactly to a hashed string of data found in the database. Therefore an organization should be extensive and thorough with the hashes they provide, considering variations of the sensitive information and any edge cases.

Hashed Organizational Sensitive Information Check gives PropScreen the ability to securely determine if specific organizational sensitive information exists inside of the model response without PropScreen ever having access to that data in the plaintext.

PreviousCore Components of Propscreen NextContext Strings Database

Last updated 11 months ago