Text Moderation - Toxicity Classification

This notebook will capture different methods of performing text moderation, in special with focus on toxicity classification. It will be divided into 3 parts: - Part 1: Text Moderation - Toxicity Classification using Amazon Comprehend API - Part 2: Text Moderation - Toxicity Classification using Amazon Comprehend Custom Model - Part 3: Text Moderation - Toxicity Classification using Large Language Models

This notebook is adapted from the following sources: - aws-samples/amazon-rekognition-code-samples

Text Moderation - Toxicity Classification using Amazon Comprehend API

Amazon Comprehend has the capability to perform toxicity analysis through the API detect_toxic_content.

detect_toxic_content performs toxicity analysis on the list of text strings that you provide as input. The API response contains a results list that matches the size of the input list.

The toxicity content labels are: GRAPHIC | HARASSMENT_OR_ABUSE | HATE_SPEECH | INSULT | PROFANITY | SEXUAL | VIOLENCE_OR_THREAT

The response syntax for the API call is show below.

json { "ResultList": [ { "Labels": [ { "Name": "string", "Score": number } ], "Toxicity": number } ] }

import boto3


# Seeting variables
boto3session = boto3.Session(profile_name='marcasbr+genai-Admin')
region = boto3session.region_name
comprehendrole = boto3session.client('iam').get_role(RoleName='AmazonComprehendServiceRole-access-role')['Role']['Arn']
comprehend = boto3session.client('comprehend', region_name=region)

THRESHOLD = 0.2
response = comprehend.detect_toxic_content(
    TextSegments=[
        {
            "Text": "You can go through the door go, he's waiting for you on the right."
        },
        {
            "Text": "***"
        },
        {
            "Text": "***"
        },
        {
            "Text": "Elon March is a piece of shit, greedy capitalis"
        }
        
    ],
    LanguageCode='en'
)

result_list = response['ResultList']

print(result_list)

for i, result in enumerate(result_list):
    labels = result['Labels']
    detected = [ l for l in labels if l['Score'] > THRESHOLD ]
    if len(detected) > 0:
        print("Text segment {}".format(i + 1))
        for d in detected:
            print("{} score {:.2f}".format(d['Name'], d['Score']))
[{'Labels': [{'Name': 'PROFANITY', 'Score': 0.046799998730421066}, {'Name': 'HATE_SPEECH', 'Score': 0.056699998676776886}, {'Name': 'INSULT', 'Score': 0.10109999775886536}, {'Name': 'GRAPHIC', 'Score': 0.01860000006854534}, {'Name': 'HARASSMENT_OR_ABUSE', 'Score': 0.08330000191926956}, {'Name': 'SEXUAL', 'Score': 0.07109999656677246}, {'Name': 'VIOLENCE_OR_THREAT', 'Score': 0.03519999980926514}], 'Toxicity': 0.08330000191926956}, {'Labels': [{'Name': 'PROFANITY', 'Score': 0.353300005197525}, {'Name': 'HATE_SPEECH', 'Score': 0.15479999780654907}, {'Name': 'INSULT', 'Score': 0.2046000063419342}, {'Name': 'GRAPHIC', 'Score': 0.0812000036239624}, {'Name': 'HARASSMENT_OR_ABUSE', 'Score': 0.12559999525547028}, {'Name': 'SEXUAL', 'Score': 0.16859999299049377}, {'Name': 'VIOLENCE_OR_THREAT', 'Score': 0.08150000125169754}], 'Toxicity': 0.15479999780654907}, {'Labels': [{'Name': 'PROFANITY', 'Score': 0.353300005197525}, {'Name': 'HATE_SPEECH', 'Score': 0.15479999780654907}, {'Name': 'INSULT', 'Score': 0.2046000063419342}, {'Name': 'GRAPHIC', 'Score': 0.0812000036239624}, {'Name': 'HARASSMENT_OR_ABUSE', 'Score': 0.12559999525547028}, {'Name': 'SEXUAL', 'Score': 0.16859999299049377}, {'Name': 'VIOLENCE_OR_THREAT', 'Score': 0.08150000125169754}], 'Toxicity': 0.15479999780654907}, {'Labels': [{'Name': 'PROFANITY', 'Score': 0.6852999925613403}, {'Name': 'HATE_SPEECH', 'Score': 0.5633000135421753}, {'Name': 'INSULT', 'Score': 0.968999981880188}, {'Name': 'GRAPHIC', 'Score': 0.07450000196695328}, {'Name': 'HARASSMENT_OR_ABUSE', 'Score': 0.2046000063419342}, {'Name': 'SEXUAL', 'Score': 0.26249998807907104}, {'Name': 'VIOLENCE_OR_THREAT', 'Score': 0.10050000250339508}], 'Toxicity': 0.9890000224113464}]
Text segment 2
PROFANITY score 0.35
INSULT score 0.20
Text segment 3
PROFANITY score 0.35
INSULT score 0.20
Text segment 4
PROFANITY score 0.69
HATE_SPEECH score 0.56
INSULT score 0.97
HARASSMENT_OR_ABUSE score 0.20
SEXUAL score 0.26

Text Moderation - Toxicity Classification using Amazon Comprehend Custom Model

The threat of abuse and harassment online means that many people stop expressing themselves and give up on seeking different opinions. Platforms struggle to effectively facilitate conversations, leading many communities to limit or completely shut down user comments.

Toxicity classification allows customers from Gaming, Social Media, and many other industries automatically classify the user-generated text content and filter out the toxic ones to keep the online environment inclusive.

In this Lab, we will use an AWS AI service - Comprehend Custom Classfication feature to train a custom model to classify toxicity text messages.

Arch

Toxicity Classficiation

Step 1: Setup Notebook

Run the below cell to install/update Python dependencies if you run the lab using a local IDE. It is optional if you use a SageMaker Studio Juypter Notebook, which already includes the dependencies in the kernel.

# First, let's get the latest installations of our dependencies
%pip install -qU pip
%pip install boto3 -qU
%pip install sagemaker -qU
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.

You can skip the below cell if you are using SageMaker Studio Data Science kernel or they are already installed in your environment.

# install pandas if you are using a local IDE and they are not installed in your env
%pip install pandas -qU
%pip install datetime -qU
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
import boto3
import sagemaker as sm
import os
import io
import datetime
import pandas as pd

# variables
boto3session = boto3.Session(profile_name='marcasbr+genai-Admin')
smsession = sm.Session(boto_session=boto3session)
data_bucket = sm.Session(boto_session=boto3session).default_bucket()
region = boto3session.region_name

os.environ["BUCKET"] = data_bucket
os.environ["REGION"] = region
role = sm.session.get_execution_role(sagemaker_session=smsession)

# Get the Comprehend service role ARN. For help check https://docs.aws.amazon.com/comprehend/latest/dg/tutorial-reviews-create-role.html
comprehendrole = boto3session.client('iam').get_role(RoleName='AmazonComprehendServiceRole-access-role')['Role']['Arn']


print(f"SageMaker role is: {role}\nDefault SageMaker Bucket: s3://{data_bucket}")
print(f"Comprehend role is: {comprehendrole}")
print(f"AWS region: {region}")



s3 = boto3session.client('s3')
comprehend = boto3session.client('comprehend', region_name=region)
SageMaker role is: arn:aws:iam::754992829378:role/Admin
Default SageMaker Bucket: s3://sagemaker-us-east-1-754992829378
Comprehend role is: arn:aws:iam::754992829378:role/AmazonComprehendServiceRole-access-role
AWS region: us-east-1

Step 2: Prepare custom classification training dataset

Unzip the sample data toxicity.zip and decompress files to a local folder

!unzip ./datasets/toxicity.zip -d toxicity_dataset
Archive:  ./datasets/toxicity.zip
replace toxicity_dataset/LICENSE? [y]es, [n]o, [A]ll, [N]one, [r]ename: ^C

This CSV file contains 500 toxic and 500 non-toxic comments from a variety of popular social media platforms. Click on toxicity_en.csv to see a spreadsheet of 1000 English examples.

Columns: - text: the text of the comment - is_toxic: whether or not the comment is toxic

(The dataset contained in ../datasets/toxicity.zip is an unaltered redistribution of the toxicity dataset made available by Surge AI under MIT License.)

df = pd.read_csv('./toxicity_dataset/toxicity_en.csv')
df
text is_toxic
0 Elon Musk is a piece of shit, greedy capitalis... Toxic
1 The senile credit card shrill from Delaware ne... Toxic
2 He does that a lot -- makes everyone look good... Toxic
3 F*ck Lizzo Toxic
4 Epstein and trump were best buds!!! Pedophiles... Toxic
... ... ...
995 My maternal abuelita taught me how to make pla... Not Toxic
996 Funnily enough I was looking online last week ... Not Toxic
997 I can't bear how nice this is.\n \n I guess it... Not Toxic
998 Going to buy a share of Tesla just to ensure i... Not Toxic
999 I only saw a couple of these throughout the mo... Not Toxic

1000 rows × 2 columns

We will use this dataset to train a Comprehend Custom Classification model to classify toxic sentences.

Comprehend custom classification supports 2 modes: multi-class or multi-label. Comprehend multi-class mode accepts training datasets in 2 formats: CSV or Augmented manifest file. In this lab, we will train a model in the multi-class mode with the training dataset in CSV format.

For more information, refer to this doc for more details about the multi-class data format.

Comprehend custom classifiers requires the CSV’s first column to be the label and the second column to be the text. The CSV file doesn’t require a header. The below code will create a CSV file in the expected format.

df.to_csv('toxicity-custom-classification.csv', header=False, index=False, columns=['is_toxic','text'])

Now, let’s upload the training data to the S3 bucket, ready for Comprehend to access.

s3_key = 'content-moderation-im/text-moderation/toxicity-custom-classification.csv'
s3.upload_file(f'toxicity-custom-classification.csv', data_bucket, s3_key)

Step 3: Create Amazon Comprehend Classification training job

Once we have a labeled dataset ready we are going to create and train a Amazon Comprehend custom classification model with the dataset.

This job can take ~40 minutes to complete. Once the training job is completed move on to next step.

import datetime

# Create a Toxicity classifier
account_id = boto3session.client('sts').get_caller_identity().get('Account')
id = str(datetime.datetime.now().strftime("%s"))

document_classifier_name = 'Sample-Toxicity-Classifier-Content-Moderation'
document_classifier_version = 'v5'
document_classifier_arn = ''
response = None

try:
    create_response = comprehend.create_document_classifier(
        InputDataConfig={
            'DataFormat': 'COMPREHEND_CSV',
            'S3Uri': f's3://{data_bucket}/{s3_key}'
        },
        DataAccessRoleArn=comprehendrole,
        DocumentClassifierName=document_classifier_name,
        VersionName=document_classifier_version,
        LanguageCode='en',
        Mode='MULTI_CLASS'
    )
    
    document_classifier_arn = create_response['DocumentClassifierArn']
    
    print(f"Comprehend Custom Classifier created with ARN: {document_classifier_arn}")
except Exception as error:
    if error.response['Error']['Code'] == 'ResourceInUseException':
        print(f'A classifier with the name "{document_classifier_name}" already exists.')
        document_classifier_arn = f'arn:aws:comprehend:{region}:{account_id}:document-classifier/{document_classifier_name}/version/{document_classifier_version}'
        print(f'The classifier ARN is: "{document_classifier_arn}"')
    else:
        print(error)
A classifier with the name "Sample-Toxicity-Classifier-Content-Moderation" already exists.
The classifier ARN is: "arn:aws:comprehend:us-east-1:754992829378:document-classifier/Sample-Toxicity-Classifier-Content-Moderation/version/v5"

Check status of the Comprehend Custom Classification Job

%%time
# Loop through and wait for the training to complete . Takes up to 10 mins 
from IPython.display import clear_output
import time
from datetime import datetime

jobArn = create_response['DocumentClassifierArn']

max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    now = datetime.now()
    current_time = now.strftime("%H:%M:%S")
    describe_custom_classifier = comprehend.describe_document_classifier(
        DocumentClassifierArn = jobArn
    )
    status = describe_custom_classifier["DocumentClassifierProperties"]["Status"]
    clear_output(wait=True)
    print(f"{current_time} : Custom document classifier: {status}")
    
    if status == "TRAINED" or status == "IN_ERROR":
        break
        
    time.sleep(60)
10:56:52 : Custom document classifier: TRAINED
CPU times: user 1.72 s, sys: 662 ms, total: 2.38 s
Wall time: 34min 21s

Alternatively, to create a Comprehend Custom Classifier Job manually using the console go to Amazon Comprehend Console - On the left menu click “Custom Classification” - In the “Classifier models” section, click on “Create new model” - In Model Setting for Model name, enter a name - In Data Specification; select “Using Single-label” mode and for Data format select CSV file - For Training dataset browse to your data-bucket created above and select the file toxicity-custom-classification.csv - For IAM role select “Create an IAM role” and specify a prefix (this will create a new IAM Role for Comprehend) - Click create

Step 4: Create Amazon Comprehend real time endpoint

Once our Comprehend custom classifier is fully trained (i.e. status = TRAINED). We can create a real-time endpoint. We will use this endpoint to classify text inputs in real time. The following code cells use the comprehend Boto3 client to create an endpoint, but you can also create one manually via the console. Instructions on how to do that can be found in the subsequent section.

#create comprehend endpoint
model_arn = document_classifier_arn
ep_name = 'toxicity-endpoint'

try:
    endpoint_response = comprehend.create_endpoint(
        EndpointName=ep_name,
        ModelArn=model_arn,
        DesiredInferenceUnits=1,    
        DataAccessRoleArn=comprehendrole
    )
    ENDPOINT_ARN=endpoint_response['EndpointArn']
    print(f'Endpoint created with ARN: {ENDPOINT_ARN}')    
except Exception as error:
    if error.response['Error']['Code'] == 'ResourceInUseException':
        print(f'An endpoint with the name "{ep_name}" already exists.')
        ENDPOINT_ARN = f'arn:aws:comprehend:{region}:{account_id}:document-classifier-endpoint/{ep_name}'
        print(f'The classifier endpoint ARN is: "{ENDPOINT_ARN}"')
        %store ENDPOINT_ARN
    else:
        print(error)
Endpoint created with ARN: arn:aws:comprehend:us-east-1:754992829378:document-classifier-endpoint/toxicity-endpoint
display(endpoint_response)
{'EndpointArn': 'arn:aws:comprehend:us-east-1:754992829378:document-classifier-endpoint/toxicity-endpoint',
 'ResponseMetadata': {'RequestId': '375b1283-0960-46d6-93a6-7cf64927b4db',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '375b1283-0960-46d6-93a6-7cf64927b4db',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '106',
   'date': 'Wed, 27 Mar 2024 10:10:02 GMT'},
  'RetryAttempts': 0}}

Alternatively, use the steps below to create a Comprehend endpoint using the AWS console. - Go to Comprehend on AWS Console and click on Endpoints in the left menu. - Click on “Create endpoint” - Give an Endpoint name; for Custom model type select Custom classification; for version select no version or the latest version of the model. - For Classifier model select from the drop down menu - For Inference Unit select 1 - Check “Acknowledge” - Click “Create endpoint”

It may take ~10 minutes for the endpoint to get created. The code cell below checks the creation status.

%%time
# Loop through and wait for the training to complete . Takes up to 10 mins 
from IPython.display import clear_output
import time
from datetime import datetime

ep_arn = endpoint_response["EndpointArn"]

max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    now = datetime.now()
    current_time = now.strftime("%H:%M:%S")
    describe_endpoint_resp = comprehend.describe_endpoint(
        EndpointArn=ep_arn
    )
    status = describe_endpoint_resp["EndpointProperties"]["Status"]
    clear_output(wait=True)
    print(f"{current_time} : Custom document classifier: {status}")
    
    if status == "IN_SERVICE" or status == "FAILED":
        break
        
    time.sleep(10)
11:16:22 : Custom document classifier: IN_SERVICE
CPU times: user 26.5 ms, sys: 8.7 ms, total: 35.2 ms
Wall time: 533 ms

Step 5: Classify message using the real-time endpoint

Once the endpoint has been created, we will use some sample text messages to classify them into toxic or non-toxic categories.

response = comprehend.classify_document(
    Text= 'Why don''t you shoot him?! I hate you all!',
    EndpointArn=ENDPOINT_ARN
)
display(response)
{'Classes': [{'Name': 'Toxic', 'Score': 0.9951152801513672},
  {'Name': 'Not Toxic', 'Score': 0.004884751979261637}],
 'ResponseMetadata': {'RequestId': '60078846-5eda-4b85-a6be-8be582564703',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '60078846-5eda-4b85-a6be-8be582564703',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '107',
   'date': 'Wed, 27 Mar 2024 10:16:25 GMT'},
  'RetryAttempts': 0}}

The inference result returned by Comprehend endpoint contains a “Classes” node, a list of labeled classes with a ‘Score’ representing the confidence score of the inference result.

The above response shows that the text message “Why don’t you shoot him?! I hate you all!” has a high confidence score (> 99%) for the “Toxic” category. You can try different inputs to test the Toxicity classification result.

Step 6: Classify message using the real-time endpoint and capture metrics accross a random selection of texts

import time
import pandas as pd

def display(text, ground_thruth):
    start = time.time()
    response = comprehend.classify_document(
        Text= text,
        EndpointArn=ENDPOINT_ARN
    )
    end = time.time()
    elapsed = end - start
    
    #Building the dataframe
    scores = [item['Score'] for item in response['Classes']]
    max_index = scores.index(max(scores))
    result = response['Classes'][max_index]
    name = result['Name']
    score = max(scores)
    content_length = response['ResponseMetadata']['HTTPHeaders'].get('content-length')
    result = {'Text': text[:50], 'ground_thruth': ground_thruth, 'Prediction': name, 'Score': score, 'Content Length': content_length, 'elapsed_time_sec': elapsed}
    return result

# Get random samples from dataset
nsel = 5
df = pd.read_csv('./toxicity_dataset/toxicity_en.csv')
df_selection = df.sample(n=nsel).head(nsel)
my_df = pd.DataFrame(columns=['Text', 'ground_thruth', 'Prediction', 'Score', 'Content Length', 'elapsed_time_sec'])

for index, row in df_selection.iterrows():
  text = row['text']
  ground_thruth = row['is_toxic']
  df1 = pd.DataFrame(display(text, ground_thruth), index=[0])
  my_df = pd.concat([my_df, df1]) 
    
my_df.head(nsel)
/var/folders/vt/7n9vmpnn1sg9xzf_2z58706r0000gr/T/ipykernel_19577/1421068623.py:30: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  my_df = pd.concat([my_df, df1])
Text ground_thruth Prediction Score Content Length elapsed_time_sec
0 Right!! I was already excited for X Choice, & ... Not Toxic Not Toxic 0.999895 107 1.273570
0 If I learned anything is those teachers are li... Not Toxic Not Toxic 0.999918 107 0.197692
0 Save them babies from their murderous mothers Toxic Toxic 0.999855 108 2.644358
0 FOCH DemoCraps!!! "LET'S GO BRANDON" Toxic Toxic 0.999886 108 0.234662
0 If you build ap with lethal tempo on him you c... Not Toxic Not Toxic 0.999921 107 0.270135

Cleanup

Cleanup is optional if you want to execute subsequent notebooks.

# Delete the Comprehend Endpoint
resp = comprehend.delete_endpoint(EndpointArn=ENDPOINT_ARN)
display(resp)

You will need to wait a few minutes to run the below cell until the Comprehend endpoint is deleted successfully and the classifier is no longer in use.

# Delete the Comprehend Custom Classifier 
resp = comprehend.delete_document_classifier(DocumentClassifierArn=document_classifier_arn)
display(resp)

Conclusion

In this lab, we have trained an Amazon Comprehend custom classifier using a sample toxicity dataset. And deploy the Custom Classifier to a Comprehend endpoint to serve real-time inference.