Working on the Hard Parts

This tutorial follows the third principle of David Perkins' Seven Principles of Teaching: Work on the Hard Parts. Here, we will become familiar with the HuggingFace library and implement a pre-trained machine learning model into a pygame-based video game. We will then improve our skills by practicing with three more types of models. By the end of this tutorial, you should feel confident exploring the HuggingFace library on your own.

Installation Requirements

It is recommended to set up a virtual environment for the installations below. See Installing packages using pip and virtual environments.

To ensure all libraries are installed correctly, see the HuggingFace Quicktour.

Installation Version Links
Python 3.9.13 or above Python Downloads page
Pygame 2.4.0 or above Pygame Getting Started wiki
Pytorch 2.0.1 with computing platform CUDA 11.8 Pytorch website
Requests 2.31.0 or above Installing Packages
HuggingFace Transformers 4.29.2 or above Transformers installation

Try This Out!

This game uses the models in this tutorial to power non-player characters the player can talk to:

To see the game above without AI models, check out this simple pygame example:

To learn about the basics of pygame, check out this blog post: Intro to Pygame: Pygame basics for your first video game!

HuggingFace and Natural Language Processing

The goals of this tutorial are to:

  • Explore a variety of language models from the HuggingFace library
  • Load the models into a pygame-based video game
  • Use the models to generate text for non-player characters (NPCs) that a player can interact with

Natural Language Processing (NLP) is when a machine learning model is trained and used on linguistic data to achieve a task. Tasks may include text classification (assigning a label to text), question answering, text generation, and more. HuggingFace is a great source for all kinds of models and datasets, including those for NLP.

Pipelines

There are two main ways to use a publicly-available model:

  1. The Slow Way - Manually loading a model and tokenizer into variables. This requires encoding text data (converting it to numeric values) before it can be inputted into the model, and decoding the model's output.
  2. The Fast Way - Pipelines. The HuggingFace pipeline() function is a wrapper for models that automatically encodes and decodes data. It also allows a task to be specified, a.k.a. what you want the model to do. Each task has an out-of-the-box default model and tokenizer, or a model can be specified. See the pipeline API reference for more information.

The section below will use both methods to implement a conversational NLP model.

Conversational Models

"Conversational response modelling is the task of generating conversational text that is relevant, coherent and knowledgeable given a prompt. These models have applications in chatbots, and as a part of voice assistants." - HuggingFace Guide on Conversational NLP Tasks.

In our video game, the goal is to make a character that the player can chat back-and-forth with. We will use a conversational model to do so.

The Slow Way - Manually Loading a Model and Tokenizer

First, we need to load the conversational model facebook/blenderbot-400M-distill into a tokenizer and a model:

  • The tokenizer takes text data and turns it into a list of numbers (tokens), where each token represents a certain word or character. This step is needed for the model to process the data.
  • The model takes a tokenized input and generates a response that is also tokenized. This response must be decoded (converted from numbers into words) using the tokenizer.

Note: Models can be found at https://huggingface.co/models. On each model’s page, look for the "</> Use in Transformers" button for the code needed to initialize the model.
# set up a chatbot with the model facebook/blenderbot-400M-distill
# code to initalize model found at: https://huggingface.co/facebook/blenderbot-400M-distill?text=Hi.

# import libraries
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# set up tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill")
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/blenderbot-400M-distill")

Tokenizers

Let's write a sentence for the model to respond to:

# a sentence for our model to respond to
utterance = "What is your favourite colour?"
utterance
'What is your favourite colour?'

Convert the sentence into a format the model can process (PyTorch tensors):

# return_rensors='pt' makes the inputs into pytorch tensors
# otherwise, the tokenizer will return lists
inputs = tokenizer(utterance, return_tensors='pt')
inputs
{'input_ids': tensor([[ 714,  315,  414, 6179, 7796,   38,    2]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1]])}

The tokenizer has encoded the sentence into input_ids. Note that the end of the sequence, </s>, is its own character. Ignore attention_mask for now.

Word input_id
'What' 714
'is' 315
'your' 414
'favourite' 6179
'colour' 7796
'?' 38
'</s>' 2

We can also use the tokenizer as a decoder:

# decode the sentence we just encoded
tokenizer.decode(inputs.input_ids[0])
' What is your favourite colour?</s>'

Let's get the model's response to our question:

# unpack (**) the inputs variable into the model
response = model.generate(**inputs)
response
c:\Users\Christina\Desktop\Python\Digital Engineering Fellowship 2023\Christina-Kampel-Draft-2023\ai-game-env\lib\site-packages\transformers\generation\utils.py:1346: UserWarning: Using `max_length`'s default (60) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.
  warnings.warn(
tensor([[   1,  863, 2297, 3183,  315, 3002,   21,  228,  714,  315, 4228,   38,
          228,  946,  304,  360,  265, 2297, 3183,   38,    2]])

Decode the response into words:

# Since the data is in double braces, we need to use [0] to access the encoded data
tokenizer.decode(response[0])
'<s> My favorite color is blue.  What is yours?  Do you have a favorite color?</s>'

Padding and Attention Masks

Above, we gave the model one sentence to respond to. What if we want to give it a batch of a few sentences? Then, we need to do two things:

  1. Make all of the encoded tensors the same length by padding them (adding a token to make all of the encoded sentences the same length).
  2. Give the model an attention mask - a tensor that tells the model which tokens are important and which tokens are padding.

For more information on padding and attention masks, see https://lukesalamone.github.io/posts/what-are-attention-masks/.

Let's make a batch of sentences for the model to respond to:

# a batch of sentences for our model to respond to
utterance_batch = ["What is your favourite colour?",
                   "I like coding. What do you like to do?",
                   "What time is dinner?"]

We can control the direction that the padding tokens are applied:

# tell the tokenizer to pad from the left
tokenizer.padding_side = 'left'

We can also specify which token is used for padding. This is not always needed. Here, we are using the "end of sequence" token for padding:

tokenizer.pad_token = tokenizer.eos_token

Use the updated tokenizer to encode the batch:

# encode the batch
input_batch = tokenizer(utterance_batch, return_tensors='pt', padding=True)
input_batch
{'input_ids': tensor([[   2,    2,    2,    2,    2,    2,  714,  315,  414, 6179, 7796,   38,
            2],
        [ 281,  398, 6601,  278,   21,  714,  361,  304,  398,  287,  361,   38,
            2],
        [   2,    2,    2,    2,    2,    2,    2,  714,  552,  315, 5048,   38,
            2]]), 'attention_mask': tensor([[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]])}

Notice:

  • the padding token is 2, which appears on the left of each tensor
  • each tensor in input_ids has a corresponding tensor in attention_mask (see below)

The attention mask tells the model if a token in input_ids is important (1), or is a padding value, and therefore not important (0).

# Show the data for only the first sentence in the batch

# encoded ids
first_sentence_ids = input_batch.input_ids[0]
# attention mask
first_sentence_mask = input_batch.attention_mask[0]
print(f"input_ids = {first_sentence_ids}\nattention_mask = {first_sentence_mask}")
input_ids = tensor([   2,    2,    2,    2,    2,    2,  714,  315,  414, 6179, 7796,   38,
           2])
attention_mask = tensor([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1])

Now, let's pass the entire batch to the model and get its responses:

# Unpacking (**) is important here because it gives the model the attention_mask
response_batch = model.generate(**input_batch)
response_batch
tensor([[   1,  863, 2297, 3183,  315, 3002,   21,  228,  714,  315, 4228,   38,
          228,  946,  304,  360,  265, 2297, 3183,   38,    2,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0],
        [   1,  281,  398,  287,  525, 1620, 1012,  298, 1484, 2842,   21,  714,
          906,  306, 6601,  278,  361,  304,  361,   38,  228,    2,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0],
        [   1,    1,  417,  267, 1336,  315,  403, 1226,   33, 2527,   21,  228,
          281,  632,  655,  287,  627,  265,  893, 1718,  306,  508,  558, 2595,
           91,   80,  298, 3597, 1884,   90,   21,    2]])

Decode the responses:

for item in response_batch:
    print(tokenizer.decode(item))
<s> My favorite color is blue.  What is yours?  Do you have a favorite color?</s><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad>
<s> I like to play video games and watch movies. What kind of coding do you do? </s><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad>
<s><s> Dinner is at 8:30.  I am going to make a big pot of spaghetti and meatballs.</s>

The Fast Way - Pipelines

The pipeline transformer performs the same tasks as above, but automatically encodes and decodes text!

Let's start with the same model and tokenizer as before:

# set up the model and tokenizer (same as before)
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill")
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/blenderbot-400M-distill")
# set the tokenizer to left padding using the eos token (same as before)
tokenizer.padding_side = 'left'
tokenizer.pad_token = tokenizer.eos_token

Use the pipeline wrapper on the model and tokenizer to create a chatbot:

from transformers import pipeline

# the chatbot - since task="conversational", pipline returns a ConversationalPipeline
blenderbot = pipeline(task="conversational", model=model, tokenizer=tokenizer)

The chatbot is a ConversationalPipeline object, which accepts a Conversation object as its input:

from transformers import Conversation

# start a conversation with a chatbot - no need for encoding!
# conversation_id is manually set for reproducibility
# if conversation_id is not set, an id is randomly generated
conversation = Conversation("Hi. How are you?", conversation_id="100")
conversation
Conversation id: 100 
user >> Hi. How are you? 

Our conversation has unprocessed user input, so we can pass it to the chatbot to get a response:

# get the bot's response
blenderbot(conversation)
Conversation id: 100 
user >> Hi. How are you? 
bot >>  I'm doing well, thank you. How about yourself? Do you have any plans for the weekend? 

The bot's response has been appended to the Conversation object! This way, the object stores the conversation history:

# show the updated conversation (chat history)
conversation
Conversation id: 100 
user >> Hi. How are you? 
bot >>  I'm doing well, thank you. How about yourself? Do you have any plans for the weekend? 

The past_user_inputs attribute returns a list of everything the user said:

conversation.past_user_inputs
['Hi. How are you?']

The generated_responses attribute returns a list of everything the bot said:

conversation.generated_responses
[" I'm doing well, thank you. How about yourself? Do you have any plans for the weekend?"]

The add_user_input() method allows us to add new user input to the conversation:

Note: The chatbot can only respond to conversations that have unprocessed user input.
# add user input
conversation.add_user_input("What do you want to do this weekend?")
conversation
Conversation id: 100 
user >> Hi. How are you? 
bot >>  I'm doing well, thank you. How about yourself? Do you have any plans for the weekend? 
user >> What do you want to do this weekend? 
# chatbot responds to the new input
blenderbot(conversation)
Conversation id: 100 
user >> Hi. How are you? 
bot >>  I'm doing well, thank you. How about yourself? Do you have any plans for the weekend? 
user >> What do you want to do this weekend? 
bot >>  I'm going to a concert with some friends. I've never been to one before. 

Trimming a Conversation

The "Conversation input is too long" warning may appear after only a few back-and-forth exchanges. The pipeline automatically trims the input, but manual trimming is also an option. This is useful if you only want to show the most recent few lines of a conversation, not the entire chat history.

# define trimming function
def trim_convo(conversation):
    """Trim the earliest user and bot lines from a Conversation.

    Parameters:
    - conversation (transformers.pipelines.conversational.Conversation object): conversation to trim

    Returns:
    - Trimmed conversation (transformers.pipelines.conversational.Conversation object)
    """
    try:
        conversation.past_user_inputs.pop(0)
        conversation.generated_responses.pop(0)
        return conversation
    except:
        warning = f"Conversation is too short to be trimmed."
        print(warning)
# test out the function
trim_convo(conversation=conversation)

# see results
conversation
Conversation id: 100 
user >> What do you want to do this weekend? 
bot >>  I'm going to a concert with some friends. I've never been to one before. 

Model Caveats

While it is useful for producing a back-and-forth conversation, the blenderbot model does not store information from the entire conversation in its responses. For example, if you tell the blenderbot that your favourite colour is blue, and then ask it what your favourite colour is, it will not remember the answer. It may also lose the context of the conversation and give answers that are nonsensical or unrelated to the question.

Question-Answering Models

"Question Answering models can retrieve the answer to a question from a given text, which is useful for searching for an answer in a document. Some question answering models can generate answers without context!" - HuggingFace Guide on Question-Answering Tasks

For our video game, the goal is to make a non-player-character (NPC) that can answer questions about the game. This means that context - the information the model uses in its responses - is important.

For this tutorial, let's compare two question-answering models:

  • distilbert-base-cased-distilled-squad: an extractive model, meaning that it extracts the answer out of the given context
  • t5-base: a text-to-text generation model that has a wide range of applications such as question-answering, translating and summarizing. This model generates new text based on the given context.

Extractive Model

Set up the model using a pipeline and task="question-answering". Note that distilbert-base-cased-distilled-squad is the default model for this task, so there is no need to specify the model and tokenizer when we are just testing the model out.

Note: In production, it’s good practice to specify the model and tokenizer as was done for the Conversational Model.
# imports
from transformers import pipeline

# set up model
qa_model = pipeline(task="question-answering")
c:\Users\Christina\Desktop\Python\Digital Engineering Fellowship 2023\Christina-Kampel-Draft-2023\ai-game-env\lib\site-packages\tqdm\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.

Let's give the model a question and some context, and see what it's response is:

# set up question and context
question = "Where is the key?"
context = "The key is at the top of the tree."

# get model's response
qa_model(question=question, context=context)
{'score': 0.2849337160587311,
 'start': 14,
 'end': 33,
 'answer': 'the top of the tree'}

Text-to-Text Generation Model

Set up the model using a pipeline and task="text2text-generation". Note that t5-base is the default model for this task.

t2t_model = pipeline(task="text2text-generation")
No model was supplied, defaulted to t5-base and revision 686f1db (https://huggingface.co/t5-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
c:\Users\Christina\Desktop\Python\Digital Engineering Fellowship 2023\Christina-Kampel-Draft-2023\ai-game-env\lib\site-packages\transformers\models\t5\tokenization_t5_fast.py:155: FutureWarning: This tokenizer was incorrectly instantiated with a model max length of 512 which will be corrected in Transformers v5.
For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-base automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.
- To avoid this warning, please instantiate this tokenizer with `model_max_length` set to your preferred value.
  warnings.warn(

Give the model a question and some context. For this model, the "question" and "context" labels are used inside a string as shown below:

t2t_model("question: Where can I find the key? context: The key is at the top of the tree.")
[{'generated_text': 'the top of the tree'}]

Adding to the Context

So far, the two models have given the same responses. To test out their differences, let's give the models a larger chunk of information as its context so it can answer a wider range of questions:

context_large = """This game has the following objects in it: Player Bear, Wall, Tree, Key, Lock and Polar Bear.
                The Player Bear is a character controlled by you, the user. You can use the arrow keys to make the Player Bear move around,
                 and the RETURN or ENTER keys to talk with other chatbots. The wall is an impassible obstacle.
                  The tree and the lock are interactive objects. You can climb the tree to find the key at the top.
                   Once you have the key, you can use the key to open or unlock the lock.
                    You can talk to the Polar Bear as well. "NPC" stands for "non-player character". The Polar Bear is a conversational chatbot NPC that uses the
                     facebook/blenderbot-400M-distill model. """

Let's compare the models' responses to the same questions:

Question 1: How do I move around?

# extractive model
qa_model(question="How do I move around?", context=context_large)
{'score': 0.5670624375343323, 'start': 186, 'end': 196, 'answer': 'arrow keys'}
# text-to-text model
t2t_model(f"question: How do I move around? context: {context_large}")
[{'generated_text': 'arrow keys'}]

Question 2: How do I get to the key?

# extractive model
qa_model(question="How do I get to the key?", context=context_large)
{'score': 0.3596245348453522,
 'start': 526,
 'end': 549,
 'answer': 'open or unlock the lock'}
# text-to-text model
t2t_model(f"question: How do I get to the key? context: {context_large}")
[{'generated_text': 'climb the tree'}]

Question 3: How many bears are there?

# extractive model
qa_model(question="How many bears are there?", context=context_large)
{'score': 0.3472032845020294,
 'start': 43,
 'end': 92,
 'answer': 'Player Bear, Wall, Tree, Key, Lock and Polar Bear'}
# text-to-text model
t2t_model(f"question: How many bears are there? context: {context_large}")
[{'generated_text': 'Polar Bear'}]

Question 4: Who is the Polar Bear?

# extractive model
qa_model(question="Who is the Polar Bear?", context=context_large)
{'score': 0.5299234986305237,
 'start': 670,
 'end': 698,
 'answer': 'a conversational chatbot NPC'}
# text-to-text model
t2t_model(f"question: Who is the Polar Bear? context: {context_large}")
[{'generated_text': 'conversational chatbot NPC'}]

Model Caveats

As shown above, both models give similar answers. However, neither model can correctly answer Question 3 ("How many bears are there?"). The answer should be "two", which can be inferred from the context but is not explicitly stated. This shows that neither model is good at inferring information from the context. To solve this problem, a different model could be used, or more information could be included in the context to make answers easier for the model to find.

Replicating the Conversation Object

When we put the Question-Answering model into production, we may want to store conversation data in a similar way as the Conversation object used for the Conversational Model. This can be done using a dictionary:

# create a conversation dictionary to hold the chat history
# like the Conversation object, past_user_inputs will store the user's input and generated_responses will store the chatbot's responses
conversation2 = {"past_user_inputs": [], "generated_responses": []}

Lines of text can be added using .append():

# add a line to the list of chatbot's responses
conversation2["generated_responses"].append("Hi, I'm a question-answering bot. Ask me a question!")

# add a line to the user input
conversation2["past_user_inputs"].append("How do I get the key?")

# show the conversation
conversation2
{'past_user_inputs': ['How do I get the key?'],
 'generated_responses': ["Hi, I'm a question-answering bot. Ask me a question!"]}

We can get the chatbot's responses to the question, and print it out:

# get the question from the conversation history
question = conversation2["past_user_inputs"][-1]
question
'How do I get the key?'
# get chatbot's response to the question given the context
qa_model(question=question, context=context_large)
{'score': 0.35603067278862,
 'start': 526,
 'end': 549,
 'answer': 'open or unlock the lock'}
# we only want the 'answer'
response = qa_model(question=question, context=context_large)["answer"]
response
'open or unlock the lock'
# format the answer so the text looks ike a sentence - capitalize the first word and add a period at the end
response = response.capitalize() + "."
response
'Open or unlock the lock.'

Finally, add the chatbot's response to the conversation history:

# add response to conversation history
conversation2["generated_responses"].append(response)

# show results
conversation2
{'past_user_inputs': ['How do I get the key?'],
 'generated_responses': ["Hi, I'm a question-answering bot. Ask me a question!",
  'Open or unlock the lock.']}

If needed, we can show the back-and-forth conversation:

# set counters - used into index into lists
i = 0
j = 0

while i < len(conversation2["generated_responses"]):
    # print bot response
    print("Bot: " + conversation2["generated_responses"][i])

    if j < len(conversation2["past_user_inputs"]):
        # print user input
        print("User: " + conversation2["past_user_inputs"][j])
    
    # increment counters
    i += 1
    j += 1
Bot: Hi, I'm a question-answering bot. Ask me a question!
User: How do I get the key?
Bot: Open or unlock the lock.

Fill-Mask Models

"Masked language modeling is the task of masking some of the words in a sentence and predicting which words should replace those masks. These models are useful when we want to get a statistical understanding of the language in which the model is trained in." - HuggingFace Guide on Fill-Mask Tasks

For our video game, we will make an NPC that fills in the blanks of a sentence using a fill-mask model.

We will use the distilroberta-base model, the default model for task='fill-mask' when using a pipeline.

Let's set up the model:

Note: Since distilroberta-base is the default model for this task, we do not need to specify the model and tokenizer. This is done so anyways because it is the conventional way of loading a model into production.
# import libraries
from transformers import pipeline, AutoTokenizer, AutoModelForMaskedLM

# set up model and tokenizer
fm_tokenizer = AutoTokenizer.from_pretrained("distilroberta-base")
fm_model = AutoModelForMaskedLM.from_pretrained("distilroberta-base")

# create chatbot
fm_chatbot = pipeline(task="fill-mask", model=fm_model, tokenizer=fm_tokenizer)

Now, let's test it out. Give the model a sentence containing the mask token ("<mask>") in the place of a missing word:

Note: The input must contain the mask token or the pipeline will raise an error.
# input sentence with missing word
sentence = "Paris is the <mask> of France."

# get the model's output
result = fm_chatbot("Paris is the <mask> of France.")
result
[{'score': 0.6790177226066589,
  'token': 812,
  'token_str': ' capital',
  'sequence': 'Paris is the capital of France.'},
 {'score': 0.05177992954850197,
  'token': 32357,
  'token_str': ' birthplace',
  'sequence': 'Paris is the birthplace of France.'},
 {'score': 0.03825283423066139,
  'token': 1144,
  'token_str': ' heart',
  'sequence': 'Paris is the heart of France.'},
 {'score': 0.024348977953195572,
  'token': 29778,
  'token_str': ' envy',
  'sequence': 'Paris is the envy of France.'},
 {'score': 0.022851353511214256,
  'token': 1867,
  'token_str': ' Capital',
  'sequence': 'Paris is the Capital of France.'}]

The model returned five sentences containing the five words most likely to fill in the mask.

We can select only certain results if needed:

# show all model results from the sentence "Paris is the <mask> of France."
result
[{'score': 0.6790177226066589,
  'token': 812,
  'token_str': ' capital',
  'sequence': 'Paris is the capital of France.'},
 {'score': 0.05177992954850197,
  'token': 32357,
  'token_str': ' birthplace',
  'sequence': 'Paris is the birthplace of France.'},
 {'score': 0.03825283423066139,
  'token': 1144,
  'token_str': ' heart',
  'sequence': 'Paris is the heart of France.'},
 {'score': 0.024348977953195572,
  'token': 29778,
  'token_str': ' envy',
  'sequence': 'Paris is the envy of France.'},
 {'score': 0.022851353511214256,
  'token': 1867,
  'token_str': ' Capital',
  'sequence': 'Paris is the Capital of France.'}]
# show only the most likely sentence - the one with the highest score
result[0]['sequence']
'Paris is the capital of France.'

We can also give a summary of the most likely tokens by iterating through the model's output:

# counter for indexing into the model's output
i = 0
# string to hold the most likely words
print_string = ""

while i < len(result):
    # if we have reached the last word, insert a period
    if i == len(result) - 1:
        print_string += result[i]['token_str'] + "."
    # otherwise, insert a comma and space
    else:
        print_string += result[i]['token_str'] + ", "

    i += 1

# show results
print("The most likely words are:" + print_string)
The most likely words are: capital,  birthplace,  heart,  envy,  Capital.

Model Caveats

Since this model gets its data from the Internet, the output of certain phrases may include harmful stereotypes. A good example is shown in the Bias, Risks, and Limitations section of the model's information page, where the creators compare the model's responses to the phrases, "The man worked as a <mask>", and, "The woman worked as a <mask>".

Text-Generating Models

"Generating text is the task of producing new text. These models can, for example, fill in incomplete text or paraphrase." - HuggingFace Guide on Text Generation Tasks

For our video game, we will use the text-generating model gpt2 to complete the phrase, "Once upon a time,".

Let's set up the model and tokenizer, and pass them to the pipeline object along with task="text-generation":

from transformers import AutoTokenizer, AutoModelForCausalLM, set_seed

# set up model and tokenizer
tg_tokenizer = AutoTokenizer.from_pretrained("gpt2")
tg_model = AutoModelForCausalLM.from_pretrained("gpt2")

# create chatbot
tg_chatbot = pipeline(task="text-generation", model=tg_model, tokenizer=tg_tokenizer, do_sample=True)

Above, we set do_sample=True. This is not required for text generation, but it enables various decoding strategies when new text is generated. From the HuggingFace transformers documentation:

"do_sample: if set to True, this parameter enables decoding strategies such as multinomial sampling, beam-search multinomial sampling, Top-K sampling and Top-p sampling. All these strategies select the next token from the probability distribution over the entire vocabulary with various strategy-specific adjustments."

Next, let's ask the model to complete a story starting with "Once upon a time,". Note that we since the generation relies on randomness, we need to set a seed for reproducibility:

# set the seed for reproducibility
set_seed(50)
# have the model fill in the story
story = tg_chatbot("Once upon a time,")
# show results
story
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
c:\Users\Christina\Desktop\Python\Digital Engineering Fellowship 2023\Christina-Kampel-Draft-2023\ai-game-env\lib\site-packages\transformers\generation\utils.py:1346: UserWarning: Using `max_length`'s default (50) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.
  warnings.warn(
[{'generated_text': 'Once upon a time, in the case of Mr. Pate, the most important part of our work, and especially when the subject of the present discussion is taken into consideration, is that the individual and the particular case are quite separate. The particular'}]

To only see the generated text, do the following:

# get only the string of generated text
story_text = story[0]['generated_text']
# show results
story_text
'Once upon a time, in the case of Mr. Pate, the most important part of our work, and especially when the subject of the present discussion is taken into consideration, is that the individual and the particular case are quite separate. The particular'

To continue the story, we can take this output and input it back into the model:

# use the previous output as the new input for the model
story2 = tg_chatbot(story_text)
# get only the string of generated text
story2_text = story2[0]['generated_text']
# show results
story_text
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Input length of input_ids is 50, but `max_length` is set to 50. This can lead to unexpected behavior. You should consider increasing `max_new_tokens`.
'Once upon a time, in the case of Mr. Pate, the most important part of our work, and especially when the subject of the present discussion is taken into consideration, is that the individual and the particular case are quite separate. The particular'

What happened here? It looks like no new text was added.

By default, the model has a max_length of 50 output tokens (words), including the input. To fix this, we could do one of two things:

  1. Increase max_length: A good short-term solution, but not useful if we want to keep expanding on the same text, since this number includes the input text.
  2. Set the max_new_tokens: Controls the maximum number of new words the model generates, not including the input text. A good long-term solution if we want the model to continue expanding one block of text.
# Try again, but this time using max_new_tokens
story2 = tg_chatbot(story_text, max_new_tokens=20)
# get only the string of generated text
story2_text = story2[0]['generated_text']
# show results
story2_text
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
'Once upon a time, in the case of Mr. Pate, the most important part of our work, and especially when the subject of the present discussion is taken into consideration, is that the individual and the particular case are quite separate. The particular case is one which has been dealt with by the Courts so far in this Court, without reference to'

Model Caveats

In production, using gpt2 to continously expand on the same block of text can result in the model giving the same output after a certain number of iterations. This may look like:

Iteration 1:
> Input:"Once upon a time,"> Output:"Once upon a time, there was a snake"
Iteration 2:
> Input:"Once upon a time, there was a snake"> Output:"Once upon a time, there was a snake in the garden"
Iteration 3:
> Input:"Once upon a time, there was a snake in the garden"> Output:"Once upon a time, there was a snake in the garden in the garden"

As a result, we may need to have the option to reset the story when using this model in production.

Resetting the Story

To reset the story while using the same loaded model, we need to:

  1. Change the input text back to "Once upon a time,".
  2. Change the seed.

If the seed remains the same, the model will generate the same results as before.

Here's proof:

# Restarting the story with the same model, same seed, and original input will generate the same result as before

# set the seed for reproducibility
set_seed(50)
# have the model fill in the story
story = tg_chatbot("Once upon a time,")
# show results
story
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
c:\Users\Christina\Desktop\Python\Digital Engineering Fellowship 2023\Christina-Kampel-Draft-2023\ai-game-env\lib\site-packages\transformers\generation\utils.py:1346: UserWarning: Using `max_length`'s default (50) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.
  warnings.warn(
[{'generated_text': 'Once upon a time, in the case of Mr. Pate, the most important part of our work, and especially when the subject of the present discussion is taken into consideration, is that the individual and the particular case are quite separate. The particular'}]

References and Companion Files

References:

Companion Files: