Pygame with AI
Using language models in a pygame-based video game!
- Working on the Hard Parts
- Installation Requirements
- Try This Out!
- HuggingFace and Natural Language Processing
- Pipelines
- Conversational Models
- Question-Answering Models
- Fill-Mask Models
- Text-Generating Models
- References and Companion Files
Working on the Hard Parts
This tutorial follows the third principle of David Perkins' Seven Principles of Teaching: Work on the Hard Parts. Here, we will become familiar with the HuggingFace library and implement a pre-trained machine learning model into a pygame-based video game. We will then improve our skills by practicing with three more types of models. By the end of this tutorial, you should feel confident exploring the HuggingFace library on your own.
Installation Requirements
It is recommended to set up a virtual environment for the installations below. See Installing packages using pip and virtual environments.
To ensure all libraries are installed correctly, see the HuggingFace Quicktour.
Installation | Version | Links |
---|---|---|
Python | 3.9.13 or above | Python Downloads page |
Pygame | 2.4.0 or above | Pygame Getting Started wiki |
Pytorch | 2.0.1 with computing platform CUDA 11.8 | Pytorch website |
Requests | 2.31.0 or above | Installing Packages |
HuggingFace Transformers | 4.29.2 or above | Transformers installation |
Try This Out!
This game uses the models in this tutorial to power non-player characters the player can talk to:
- Video Game Playthrough (may require downloading): ai_game_playthrough.mp4
- Video Game Code: ai_game.py
To see the game above without AI models, check out this simple pygame example:
- Video Game Playthrough (may require downloading): simple_pygame_playthrough.mp4
- Video Game Code: simple_pygame.py
To learn about the basics of pygame, check out this blog post: Intro to Pygame: Pygame basics for your first video game!
HuggingFace and Natural Language Processing
The goals of this tutorial are to:
- Explore a variety of language models from the HuggingFace library
- Load the models into a pygame-based video game
- Use the models to generate text for non-player characters (NPCs) that a player can interact with
Natural Language Processing (NLP) is when a machine learning model is trained and used on linguistic data to achieve a task. Tasks may include text classification (assigning a label to text), question answering, text generation, and more. HuggingFace is a great source for all kinds of models and datasets, including those for NLP.
Pipelines
There are two main ways to use a publicly-available model:
- The Slow Way - Manually loading a model and tokenizer into variables. This requires encoding text data (converting it to numeric values) before it can be inputted into the model, and decoding the model's output.
-
The Fast Way - Pipelines. The HuggingFace
pipeline()
function is a wrapper for models that automatically encodes and decodes data. It also allows atask
to be specified, a.k.a. what you want the model to do. Each task has an out-of-the-box default model and tokenizer, or a model can be specified. See the pipeline API reference for more information.
The section below will use both methods to implement a conversational NLP model.
Conversational Models
"Conversational response modelling is the task of generating conversational text that is relevant, coherent and knowledgeable given a prompt. These models have applications in chatbots, and as a part of voice assistants." - HuggingFace Guide on Conversational NLP Tasks.
In our video game, the goal is to make a character that the player can chat back-and-forth with. We will use a conversational model to do so.
The Slow Way - Manually Loading a Model and Tokenizer
First, we need to load the conversational model facebook/blenderbot-400M-distill into a tokenizer
and a model
:
- The
tokenizer
takes text data and turns it into a list of numbers (tokens
), where each token represents a certain word or character. This step is needed for the model to process the data. - The
model
takes a tokenized input and generates a response that is also tokenized. This response must be decoded (converted from numbers into words) using thetokenizer
.
# set up a chatbot with the model facebook/blenderbot-400M-distill
# code to initalize model found at: https://huggingface.co/facebook/blenderbot-400M-distill?text=Hi.
# import libraries
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
# set up tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill")
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/blenderbot-400M-distill")
Let's write a sentence for the model to respond to:
# a sentence for our model to respond to
utterance = "What is your favourite colour?"
utterance
Convert the sentence into a format the model can process (PyTorch tensors):
# return_rensors='pt' makes the inputs into pytorch tensors
# otherwise, the tokenizer will return lists
inputs = tokenizer(utterance, return_tensors='pt')
inputs
The tokenizer has encoded the sentence into input_ids
. Note that the end of the sequence, </s>
, is its own character. Ignore attention_mask
for now.
Word | input_id |
---|---|
'What' | 714 |
'is' | 315 |
'your' | 414 |
'favourite' | 6179 |
'colour' | 7796 |
'?' | 38 |
'</s> ' |
2 |
We can also use the tokenizer as a decoder:
# decode the sentence we just encoded
tokenizer.decode(inputs.input_ids[0])
Let's get the model's response to our question:
# unpack (**) the inputs variable into the model
response = model.generate(**inputs)
response
Decode the response into words:
# Since the data is in double braces, we need to use [0] to access the encoded data
tokenizer.decode(response[0])
Padding and Attention Masks
Above, we gave the model one sentence to respond to. What if we want to give it a batch of a few sentences? Then, we need to do two things:
- Make all of the encoded tensors the same length by padding them (adding a token to make all of the encoded sentences the same length).
- Give the model an attention mask - a tensor that tells the model which tokens are important and which tokens are padding.
For more information on padding and attention masks, see https://lukesalamone.github.io/posts/what-are-attention-masks/.
Let's make a batch of sentences for the model to respond to:
# a batch of sentences for our model to respond to
utterance_batch = ["What is your favourite colour?",
"I like coding. What do you like to do?",
"What time is dinner?"]
We can control the direction that the padding tokens are applied:
# tell the tokenizer to pad from the left
tokenizer.padding_side = 'left'
We can also specify which token is used for padding. This is not always needed. Here, we are using the "end of sequence" token for padding:
tokenizer.pad_token = tokenizer.eos_token
Use the updated tokenizer to encode the batch:
# encode the batch
input_batch = tokenizer(utterance_batch, return_tensors='pt', padding=True)
input_batch
Notice:
- the padding token is
2
, which appears on the left of each tensor - each tensor in
input_ids
has a corresponding tensor inattention_mask
(see below)
The attention mask tells the model if a token in input_ids
is important (1
), or is a padding value, and therefore not important (0
).
# Show the data for only the first sentence in the batch
# encoded ids
first_sentence_ids = input_batch.input_ids[0]
# attention mask
first_sentence_mask = input_batch.attention_mask[0]
print(f"input_ids = {first_sentence_ids}\nattention_mask = {first_sentence_mask}")
Now, let's pass the entire batch to the model and get its responses:
# Unpacking (**) is important here because it gives the model the attention_mask
response_batch = model.generate(**input_batch)
response_batch
Decode the responses:
for item in response_batch:
print(tokenizer.decode(item))
Let's start with the same model and tokenizer as before:
# set up the model and tokenizer (same as before)
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill")
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/blenderbot-400M-distill")
# set the tokenizer to left padding using the eos token (same as before)
tokenizer.padding_side = 'left'
tokenizer.pad_token = tokenizer.eos_token
Use the pipeline
wrapper on the model
and tokenizer
to create a chatbot:
from transformers import pipeline
# the chatbot - since task="conversational", pipline returns a ConversationalPipeline
blenderbot = pipeline(task="conversational", model=model, tokenizer=tokenizer)
The chatbot is a ConversationalPipeline
object, which accepts a Conversation
object as its input:
from transformers import Conversation
# start a conversation with a chatbot - no need for encoding!
# conversation_id is manually set for reproducibility
# if conversation_id is not set, an id is randomly generated
conversation = Conversation("Hi. How are you?", conversation_id="100")
conversation
Our conversation has unprocessed user input, so we can pass it to the chatbot to get a response:
# get the bot's response
blenderbot(conversation)
The bot's response has been appended to the Conversation
object! This way, the object stores the conversation history:
# show the updated conversation (chat history)
conversation
The past_user_inputs
attribute returns a list of everything the user said:
conversation.past_user_inputs
The generated_responses
attribute returns a list of everything the bot said:
conversation.generated_responses
The add_user_input()
method allows us to add new user input to the conversation:
# add user input
conversation.add_user_input("What do you want to do this weekend?")
conversation
# chatbot responds to the new input
blenderbot(conversation)
Trimming a Conversation
The "Conversation input is too long" warning may appear after only a few back-and-forth exchanges. The pipeline automatically trims the input, but manual trimming is also an option. This is useful if you only want to show the most recent few lines of a conversation, not the entire chat history.
# define trimming function
def trim_convo(conversation):
"""Trim the earliest user and bot lines from a Conversation.
Parameters:
- conversation (transformers.pipelines.conversational.Conversation object): conversation to trim
Returns:
- Trimmed conversation (transformers.pipelines.conversational.Conversation object)
"""
try:
conversation.past_user_inputs.pop(0)
conversation.generated_responses.pop(0)
return conversation
except:
warning = f"Conversation is too short to be trimmed."
print(warning)
# test out the function
trim_convo(conversation=conversation)
# see results
conversation
Model Caveats
While it is useful for producing a back-and-forth conversation, the blenderbot model does not store information from the entire conversation in its responses. For example, if you tell the blenderbot that your favourite colour is blue, and then ask it what your favourite colour is, it will not remember the answer. It may also lose the context of the conversation and give answers that are nonsensical or unrelated to the question.
Question-Answering Models
"Question Answering models can retrieve the answer to a question from a given text, which is useful for searching for an answer in a document. Some question answering models can generate answers without context!" - HuggingFace Guide on Question-Answering Tasks
For our video game, the goal is to make a non-player-character (NPC) that can answer questions about the game. This means that context - the information the model uses in its responses - is important.
For this tutorial, let's compare two question-answering models:
- distilbert-base-cased-distilled-squad: an extractive model, meaning that it extracts the answer out of the given context
- t5-base: a text-to-text generation model that has a wide range of applications such as question-answering, translating and summarizing. This model generates new text based on the given context.
Extractive Model
Set up the model using a pipeline
and task="question-answering"
. Note that distilbert-base-cased-distilled-squad
is the default model for this task, so there is no need to specify the model and tokenizer when we are just testing the model out.
# imports
from transformers import pipeline
# set up model
qa_model = pipeline(task="question-answering")
Let's give the model a question and some context, and see what it's response is:
# set up question and context
question = "Where is the key?"
context = "The key is at the top of the tree."
# get model's response
qa_model(question=question, context=context)
t2t_model = pipeline(task="text2text-generation")
Give the model a question and some context. For this model, the "question" and "context" labels are used inside a string as shown below:
t2t_model("question: Where can I find the key? context: The key is at the top of the tree.")
context_large = """This game has the following objects in it: Player Bear, Wall, Tree, Key, Lock and Polar Bear.
The Player Bear is a character controlled by you, the user. You can use the arrow keys to make the Player Bear move around,
and the RETURN or ENTER keys to talk with other chatbots. The wall is an impassible obstacle.
The tree and the lock are interactive objects. You can climb the tree to find the key at the top.
Once you have the key, you can use the key to open or unlock the lock.
You can talk to the Polar Bear as well. "NPC" stands for "non-player character". The Polar Bear is a conversational chatbot NPC that uses the
facebook/blenderbot-400M-distill model. """
Let's compare the models' responses to the same questions:
Question 1: How do I move around?
# extractive model
qa_model(question="How do I move around?", context=context_large)
# text-to-text model
t2t_model(f"question: How do I move around? context: {context_large}")
Question 2: How do I get to the key?
# extractive model
qa_model(question="How do I get to the key?", context=context_large)
# text-to-text model
t2t_model(f"question: How do I get to the key? context: {context_large}")
Question 3: How many bears are there?
# extractive model
qa_model(question="How many bears are there?", context=context_large)
# text-to-text model
t2t_model(f"question: How many bears are there? context: {context_large}")
Question 4: Who is the Polar Bear?
# extractive model
qa_model(question="Who is the Polar Bear?", context=context_large)
# text-to-text model
t2t_model(f"question: Who is the Polar Bear? context: {context_large}")
Model Caveats
As shown above, both models give similar answers. However, neither model can correctly answer Question 3 ("How many bears are there?"). The answer should be "two", which can be inferred from the context but is not explicitly stated. This shows that neither model is good at inferring information from the context. To solve this problem, a different model could be used, or more information could be included in the context to make answers easier for the model to find.
# create a conversation dictionary to hold the chat history
# like the Conversation object, past_user_inputs will store the user's input and generated_responses will store the chatbot's responses
conversation2 = {"past_user_inputs": [], "generated_responses": []}
Lines of text can be added using .append()
:
# add a line to the list of chatbot's responses
conversation2["generated_responses"].append("Hi, I'm a question-answering bot. Ask me a question!")
# add a line to the user input
conversation2["past_user_inputs"].append("How do I get the key?")
# show the conversation
conversation2
We can get the chatbot's responses to the question, and print it out:
# get the question from the conversation history
question = conversation2["past_user_inputs"][-1]
question
# get chatbot's response to the question given the context
qa_model(question=question, context=context_large)
# we only want the 'answer'
response = qa_model(question=question, context=context_large)["answer"]
response
# format the answer so the text looks ike a sentence - capitalize the first word and add a period at the end
response = response.capitalize() + "."
response
Finally, add the chatbot's response to the conversation history:
# add response to conversation history
conversation2["generated_responses"].append(response)
# show results
conversation2
If needed, we can show the back-and-forth conversation:
# set counters - used into index into lists
i = 0
j = 0
while i < len(conversation2["generated_responses"]):
# print bot response
print("Bot: " + conversation2["generated_responses"][i])
if j < len(conversation2["past_user_inputs"]):
# print user input
print("User: " + conversation2["past_user_inputs"][j])
# increment counters
i += 1
j += 1
Fill-Mask Models
"Masked language modeling is the task of masking some of the words in a sentence and predicting which words should replace those masks. These models are useful when we want to get a statistical understanding of the language in which the model is trained in." - HuggingFace Guide on Fill-Mask Tasks
For our video game, we will make an NPC that fills in the blanks of a sentence using a fill-mask model.
We will use the distilroberta-base model, the default model for task='fill-mask'
when using a pipeline
.
Let's set up the model:
distilroberta-base
is the default model for this task, we do not need to specify the model and tokenizer. This is done so anyways because it is the conventional way of loading a model into production.
# import libraries
from transformers import pipeline, AutoTokenizer, AutoModelForMaskedLM
# set up model and tokenizer
fm_tokenizer = AutoTokenizer.from_pretrained("distilroberta-base")
fm_model = AutoModelForMaskedLM.from_pretrained("distilroberta-base")
# create chatbot
fm_chatbot = pipeline(task="fill-mask", model=fm_model, tokenizer=fm_tokenizer)
Now, let's test it out. Give the model a sentence containing the mask token ("<mask>"
) in the place of a missing word:
# input sentence with missing word
sentence = "Paris is the <mask> of France."
# get the model's output
result = fm_chatbot("Paris is the <mask> of France.")
result
The model returned five sentences containing the five words most likely to fill in the mask.
We can select only certain results if needed:
# show all model results from the sentence "Paris is the <mask> of France."
result
# show only the most likely sentence - the one with the highest score
result[0]['sequence']
We can also give a summary of the most likely tokens by iterating through the model's output:
# counter for indexing into the model's output
i = 0
# string to hold the most likely words
print_string = ""
while i < len(result):
# if we have reached the last word, insert a period
if i == len(result) - 1:
print_string += result[i]['token_str'] + "."
# otherwise, insert a comma and space
else:
print_string += result[i]['token_str'] + ", "
i += 1
# show results
print("The most likely words are:" + print_string)
Model Caveats
Since this model gets its data from the Internet, the output of certain phrases may include harmful stereotypes. A good example is shown in the Bias, Risks, and Limitations section of the model's information page, where the creators compare the model's responses to the phrases, "The man worked as a <mask>
", and, "The woman worked as a <mask>
".
Text-Generating Models
"Generating text is the task of producing new text. These models can, for example, fill in incomplete text or paraphrase." - HuggingFace Guide on Text Generation Tasks
For our video game, we will use the text-generating model gpt2 to complete the phrase, "Once upon a time,".
Let's set up the model and tokenizer, and pass them to the pipeline
object along with task="text-generation"
:
from transformers import AutoTokenizer, AutoModelForCausalLM, set_seed
# set up model and tokenizer
tg_tokenizer = AutoTokenizer.from_pretrained("gpt2")
tg_model = AutoModelForCausalLM.from_pretrained("gpt2")
# create chatbot
tg_chatbot = pipeline(task="text-generation", model=tg_model, tokenizer=tg_tokenizer, do_sample=True)
Above, we set do_sample=True
. This is not required for text generation, but it enables various decoding strategies when new text is generated. From the HuggingFace transformers documentation:
"do_sample: if set to True, this parameter enables decoding strategies such as multinomial sampling, beam-search multinomial sampling, Top-K sampling and Top-p sampling. All these strategies select the next token from the probability distribution over the entire vocabulary with various strategy-specific adjustments."
Next, let's ask the model to complete a story starting with "Once upon a time,". Note that we since the generation relies on randomness, we need to set a seed for reproducibility:
# set the seed for reproducibility
set_seed(50)
# have the model fill in the story
story = tg_chatbot("Once upon a time,")
# show results
story
To only see the generated text, do the following:
# get only the string of generated text
story_text = story[0]['generated_text']
# show results
story_text
To continue the story, we can take this output and input it back into the model:
# use the previous output as the new input for the model
story2 = tg_chatbot(story_text)
# get only the string of generated text
story2_text = story2[0]['generated_text']
# show results
story_text
What happened here? It looks like no new text was added.
By default, the model has a max_length
of 50 output tokens (words), including the input. To fix this, we could do one of two things:
- Increase
max_length
: A good short-term solution, but not useful if we want to keep expanding on the same text, since this number includes the input text. - Set the
max_new_tokens
: Controls the maximum number of new words the model generates, not including the input text. A good long-term solution if we want the model to continue expanding one block of text.
# Try again, but this time using max_new_tokens
story2 = tg_chatbot(story_text, max_new_tokens=20)
# get only the string of generated text
story2_text = story2[0]['generated_text']
# show results
story2_text
Model Caveats
In production, using gpt2 to continously expand on the same block of text can result in the model giving the same output after a certain number of iterations. This may look like:
Iteration 1:
> Input:"Once upon a time,"> Output:"Once upon a time, there was a snake"
Iteration 2:
> Input:"Once upon a time, there was a snake"> Output:"Once upon a time, there was a snake in the garden"
Iteration 3:
> Input:"Once upon a time, there was a snake in the garden"> Output:"Once upon a time, there was a snake in the garden in the garden"
As a result, we may need to have the option to reset the story when using this model in production.
# Restarting the story with the same model, same seed, and original input will generate the same result as before
# set the seed for reproducibility
set_seed(50)
# have the model fill in the story
story = tg_chatbot("Once upon a time,")
# show results
story
References and Companion Files
References:
-
Education at Bat: Seven Principles for Educators: https://www.gse.harvard.edu/news/uk/09/01/education-bat-seven-principles-educators
-
HuggingFace:
- Website: https://huggingface.co/
- Quicktour: https://huggingface.co/docs/transformers/quicktour
- Pipelines API Reference: https://huggingface.co/docs/transformers/main_classes/pipelines
- Models Page: https://huggingface.co/models
- Transformers Documentation: https://huggingface.co/docs/transformers/generation_strategies#:~:text=do_sample%20%3A%20if%20set%20to%20True,with%20various%20strategy%2Dspecific%20adjustments.
-
Conversational Models:
- HuggingFace Guide on Conversational NLP Tasks: https://huggingface.co/tasks/conversational
- facebook/blenderbot-400M-distill Model Card: https://huggingface.co/facebook/blenderbot-400M-distill?text=Hi
- Blenderbot tutorial video: https://www.youtube.com/watch?v=FfywuRCPmqY
- Blenderbot tutorial GitHub: https://github.com/nicknochnack/Blenderbot/blob/main/Blenderbot-Tutorial.ipynb
- What are Attention Masks? by Luke Salamone: https://lukesalamone.github.io/posts/what-are-attention-masks/
-
Question-Answering Models:
- HuggingFace Guide on Question-Answering Tasks: https://huggingface.co/tasks/question-answering
- distilbert-base-cased-distilled-squad Model Card: https://huggingface.co/distilbert-base-cased-distilled-squad
- t5-base Model Card: https://huggingface.co/t5-base
-
Fill-Mask Models:
- HuggingFace Guide on Fill-Mask Tasks: https://huggingface.co/tasks/fill-mask
- distilroberta-base Model Card: https://huggingface.co/distilroberta-base
- Bias, Risks, and Limitations of the distilroberta-base Model: https://huggingface.co/distilroberta-base#bias-risks-and-limitations
-
Text-Generation Models:
- HuggingFace Guide on Text Generation Tasks: https://huggingface.co/tasks/text-generation
- gpt2 Model Card: https://huggingface.co/gpt2?text=Once+upon+a+time%2C
- ChatGPT-at-Home GitHub Repository: https://github.com/Sentdex/ChatGPT-at-Home/blob/main/app.py
Companion Files:
- Public Repository: Intro-to-Pygame-and-AI
- A pygame-based video game with AI: ai_game.py
- Video game playthrough (may require downloading): ai_game_playthrough.mp4
- Natural language processing models used in ai_game.py: chat_models.py
- Intro to Pygame tutorial: Intro to Pygame: Pygame basics for your first video game!
- A simple pygame example: simple_pygame.py