From Text to Test: Automated MCQ Generation Using NLP
Different scores for keywords extracted by the model vs the human.

From Text to Test: Automated MCQ Generation Using NLP


INTRODUCTION

In the ever-evolving era of education where AI has become a crucial part, assessing the students effectively is a paramount. However, manually creating high-quality multiple choice questions (MCQs) can be a time-consuming and challenging task for educators. Therefore, to address this issue, I have come across a research paper "An Automated Multiple Choice Generation Using NLP Techniques" with the consultation of which I have created a project to automate this entire process. The approach followed not only aligns the creation of MCQs but also ensures the generation of diverse and challenging questions from a given lesson text, saving educators a significant time while enhancing the overall learning experience. So, in this article, I am going to walk you through the journey of developing this project, highlighting the methodology and implementation details.


Methodology

By common sense, when MCQs are needed to be prepared from a certain piece of text, we look for the most important and relevant keywords by replacing which we can create the MCQs. Same analogy has been applied in NLP to automate the process of creating the MCQs.

Before embarking on a journey of implementing this project, it is important to make yourself clearer as to what should be done in order to achieve a specific goal out of a particular use-case. Following are the high level steps that are used in the development of this project.

  1. Text Collection
  2. Sentence Separation
  3. Noise Removal and Word Normalization
  4. Keyword Extraction
  5. MCQ generation


IMPLEMENTATION

Text Collection:

The first step involves collecting relevant text materials that will serve as the basis for generating MCQs. These materials can include textbooks, lecture notes, articles, and other educational resources. The quality and comprehensiveness of the collected text directly impact the relevance and accuracy of the generated questions.

As an example, I have taken the text from ChatGPT to save time. The fetched text is converted to lower-case for making the processing convenient.

text = "Modern operating systems, such as Windows, macOS, and Linux, offer a wide range of features and functionalities to cater to different user needs. Windows, developed by Microsoft, is known for its user-friendly interface and extensive software compatibility. It is widely used in personal computers and enterprise environments. macOS, created by Apple, is renowned for its sleek design, robust security features, and seamless integration with other Apple products. It is the preferred choice for creative professionals and those who value a polished user experience."

text = text.lower()        


Sentence Separation:

Once the text materials are collected, the next step is to divide the text into individual sentences. This process, known as sentence segmentation, is essential for isolating meaningful units of information. Sentence separation helps in the subsequent steps of keyword extraction and question formation by breaking down the text into manageable pieces.

For this purpose, Spacy library has been used which is demonstrated as follows:

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
sentences  = []
for sent in doc.sents:
    print(sent.text)
    sentences.append(sent.text)        

The above code, makes a separate list of the sentences in the text. This list will be used for further processing of the text. Output list looks something like this:

Article content
Segregated Sentences


Noise Removal and Word Normalization:

Text data often contains various forms of noise, such as punctuation, special characters, stop words, and irrelevant information. Noise removal means cleaning the text to eliminate these unwanted elements. Word normalization, in the context of the referenced research paper, involves transforming words into a standard format. This includes lemmatization (grouping inflected forms of a word together). These preprocessing steps ensure that the text is in a uniform format, which enhances the accuracy of keyword extraction and subsequent NLP tasks.

The preprocess function helps to pre-process the text. Here, this function will be used for every sentence in the sentence list created in the previous steps.

def preprocess(text):
  filtered = []
  for token in nlp(text):
    if token.is_stop or token.is_punct:
      continue
    filtered.append(token.lemma_)
  return " ".join(filtered)

filtered_sentences = [preprocess(sent) for sent in sentences]         

The loop in the above code iterates through every token in the given text and filter out the stop-words or punctuation marks and resultantly lemmatize the text. The last line applies the function on every sentence in the sentences list and adds to the new filtered_sentences list. Thus, giving the processed sentences of the original text.

The processed text makes it very much easier to extract the keywords for generating the MCQs based on the important concepts relevant to the lesson text.

Article content
Processed segregated sentences


Keyword Extraction:

Here comes the most important part i.e., extracting the important keywords from the text document to generate the MCQs based on the most important and relevant information from the given lesson-text.

For the extraction, TF-IDF (Term Frequency - Inverse Document Frequency) is used. This statistical measure evaluates the importance of a word in a document relative to a collection of documents. It helps identify keywords that are both frequent and unique within the given text and creates a vector of the words in a sentence. Each word is represented by a "weight" which basically shows the importance of that word in the text document.

Scikit-Learn's "TfidfVectorizer" class is used to construct these vectors as shown in the code:

from sklearn.feature_extraction.text import TfidfVectorizer
v = TfidfVectorizer()
vectorized_sentences = v.fit_transform(filtered_sentences)        

What this code-snippet does is that it first initializes the TfidfVectorizer object and then it fits the filtered_sentences list entries. What that means is that the vectorizer takes all of the words (or keywords) as the vocabulary to transform each sentence into a vector. This vocabulary can be seen using the "vocabulary_" function as shown:

Code:
print(v.vocabulary_)

Output:
{'modern': 23, 'operating': 26, 'system': 39, 'window': 44, 'macos': 21, 'linux': 20, 'offer': 25, 'wide': 42, 'range': 32, 'feature': 14, 'functionality': 16, 'cater': 1, 'different': 9, 'user': 40, 'need': 24, 'develop': 8, 'microsoft': 22, 'know': 19, 'friendly': 15, 'interface': 18, 'extensive': 13, 'software': 38, 'compatibility': 3, 'widely': 43, 'personal': 27, 'computer': 4, 'enterprise': 10, 'environment': 11, 'create': 5, 'apple': 0, 'renowne': 33, 'sleek': 37, 'design': 7, 'robust': 34, 'security': 36, 'seamless': 35, 'integration': 17, 'product': 30, 'preferred': 29, 'choice': 2, 'creative': 6, 'professional': 31, 'value': 41, 'polished': 28, 'experience': 12}
        

The output of the above code is the dictionary with the keyword as the key and index of that keyword as its value. We can also get the list of the keywords using the "get_feature_names_out()" which basically returns a numpy array containing only the names of keywords, each placed at the index mentioned in the above dictionary.

Code:
feat_names = v.get_feature_names_out()

Output:
array(['apple', 'cater', 'choice', 'compatibility', 'computer', 'create',
       'creative', 'design', 'develop', 'different', 'enterprise',
       'environment', 'experience', 'extensive', 'feature', 'friendly',
       'functionality', 'integration', 'interface', 'know', 'linux',
       'macos', 'microsoft', 'modern', 'need', 'offer', 'operating',
       'personal', 'polished', 'preferred', 'product', 'professional',
       'range', 'renowne', 'robust', 'seamless', 'security', 'sleek',
       'software', 'system', 'user', 'value', 'wide', 'widely', 'window'],
      dtype=object)        

Now that we have the array of keywords placed at their respective indices, we shall now create a list for scores or weight of each of these keywords stored at the same index as the keyword but in a separate list as shown:

Code:
scores = []
for word in feat_names:
  index_v = v.vocabulary_.get(word)
  scores.append(v.idf_[index_v])        

Here, the "idf_" function takes the index of a word and returns the weight corresponding to each word.

Since, the main goal is to choose top-n keywords for creating n MCQs, therefore, it is required that the scores list as well as the feat_names array must be sorted in descending order, for that bubble sort is used.

Code:
n = len(feat_names)

for i in range(n):
  for j in range(0, n-i-1):
    if scores[j] < scores[j+1]:
      scores[j], scores[j+1] = scores[j+1], scores[j]
      feat_names[j], feat_names[j+1] = feat_names[j+1], feat_names[j]        

It can be verified as follows:

Code:
n = len(feat_names)
for i in range(n):
  print(feat_names[i], "|", scores[i])

Output:
apple | 2.09861228866811
cater | 2.09861228866811
choice | 2.09861228866811
compatibility | 2.09861228866811
computer | 2.09861228866811
create | 2.09861228866811
creative | 2.09861228866811
design | 2.09861228866811
develop | 2.09861228866811
different | 2.09861228866811
enterprise | 2.09861228866811
environment | 2.09861228866811
experience | 2.09861228866811
extensive | 2.09861228866811
friendly | 2.09861228866811
functionality | 2.09861228866811
integration | 2.09861228866811
interface | 2.09861228866811
know | 2.09861228866811
linux | 2.09861228866811
microsoft | 2.09861228866811
modern | 2.09861228866811
need | 2.09861228866811
offer | 2.09861228866811
operating | 2.09861228866811
personal | 2.09861228866811
polished | 2.09861228866811
preferred | 2.09861228866811
product | 2.09861228866811
professional | 2.09861228866811
range | 2.09861228866811
renowne | 2.09861228866811
robust | 2.09861228866811
seamless | 2.09861228866811
security | 2.09861228866811
sleek | 2.09861228866811
software | 2.09861228866811
system | 2.09861228866811
value | 2.09861228866811
wide | 2.09861228866811
widely | 2.09861228866811
feature | 1.6931471805599454
macos | 1.6931471805599454
window | 1.6931471805599454
user | 1.4054651081081644        

MCQ Generation:

At this point, all that is needed for MCQ generation is available, the only thing that is remaining is to get the top-n keywords as per their scores, and then map those keywords in the original sentence of the text, and finally replace that word with a "_____". Also note that the original answer is stored along with other options for multiple choices which are taken randomly from the keywords.

First, a class is created as follows:

class mcqQuestion:
  def __init__(self, ques, choices, correct_answer):
    self.ques = ques
    self.choices = choices
    self.correct_answer = correct_answer        

This class is used to store each multiple choice question as an object which can later be interpreted.

Finally, coming to the generation part, total number of MCQs which are to generated are specified by getting the top-n keywords, "n" in this case is 5. This implies, five MCQs will have to be generated in such a way that the sentences in which these keywords occur are presented as the MCQs with the keyword replaced with a blank space to be filled by the given choices.

Code:
num = 5
mcqs = feat_names[:num]
print(mcqs)

Output:
array(['apple', 'cater', 'choice', 'compatibility', 'computer'],
      dtype=object)        

Following function takes care of the mapping between the original sentences and the keywords which were extracted before (to be replaced as a blank space) as well as creating the MCQ from the sentence in which that particular keyword is present.

One bug that was faced was that if a keyword appeared in the same sentence multiple times, then it would show two blank spaces to be filled with a single word out of the choices given. That did not actually made sense. So as a simple solution, it was presumed that if a keyword is appearing multiple times in a sentence, then only one instance is considered. As a result to that, what was happening was that it created two separate questions exactly the same with the same instance of the keyword as blank. That is also taken care of by checking the Questions list and removing the duplicate questions, if any.

import numpy as np

Questions = []
for mcq in mcqs:
  for doc in filtered_sentences:
    if mcq in doc:
      fil_index = filtered_sentences.index(doc) # index of the doc containing that word

      for token in nlp(sentences[fil_index]):
        if token.is_stop or token.is_punct:
          continue
        if token.lemma_ == mcq:
          correctAns = token.text
          ques = sentences[fil_index].replace(token.text, "____", 1)
          choices = np.array([])
          while len(choices) < 3:
            option = np.random.choice(feat_names, 1, replace=False)
            if option not in choices and not np.equal(option, correctAns):
              choices = np.append(choices, option)

          choices = np.append(choices, correctAns)
          np.random.shuffle(choices)

          objectQues = mcqQuestion(ques, choices, correctAns)

          for q in Questions:
            if q.ques == objectQues.ques:
              break
          else:
            Questions.append(objectQues)        

Now that the questions are prepared, let's check the list and see the question, choices and the correct answer:

Code:
for ques in Questions:
  print(ques.ques)
  print(ques.choices)
  print(ques.correct_answer)
  print("\n")

Output:

macos, created by ____, is renowned for its sleek design, robust security features, and seamless integration with other apple products.
['interface' 'enterprise' 'preferred' 'apple']
apple


modern operating systems, such as windows, macos, and linux, offer a wide range of features and functionalities to ____ to different user needs.
['design' 'cater' 'security' 'microsoft']
cater


it is the preferred ____ for creative professionals and those who value a polished user experience.
['experience' 'compatibility' 'choice' 'creative']
choice


windows, developed by microsoft, is known for its user-friendly interface and extensive software ____.
['microsoft' 'seamless' 'compatibility' 'linux']
compatibility


it is widely used in personal ____ and enterprise environments.
['macos' 'seamless' 'computers' 'experience']
computers        

Hence, it is verified that exactly 5 unique MCQs objects have been created and stored in the list which when iterated, gives the actual question, list of available choices along with the correct answer.

Having the base model to generate the MCQs, it can be intregrated with a front-end framework/library such as streamlit or Flask.

I have implemented the front-end of this model in streamlit and also have deployed it in streamlit cloud and added an option for the user to specify the number of questions he/she wants to generate (for more details, feel free to check on my github repo).

Article content
Article content

CONCLUSION

In this article, we have explored the development and implementation of an automatic MCQ generator using advanced NLP techniques. This project showcases the potential of incorporating technology to streamline the creation of educational assessments, providing educators with a valuable tool to enhance their teaching processes. By automating the generation of multiple-choice questions, we can save time and ensure the production of high-quality, relevant questions that effectively evaluate student understanding.

I hope you find this project insightful and beneficial. Your feedback is incredibly valuable as it helps improve and refine the application further. I encourage you to explore the GitHub repository, try out the system, and share your thoughts and experiences. Whether it's suggestions for improvement, questions about the implementation, or insights on potential applications, I would love to hear from you.

Thank you for taking the time to read about this project. Please feel free to connect with me on LinkedIn, leave your feedback, or reach out directly.


GitHub Repository

Live App Link



Contact


Rameesha Asim

CUI ‘26 | CCI | NTDC | xIntern PEL

1y

Your engagement with the realm of Natural Language Processing is profoundly captivating.

Shannon Atkinson

DevOps & Automation Expert | Kubernetes, Docker, CI/CD Pipelines, Terraform | Cloud Specialist (AWS, Azure, GCP) | AI & ML Innovator | Patent Holder & Certified Jenkins Engineer

1y

Your exploration of NLP sounds incredibly intriguing. Can't wait to delve into your article.

To view or add a comment, sign in

Others also viewed

Explore topics