Can chatbots really pass the Watson Glaser?

Abigail Clayton
Mar 8, 2023
4 min read

Updated: Jun 7, 2023

At GFB we are always assessing the impact of new technologies on the capability and security of the testing solutions we provide. This is something that all test publishers are constantly reviewing as we try to minimize the possibility for cheating on any assessments - be that SHL’s Verify or the Watson Glaser. New technology brings with it both challenges and opportunities, and we aim to make improvements to address these whenever they are needed.

Remote testing anti-cheating measures

The remote testing of candidates has always drawn some concern – such as considering if the person taking the test is the person who has been asked to take it – however, with well-designed and correctly administered assessments the pros have always outweighed the cons and we would suggest this remains the case.

It is worth considering the anti-cheating measures that are already in place with the Watson Glaser:

The Watson Glaser is 'item banked', which means that each candidate is presented with a test made up of 40 questions, drawn from a large item bank. This means no two candidates will be presented with exactly the same test or set of items, dramatically reducing the potential for cheating.
Clients can opt to use remote proctoring to ensure the candidate’s identity which means individuals complete the assessment in a controlled environment.
A verification test can be conducted at the final stages of the selection process in a controlled environment. The verification test can be randomised. It is also made clear to candidates throughout the process that this will happen if they reach the final stage.

Chatbot’s variable answers

Due to concerns around the increasing capability of chatbots recently we have conducted a review of their responses to a selection of Watson Glaser questions. Our findings suggest that although such technology has been improving, the accuracy of responses has yet to be proven. In our testing of the responses, with a number of programmes, to practice questions (where answers are provided) the conclusions drawn are not always correct as a number of the questions do require a level of understanding that such bots do not possess. An example of such a question and response from one of the most sophisticated chat bots we tested is below:

A study of vocabulary growth in children from ages eight months to six years old shows that the size of the spoken vocabulary increases from zero words at the age eight months to 2,562 words at age six years.

The example question asks "Does the proposed conclusion follow: None of the children in this study had learned to talk by the age of 6 months."

This conclusion does follow since, according to the statement, the size of the spoken vocabulary at eight months was zero words. However the chatbot responds:

The proposed conclusion that "none of the children in this study had learned to talk by the age of 6 months" cannot be logically inferred from the information provided in the argument.

The argument states that the size of the spoken vocabulary in children increases from zero words at the age of eight months to 2,562 words at age six years. However, the argument does not provide any information about the children's ability to talk at the age of six months. Therefore, the proposed conclusion cannot be logically derived from the given information, and it cannot be considered a valid conclusion based on the evidence provided in the argument.

Technology advancements in test security

As well as constantly reviewing new technologies we have also been implementing our own advancements.

At the beginning of this year, to further reduce the opportunity for candidates to utilise the unreliable answers of chatbots, updates were made to the Watson Glaser test to ensure that individuals are not able to cut and paste questions from any sections of the test into such programmes. Questions would need to be typed directly into such technology to reveal resulting answers - which may or may not be right – and within the set time limits of the assessment being undertaken.

Technology advancements will also allow us to track a user’s response pattern and how many times a user’s window loses focus (i.e. if someone switches tabs etc) both of which can help identify candidates to be flagged for further verification assessments.

No pass for the chatbot

The other thing to note is that in order to answer the above question as to whether chatbots can pass the Watson Glaser, the idea of a Watson Glaser pass requires definition. Trained users of the Watson Glaser will be aware that an individual’s responses need to be compared against a norm group to determine if they have performed well. As accredited psychologists will know, it is the percentile scores not percentages that are important to consider here. The Watson Glaser is renowned for the higher calibre nature of those being assessed - with such strong comparison groups and the unreliable nature of the responses of chatbots, we would suggest the likelihood of a candidate using such a tool to successfully secure a suitable score to pass any such assessment would be remote.

Conclusion

At GFB we maintain our ambition to give candidates the opportunity to demonstrate their capabilities whist keeping the possibility of cheating to a minimum. Our ongoing research continues to suggest that by utilising well-designed and correctly administered assessments, remote testing continues to be a fair and accessible way of doing so.

We are happy to speak to any organisations wishing to ensure they are testing candidates in the most accurate, fair and reliable way. If you would like to discuss this further do get in touch.

1 comentário

Abigail Clayton

13 de mar. de 2023

Since writing this article heard a really interesting documentary by Lara Lewington on BBC Radio4 on ChatGPT and it's possibilities and problems. It's well worth a listen:

https://www.linkedin.com/feed/update/urn:li:activity:7039607620499861504/

Curtir