Exploring the Secondary Indicators of Generative AI in Academic Misconduct

Anthony Summers; Thea Vanags

doi:10.47739/

Exploring the Secondary Indicators of Generative AI in Academic Misconduct

Review Article | Open Access | Volume 12 | Issue 1

Article DOI :

Anthony Summers^1* Thea Vanags²

^1. University of the Sunshine Coast, School of Health, Discipline of Nursing, Australia
^2. University of the Sunshine Coast, Lead Academic Integrity Unity, Australia

+ Show More - Show Less

Corresponding Authors

Anthony Summers, University of the Sunshine Coast, School of Health, Discipline of Nursing, Sippy Downs, Qld 4558, Australia, Tel: 0774565439

Abstract

Background: The increasing inappropriate use of GenAI tools has seen an increase in referrals to the Integrity and Compliance Unit of a regionaluniversity. There is no clear definitive characteristic that indicates this inappropriate use.

Aim: This audit of referrals to the Integrity and Compliance Unit aims to identify secondary characteristics commonly seen in students’ assessments that indicate a GenAI tool may have been used.

Design: A retrospective descriptive audit of anonymised student assessments referred to the Integrity Compliance Unit for investigation for potential academic misconduct.

Methods: A retrospective audit of anonymised student assessments was undertaken, looking for secondary characteristics of GenAI tool use. Each authorreviewed and agreed to the characteristics identified.

Results: The secondary characteristics commonly identified related to formatting, metadata, references and citations, and terminology and language. No one characteristic was definitive in confirming the use of GenAI.

Conclusions: No one characteristic can be used to define if GenAI has been used in an assessment. However, the greater the number of secondary characteristics identified, the higher the confidence an investigator can have that on the balance of probabilities a GenAI tool has been used. Using the secondary characteristics highlighted in this paper as a guide, investigators can be confident in identifying if GenAI has been used or not.

Keywords

• Generative AI, Academic Misconduct, Healthcare Education.

Citation

Summers A, Vanags T (2025) Exploring the Secondary Indicators of Generative AI in Academic Misconduct. Ann Nurs Pract 12(1): 1138.

INTRODUCTION

Generative Artificial Intelligence tools (GenAI) such as ChatGPT, Copilot and Gemini are being pushed as valuable tools within the modern workplace [1]. Their introduction necessitates instructing users on the ethical use of these tools. Universities may not be able to influence those currently in the workplace, but they can influence the future workforce [2,3]. Universities can do this by educating undergraduate students on the ethical use of GenAI tools.To educate students on the ethical use of GenAI tools, universities must understand how students currently view and use GenAI. Work by Summers et al. [4], highlighted that students saw a benefit to using GenAI ethically, but understood that if used unethically, the value of their degree is lost. The importance of their degree is lessened if a student cheats on assessments by using GenAI and does not do the work themselves [4]. This is not just in a high stakes assessment, it is in any assessment submitted for marking during their degree. Thus, to ensure that the value of a degree is not undermined, markers of assessments need to be able to detect unethical use of GenAI.How markers within the discipline of nursing at one university detected unethical use of GenAI was explored by Summers et al. (in press). Some of the common indicators these markers identified were the prose was robotic, there were no citations, changes in writing style or the voice of the writing, the text did not answer the question being asked, the response was generic or repetitive, and hallucinated references existed [5]. Once the assessment was deemed to have used GenAI unethically, it was referred to the University’s Integrity and Compliance Unit for further investigation. The role of the Integrity and Compliance Unit is to investigate the suspicions of markers and confirm if academic misconduct has occurred. The challenge of investigating these cases of alleged misconduct involving GenAI is obtaining enough evidence to be confident that the student had used GenAI in an unethical way. This is difficult because the GenAI scores provided by different software providers are subject to false positives and false negatives, and their accuracy has been called into question [6].As more assessments were referred to the Integrity and Compliance Unit for investigation in 2024, a robust way of deciding if academic misconduct had occurred was needed. This was required to allow the Integrity and Compliance Unit to have confidence in their determinations that academic misconduct has occurred, as an AI score alone is considered weak evidence.During the first half of 2024, the authors audited 700 assessments submitted to the Integrity and Compliance Unit to identify common secondary indicators of GenAI use. 50% of these referrals came from healthcare disciplines within the university. Business-related disciplines and computer study-related disciplines had referrals of 30%. The remaining 20% came from the other disciplines, including education, law and the arts. The largest single discipline for referrals was nursing.This was possible as the number of students being referred for academic misconduct was around 10% of the student population. It is worth noting that given these referrals occurred in the first half of 2024, Course Coordinators lacked confidence and knowledge in the detection of GenAI, and only referred students whose work returned relatively high GenAI scores on the detector tool used at the university (M = 61%, Mdn = 66.5%, Mode = 100%). The GenAI scores were seen as a primary indicator that GenAI may have been used. It was determined that reliance on this single primary indicator was unsafe, as many factors could influence this score. Therefore, the authors determined that secondary indicators were needed to help support the determination of genAI use. The secondary indicators identified by this audit fell into the following categories :

AI detection scores
Formatting
Metadata
Referencing and citations
Terminology and language

AI detection scores

Currently, two AI detection tools are used at this university. The first is the Turnitin detection tool which is visible to the Course Coordinators. The second is ZeroGPT, which is utilised by the Integrity and Compliance Unit, often as a secondary check when Turnitin scores are lower. Both tools provide varying results and are only used as a guide to the unethical use of GenAI. For example, test runs have shown great variance when using the Turnitin AI score, as students’ work may return a 0% detection score for GenAI but there will be a multitude of secondary indicators (false negative). Conversely, a student may have a Turnitin AI score of 100% and no secondary indicators.The Turnitin AI detection is based on predictable language patterns likely generated by AI [9]. It has a substantial database of text for “training” (approximately 20 years’ worth of text), in contrast to ZeroGPT [9]. Turnitin has stated it will correctly flag 84.2% of documents written by AI as AI-generated, with the caveat that at least 300 words of a text are needed to decide to make that evaluation [9]. ZeroGPT claims that it analyses text with “a series of complex and deep algorithms” validated and published in highly reputable papers; however, these citations are not listed on its website (https://www.zerogpt.com/). They claim an accuracy rate of “up to 98%”, and an error rate of “lower than 2%”. Saqib and Zia [7]. state that ZeroGPT has been trained on a variety of data through a complex deep learning model and can examine text at a sentence level. Saqib and Zia [7] evaluated four different types of text and the ability of Turnitin and ZeroGPT (and other software) to correctly identify GenAI text. They found for content created by humans, Turnitin and ZeroGPT detected the AI content as 0%; for content originally written by humans but subsequently paraphrased by AI, Turnitin (M=0%) and ZeroGPT (M=16.4%) failed to detect this as AI generated; for content built totally by AI, Turnitin (M= 80%) and ZeroGPT (M=78%) performed similarly. Finally, for content that was AI generated but included considerable background and contextual information, ZeroGPT (M=59.8%) outperformed Turnitin (M=18.4% skewed by 4 trials returning 0% and 1 trial returning 92%). These results suggest that there is still a great deal of variability in the accuracy of these tools. Therefore, when using these tools, they are only a guide to the potential GenAI and cannot be used as definitive proof that GenAI has been used.

Formatting

The way a document is formatted can provide many examples of secondary indicators of academic misconduct, not just GenAI misuse. When examining a document for secondary indicators of GenAI misuse, investigators need to be wary of not missing other forms of academic misconduct such as contract cheating. While some of the following may also exist in other forms of academic misconduct, the following common formatting issues were attributed to the unethical use of GenAI at this university in 2024:

Font inconsistencies. These are often subtle and easily missed. But equally, they can sometimes be the easiest way to pick up something untoward in the document. As the investigator scrolls through the document clicking randomly on text and citations with their visual focus on the style and font boxes in Word’s menu bar (rather than on the text being clicked on), subtle differences in font often not visible to the naked eye may be detected. See Figure 1 for an example of this. At other times, these inconsistencies are obvious and different fonts are seen in whole paragraphs/sections throughout the document.
Shaded background or underlining of text can be an artefact of copy/paste.
The reply prompt the student has used is left in the submitted document.
There is extensive, accurate and/or inaccurate, and inconsistent use of the variety of linguistically diverse international characters used in authors’ names.
There are paragraph or other formatting marks left within the text. These often remain because their appearance is less obvious when the text is reviewed, for example, a carriage return that falls on the far right of a line can be missed by students but can be seen by using the show/hide button in Word to display hidden characters, paragraph marker and tabs.
The reference list is in a different font to the remainder of the document and often formatted in a totally different way.
There is inconsistency in the formatting styles used throughout the document. For example, one paragraph is centre justified the next is right justified, or superscripts and subscripts are used in one area but not another.

Figure 1: Two Different, but Physically Similar, Fonts.

Metadata

When a paper is submitted using Word, the metadata of the document can provide some simple analytics about potential unethical GenAI usage. When looking at this metadata, identifying who the author of the document and the last modifier of the document are can provide the investigator with questions as to whether this is the student’s work. A name other than the student can be an indicator of nefarious deeds and prompt the investigator to ask questions as to the authorship of the document, including leading to more serious academic misconduct such as contract cheating.

The other aspect of the metadata to be explored is the editing time for the document. An exceptionally low editing time or an exceptionally high editing time can both be indicators of academic misconduct and potential GenAI use. Both should raise questions for the student to answer about how they produced the submitted document. For example, an editing time of >25 hours for a document created and last modified less than 24 hours earlier has been seen in some submitted documents. The reason for this is unclear and it is suspected that the document was created in one time zone and then emailed to a different time zone.

References and citations

Exploring the references and citations can be a time consuming process and in submissions where there are a vast number of both it can be tedious. However, this is often the most damning indication that a GenAI tool has been used unethically. These are also often the secondary indicators Course Coordinators highlight as subject matter experts in the topic of the assignment. They are aware of the common references and citations the student should be using, and often instinctively know when a reference or citation is incorrectly used.

The common referencing and citation errors identified in this audit included:

The year of publication is incorrect. GenAI tends to assign a publication later than the actual publication date. This appears to be more likely when the prompt given asks a GenAI tool to provide a referenced response from a stipulated date range, for example, “within the last 5 years”.
The use of references from fields unrelated to the student’s area of study; this is often a generic statement which is easily supported by literature within the student’s area of study.
Hallucinated references/quotes/citations. These can be easily identified by running them through plagiarism-checking software.
Multiple citations using the same common names as “Smith and Jones” or “Black and White” often indicate a hallucinated reference.
Indirect/secondary citations. GenAI tools usually and incorrectly attribute the ideas to the author of the paper it is citing, rather than to the original author of the idea. Therefore, there is a failure to acknowledge the original author(s) of the work because of attributing the work to the later author(s)
The full title of the article or publication is incorporated into sentences even though the citation is at the end of the statement. For example, “…are made clear by the study “Rates and Predictors of Conversion to Schizophrenia or Bipolar Disorder Following Substance-Induced Psychosis (Starzer et al., 2018)”.
Inconsistency in citations. In-text citations may give the publisher as the author and then elsewhere accurately provide the actual author.
The reference is very old and has now been superseded.
The use of “over-referencing”, such as three citations per statement.
An unusual or obscure citation is used for a common technical item from the discipline.
The use of unusual references. For example, a Master’s thesis from a remote university requires a login to their archive section to download it. When downloaded, many of these documents are not searchable until the format is modified to OCR.
An undergraduate assignment with many references, each of which is only used once. Or a small number of references that are used multiple times.
Common statements that are general and easily found in the common literature of the profession, yet the student has cited an obscure source.
Claims made in the text are not substantiated by the material in the reference cited.
Unsubstantiated claims, there is no in-text citation, yet the claims are common knowledge in the student’s chosen profession and, therefore, should be easy to substantiate.

Terminology and language

Each student has their own voice, and their writing will reflect this voice. Therefore, when reviewing assignments markers will see a consistency of how a student uses language. In courses where there are few students it may be possible for markers to become very familiar with individual students’ work, so when reading an assignment which is not consistent with their knowledge of the student, this is a red flag that academic misconduct may have occurred. In addition, research is starting to become available comparing linguistic similarities and differences in student work and generative AI text [8], and some of the items below are consistent with their findings (e.g., the use of gerunds). When exploring for potential GenAI use, the following secondary indicators were identified in terminology and language:

Switching between correct and incorrect acronyms for a term. GenAI tools do tend to be more accurate and student-generated acronyms are less accurate.
Contractions used are inconsistent or inappropriate. For example, GenAI will regularly contract “it is” to “it’s”, something students rarely do because they have been educated that in academic writing it is not appropriate to use contractions.
The use of a keyword repeatedly in every sentence of the paragraph. This keyword is often paired with a different word within the sentence. For example, “The works of scholars like x and y highlight the necessity for a cybersecurity strategy that is ethically informed and proactive. This strategy underscores a deep ethical commitment. Emphasising ethical considerations which require collective trust. Advocating for ethical complexities and its counterparts using ethical frameworks…”
An acronym or abbreviation is introduced and used. However, in subsequent places, and at random, the acronym or abbreviation is again “introduced”.
The use of phrases not common to the country the student is studying in. For example, the use of American terminology outside America. An example of this is the use of “Law Enforcement”; in Australia, it is more common to use the terminology “Police”.
The inconsistent use of spellings. For example, in one paragraph the use of the word “color” (American English) and later use of “colour” (Australian/ British English).
The use of gerunds (present participles – “ing” words) to start a sentence, for example, “Building a strong community…”. This is unusual for this generation and atypical of undergraduate writing.
The use of correlative conjunctions. For example, “not only … but also”. This is a formal grammatical structure that is rarely used by undergraduate students as it was more common in past generations.
Language structures typical of GenAI and atypical of undergraduate students (e.g., starting sentences with gerunds, conjunctions, imperatives and dependant clauses).
Typical GenAI word choices that are not used in everyday language. For example, words like “elucidate” “sequalae” “intricate” “delve” “landscape” “multifaceted”.
There is evidence of text-spinning. Uncommon words are used rather than the common normal words. For example, instead of “calculator”, “number adding machine” is used or “Taylor Swift” becomes “Taylor Quick”.

DISCUSSION

The use of GenAI tools by students in their assessments is a concern. By using a GenAI tool the student is not completing the required work by themself, they are relying on a tool to complete the work for them. Therefore, these students are failing to demonstrate that they have the required knowledge for their degree. It is through this assessment and demonstration of knowledge acquisition that universities are confident that the student has the required knowledge for their degree and skills to enter their chosen profession. In the healthcare setting, and in nursing, patients need to be able to trust their nurse, and if students have not demonstrated skills and knowledge in a trustworthy way, how can patients trust them?

In using a GenAI tool the student is denying the opportunity for the university to be confident in the award they are granting, thus devaluing the award. If the university cannot be confident that the student has earned the degree legitimately through their study the degree is invalid, and the student is not prepared to enter their chosen profession. Healthcare services will no longer be able to trust their employees to know what they are doing, putting patients at risk. Therefore, universities need to spend time and resources investigating those who are suspected of using GenAI tools to ensure that any awards that they offer are valid and that students are properly prepared for their chosen profession.

What this paper has highlighted is the detection of GenAI use in student assessments is complex. There is not one magic tool that is 100% reliable in detecting GenAI use. So, academics must rely on other methods to detect GenAI usage in student papers. This led to the Integrity and Compliance Unit of a regional university compiling a list of the common features that they found that indicated, on the balance of probabilities, a GenAI tool had been used by a student in a particular assessment. However, not one of these factors alone led to a determination the student had used a GenAI tool. There was a combination of factors highlighted above to make this determination. The greater the number of factors identified, the stronger the case against the student.

Students were provided with the evidence for the allegation of academic misconduct. This allowed the student to provide evidence that they did not use a GenAI tool. We found that 55% of students freely admitted culpability and the inappropriate use of a Gen AI tool. Of the remaining 45%, 40% were found on the balance of probabilities to have used a genAI tool. As this 40% could not offer any clear or reasonable explanation to defend themselves against the allegations and the findings of the investigation. The remaining 5% had their cases dismissed as the evidence found was inconclusive or the student was able to provide substance evidence that refuted what was found during the investigation phase. Careful consideration was given to any evidence provided by the student, and if there was a conflict between what the investigator found and the supporting evidence the student-produced, an independent subject matter expert was consulted, and a decision was made on their advice.

Despite the efforts of universities to encourage and support the ethical use of GenAI tools, and to discourage the unethical use of these tools, it appears that many students are currently progressing through their courses using GenAI unethically due to the time and effort that is required by both Course Coordinators and central academic misconduct investigation units in universities.

CONCLUSION

This paper has highlighted that although it is difficult to rely on checkers to identify if GenAI has been used in a student assessment. It is possible through careful examination of the paper to identify secondary characteristics that may indicate that GenAI has been used. Through an audit of assessments submitted to the Integrity and Compliance Unit at the university the authors were able to establish a list of secondary characteristics that possibly indicate that a GenAI tool had been used in an assessment.The secondary characteristics highlighted related to formatting, metadata, references and citations, and terminology and language. No one characteristic provided a definitive answer as to whether GenAI had been used or not. However, the more of the secondary characteristics identified in the assessment to likelihood, on the balance of probabilities, a GenAI tool had been used. When the investigators had deemed that this balance of probability threshold had been met was a student found in breach of academic integrity standards and the appropriate penalty applied.Universities support the use of GenAI tools in creative ways to help students learn and develop in their chosen careers. Yet, when used inappropriately GenAI tools hinder learning, as student fail to demonstrate the required learning needs for their assessment. Academics then need to spend time investigating and collating evidence to prove the inappropriate GenAI use, instead of spending this time supporting the growth of their students.

Annals of Nursing and Practice

Annals of Nursing and Practice

Annals of Nursing and Practice

Exploring the Secondary Indicators of Generative AI in Academic Misconduct

Abstract

Keywords

Citation

INTRODUCTION

DISCUSSION

CONCLUSION

REFERENCES

Journals

JSciMed

Journals

Exploring the Secondary Indicators of Generative AI in Academic Misconduct

Abstract

Keywords

Citation

INTRODUCTION

DISCUSSION

CONCLUSION

REFERENCES

Journals

Subscribe to Newsletters