This study evaluates one particular type of hallucination produced by ChatGPT-3.5 and ChatGPT-4: fabricated bibliographic citations that do not represent actual scholarly works. Data was compiled on the 636 bibliographic citations found in 84 papers produced by the two versions of ChatGPT. Results showed that 55% of the GPT-3.5 citations were fabricated, compared to 18% of the GPT-4 citations. Additionally, 43% of the real (non-fabricated) GPT-3.5 citations and 24% of the real GPT-4 citations included substantive citation errors. Although GPT-4 is a major improvement over GPT-3.5, problems remain.
