GPT vs. Llama2: Which Comes Closer to Human Writing in Text Generation?
Published in EDM 2024, 2024
Large Language Models (LLMs) have prompted widespread application across diverse domains. In some applications, human-like quality in output is essential for optimal user experience and credibility. This is particularly evident in applications such as Chatbots. Conversely, concerns arise regarding LLM use in contexts where human authenticity is crucial, notably in higher education with materials like Statements of Purpose and Letters of Recommendation. Despite extensive research in this area, accurately distinguishing between human and LLM-generated content remains challenging. This study conducts a comparative analysis between two leading LLMs, GPT and Llama, evaluating their output’s resemblance to human writing through vocabulary and structure analysis. Additionally, we apply classification models to detect human vs. LLM-generated content, with higher accuracy signaling deviations from human-like writing.