@ AAAI 2025
Fourth Workshop on Multimodal Fact Checking and Hate Speech
Detection
February, 2025
About : CT2 - AI Generated Text Detection
Important News Datasets Released:
CT2 - AI Generated Text Detection
Colab: https://codalab.lisn.upsaclay.fr/competitions/20330Form:https://forms.gle/bQcnFWv1dUoeMwZbA
Dataset:
To better understand which types of LLM-generated content are easier or harder to detect, we will include the following families of LLMs:
- Encoder models: (e.g., BERT, DeBERTa)
- Open-Source: (e.g., Llama 3.1, Yi-2)
- Closed-Source: (e.g., Claude 3.5 Sonnet, GPT 4.0/mini)
- SLMs: (e.g., Phi-3.5)
- Mixture of Experts (MoEs) (e.g., Mixtral)
- State Space Models (SSM) (e.g., Falcon-Mamba)
We will be releasing 50K data samples for this task. The data will be structured such that each prompt will have a human-written story and corresponding parallel generations from all the included LLMs. A snapshot of the data can be viewed here.
Tasks:
CT2 for text will consist of two sub-tasks:
- Task A: This is a binary classification task where the goal is to determine whether each given text document was generated by AI or created by a human.
- Task B: Building on Task A, this task requires participants to identify which specific LLM generated a given piece of AI-generated text. For this task, participants will know that the text is AI-generated and must predict whether it was produced by models such as GPT 4.0, DeBERTa, FalconMamba, Phi-3.5, or others.
Baseline:
Several systems for detecting AI-generated text, such as GPT-2 Output Detector, GLTR, and GPTZero, have recently emerged. However, these proprietary tools do not disclose their detection methods. In academic research, various techniques have been proposed to identify AI-generated text, including:
- Perplexity estimation
- Burstiness estimation
- Negative log-likelihood curvature
- Stylometric variation
- Intrinsic property based approaches
- Classifier-based approaches (RADAR, Conda)
A new interesting technique proposed at ICML 2024 finds that LLMs are more inclined to modify human-written text than AI-generated text when tasked with rewriting. This approach, named the geneRative AI Detection viA Rewriting method (Raidar), will be used as the baseline for both Tasks A and B.
The authors highlight character deletions in red and character insertions in orange. Their findings indicate that human-generated text generally prompts more modifications compared to machine-generated text when rewritten.
IMPORTANT DATES
- 25 October 2024 : Release of the validation set.
- 8 November 2024 : Release of the test set.
- 30 November 2024 : Deadline for submitting the final results.
- 3 December 2024 : Announcement of the results.
- 10 December 2024 : System paper submission deadline (All teams are invited to submit a paper).
- 20 December 2024 : Notification of system papers.
- 25 December 2024 : Camera ready submission.
ORGANIZING COMMITTEE CHAIRS
Dr. Amitava Das:
Dr. Amitava Das is a Core Faculty & Research Associate Professor of the
Artificial Intelligence Institute, at the University of South Carolina, and an
Advisory Scientist to Wipro AI.
Research interests : Code-Mixing
and Social Computing.
Organizing Activities [selective] :
• Memotion @SemEval2020 • SentiMix @SemEval2020 • Computational Approaches to
Linguistic Code-Switching @LREC 2020 • CONSTRAINT @AAAI2021
Dr. Amit Sheth:
Dr. Amit Sheth is the founding Director of the Artificial Intelligence
Institute, and a CSE Professor at
University of South Carolina.
Research interests : Knowledge
Graph, NLP, Analysing Social
Media
Organizing Activities [selective] :
• Cysoc2021 @ ICWSM2021 • Emoji2021 @ICWSM2021 • KiLKGC 2021 @KGC21
Aman Chadha
Aman Chadha is an Applied Sci-
ence Manager at Amazon Alexa AI
and a Researcher at Stanford AI.
Research interests : Multimodal AI, On-device AI, and
Human-Centered AI.
Vasu Sharma
Vasu Sharma is an Applied Research Scientist at FAIR (Meta AI).
Research interests : LLMs, Multimodal AI, Data Curation and Efficiency, and Generative AI.
Aishwarya Naresh Reganti
Aishwarya Naresh Reganti is an Applied Scientist at AWS Generative AI Innovation Center.
Research interests : NLP, Artificial Social Intelligence, Graph Learning.
Vinija Jain
Vinija Jain has worked in GenAI, NLP, and RecSys domains at Amazon, Oracle, and PANW.
Research interests : NLP, Recommender Systems.
ASSOCIATE ORGANIZERS
CONTACT US
- amitava.santu@gmail.com
- royrajarshi0123@gmail.com (For queries related to Codalab)
- parthpatwa@ucla.edu
- defactifyaaai@gmail.com