@ AAAI 2025

Fourth Workshop on ​Multimodal Fact Checking and Hate Speech Detection
February, 2025


About : CT2 - AI Generated Text Detection

Important News Datasets Released:

CT2 - AI Generated Text Detection

Colab: https://codalab.lisn.upsaclay.fr/competitions/20330
Form:https://forms.gle/bQcnFWv1dUoeMwZbA


Dataset:

To better understand which types of LLM-generated content are easier or harder to detect, we will include the following families of LLMs:

  • Encoder models: (e.g., BERT, DeBERTa)
  • Open-Source: (e.g., Llama 3.1, Yi-2)
  • Closed-Source: (e.g., Claude 3.5 Sonnet, GPT 4.0/mini)
  • SLMs: (e.g., Phi-3.5)
  • Mixture of Experts (MoEs) (e.g., Mixtral)
  • State Space Models (SSM) (e.g., Falcon-Mamba)

We will be releasing 50K data samples for this task. The data will be structured such that each prompt will have a human-written story and corresponding parallel generations from all the included LLMs. A snapshot of the data can be viewed here.

Tasks:

CT2 for text will consist of two sub-tasks:

  1. Task A: This is a binary classification task where the goal is to determine whether each given text document was generated by AI or created by a human.
  2. Task B: Building on Task A, this task requires participants to identify which specific LLM generated a given piece of AI-generated text. For this task, participants will know that the text is AI-generated and must predict whether it was produced by models such as GPT 4.0, DeBERTa, FalconMamba, Phi-3.5, or others.

Baseline:

Several systems for detecting AI-generated text, such as GPT-2 Output Detector, GLTR, and GPTZero, have recently emerged. However, these proprietary tools do not disclose their detection methods. In academic research, various techniques have been proposed to identify AI-generated text, including:

  • Perplexity estimation
  • Burstiness estimation
  • Negative log-likelihood curvature
  • Stylometric variation
  • Intrinsic property based approaches
  • Classifier-based approaches (RADAR, Conda)

A new interesting technique proposed at ICML 2024 finds that LLMs are more inclined to modify human-written text than AI-generated text when tasked with rewriting. This approach, named the geneRative AI Detection viA Rewriting method (Raidar), will be used as the baseline for both Tasks A and B.

Raidar Example

The authors highlight character deletions in red and character insertions in orange. Their findings indicate that human-generated text generally prompts more modifications compared to machine-generated text when rewritten.

IMPORTANT DATES

  • 25 October 2024 : Release of the validation set.
  • 8 November 2024 : Release of the test set.
  • 30 November 2024 : Deadline for submitting the final results.
  • 3 December 2024 : Announcement of the results.
  • 10 December 2024 : System paper submission deadline (All teams are invited to submit a paper).
  • 20 December 2024 : Notification of system papers.
  • 25 December 2024 : Camera ready submission.

ORGANIZING COMMITTEE CHAIRS

Dr. Amitava Das:


Dr. Amitava Das is a Core Faculty & Research Associate Professor of the Artificial Intelligence Institute, at the University of South Carolina, and an Advisory Scientist to Wipro AI.

Research interests : Code-Mixing and Social Computing.

Organizing Activities [selective] : • Memotion @SemEval2020 • SentiMix @SemEval2020 • Computational Approaches to Linguistic Code-Switching @LREC 2020 • CONSTRAINT @AAAI2021

Dr. Amit Sheth:


Dr. Amit Sheth is the founding Director of the Artificial Intelligence Institute, and a CSE Professor at University of South Carolina.

Research interests : Knowledge Graph, NLP, Analysing Social Media

Organizing Activities [selective] : • Cysoc2021 @ ICWSM2021 • Emoji2021 @ICWSM2021 • KiLKGC 2021 @KGC21

Aman Chadha


Aman Chadha is an Applied Sci- ence Manager at Amazon Alexa AI and a Researcher at Stanford AI.

Research interests : Multimodal AI, On-device AI, and Human-Centered AI.

Vasu Sharma

Vasu Sharma

Vasu Sharma is an Applied Research Scientist at FAIR (Meta AI).

Research interests : LLMs, Multimodal AI, Data Curation and Efficiency, and Generative AI.

Aishwarya Naresh Reganti

Aishwarya Naresh Reganti

Aishwarya Naresh Reganti is an Applied Scientist at AWS Generative AI Innovation Center.

Research interests : NLP, Artificial Social Intelligence, Graph Learning.

Vinija Jain

Vinija Jain

Vinija Jain has worked in GenAI, NLP, and RecSys domains at Amazon, Oracle, and PANW.

Research interests : NLP, Recommender Systems.

ASSOCIATE ORGANIZERS

Rajarshi Roy

Rajarshi Roy

Kalyani Government Engineering College

Shwetangshu Biswas

Shwetangshu Biswas

National Institute of Technology Silchar

Ashhar Aziz

Ashhar Aziz

IIIT Delhi

Subhankar Ghosh

Parth Patwa

University of California Los Angeles (UCLA)

Subhankar Ghosh

Subhankar Ghosh

Washington State University

Shashwat Bajpai

Shashwat Bajpai

BITS Pilani Hyderabad Campus

Shreyas Dixit

Shreyas Dixit

Vishwakarma Institute of Information Technology

Shreyas Dixit

Nilesh Ranjan Pal

Kalyani Government Engineering College

Kapil Wanaskar

Kapil Wanaskar

San José State University, California, USA

Nasrin Imanpour

Nasrin Imanpour

University of South Carolina

Gurpreet Singh

Gurpreet Singh

IIIT Guwahati (2022)

CONTACT US