Links
Overview
Developed a transformer-based AI text detection model to distinguish between human and AI-generated academic writing, specifically engineered to address false positive bias in academic settings where students’ legitimate work was being wrongly flagged as AI-generated. Using RoBERTa with LoRA (Low-Rank Adaptation), achieved 99.6% accuracy while requiring only 0.82% trainable parameters compared to full fine-tuning approaches. Implemented a two-stage training methodology: initially trained on ~1,378 essays from Kaggle, then evaluated and fine-tuned on academic abstracts from the RAID dataset including adversarial examples designed to evade detection. Focused on minimizing unfair misclassification of real student work by applying aggressive 10:1 class weighting and optimizing for fairness-focused metrics like human accuracy and balanced accuracy, reducing false positives on human academic writing from 83.2% to 0.7% while maintaining 99.4% AI detection accuracy.
Next Steps
Building on our initial detection framework, I plan to continue this research independently to further explore the extent to which LLM-generated content can be effectively detected and contribute to the broader discussion on ethical AI usage in education.
Enhanced Detection Methods and Benchmarking
- Conduct thorough analysis of existing detection models and methods, expanding beyond our initial three-model comparison to include state-of-the-art approaches and commercial tools
- Implement the stylometric features identified in our future work, including perplexity and burstiness analysis, vocabulary richness metrics, and sentence complexity measurements to create more sophisticated hybrid detection models
- Explore integration of traditional linguistic features with our transformer-based approach to improve robustness and generalizability
Comprehensive RAID Dataset Evaluation
- Expand evaluation to the complete RAID test dataset beyond our focused subset of academic abstracts, testing across all domains including news articles, creative writing, technical documentation, and social media content
- Submit results to the RAID leaderboard to benchmark our approach against other state-of-the-art detection methods and establish comparative performance metrics
- Analyze performance variations across different text domains and LLM source models to understand generalization capabilities
Robustness and Real-World Application
- Evaluate detection performance against newer and evolving LLM architectures to ensure sustained effectiveness as AI text generation continues to advance
- Investigate adversarial robustness by testing against AI-generated content specifically designed to evade detection systems
- Develop practical implementation tools including web interfaces or API endpoints that educators could integrate into existing academic workflows
Ethical AI and Academic Integrity
- Conduct comprehensive fairness audits to ensure equitable performance across diverse student populations and writing styles, addressing potential biases in detection accuracy
- Implement explainable AI features to help educators understand detection decisions and support more informed academic integrity assessments
- Explore confidence scoring mechanisms that provide nuanced predictions rather than binary classifications, better serving real-world educational decision-making