Comparative Study of Large Language Model Evaluation Frameworks

As part of my capstone project in the Master’s in Data Science program at the University of Virginia, this research evaluates various LLM evaluation frameworks, emphasizing bias detection, response quality assessment, and robustness testing. The study leverages multiple datasets and methodologies to benchmark state-of-the-art approaches for ethical and reliable AI assessment.

February 2025 · Afnan Alabdulwahab

Understanding DeepEval's Bias Evaluation Methodology

This blog post explores the three-stage bias detection process in DeepEval, an LLM-based evaluation system that quantifies bias in AI-generated text. The methodology leverages structured validation, templated prompts, and a scoring framework to assess bias across multiple categories.

February 2025 · Afnan Alabdulwahab

Accelerating Transformer Attention with Custom CUDA Kernels

As part of my GPU Architectures course, this project explores profiling and optimizing attention mechanisms in transformers using custom CUDA extensions. The focus is on reducing inference and training latency through kernel-level enhancements, improving GPU resource utilization for deep learning workloads.

February 2025 · Afnan Alabdulwahab

List of Irregular Verbs Across Romance Languages

This dataset contains all irregular verbs in known Romance languages.

March 2013 · Patrick Fitzcarron O'Leary, Florianus Prinzel, Walter Schoeffler-Henschell, Detlev Amadeus Unterholzer, Dieter Vogelsang, Moritz-Maria von Igelfeld