Comparative Study of Large Language Model Evaluation Frameworks

As part of my capstone project in the Master’s in Data Science program at the University of Virginia, this research evaluates various LLM evaluation frameworks, emphasizing bias detection, response quality assessment, and robustness testing. The study leverages multiple datasets and methodologies to benchmark state-of-the-art approaches for ethical and reliable AI assessment.

February 2025 · Afnan Alabdulwahab

Understanding DeepEval's Bias Evaluation Methodology

This blog post explores the three-stage bias detection process in DeepEval, an LLM-based evaluation system that quantifies bias in AI-generated text. The methodology leverages structured validation, templated prompts, and a scoring framework to assess bias across multiple categories.

February 2025 · Afnan Alabdulwahab

Accelerating Transformer Attention with Custom CUDA Kernels

As part of my GPU Architectures course, this project explores profiling and optimizing attention mechanisms in transformers using custom CUDA extensions. The focus is on reducing inference and training latency through kernel-level enhancements, improving GPU resource utilization for deep learning workloads.

February 2025 · Afnan Alabdulwahab

Speech Emotion Recognitiong

As part of my Deep Learning course, this project explores the use of convolutional and recurrent neural networks for Speech Emotion Recognition (SER). Using the RAVDESS and TESS datasets, we train models to classify emotions from audio signals, aiming to improve human-computer interaction, mental health applications, and AI-driven affective computing.

February 2025 · Afnan Alabdulwahab