Comparative Study of Large Language Model Evaluation Frameworks
As part of my capstone project in the Master’s in Data Science program at the University of Virginia, this research evaluates various LLM evaluation frameworks, emphasizing bias detection, response quality assessment, and robustness testing. The study leverages multiple datasets and methodologies to benchmark state-of-the-art approaches for ethical and reliable AI assessment.