Thesis Project Form
Title (tentative): Development and Application of Methods for Generating\Building Benchmark Datasets for Evaluating Language Models (LM)\Agentic AI in Pediatric Healthcare| Thesis advisor(s): Giacomini Mauro, Davide Cangelosi | E-mail: |
| Address: Via Opera Pia 13 | Phone: (+39) 010 33 56546 |
Description
Motivation and application domain
The use of Language Models and Agentic AI in pediatric healthcare is growing, supporting documentation, decision making, and data extraction. Pediatrics presents unique challenges: heterogeneous data, small samples, ethical constraints, and age-specific terminology. Existing benchmarks rarely reflect pediatric needs, making tailored datasets essential for safe, transparent, and robust evaluation of LMs and Agentic AI in clinical and research settings.
General objectives and main activities
This thesis aims to design, implement, and validate methods for creating benchmark datasets to evaluate LMs and agentic AI in pediatric clinical and research workflows. Activities include: reviewing LM/agentic AI benchmarking methods and pediatric-specific gaps; defining pediatric-relevant evaluation tasks, metrics, and quality assurance techniques; systematically collecting real and synthetic benchmarks; designing a multilingual framework for dataset generation using clinical registries, synthetic data, microtasking, and public datasets, with reproducibility and bias monitoring; constructing prototype datasets validated via consistency checks, inter-annotator agreement, and bias assessment; evaluating AI systems for accuracy, robustness, and variability across algorithms and pediatric subpopulations; and analyzing infrastructure needs for dataset management and deployment.
Training Objectives (technical/analytical tools, experimental methodologies)
During the thesis, the student will gain skills in pediatric AI benchmarking, clinical data modeling and annotation, understanding pediatric workflows and terminology, developing end-to-end dataset generation and LM evaluation pipelines, assessing infrastructure and governance constraints, evaluating LMs for robustness, errors, and bias, and applying decision-making frameworks for real-world clinical AI adoption.
Place(s) where the thesis work will be carried out: Gaslini Hospital - AI Unit
Additional information
Maximum number of students: 1