Gokul Srinivasagan

Allgemeine Infos Srinivasagan

While large closed-source LLMs perform reasonably well on widely used open-source programming languages and publicly available repositories, their effectiveness decreases in industrial environments. Companies rely on proprietary codebases, internal documentation, architectural models, and domain-specific artifacts that are largely absent from public training data. Consequently, directly applying generic LLMs often leads to limited reliability and contextual understanding. To make LLMs genuinely useful in such environments, smaller, adaptable models and agent-based workflows must be developed that can integrate structured and domain-specific knowledge in a controlled and reliable manner.
 
This project explores how structured software-engineering artifacts can enhance LLM-based systems for code understanding. Students will work with two datasets: a UML corpus containing heterogeneous diagrams and multilingual descriptions, and a dataset of vulnerability–fix pairs collected from programming-language repositories. The project focuses on two main challenges: (1) preparing high-quality structured datasets from noisy sources, including deduplication, normalization, multilingual processing, and semantic preservation, and (2) evaluating methods for integrating this information into LLM-based systems, such as parameter-efficient fine-tuning, retrieval-augmented generation, and agent-based approaches with external knowledge.
 
The project will follow an Agile/Scrum framework to ensure iterative development and integration. Students will be organized into a self-managing sub-teams for planning, tracking  and version controls. This structure also includes weekly sprints, milestone reviews and MVP demonstrations to manage and achieve the defined goals. Students will develop reproducible experimental pipelines and empirically analyze how structured knowledge improves the reliability and interpretability of LLM-supported analysis.
Weitere Kurse

After successful participation in the module courses Language and Text Comprehension you will be able to,

  • explain the basic features of speech and text comprehension
  • analyze and evaluate text and speech signals
  • classify existing applications and evaluate future developments
  • use basic speech/text algorithms to solve problems

formulate given and self-designed algorithms for speech and text processing formally and in the Python programming language.