no code implementations • 27 Jun 2025 • Junho Myung, Yeon Su Park, Sunwoo Kim, Shin Yoo, Alice Oh
Evaluating the performance and biases of large language models (LLMs) through role-playing scenarios is becoming increasingly common, as LLMs often exhibit biased behaviors in these contexts.
no code implementations • 13 Mar 2025 • Shin Yoo, Robert Feldt, SoMin Kim, Naryeong Kim
We propose the idea of semantic flow, introduce two examples using a DNN and an LLM agent, and finally sketch its properties and how it can be used to adapt existing dynamic analysis techniques for use in ML-based software systems.
no code implementations • 1 Mar 2025 • Felix Dobslaw, Robert Feldt, Juyeon Yoon, Shin Yoo
This paper presents a taxonomy for LLM test case design, informed by both the research literature, our experience, and open-source tools that represent the state of practice.
no code implementations • 5 Feb 2025 • SoMin Kim, Shin Yoo
While SA has been widely adopted as a test prioritization method, its major weakness is the fact that the computation of the metric requires access to the training dataset, which is often not allowed in real-world use cases.
no code implementations • 5 Feb 2025 • Hyunjoon Cho, Sungmin Kang, Gabin An, Shin Yoo
LLMs are rapidly being adopted to build powerful tools and agents for software engineering, but most of them rely heavily on extremely large closed-source models.
no code implementations • 31 Jan 2025 • Jae Yong Lee, Sungmin Kang, Shin Yoo
LLMs, due to their training, are sensitive to how exactly a question is presented, also known as prompting.
no code implementations • 23 Jan 2025 • Juyeon Yoon, Robert Feldt, Shin Yoo
The recent surge of building software systems powered by Large Language Models (LLMs) has led to the development of various testing frameworks, primarily focused on treating prompt templates as the unit of testing.
no code implementations • 20 Dec 2024 • Gunel Jahangirova, Nargiz Humbatova, Jinhan Kim, Shin Yoo, Paolo Tonella
As the adoption of Deep Learning (DL) systems continues to rise, an increasing number of approaches are being proposed to test these systems, localise faults within them, and repair those faults.
no code implementations • 15 Dec 2024 • Nargiz Humbatova, Jinhan Kim, Gunel Jahangirova, Shin Yoo, Paolo Tonella
Results indicate that \dfd is the most effective tool, achieving an average recall of 0. 61 and precision of 0. 41 on our benchmark.
1 code implementation • 7 Apr 2024 • Saeyoon Oh, Shin Yoo
When applying the Transformer architecture to source code, designing a good self-attention mechanism is critical as it affects how node relationship is extracted from the Abstract Syntax Trees (ASTs) of the source code.
no code implementations • 15 Nov 2023 • Juyeon Yoon, Robert Feldt, Shin Yoo
On average, DroidAgent achieved 61% activity coverage, compared to 51% for current state-of-the-art GUI testing techniques.
no code implementations • 31 Jul 2020 • William B. Langdon, Westley Weimer, Justyna Petke, Erik Fredericks, Seongmin Lee, Emily Winter, Michail Basios, Myra B. Cohen, Aymeric Blot, Markus Wagner, Bobby R. Bruce, Shin Yoo, Simos Gerasimou, Oliver Krauss, Yu Huang, Michael Gerten
Following Prof. Mark Harman of Facebook's keynote and formal presentations (which are recorded in the proceedings) there was a wide ranging discussion at the eighth international Genetic Improvement workshop, GI-2020 @ ICSE (held as part of the 42nd ACM/IEEE International Conference on Software Engineering on Friday 3rd July 2020).
no code implementations • 29 May 2020 • Jinhan Kim, Jeongil Ju, Robert Feldt, Shin Yoo
The development process in use consists of multiple iterations of data collection, labelling, training, and evaluation.
1 code implementation • 19 May 2020 • Sungmin Kang, Robert Feldt, Shin Yoo
The testing of Deep Neural Networks (DNNs) has become increasingly important as DNNs are widely adopted by safety critical systems.
1 code implementation • 28 Dec 2019 • Jeongju Sohn, Sungmin Kang, Shin Yoo
The rapid and widespread adoption of Deep Neural Networks (DNNs) has called for ways to test their behaviour, and many testing approaches have successfully revealed misbehaviour of DNNs.
5 code implementations • 25 Aug 2018 • Jinhan Kim, Robert Feldt, Shin Yoo
Recently, a number of coverage criteria based on neuron activation values have been proposed.