Software testing and AI

In our last article, we explained what prompting, i.e. entering an instruction to an artificial intelligence (AI), means and what you should pay attention to when doing so. We realized that it is particularly important to describe what you want to the AI in as much detail as possible. After all, machines can do many things, but they cannot read minds. This was demonstrated in an exciting way in one of our last projects.

What was the AI project about?

Our customer DocToRead came to us with the idea of developing an app in which an AI simplifies complicated facts or texts. The principle sounds simple, but as it turned out, you can’t just build an app around a prompt. We first considered what the optimal approach for the project might be. In addition to researching technical connections to AI

and basic considerations about the design and user-friendliness of the app, we quickly decided to consult our software testing team. When it comes to avoiding pitfalls from the outset and experimenting with a technology in order to achieve the best possible result, there is no way around a well-established and experienced testing team.

Software testing and AI

Our idea was that our team of testers, experienced in both manual and automated testing, would be best placed to formulate a result expectation and figure out the best way to get to that result. While the initial test data was being evaluated, it quickly became clear that prompts could be greatly optimized. Prompts are multi-layered and the results that the AI delivers are very dependent on the way a prompt is formulated. There are a variety of functional settings, such as the type of language (technical or colloquial), the context (professional or casual), but also cultural influences. For example, the same prompt with different language variants in the AI (German, English, Spanish, etc.) also led to different results.

Quote “The work of testers is very varied and precise. Testers are the link between the code and the user experience. This work is very complex and sometimes very convoluted. An AI language model is a very intelligent and complicated algorithm that leads to global changes in many areas of people’s lives. In order to optimize the results that the AI provides us with, it is imperative to also optimize the input to the AI, i.e. the prompts.”

Trial and error when prompting

The biggest challenge was the multilingualism of the app. It was more complex than expected to find a prompt that would lead to the desired and, above all, reliable result in the languages used. Our testers Marta and Maria had to test a prompt again and again, translate the results and compare them. This went on for weeks. Trial & error. First, the two of them created a schema based on the expected output of the AI with defined minimum requirements. Then each output of the AI was checked against the schema, e.g. whether scientific terms were explained or translated clearly, whether the text was summarized, etc. Even with the release versions of the AI, there are enormous differences in terms of the result on the same prompt. The testers succeeded in finding exactly the instruction that produces an optimal result and translates the presented scientific text in a generally understandable way in all languages used.

Quote: “Testing the different prompts allowed us to tailor the app to the exact needs of the users. Testing refined the prompts and made them more effective, which allowed us to develop a more useful and intuitive product. The biggest challenge was to develop a prompt that was both effective and versatile. It is time consuming to figure out which prompts work best and how AI deals with different data. It’s hugely important to monitor the effectiveness of the prompts so that they can be optimized for the variety of users and their needs.”

Through constant testing, translating, comparing and optimizing, an app was created that serves as a kind of medical interpreter in conjunction with AI. This offers enormous added value, particularly in the medical environment, for patients who are mostly laypeople in this respect.

However, the project is far from over. Depending on AI releases, the prompt will have to be adapted again. There are also plans to implement language-dependent prompts in the future – a really exciting topic.

Do you have an idea for an app or a software project? Please feel free to contact us!