Skip to content
6 min read

Software Testing and AI

In our last post, we explained what prompting—i.e., entering a command into artificial intelligence (AI)—means and what you should pay attention to when doing so. We realized that it is particularly important to describe what you want to the AI in as much detail as possible. Machines can do many things, but they cannot read minds. This was demonstrated in an exciting way in one of our recent projects.

Translated with DeepL.com (free version)

 

What was the AI project about?

Our client DocToRead basically came to us with the idea of developing an app in which AI simplifies complicated issues and texts. The principle sounds simple, but as it turned out, you can't just build an app around a prompt. We first considered what the optimal approach for the project might be. In addition to researching technical connections to AI and fundamental considerations regarding the design and user-friendliness of the app, we quickly decided to consult our software testing team. When it comes to avoiding pitfalls from the outset and experimenting with technology to achieve the best possible result, there is no substitute for a well-coordinated and experienced testing team.

 

Software Testing and AI

Our idea was that our team of testers, who have experience in both manual and automated testing, would be best placed to formulate an expected result and figure out the best way to achieve it. As the initial test data was evaluated, it quickly became apparent that prompts could be greatly optimized. Prompts are complex, and the results delivered by AI are highly dependent on how a prompt is formulated. There are a variety of functional settings, such as the type of language (technical or colloquial), the context (professional or casual), but also cultural influences. For example, the same prompt led to different results in different language variants in the AI (German, English, Spanish, etc.).

Quote: "The work of testers is very varied and precise. Testers are the link between the code and the user experience. This work is very complex and sometimes very confusing. An AI language model is a very intelligent and complicated algorithm that is leading to global changes in many areas of people's lives. In order to optimize the results that AI provides us with, it is imperative to also optimize the inputs to the AI, i.e., the prompts."

 

Trial and Error in Prompting

The biggest challenge was the multilingual nature of the app. It was more complex than expected to find a prompt that would produce the desired and, above all, reliable results in the languages used. Our testers Marta and Maria had to repeatedly test a prompt, translate the results, and compare them. This took weeks. Trial and error. First, the two created a schema based on the expected output of the AI with defined minimum requirements. Then, each output of the AI was checked against the schema, e.g., whether scientific terms were explained or translated in an understandable way, whether a summary of the text was provided, etc. Even in the release versions of the AI, there are enormous differences in the results for the same prompt. The testers succeeded in finding exactly the right instruction that produces optimal results and translates the scientific text in all languages used in a way that is generally understandable.

Quote: "Testing the different prompts enabled us to tailor the app precisely to the needs of the users. Through testing, the prompts were refined and made more effective, enabling us to develop a more useful and intuitive product.The biggest challenge was developing a prompt that is both effective and versatile. It is time-consuming to figure out which prompts work best and how AI handles different data. It is extremely important to monitor the effectiveness of the prompts so that they can be optimized for the diversity of users and their needs."

Through constant testing, translating, comparing, and optimizing, an app was created that, in conjunction with AI, serves as a kind of medical interpreter. This offers enormous added value, especially in the medical environment, where patients are mostly laypeople in this regard. However, the project is far from complete. Depending on AI releases, the prompt will need to be adjusted again. In addition, there are plans to implement language-dependent prompts in the future – a truly exciting topic. Do you have an idea for an app or software project? Feel free to contact us!