Artificial Intelligence. Machine Learning. Neural Networks. All of these technologies are getting lots of attention in the market right now. At Significo, we have been using AI and cutting-edge technology to provide personalized experiences for some time now. So how do we do this?
Data.
More importantly, data harmonization. Data comes in various forms and we need harmonization to get any meaning out of the data. We are not only collecting specific data points, but by harmonizing data we get to the core of the overarching information they provide. Our data scientists are staying on the cutting edge to incorporate strategic measures for multiple data source analysis.
In order to work with this data from diverse sources, you need a very profound understanding of data security and data privacy. We deeply respect the decisions of our users, so we need their consent first — and because of the sensitivity and importance of the data we are working with, we follow GDPR and HIPAA standards and include privacy by design. For example, we are only collecting data that is essential to our products and offer a configuration that is conducive to data protection by default. We also ensure data protection through technical measures when building our products, e.g., a standardized opt-in measure. And by knowing what kind of information we actually really need to provide support, we don’t collect information that doesn’t add value or isn’t really necessary. The knowledge of what is important to derive comes from our years of experience working with data and from our intensive scientific literature review and follow ups.
Machine Learning
Machine Learning and Data Mining have been used by Significo for a decade and are topics of high interest for our partners. As we are using machine learning for various purposes, we obviously need to have a deep understanding of what you can do with data. The same is true for Generative AI: it gets a lot of attention, but it’s only one tiny part of machine learning. We are aware that creating a meaningful GenAI model takes the correct problem setting and data collection, followed by a concise model selection and training procedure. We do have an understanding of how the complete data processing pipeline needs to be so that we can be sure that the outcome brings value to the users. Unlocking the full power of GenAI is a complex process, but it already supports our own internal processes; the insights we get from using it in this way lets us study how best to use it for the needs of our users.
For the healthcare industry, in order to provide a safe and tailored application for your users, you have to understand the techniques of deep neural networks (and their relatives). They are very powerful models that perform really well. At Significo, we want to fully understand what is happening in pre-existing models, why they work for specific use cases, and why they do not. We want to avoid providing something to the user without knowing whether this could be harmful under specific circumstances. Thus, we currently do not use GenAI for the core of our products, but we use it to speed up our processes to provide these solutions. One example is content curation: Health experts define the topics and evaluate the scientific basis for the content provisioning, and then medical experts curate the outcome of the GenAI machinery. This gives us some flexibility and inspiration around how users could perceive content best. Thus, it's used by creatives to check out alternatives of how to phrase things, to speed up tiring processes, to automate workflows that we otherwise would have to do by hand, so that it supports us in our daily work. GenAI is a super support but still, the content that we deliver is selected and evaluated by experts and the final versions are always crafted and quality proven by a human.
In summary, generative AI and Machine Learning are valuable tools, but we are not using them without education. We explore what they can do. We experiment a lot to get a good understanding of what they are useful for. We are not just integrating a hyped tool to talk to the user because our experiments have shown that they are just not capable of providing the things that we want, yet. We know from research and from our cross domain experience that only combining multiple aspects will guide the user to success.
Trustworthy AI Act
In the EU, there is a Trustworthy AI Act that was published in 2019 that states if you are applying some kind of machine learning/AI you need to follow specific principles to make it secure, profound, explainable and predictable. One example of this is that you need to be aware of what kind of data you train and validate your model on. You want to be sure that this actually is appropriate for the use case that you want to serve, i.e., that the training/test data is from the same population that you want to provide the service for. But if you only rely on training data from older caucasian men, then your model is also only applicable for people of a similar age, gender, and ethnicity. This induces a high degree of imbalance and also injustice for your users if you then include younger users, minorities, or females, because there may be totally different needs and patterns for that kind of population. At Significo, we take these types of data ethics seriously, we dig deep to understand the people the data comes from, to prevent tremendous problems, unethical behavior of machines and social injustices within our services.
We want to provide equal health care for all of the people who use our technology. So we pay attention to our training basis where there may be areas of improvement.
Statistics
The understandability of machine learning models is also one important point of the Trustworthy AI Act. It’s not a good idea to just take your favorite machine learning library and tune some parameters, just for the sake of tuning them and expect that you have a wonderful model that solves all of your problems. You really need to understand what your problem is, what method is most likely appropriate, what parameters mean and what your expected outcome actually is. Therefore, a profound understanding of statistics and the theory of machine learning is required. This not only helps you to select the appropriate model for the problem but also guides you through the parameter optimization procedure and prevents you from overfitting, over-optimizing, local optimal traps, and non-interpretability. Occam's razor as a way to only use data and information that gives you stable models also applies there as the Pomodoro Technique. This also circles back to only including data that we believe is really necessary and will give us the information that is needed and not using complex and training intensive models just because they are hyped. We are aware that deep models require a lot more training resources and energy, so we also take this into account when selecting the appropriate model types.
Success Measurement and Iterative Improvement
Finally, you need to test, you need to validate your results in the full process to ensure that you provide a good service to your customers and finally, the people who use your products. Measuring success and reiterating through the model creation process then guides you to better models and better services. Yet again, the definition of KPIs and how to measure them repetitively requires math and statistical expertise. At Significo, we apply all of this knowledge to the full process so that we can say with confidence that we are providing good services to our users.