From passive to active data protection

AI and data protection must not be separated

- Falk Borgmann

The leak of Meta's LLM model has unleashed a wave of innovation across the IT world, and its momentum has not waned even after nine months (see: klick). By the end of 2022, OpenAI's impressively powerful ChatGPT3.5 model demonstrated to everyone that this is more than just a passing trend. With DALL-E 3, the next generation of multifunctional AI software is already on the horizon.

As always, when something new emerges, two distinct positions have largely manifested in the public AI discourse. Every innovation is met with either rejection or approval, and the "wait and see" majority usually remains silent, leaving critics and enthusiasts to shape public opinion.

The camp of skeptics, whose attitude reflects skepticism or even near rejection and fear, covers a wide spectrum of concerns. Many, if not all, of these concerns are generally understandable. They revolve around issues such as data protection, copyright, competition, job losses, and the potential for targeted misinformation. Skepticism is the focal point.

The crucial question is how to assess and make visible the various risks and how to deal with them. This quickly leads to calls for legal regulation, and indeed, this process has already begun. It's at this point that proponents of the other camp position themselves—those who see excessive regulation as a threat to the innovation and, consequently, to the economic and scientific landscape of the European Union, especially Germany.

The fundamental problem with the entire discussion is that AI technology is evolving at such a rapid pace that even experts struggle to keep up. Legislative speed has always lagged behind technological development. Endless discussions and slow decision-making processes have unfortunately become a dubious tradition. When it comes to AI, this gap is enormous, and in my view, also potentially dangerous, as we are dealing with the potential for a game-changing technology that can also be a latent threat to democracy (see: klick).

There is agreement on the question of whether AI should be regulated. The question is only how?

To grasp the problem, one must understand what AI is from a technical perspective. AI is not just a computer or a software model alone. AI is the combination of a model with a powerful IT infrastructure, which, in this combination, can be used by other applications. Therefore, AI can be operated and used as a service anywhere on Earth, with only a technical internet connection required. In a highly interconnected world where people and machines are almost constantly connected via the internet, a user cannot possibly know through which components or services their requests from a smartphone or home computer are answered. Just as it is nearly impossible for end-users to verify this, it will also be challenging for any regulatory authority to control it.

Hence, from a technical point of view, it is hardly conceivable that training methods of models or their use can be inspected when they are outside the legal jurisdiction of German or European legislation. And the internet cannot be restricted to legal jurisdictions. Even companies using cloud services cannot be certain about the software, data, and models underlying these services. It should be well-known by now that U.S. corporations do not take data protection and transparency as seriously as European standards dictate. To believe that Chinese or Russian state-hackers or companies from such countries adhere to our standards would be quite naive.
The bitter realization is that the use of AI models and their training is very difficult to control, if not impossible. Even within the EU, I consider effective control to be practically impossible.

As a result, our traditional legal system and legislation do not provide good mechanisms to solve the real problems we are facing – this is something that neither politics nor society have truly understood. To be clear, I believe that legal regulation is absolutely sensible and necessary, but I do not see the potential for control and enforcement of these rules. Our traditional branches of government, legislative and executive, reach the limits of what is feasible here.

In my view, there are two possible levers that we could adjust. The first is to regulate international data traffic in and of itself, or partly restrict it actively. This would involve creating a kind of European Data Governance Policy where supervisory authorities regulate and monitor data flows. However, a broad discourse is required for this, which would be challenging in several respects, as it ultimately also concerns the potential restriction of internet freedoms, which may not be easily reconciled with our democratic understanding of freedom. At the end of the day, a very powerful tool would be created, the potential negative consequences of which should not be underestimated.

For this reason, a second-best alternative may come into focus. I am talking about shifting from passive data protection to active data protection.

In my opinion, the question of AI regulation must also extend to the protection of personal and copyrighted data in every technical persistence. If data is no longer freely accessible by default, models cannot use or alter it freely without authorization. However, this requires a shift in the mindset of the general population and legislators. Everyone must understand that individual data should not be stored in any cloud; these pieces of information are essential and highly protectable assets.

The naivety and idealistic notion that the internet would make all information freely and positively accessible now haunt us. Not only undemocratic structures can freely access the publicly available reservoir of information, but the power of tech giants over all of us has expanded into uncharted dimensions due to their data collection and technological advancements.

If efficient regulation of AI and its use and training is not feasible, then it is all the more important to protect data as best as possible. Just a few years ago, there was resistance to censuses and shock over the methods of the Stasi for monitoring citizens. Today, in a digital world, it seems that no one cares that personal data is collected en masse by companies or even states and used for their own benefit. Therefore, it may be worth considering a complete reversal of the fundamental understanding of data protection. That is, everything that is not explicitly released by a creator or owner is considered "default-confidential."