Introducing the Memri Privacy Preserving License

koen.vanderveen · July 1, 2020, 2:03pm

Over the last few months, we have creating our own privacy preserving software license, the Memri Privacy Preserving License, based on the Mozilla Public License (MPL). The license adds a privacy clause to protect our users and make sure that we take responsibility for the software that we put out in the world. This is an early version of the license and we are actively looking for feedback on the moral side and implementation details made so far.

You can find the latest version as a diff to the MPL here, and to facilitate a fruitful discussion we describe our reasoning behind the biggest additions to the MPL license below.

It is useful background information that in general, software licenses make an agreement between two parties: the original creator of the software, and a party extending or using the software in its own products. In this license we refer to the latter party as “You”. Moreover, we introduce a third party: the “Subject”, which is the end-user of the of the software, that may interact with the software and thereby exposes its personal data to the software.

1.1. Adequately Encrypted	Explanation
means encrypted using encryption standards generally accepted for encrypting sensitive (personal) information, whereby the decryptio keys may only be available to Subject.	We make destinction between encrypted and unencrypted information. The main idea here is that the Subject (read user) only has access to the keys. Third parties may still provide services like storage to users, as long as these parties cannot access the plaintext data.
1.2. Aggregate Information	Explanation
means all information that is generated by combining Persona Information about a group of individuals, that does no longer directly contain such Personal Information, and which was created for another purpose than extracting Personal information from that information.	Before explaining what this is, let's make this clear: users can always opt out. Many applications will require to compute user statistics, grouped crash reports and other aggregated information. Making sure that aggregated information does not contain any information about users in the widest sense is a very hard problem, and proving that no information is leaked at all form an information theoretical perspective can be impossible in practice. Therefore, we will allow parties to generate aggregate information. As we will read later, what we won’t allow, is parties actively trying to extract information about individuals from this aggregate information.
1.15. Personal Information	Explanation
Means all information related to or generated by a Subject, generated by the interaction between a Subject and a computer system by means of input devices and/or sensors, or generated by Processing Personal Information, excluding any Aggregate Information.	Many definitions of personal information are too narrow in our eyes. They often address direct sources of information, but forget to include information that can be deduced about you based on your personal information. *COMMENT RUBEN: alle informatie die google in hun socalled "shadow text" (ref age of surveillance capitalism) opneemt en die je dus niet krijgt via de GDPR* Our aim here was to have the broadest definition of personal information possible.
1.16. Process / Processing	Explanation
means any action, whether performed by a human or by a computing device involving Personal Information. Such actions include, but are not limited to, storing, retrieving, viewing, displaying, copying, removing, editing, displaying, and showing.	We are aiming to define all things third parties may want to do with your data here. We are not going into restrictions just yet.
1.19. Static Data Set	Explanation
means a fixed amount of data that is provided at one, immediate point in time.	Definition of a fixed size dataset (E.g. "these 10 photo's), instead of data that streams in over time (E.g. "All my e-mails, including incoming")
1.20. Subject	Explanation
means the person that is a user of Your Covered Software.	This defines the user we are trying to protect with the license.
1.21. Unencrypted	Explanation
means not Adequately Encrypted.	Definition of unencrypted data. Note that we define encrypted, but not adequately* as unencrypted.*
2.1. Grants	Explanation
Provided that You comply with all the terms of this License, each Contributor hereby grants You a world-wide, royalty-free, non-exclusive license:	This is an obvious but essential definition in our license. It defines that anyone that uses or extends the software covered under this license, will have to comply with the privacy restrictions for its users that are described by the license
4.1. Protecting Personal Information	Explanation
Personal Information is not subject to ownership, by either You or the Subject. Instead, privacy is a fundamental human right. Determining if and how Personal Information is Processed is an essential part of privacy. This clause aims to guarantee that You respect this fundamental human right. You must ensure that Personal Information is Adequately Encrypted wherever possible. This clause 4 does not apply if and in so far as You are the Subject.	This is the start of our privacy preserving restrictions. First of all, we are trying to set the tone. We believe that as a user, your data should always be yours. You have control over it, and no one else does. If other parties want to use your data, it should be encrypted where possible. This definition is not airtight from a legal perspective. Especially the “where possible” leaves some room for interpretation. Our aim here was to have a definition that makes sense from both 1) a practical perspective, and 2) a privacy perspective. It aims to ensure that third parties do everything what can be expected from them to protect your data, but it also makes sure users don’t expect the impossible. If some encryption algorithm has a bug, it is unreasonable to expect parties to immediately know about that. TODO: REWRITE THIS
4.2. Processing Adequately Encrypted Personal Information	Explanation
You may Process Personal Information by means of Covered Software, or by means of a Larger Work, without obtaining the prior authorization of clause 4.3, only if and in so far as necessary for providing Your service to the Subject, and provided that this Personal Information is Adequately Encrypted. In all other cases, the terms of clause 4.3 apply.	This section defines that parties may provide services for your data in encrypted form, such as storage.
4.3. Processing Unencrypted Personal Information	Explanation
You may not Process Unencrypted Personal Information by means of Covered Software or by means of a Larger Work, unless You obtain prior authorization from the Subject. You will ensure that such authorization is always: (a) For a specific period of time; (b) For a specific, pre-determined, communicated and unchangeable purpose; © Based on clear and understandable information about the specific period of time and the specific purpose; (d) Based on a separate agreement, and not concealed in general terms and conditions or click through agreements; (e) Accompanied by an offer to fairly compensate the Subject for the value derived from Processing Personal Information that is not Adequately Encrypted, which value must be validated in hindsight; The specific period of time listed under clause 4.3 (a) may be an unlimited period of time only if and in so far as the Personal Information is a Static Data Set. If the Personal Information is not a Static Data Set, the specific period of time may not be longer than two years, which term may be extended for another period of two years. Such extension may not occur automatically, and requires renewed authorization in accordance with this clause 4.3. The terms of this clause 4.3 apply to Processing Adequately Encrypted Personal Information for other purposes than the purposes described in clause 4.2 as well.	With this section we aim to restrict what parties can do with your data, if you allow them to use it in unencrypted form. We do not want to protect the user against it self by saying that no one can ever use your unencrypted data, if you want to borrow or give away your information, for instance for a good cause, that should be your choice. At the same time we do want to circumvent situations that are not acceptable under any circumstances. Our aim here has been to provide a set of clear rules that cannot easily be misused. In all cases, when parties use your unencrypted data, it should be clearly communicated what the data is used for and for how long it is used. Also, users should be “fairly” compensated for the value that is derived from their data. Again, we leave some room for interpretation here, because calculating this value exactly is very subjective in practice. However, it makes situations where companies earn billions of dollars by selling products that are build on user data while users getting nothing less likely. This section has an exception for fixed datasets in terms of time constraints. This allows parties to do research or create models (aggregate information) over data while maintaining reproducibility of their work. This exception is only applies to the time constraint, the other constraints still apply.
4.4. Requirements for third party access	Explanation
If You want to provide access to Personal Information to a third party, then You must ensure that the Subject is a party to the transaction (whether that transaction is for a fee, or not) with regard to their Personal Information, so that the Subject can authorize that access. The criteria for authorization listed in clause 4.3 apply in full. In addition, you must ensure that the third party complies with the obligations of clause 4.3 and 4.4, even if the third party does not use Covered Software or a Larger Work to Process the acquired Personal Information. This Clause does not apply if and in so far as the Subject initiates the process of providing access to such third party, for instance by sending a limited Static Data Set to a recipient of their choice.	This section defines what constraints apply when parties want to sell your data to other parties. This is allowed, if the user authorises it, but the same privacy constraints apply for the buying party.
4.5. Revoking authorization	Explanation
If a Subject does not extend the authorization previously given, You will delete all their Personal Information and after deletion You must send proof of deletion to the Subject and subsequently delete such proof.	This section defines that a user revokes access after borrowing their information by default, and it ensures that parties delete the users data when access is revoked.
4.7 Restrictions for Processing Aggregate Information	Explanation
If you want to Process Personal Information to generate Aggregate Information, the terms of clause 4.3 apply. In addition: You may only Process Personal Information to generate Aggregate Information if and in so far as it’s reasonable to assume that the chance of extracting Personal Information from the Aggregate Information is negligible. Creating Aggregate Information may not serve the purpose of obtaining more Personal Information about a Subject or extracting Personal Information from such Aggregate Information. Furthermore, You must do everything in Your power to prevent anyone, including Yourself, from trying to extract Personal Information about a Subject from Your Aggregate Information or Your Inferences. If you want to provide access to or sell Aggregate Information to a third party, You must ensure that the terms of clause 4.7 (b) and © apply in all material respects to such third party, even if the third party does not use Covered Software or a Larger Work to Process the Aggregate Information Provided You comply with the terms of this clause 4.7, you may use Aggregate Information to generate Inferences.	This section restricts how parties van generate and use Aggregate information. The main point here is that parties can make models and statistics over user data, but the purpose of doing that can never be to learn more about individual users.
4.8 General Restrictions on Processing Personal Information	Explanation
You may not Process Personal Information by means of Covered Software or by means of a Larger Work, if the purpose of such Processing is: (a) Surveillance; (b) Tracking; © Influencing or recording political views; (d) identifying, or obtaining information about, other persons than the Subject.	This section constraints what the purposes of processing personal information may be. We exclude some obvious activities that do not allign with any privacy values.

koen.vanderveen · July 1, 2020, 1:26pm

vilius · April 26, 2021, 5:56pm

I’ve read through the license introduction, this thread and the changes against MPL and here are my concerns.

I’ll start with the usual IANAL disclaimer for the typical reason and to avoid annoying you with phrases like “I think”, “In my opinion/understanding” and similar in every other sentence.

License proliferation is considered an issue for at least a couple decades and any new license inevitably makes matters worse in this regard, so the bar to justify a new license should be very high. I did not find a satisfactory justification in the aforementioned resources. The changes to MPL seem to serve at least a couple purposes:
a) encode the goals and ethos of the Memri project;
b) specify requirements for data handling by a service provider.

Purpose a) does not belong in a software license and would be more easily discoverable on the project page. Purpose b) does not seem to be achieved by the license because a service provider would not be a party to the license. A party to the license would either be an end-user in case of self-hosting or the data hosting provider. Guarding the data from data hosting provider seems feasible using the technical means, so provisions for that in the license are superfluous. Provisions for purpose b) should be in the agreement between the user and service provider.
The introduction of section 4 makes MPPL incompatible with the other free/open source licenses, which makes any code from the Memri project impossible to reuse in projects licensed under those licenses. That means the value of Memri as an open source project is diminished. Although, this could be mitigated by minimizing the use of MPPL, i.e. using a copyleft or permissive licenses for any generally useful libraries and other components developed as part of the Memri project.
Section 4. of MPPL seems likely to have the equivalent effect to the morality clause of the JSON license - any covered work will be non-free software (freedom 0 of The Free Software Definition is “to run the program as you wish, for any purpose”) and non-open source (clause 6. of the Open Source Definition stipulates “No Discrimination Against Fields of Endeavor”). Even if compliance with those definitions is not an explicit goal of the Memri project, it may have counter-productive consequences. For example, MPPL-licensed software would not be acceptable to mainstream distributions like Debian and Fedora, which would hinder adoption of the Memri project.
Assuming introduction of a new license in general or the changes introduced in MPPL in particular are frowned upon in the FOSS community it may discourage potential volunteer contributors.

If these concerns have been adequately considered, my feedback would hopefully serve as an indication on what questions should be elaborated on for the community at large.

Ruben · April 26, 2021, 6:48pm

Hi @vilius,

Welcome to the community and thanks for taking the time to provide feedback. We super appreciate any dialog around the license. Allow me to respond by clarifying our intentions in the hope to start a healthy dialog and create understanding. There may be better ways to achieve our goals, but so far we haven’t found them.

We realize that license proliferation is a problem. We have chosen this path anyway because we believe that a privacy focussed license will prevent some of the damage that is currently done in the tech industry to our society at large. We see technology as neutral but the way it is used as non-neutral. The license therefore is a way for us to prevent abusive use of the software we create. This is important to us as we do not wish our software to be used as a tool for oppression.

The changes to the MPL serve to protect a user of our software when that software is offered by someone other than themselves. We limit the ways that a SaaS provider that’s offering the software treats the data of a user and the business models it can enable for itself and others using the software. For instance it states that when selling the data of the user there must be a direct relationship between the acquirer of the data and the user itself. Furthermore it forwards restrictions to that relationship (as proprietor of a potential market place) to the data buyers in such a market place. This bounds the SaaS provider that is using the software and/or their customers in a chain that extends without end.

I would love to hear other ways that you think we could protect the end user in this way and prevent our software from being used in ways that are detrimental to the user.

The question of incompatibility is not a straightforward one. the MPPL is obviously compatible with BSD and MIT like licenses. It just means that section 4 applies to that software as well. If we do write software that is generally applicable we will release that as MIT/BSD.
Indeed we explicitly do not wish “to run the program as you wish, for any purpose”. We do not wish it to be used in cases where the software harms the privacy of its users.
We hope that there will be enough contributors that wouldn’t want to allow their software to be used by those that impede on user’s privacy. How do you feel about that?

I look forward to your response! Thanks again for engaging here.

vilius · April 28, 2021, 6:26pm

The core concern is if MPPL will have practical value, which I’m not sure of yet, but that may be due to my limited understanding of the foreseen typical deployments and other important factors. Weighing all the potential consequences and complications of using MPPL is only relevant once the value of it is established.

I can see at least 3 potential licensees of Memri, that is entities that would deploy and run MPPL-licensed software:

end-user (self-hosting);
data hosting provider (specialised service for simplified Memri provisioning);
service provider of a service that processes the data managed by Memri and in addition hosts Memri.

From the privacy point of view the first two options seemed like the obvious choices to me where the additional protections defined in MPPL seem to have no practical value. The third option is a losing proposition from the privacy perspective and relying on protection in MPPL is grabbing at straws in my opinion. The third deployment option can be imagined if users focus on the value of having access to data instead of the privacy aspect. So for the sake of discussion, let’s assume that the service provider that could not be trusted before has somehow found his long lost conscience or is somehow otherwise compelled to abide by the terms of MPPL. In that case the most realistic outcome in my mind is not that users’ privacy would be successfully safeguarded thanks to MPPL, but that the service provider would choose to reimplement something akin to Memri for the sake of not being bound by the license. If we think of the big players like the FAANG gang, it does not seem realistic to outcompete them by development resources.

The way I understand the vision of Memri, it will require radical changes to the status quo. That will require as broad support as possible. A license that is ineffective in practice, but brings about the complications outlined in my first post may become a reason to fragment the wider community working on similar goals instead of Memri potentially becoming a rallying point, reference implementation. A non-controversial license could turn the service provider in the hypothetical scenario above into a contributor instead of a competitor, which would consequently benefit the end-users.

I wholeheartedly support the goals of Memri as far as I understand them right now, that’s why I think it’s important to make very careful decisions to lay groundwork for success in the long-term. My guess is this would be a long game, but an interesting one.

Ruben · May 8, 2021, 7:10am

It took me a while to parse your message, but I think that this sentence is potentially where we can resolve a disconnect between our mental models.

My interpretation of this sentence is that you suggest that users are given the choice between either gaining access to some data (I can only guess at which data, but some data that the provider has) and having privacy, but not both at the same time. This is certainly the reality of today. Yet, Memri is designed with a different situation in mind. We believe that privacy is a fundamental human right and that that right should extend into the digital realm.

Memri’s architecture is designed such that a user’s data is stored in a pod with keys that only they have. With a proper hosting architecture these keys will only exist in an encrypted form in any other context (i.e. that of the service provider(s) — at rest, in transit and in memory) and therefore the service provider cannot access the user’s data. Any processing that would happen on the user’s data would happen within this encrypted context and the output of that processing would be stored within that context as well, only accessible by the user holding the keys.

The license ensures that the service provider adheres to this agreement and encrypts things properly with the risk of losing ability to host the software (and more importantly, reputation risk) when not providing that level of security/privacy. There is a large amount of trust required for a user to put their data in a pod and they should only want to do that if they can trust the provider. We believe this will be a common requirement by users in the future.

Although large players have many orders of magnitude more developer resources these are not applied in the same qualitative ways. The innovators dilemma describes this issue really well. We have seen this play out in the past with Youtube, facebook and other services where Google was not able to compete. Memri aims to mobilize a large community of developers that want to contribute towards a common goal of data sovereignty and data privacy and doing that in a values aligned way. We believe that this has the potential to be an unstoppable force to be reckoned with, regardless of the amount of developer resources available in absolute terms.

vilius · May 16, 2021, 6:43pm

My previous post was an attempt to find arguments for using MPPL and present the assumptions I was making for you to possibly correct/amend. The first two deployment options don’t seem to offer any arguments for MPPL - technical measures are sufficient to ensure privacy protection and MPPL offers no tangible benefit. The additional protection by MPPL is only relevant in the third case.

My interpretation of this sentence is that you suggest that users are given the choice between either gaining access to some data

Sorry if it came out a bit too terse. The sentence was an attempt to formulate my assumption that the third option would only be attractive to users for whom the selling point is not privacy, but the ability to access/use their data using Memri (“focus” was the key word in that sentence, i.e. I am not proposing a false dichotomy). If the assumption on your side is that this would be the predominant motivation of users and hence the typical deployment scenario, then MPPL may make sense. Provided it would be enforced, of course, which is problematic for any open source license.

Memri aims to mobilize a large community of developers

One of the reasons for me to engage in this thread was to raise the concern that MPPL may work counter to this very aim. Although, given the sample size of “outsiders” who have voiced their opinion here is currently equal to 1, it’s anyone’s guess if this sentiment is widely shared in the target audience of the developers you are aiming at.

Ruben · May 17, 2021, 5:41am

One of the reasons for me to engage in this thread was to raise the concern that MPPL may work counter to this very aim.

I think this is true for an open source project that does not reward the open source contributors properly. In our case we believe that it is unfair for core workers to be the only beneficiaries of the financial gains of an ecosystem that is co-created by a larger community of contributors.

We are therefore organizing ourselves around the principle that some of the income should flow back to these contributors, thereby increasing the quality of the contributions, the amount of contributions and aligning the interest of the contributors to the income generated by the ecosystem as a whole. We expect that this will then also cause a stronger alignment around the value of privacy, which we believe will be a main attractor of users to the ecosystem in the first place. If that will be the case then the MPPL will be seen as a way to protect the income of the contributors rather than a limitation of freedom.