You often don't want to totally delete data the moment a user asks. Why?
1. Hackers may have taken over the account and deleted everything for funsies.
2. People accidentally deleted their own data.

Users ask for undeletion surprisingly often. So you may want a grace period. /2
Now, mind you, in that grace period you should:
a) obviously not serve that data
b) stop using that deleted data for things like training ML models, etc.
Act like it's deleted. If you can't make every piece of software understand (e.g. shared library or filter service) then...
/3
... you need to delete the data from the main data store for real and keep a copy in a special "grace period" data store for possible later restoration. This same sequestration strategy is useful for legal holds. /4
Keeping a list of the (hopefully random -- don't even ask what you have to do if your IDs aren't random I'm having bad memories right now) IDs of particular accounts or pieces of data (e.g. post IDs) that should have been deleted (but not the contents!) is also a good idea. /5
Why keep a list of the things which are supposed to be deleted? Because you should assume that computers, especially distributed systems, are OUT TO GET YOU and things fail in all kinds of funky ways, which is not acceptable for deletion. Trust me, it happens. 😭 /6
Just because the job that goes around and removes deleted data failed or there's a bug in the code doesn't mean it is OK to not delete data. Having a list of what is supposed to be deleted lets you go back and fix things. /7
Once you want to fully remove data from storage, things get complicated. There are probably copies or derived data all over in pipelines and ML models and caches.

Overview here:
https://iapp.org/news/a/data-retention-in-a-distributed-system/

Once you've gotten rid of the copies, you have to get rid of the bytes. /8
Even more annoyingly, when you tell a database to delete things, it doesn't actually remove the bytes! I'm going to go do my day job now, but if you're interested I'll explain this (and how to fix it) later. It's a side effect of how a database actually works inside.
You can follow @LeaKissner.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.