Facebook and its parent company Meta have gobs of sensitive user data from tracking people across the internet, no clear idea where it’s all stored, who precisely has access to it, or even what information it contains.
That’s one top-level takeaway from a newly unsealed court transcript featuring the testimony of two senior Facebook engineers who were enlisted by a U.S. district court to help clarify the company’s data retention practices.
The testimony of Eugene Zarashaw, a Facebook engineering director, and Steven Elia, a software engineering manager, was first reported by The Intercept.
Facebook made the experts available as part of an ongoing lawsuit prompted by the Cambridge Analytica data scandal, in which vast troves of user data were secretly harvested and exploited by a firm with links to Donald Trump’s 2016 presidential campaign.
A court-appointed special master, Daniel Garrie, has the unenviable job of determining where Facebook stores personal data in its 55 subsystems ― which neither Zarashaw nor Elia could really answer.
Presented with a list of those systems, neither engineer recognized what they all were, let alone what data they collected.
Asked where Facebook stores a user’s on-platform activity, data obtained from third parties about users’ activities outside of Facebook, and other inferred user data, again, neither knew the answer.
“I don’t believe there’s a single person that exists who could answer that question,” Zarashaw told the court. “It would take a significant team effort to even be able to answer that question.”
Zarashaw added that Facebook tends to build pieces of infrastructure, “and then just leave them running for anybody at the company to use.” Other teams then “end up using other pieces of infrastructure as underlying storage,” making it difficult to fully account for who is doing what.
“It would take multiple teams on the ad side to track down exactly the ― where the [user] data flows,” Zarashaw added. “I would be surprised if there’s even a single person that can answer that narrow question conclusively.”
What’s more, Zarashaw told the court Facebook has a “somewhat strange engineering culture” in that it often doesn’t generate documentation for others to refer to later.
“Effectively the code is its own design document,” the engineer said. “For what it’s worth, this [was] terrifying to me when I first joined as well.”
A Meta spokesperson vehemently disputed the notion that it has haphazard internal data tracking policies.
“Our systems are sophisticated and it shouldn’t be a surprise that no single company engineer can answer every question about where each piece of user information is stored,” the company said in a statement.
“We’ve built one of the most comprehensive privacy programs to oversee data use across our operations and to carefully manage and protect people’s data. We have made ― and continue making ― significant investments to meet our privacy commitments and obligations, including extensive data controls.”
The transcript bolsters an April Motherboard report, based on a leaked internal document from the company’s ads and business product team, that suggested Facebook is structurally incapable of adequately regulating user data because of how the company is built.
“We do not have an adequate level of control and explainability over how our systems use data, and thus we can’t confidently make controlled policy changes or external commitments such as ‘we will not use X data for Y purpose,’” the document reads. “And yet, this is exactly what regulators expect us to do, increasing our risk of mistakes and misrepresentation.”
The 15-page document compared Facebook’s user data to a bottle of ink that’s been emptied into a body of water.
“You pour that ink into a lake of water (our open data systems; our open culture) … and it flows … everywhere. How do you put that ink back in the bottle? How do you organize it again, such that it only flows to the allowed places in the lake?”