The database of removed content from big tech platforms: Transparency and its illusion

12. January 2026.
The database of removed content from big tech platforms: Transparency and its illusion 1
Ilustracija: EK

Large online platforms in the European Union moderate content in accordance with the law and publish statements of reasons for all their decisions in a single, centralised database. However, practice shows that the actual transparency of this database remains significantly deficient.

With the adoption of the Digital Services Act (DSA), the European Union introduced a fundamental shift in the regulation of large online platforms - particularly Very Large Online Platforms (VLOPs) - with the aim of strengthening platform accountability, enhancing user protection, and reinforcing oversight of content moderation.

The DSA Transparency Database represents one of the most ambitious instruments established under the DSA, as it is designed to provide public, centralised, and continuous insight into all platform decisions concerning the removal, restriction, or labelling of illegal or harmful content (DSA, Article 24(5)).

Welcome to the DSA Transparency Database!

Welcome to the DSA Transparency Database!

However, despite the undeniable importance of the DSA Transparency Database, its implementation — since its launch by the European Commission on 26 September 2023 - continues to exhibit a number of shortcomings. In practice, it does not provide comprehensive meaningful transparency; the quality of the information and data it contains is questionable; and it is designed in a way that makes it difficult to use without substantial resources and a high level of expertise, particularly from the perspective of the broader public.


Three methods online platforms use to remove illegal content:

1. On Its Own Initiative - The platform removes illegal content proactively after identifying it through its algorithms and measures implemented based on its assessment of systemic risks.

2. By Order of Competent National Authorities - In Croatia, for example, by the Ministry of the Interior (MUP). Platforms are required to comply with such orders and to report them to HAKOM.

3. Following Reports from Trusted Flaggers - Since there is no central body monitoring all illegal content on the internet, certain "trusted flaggers" are granted this status after submitting their reports to HAKOM.


Five key methods for accessing content in the DSA transparency database:

1. Through visualised summarised data on the dashboard,

2. Through search of statements of reasons,

3. By downloading aggregated data packages,

4. Via the database’s research API, primarily intended for technically proficient users,

5. Through the DSA-TDB data package for researchers, designed primarily for academic and highly technical analyses rather than for the general public.


Searching the database very soon reveals problems that block certain analyses from the outset. For example, in the statements of reasons within the territorial scope category, it is very rare for only a single EU Member State to be indicated; most often, the entire EU territory is referenced. This could be partially mitigated, however, by filtering statements of reasons according to the language in which the moderated content was originally published.

Digital services act

Akt o digitalnim uslugama

However, since the DSA does not require platforms to indicate the language in which moderated content was published in their statements of reasons, the publication of language varies on a case-by-case basis. Consequently, filtering by language does not provide reliable data for analysis.

For example, if we attempt to investigate how much content in Croatian has been moderated across all platforms in the past 180 days, the database returns 16.346 statements of reasons. Examining the content of individual statements, however, does not reveal what was actually removed. If we then further restrict the selection to posts on the social media of VLOPs, the result shows that no content in Croatian was removed from these platforms in the past 180 days - clearly a result that does not reflect reality.

Exploring Aggregated Data on the Dashboard

The database contains over 36 billion statements of reasons for content moderation decisions. On the Dashboard, users can filter aggregated statistics of all posts since the database’s establishment across several categories, but not by the language of the post, the territorial scope within the EU, or other details, which significantly limits the possibilities for analysis.

Deeper analysis is also hindered by the poorly defined distribution of statements of reasons across categories. Around 30 billion statements are listed under the scope of platform service category, approximately 3 billion under other violations of providers' terms and conditions, and about 1 billion under illegal or harmful speech. This distribution clearly demonstrates overlapping definitions and category ambiguity, leaving platforms room for their own interpretation, which in turn complicates a more in-depth analysis of moderation practices.

Dashboard

Dashboard

The root of this problem lies in the fact that a single statement of reasons can be assigned to multiple categories, with the platform itself deciding which category to place each violation in. This is a direct consequence of DSA articles that define categories but do not specify how overlapping cases should be treated. This flexibility, intended to allow platforms to adapt to different types of content, the laws of individual Member States, and technical implementation, in practice results in poorly defined category distributions. Consequently, many statements of reasons end up in broad categories, with some categories overlapping, for example, illegal or harmful speech and other violations of providers’ terms and conditions.

Such distribution complicates comparison, trend analysis, and reliable quantification of data, not only through this dashboard option but across all other methods of analysing content in the DSA Transparency Database.

Searching "Statements of reasons"

The DSA Transparency Database functionality with the greatest potential for transparency is the Statements of Reasons option, where statements of moderation decisions can be searched across almost all available dimensions: platform, content category, source of report, territorial scope, account type, language, content type, and the method of automated decision-making.

The main issues here are that search results are paginated, showing only the first 10.000 results, and that only 1.000 results can be exported in .csv format.

Research API

The consequence of this limitation (through Statements of reasons) was clearly anticipated by the European Commission when organizing the database. Therefore, the user instructions for the Statements of reasons explicitly note that users interested in programmatic research have access to a research API.

However, in addition to the issues with categories, a new problem arises: the age of the data available through this method is limited to the past 180 days. According to the Data Retention Policy, the rationale for this limitation is to “balance the need for access to information with the obligation to protect sensitive data and comply with the established legal framework".

DSA Transparency Database - Research API

DSA Transparency Database - Research API

Such a limitation is paradoxical in light of the fact that all statements of moderation decisions in the DSA Transparency Database do not contain any information that could identify users (whether natural or legal persons, or any other organizations) that published the moderated content.

Under this restriction, cases are possible in which an online platform removes content due to hate speech, for example, posted by a political party, and the statement of reasons that enters the DSA Transparency Database contains no indication of who or what the content concerns. Despite this strict anonymization, the statement still cannot be accessed via the API 180 days after the moderation decision, due to the protection of sensitive data.

Downloading Data Packages

Older data, however, is not deleted. It is stored in daily dump files, which are accessible to users through the Download option. Here, the temporal limitation is set to five years, but a new problem arises. Statements of moderation can only be filtered by date and platform, while other criteria - language, territorial scope, category, or content type - cannot be filtered prior to data download, even though these exist in the metadata of the CSV dumps.

DSA Transparency Database - Download

DSA Transparency Database - Download

Subsequent filtering after downloading is possible, but only after retrieving zip packages of CSV files sometimes exceeding 1 GB, which makes them particularly challenging to process and filter. Even after applying filters (by platform and date), the system does not generate a new dataset but instead downloads the standard daily package and removes rows that do not match the criteria. The result can be hundreds or thousands of CSV files, mostly empty or nearly empty, requiring additional local processing and significant data storage capacity.

DSA‑TDB toolbox

The DSA Transparency Database also offers the DSA‑TDB python package as a research tool, designed for highly technically proficient users and advanced analyses, which requires substantial technical resources. Civil society organizations typically do not have such capacities, limiting their ability to monitor the implementation of the Digital Services Act.

This tool allows the download and processing of the full dataset as recorded in statements of moderation decisions, which, among other things, means it does not contain information about whose content was moderated or what type of content it was. As a result, even using the DSA‑TDB Toolbox cannot overcome the fundamental limitations in data quality, consistency, and filtering.

Conclusion

All of this results in a formally open database that, in practice, does not provide sufficient transparency or analytical capabilities. Although the DSA Transparency Database offers valuable insights into trends and regulatory practices, for researchers and the public, actual insight into content moderation remains fragmented and technically demanding.

In other words, the database is an important but underutilized tool - it points the way toward more accountable platforms but does not allow the public to truly verify, analyze, and understand the impact of moderation on content and users in practice. From the perspective of Gong, which advocates for public transparency and civic oversight of digital platforms, this represents a step forward, but also a warning that formal instruments are insufficient if they are not operationally accessible and comprehensible.

Ova slika ima prazan alt atribut ; naziv datoteke je logo-EU-1024x253.png
Gong chevron-right