The Impact of AI-Generated
Text on the Internet

Jonas Dolezal ¹, Sawood Alam ², Mark Graham ², Maty Bohacek ³*

¹ Imperial College London ² Internet Archive ³ Stanford University

The proliferation of AI-generated and AI-assisted text on the internet is feared to contribute to a degradation in semantic and stylistic diversity, factual accuracy, and other negative developments. We find that by mid-2025, roughly 35% of newly published websites were classified as AI-generated or AI-assisted, up from zero before ChatGPT's launch in late 2022. We also find evidence suggesting that increases in AI-generated text on the internet bring about a decrease in semantic diversity and an increase in positive sentiment. We do not, however, find statistically significant evidence supporting the hypothesis that an increased rate of AI-generated text on the internet decreases factual accuracy or stylistic diversity. Notably, our findings diverge from public perception of AI's impact on the internet.

Paper Code & Data

How much new text on the internet is AI-generated?

Answering this question is harder than it might seem. Constructing a statistically representative sample of the internet is difficult, as there is no central index, popular domains are vastly over-represented in most crawls, and archival coverage has shifted considerably over time. To work around this, we draw on the Internet Archive's Wayback Machine and apply a multi-dimensional stratified sampling approach, approximating a uniform random draw from publicly accessible web pages published between 2022 and 2025 (see Section 3.1 in our paper).

On top of this sample, we need a reliable way to tell AI-generated and AI-assisted text apart from human-written text. AI-generated text detection is itself an open problem, so rather than committing to a single detector, we experiment with four prominent methods selected based on their performance on the RAID benchmark: Binoculars, Desklib, DivEye, and Pangram v3. We then run our own robustness checks across text length, HTML versus plain text, model family, model version, and language, and choose the detector that comes out the strongest overall — Pangram v3 (see Appendix A in our paper).

What is the public's perception of AI's impact on the internet?

In a stratified sample of the US population, we surveyed 853 adults about their AI usage habits, their general view of AI's impact on society, and their beliefs about six specific hypothesized negative impacts of AI-generated text on the internet. Below, we present overall trends in usage frequency and view of AI impact. We then present a breakdown of belief in each of the scrutinized hypotheses based on these usage habits.

How does AI-generated text actually impact online discourse?

For each hypothesis, we show (a) the correlation between the measured signal and AI prevalence across monthly samples, (b) the overall distribution of participant survey responses, (c) responses broken down by AI usage frequency, and (d) responses by general view of AI's impact on society.

Evidence vs. Public Belief

While the majority of US adults believe in all six tested negative impacts of AI-generated text on the internet, our quantitative analysis only confirms two. Below, we compare the statistical evidence (correlation between each signal and AI prevalence) with the rate of public agreement from our participant survey.

Acknowledgements

The authors thank the Internet Archive for providing the Wayback Machine data and their extraordinary support throughout this project, and Pangram for providing a research grant supporting the use of their API. We also thank Liam Dugan, Hany Farid, Daphne Ippolito, Shayne Longpre, and Alexander Wang for helpful discussions (listed in alphabetical order).

The Impact of AI-Generated
Text on the Internet

How much new text on the internet is AI-generated?

What is the public's perception of AI's impact on the internet?

How does AI-generated text actually impact online discourse?

Hypothesis 1: Semantic Contraction Confirmed • ρ = 0.47, p = 0.004

Hypothesis 2: Truth Decay Not Confirmed • ρ = −0.19, p = 0.27

Hypothesis 3: Positivity Shift Confirmed • ρ = 0.56, p = 0.0003

Hypothesis 4: Epistemic Islands Not Confirmed • ρ = −0.12, p = 0.48

Hypothesis 5: Entropy Dilution Not Confirmed • ρ = −0.02, p = 0.89

Hypothesis 6: Stylistic Monoculture Not Confirmed • ρ = 0.24, p = 0.17

Evidence vs. Public Belief

Citation

Acknowledgements

The Impact of AI-GeneratedText on the Internet

How much new text on the internet is AI-generated?

What is the public's perception of AI's impact on the internet?

How does AI-generated text actually impact online discourse?

Hypothesis 1: Semantic Contraction Confirmed • ρ = 0.47, p = 0.004

Hypothesis 2: Truth Decay Not Confirmed • ρ = −0.19, p = 0.27

Hypothesis 3: Positivity Shift Confirmed • ρ = 0.56, p = 0.0003

Hypothesis 4: Epistemic Islands Not Confirmed • ρ = −0.12, p = 0.48

Hypothesis 5: Entropy Dilution Not Confirmed • ρ = −0.02, p = 0.89

Hypothesis 6: Stylistic Monoculture Not Confirmed • ρ = 0.24, p = 0.17

Evidence vs. Public Belief

Citation

Acknowledgements

The Impact of AI-Generated
Text on the Internet