{"id":13,"date":"2024-06-18T15:16:44","date_gmt":"2024-06-18T15:16:44","guid":{"rendered":"https:\/\/www.yippeekiai.com\/?p=13"},"modified":"2024-06-18T15:41:22","modified_gmt":"2024-06-18T15:41:22","slug":"ai-generated-content-a-looming-threat-to-llm-quality","status":"publish","type":"post","link":"https:\/\/www.yippeekiai.com\/index.php\/2024\/06\/18\/ai-generated-content-a-looming-threat-to-llm-quality\/","title":{"rendered":"AI-Generated Content: A Looming Threat to LLM Quality?"},"content":{"rendered":"\n<p>Is there a risk of a recursive degradation effect for LLMs over time due to training on subpar AI-generated content from the web?<\/p>\n\n\n\n<p><br>Some estimates say that <a href=\"https:\/\/www.vice.com\/en\/article\/y3w4gw\/a-shocking-amount-of-the-web-is-already-ai-translated-trash-scientists-determine\" target=\"_blank\" rel=\"noreferrer noopener\">around 50%<\/a> of all content on the web is now AI-generated, and that could rise to <a href=\"https:\/\/futurism.com\/the-byte\/experts-90-online-content-ai-generated\" target=\"_blank\" rel=\"noreferrer noopener\">90% by 2026<\/a>. A more precise number depends on the methodology used for measuring. However, given how long it takes a human to write a quality blog post compared to an AI generating text that is then published automatically or by a human, the trend is quite obvious. By 2030 or so, 99% of all content on the web will probably be AI-generated to some extent.<\/p>\n\n\n\n<figure class=\"wp-block-image size-medium\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"251\" src=\"https:\/\/www.yippeekiai.com\/wp-content\/uploads\/2024\/06\/ai-recursive-e1718723592590-300x251.png\" alt=\"AI Recursive learning\" class=\"wp-image-11\" srcset=\"https:\/\/www.yippeekiai.com\/wp-content\/uploads\/2024\/06\/ai-recursive-e1718723592590-300x251.png 300w, https:\/\/www.yippeekiai.com\/wp-content\/uploads\/2024\/06\/ai-recursive-e1718723592590-768x644.png 768w, https:\/\/www.yippeekiai.com\/wp-content\/uploads\/2024\/06\/ai-recursive-e1718723592590.png 1024w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/figure>\n\n\n\n<p><br>Google&#8217;s search results <a href=\"https:\/\/gizmodo.com\/google-search-results-are-getting-worse-study-finds-1851172943\" target=\"_blank\" rel=\"noreferrer noopener\">have deteriorated<\/a> over time, and they <a href=\"https:\/\/www.bbc.com\/future\/article\/20240524-how-googles-new-algorithm-will-shape-your-internet\" target=\"_blank\" rel=\"noreferrer noopener\">recently took steps<\/a> to filter out content that they interpret as lower quality, mostly AI-generated content. Google&#8217;s latest move is just one in a long line of efforts to filter out lower-quality texts, like SEO link farms, from search results, and they have extensive experience in this field.<\/p>\n\n\n\n<p><br>So, if there is a fast rise in AI-generated content on the web that Google has taken active steps to counter, thus filtering out 45% of the previous content, how could the various trainers of new LLMs with training data in the billions and even trillions of parameters ensure the quality of that training data? They will probably have an increasingly hard time doing so.<\/p>\n\n\n\n<p><br>GPT-4, for instance, has 1.76 trillion parameters and has historically increased the number of parameters with each version.<\/p>\n\n\n\n<p><br>Thus, subpar AI content could take up a bigger and bigger part of the web and then be used as training data for new LLMs, which in turn will be used for generating new AI content that is put on the web, that in turn is used as new training data for LLMs, and so on.<\/p>\n\n\n\n<p><br>In an experiment where an AI was trained on its own output, without fresh quality data added, <a href=\"https:\/\/arxiv.org\/pdf\/2307.01850\" target=\"_blank\" rel=\"noreferrer noopener\">it went MAD<\/a> (Model Autophagy Disorder) after 5 iterations.<\/p>\n\n\n\n<p><br>Given the trend of accelerating AI content on the web, ensuring the quality of the training data will be extremely important for new LLMs.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Is there a risk of a recursive degradation effect for LLMs over time due to training on subpar AI-generated content from the web? Some estimates say that around 50% of all content on the web is now AI-generated, and that could rise to 90% by 2026. A more precise number depends on the methodology used [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[6,4,5],"class_list":["post-13","post","type-post","status-publish","format-standard","hentry","category-ai","tag-ai-content","tag-llm","tag-training"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.1 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>AI-Generated Content: A Looming Threat to LLM Quality? - yippeekiAI.com<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.yippeekiai.com\/index.php\/2024\/06\/18\/ai-generated-content-a-looming-threat-to-llm-quality\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"AI-Generated Content: A Looming Threat to LLM Quality? - yippeekiAI.com\" \/>\n<meta property=\"og:description\" content=\"Is there a risk of a recursive degradation effect for LLMs over time due to training on subpar AI-generated content from the web? Some estimates say that around 50% of all content on the web is now AI-generated, and that could rise to 90% by 2026. A more precise number depends on the methodology used [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.yippeekiai.com\/index.php\/2024\/06\/18\/ai-generated-content-a-looming-threat-to-llm-quality\/\" \/>\n<meta property=\"og:site_name\" content=\"yippeekiAI.com\" \/>\n<meta property=\"article:published_time\" content=\"2024-06-18T15:16:44+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-06-18T15:41:22+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.yippeekiai.com\/wp-content\/uploads\/2024\/06\/ai-recursive-e1718723592590-300x251.png\" \/>\n<meta name=\"author\" content=\"mcclane\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"mcclane\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.yippeekiai.com\/index.php\/2024\/06\/18\/ai-generated-content-a-looming-threat-to-llm-quality\/\",\"url\":\"https:\/\/www.yippeekiai.com\/index.php\/2024\/06\/18\/ai-generated-content-a-looming-threat-to-llm-quality\/\",\"name\":\"AI-Generated Content: A Looming Threat to LLM Quality? - yippeekiAI.com\",\"isPartOf\":{\"@id\":\"https:\/\/www.yippeekiai.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.yippeekiai.com\/index.php\/2024\/06\/18\/ai-generated-content-a-looming-threat-to-llm-quality\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.yippeekiai.com\/index.php\/2024\/06\/18\/ai-generated-content-a-looming-threat-to-llm-quality\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.yippeekiai.com\/wp-content\/uploads\/2024\/06\/ai-recursive-e1718723592590-300x251.png\",\"datePublished\":\"2024-06-18T15:16:44+00:00\",\"dateModified\":\"2024-06-18T15:41:22+00:00\",\"author\":{\"@id\":\"https:\/\/www.yippeekiai.com\/#\/schema\/person\/0f77819abc97a306d09de01460b093cd\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.yippeekiai.com\/index.php\/2024\/06\/18\/ai-generated-content-a-looming-threat-to-llm-quality\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.yippeekiai.com\/index.php\/2024\/06\/18\/ai-generated-content-a-looming-threat-to-llm-quality\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.yippeekiai.com\/index.php\/2024\/06\/18\/ai-generated-content-a-looming-threat-to-llm-quality\/#primaryimage\",\"url\":\"https:\/\/www.yippeekiai.com\/wp-content\/uploads\/2024\/06\/ai-recursive-e1718723592590.png\",\"contentUrl\":\"https:\/\/www.yippeekiai.com\/wp-content\/uploads\/2024\/06\/ai-recursive-e1718723592590.png\",\"width\":1024,\"height\":858,\"caption\":\"AI Recursive learning\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.yippeekiai.com\/index.php\/2024\/06\/18\/ai-generated-content-a-looming-threat-to-llm-quality\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.yippeekiai.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"AI-Generated Content: A Looming Threat to LLM Quality?\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.yippeekiai.com\/#website\",\"url\":\"https:\/\/www.yippeekiai.com\/\",\"name\":\"yippeekiAI.com\",\"description\":\"Welcome to the AI Party, Pal\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.yippeekiai.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.yippeekiai.com\/#\/schema\/person\/0f77819abc97a306d09de01460b093cd\",\"name\":\"mcclane\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.yippeekiai.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/47d0d58a07412a81dfa7aeaf2f2e1d9d9d3b09b7f5b3281eba03e1331cbc1a9a?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/47d0d58a07412a81dfa7aeaf2f2e1d9d9d3b09b7f5b3281eba03e1331cbc1a9a?s=96&d=mm&r=g\",\"caption\":\"mcclane\"},\"sameAs\":[\"https:\/\/www.yippeekiai.com\"],\"url\":\"https:\/\/www.yippeekiai.com\/index.php\/author\/mcclane\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"AI-Generated Content: A Looming Threat to LLM Quality? - yippeekiAI.com","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.yippeekiai.com\/index.php\/2024\/06\/18\/ai-generated-content-a-looming-threat-to-llm-quality\/","og_locale":"en_US","og_type":"article","og_title":"AI-Generated Content: A Looming Threat to LLM Quality? - yippeekiAI.com","og_description":"Is there a risk of a recursive degradation effect for LLMs over time due to training on subpar AI-generated content from the web? Some estimates say that around 50% of all content on the web is now AI-generated, and that could rise to 90% by 2026. A more precise number depends on the methodology used [&hellip;]","og_url":"https:\/\/www.yippeekiai.com\/index.php\/2024\/06\/18\/ai-generated-content-a-looming-threat-to-llm-quality\/","og_site_name":"yippeekiAI.com","article_published_time":"2024-06-18T15:16:44+00:00","article_modified_time":"2024-06-18T15:41:22+00:00","og_image":[{"url":"https:\/\/www.yippeekiai.com\/wp-content\/uploads\/2024\/06\/ai-recursive-e1718723592590-300x251.png","type":"","width":"","height":""}],"author":"mcclane","twitter_card":"summary_large_image","twitter_misc":{"Written by":"mcclane","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.yippeekiai.com\/index.php\/2024\/06\/18\/ai-generated-content-a-looming-threat-to-llm-quality\/","url":"https:\/\/www.yippeekiai.com\/index.php\/2024\/06\/18\/ai-generated-content-a-looming-threat-to-llm-quality\/","name":"AI-Generated Content: A Looming Threat to LLM Quality? - yippeekiAI.com","isPartOf":{"@id":"https:\/\/www.yippeekiai.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.yippeekiai.com\/index.php\/2024\/06\/18\/ai-generated-content-a-looming-threat-to-llm-quality\/#primaryimage"},"image":{"@id":"https:\/\/www.yippeekiai.com\/index.php\/2024\/06\/18\/ai-generated-content-a-looming-threat-to-llm-quality\/#primaryimage"},"thumbnailUrl":"https:\/\/www.yippeekiai.com\/wp-content\/uploads\/2024\/06\/ai-recursive-e1718723592590-300x251.png","datePublished":"2024-06-18T15:16:44+00:00","dateModified":"2024-06-18T15:41:22+00:00","author":{"@id":"https:\/\/www.yippeekiai.com\/#\/schema\/person\/0f77819abc97a306d09de01460b093cd"},"breadcrumb":{"@id":"https:\/\/www.yippeekiai.com\/index.php\/2024\/06\/18\/ai-generated-content-a-looming-threat-to-llm-quality\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.yippeekiai.com\/index.php\/2024\/06\/18\/ai-generated-content-a-looming-threat-to-llm-quality\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.yippeekiai.com\/index.php\/2024\/06\/18\/ai-generated-content-a-looming-threat-to-llm-quality\/#primaryimage","url":"https:\/\/www.yippeekiai.com\/wp-content\/uploads\/2024\/06\/ai-recursive-e1718723592590.png","contentUrl":"https:\/\/www.yippeekiai.com\/wp-content\/uploads\/2024\/06\/ai-recursive-e1718723592590.png","width":1024,"height":858,"caption":"AI Recursive learning"},{"@type":"BreadcrumbList","@id":"https:\/\/www.yippeekiai.com\/index.php\/2024\/06\/18\/ai-generated-content-a-looming-threat-to-llm-quality\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.yippeekiai.com\/"},{"@type":"ListItem","position":2,"name":"AI-Generated Content: A Looming Threat to LLM Quality?"}]},{"@type":"WebSite","@id":"https:\/\/www.yippeekiai.com\/#website","url":"https:\/\/www.yippeekiai.com\/","name":"yippeekiAI.com","description":"Welcome to the AI Party, Pal","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.yippeekiai.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.yippeekiai.com\/#\/schema\/person\/0f77819abc97a306d09de01460b093cd","name":"mcclane","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.yippeekiai.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/47d0d58a07412a81dfa7aeaf2f2e1d9d9d3b09b7f5b3281eba03e1331cbc1a9a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/47d0d58a07412a81dfa7aeaf2f2e1d9d9d3b09b7f5b3281eba03e1331cbc1a9a?s=96&d=mm&r=g","caption":"mcclane"},"sameAs":["https:\/\/www.yippeekiai.com"],"url":"https:\/\/www.yippeekiai.com\/index.php\/author\/mcclane\/"}]}},"_links":{"self":[{"href":"https:\/\/www.yippeekiai.com\/index.php\/wp-json\/wp\/v2\/posts\/13","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.yippeekiai.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.yippeekiai.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.yippeekiai.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.yippeekiai.com\/index.php\/wp-json\/wp\/v2\/comments?post=13"}],"version-history":[{"count":5,"href":"https:\/\/www.yippeekiai.com\/index.php\/wp-json\/wp\/v2\/posts\/13\/revisions"}],"predecessor-version":[{"id":21,"href":"https:\/\/www.yippeekiai.com\/index.php\/wp-json\/wp\/v2\/posts\/13\/revisions\/21"}],"wp:attachment":[{"href":"https:\/\/www.yippeekiai.com\/index.php\/wp-json\/wp\/v2\/media?parent=13"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.yippeekiai.com\/index.php\/wp-json\/wp\/v2\/categories?post=13"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.yippeekiai.com\/index.php\/wp-json\/wp\/v2\/tags?post=13"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}