In February 2024, Google and Reddit announced a landmark licensing agreement reportedly worth $60 million annually. Under this deal, Google gained enhanced access to Reddit's Data API, enabling real-time indexing of Reddit content and the use of Reddit discussions to train Google's AI models. This was not a routine business partnership. It was a signal that fundamentally reshaped how brands should think about their digital presence.

The deal confirmed what many in the industry had suspected: Reddit's user-generated content has become one of the most valuable training datasets in the world for artificial intelligence systems. The implications for brands are profound and still widely underappreciated.

What the Deal Actually Includes

The agreement gives Google access to Reddit's full corpus of conversations — billions of posts and comments spanning nearly two decades — for the purpose of training its Gemini AI models. In return, Reddit receives a reliable revenue stream and deeper integration with Google's search ecosystem.

Practically, this means Reddit content now appears more prominently in Google Search results than ever before. Google's AI Overviews — the AI-generated summaries that appear at the top of search results — draw heavily from Reddit discussions. When a user asks Google "best CRM for small business" or "is Notion worth it," the AI Overview frequently cites or paraphrases Reddit threads.

But the implications go far beyond search rankings. The Reddit data that Google uses to train Gemini becomes embedded in the model's knowledge base. When Gemini recommends products, explains concepts, or compares alternatives, it draws on patterns learned from millions of Reddit discussions. The brands that appear frequently and positively in those discussions gain a persistent advantage in AI-generated recommendations.

Reddit as AI Training Data: Why It Matters

AI models need training data that reflects how real humans discuss, evaluate, and recommend things. Reddit is uniquely suited for this purpose. Unlike social media platforms where content is primarily self-promotional or performative, Reddit discussions tend to be candid, detailed, and focused on problem-solving.

When someone on r/skincare writes a 500-word review of a moisturizer, describing how it performed on their specific skin type over three months, that content is extraordinarily valuable for training AI systems to understand product quality signals. The upvote/downvote system provides an additional quality signal that helps AI models distinguish between genuine recommendations and low-quality content.

Google is not the only company that recognizes this value. OpenAI, the company behind ChatGPT, signed a similar deal with Reddit in 2024. Anthropic trains its models on publicly available web data that includes Reddit. Perplexity AI, which has gained millions of users as an AI-powered search engine, draws extensively from Reddit threads when generating answers.

The Brand Visibility Equation Has Changed

For two decades, brand visibility online was primarily a function of two things: how much you spent on advertising and how well you optimized for search engines. The Google-Reddit deal — and the broader trend of AI systems learning from user-generated content — has introduced a third variable: how your brand appears in authentic online discussions.

This is a fundamentally different challenge. You cannot buy your way into a Reddit thread the way you can buy a Google ad. You cannot optimize a Reddit comment the way you can optimize a webpage for SEO. The only way to appear positively in Reddit discussions is to have real users genuinely recommending your product, or to participate authentically in community conversations in ways that add value.

What This Means for AI Recommendations

Research from Profound has shown that AI chatbots cite sources in approximately 40% of their responses. When they do cite sources, Reddit threads are among the most frequently referenced. This means that the brand mentions accumulating on Reddit today are not just driving current traffic — they are shaping how AI systems will recommend products for years to come.

Consider a concrete example. If someone asks ChatGPT "What's the best email marketing platform for an e-commerce store?", the model generates its response based on patterns learned from its training data. If Klaviyo appears in hundreds of positive Reddit discussions about e-commerce email marketing, while a competitor appears in only a handful, the AI is more likely to recommend Klaviyo. This is not because someone paid for placement. It is because the training data reflects a genuine pattern of user preference.

The Urgency for Brands

The $60 million deal is a lagging indicator. It confirmed a trend that had been building for years. Google had already been favoring Reddit content in search results throughout 2023. AI companies had already been training on Reddit data. The deal simply formalized and accelerated what was already happening.

For brands, the implication is urgent. The AI models being trained right now will influence purchasing decisions for the next several years. The Reddit discussions happening today are becoming part of the permanent knowledge base that AI systems use to generate recommendations. Brands that are absent from these discussions are not just missing out on current traffic. They are ceding ground in the AI-driven future of product discovery.

The window to build a strong Reddit presence is narrowing as more brands recognize this opportunity. Those that act now — building authentic presences, earning genuine recommendations, and participating meaningfully in community discussions — will have a compounding advantage that becomes increasingly difficult for competitors to replicate.