# ============================================================================ # florist.ca - ROBOTS.TXT FILE # Shopify E-commerce Platform # Last Updated: October 2025 # ============================================================================ # This file controls how search engines and AI bots crawl this website. # It balances SEO optimization with strategic AI visibility. # ============================================================================ # we use Shopify as our ecommerce platform # ============================================================================ # SECTION 1: DEFAULT USER-AGENT RULES (All Bots) # ============================================================================ # These rules apply to all crawlers that don't have specific rules below. # Blocks administrative, duplicate content, and security-sensitive pages. User-agent: * # --- EXPLICIT ALLOW RULES FOR SHOPIFY E-COMMERCE --- # These rules make crawl intent clearer for search engines Allow: /collections/ Allow: /products/ Allow: /pages/ Allow: /blogs/ Allow: /collections/all Allow: /sitemap.xml # --- ADMINISTRATIVE & BACKEND PAGES --- # Reason: Backend pages are password-protected and inaccessible to bots anyway. # Crawling these wastes crawl budget and provides no SEO value. Disallow: /a/downloads/-/* Disallow: /admin Disallow: /account # --- CHECKOUT & CART PAGES --- # Reason: These are user-specific, session-based pages with no SEO value. # They contain sensitive information and should never be indexed. Disallow: /cart Disallow: /orders Disallow: /checkouts/ Disallow: /checkout Disallow: /92981985565/checkouts Disallow: /92981985565/orders Disallow: /carts # --- COLLECTION FILTERS & SORTING PAGES --- # Reason: These create duplicate content issues. Filter/sort combinations # generate thousands of URL variations with identical or near-identical content. # This dilutes crawl budget and can cause keyword cannibalization. Disallow: /collections/*sort_by* Disallow: /*/collections/*sort_by* Disallow: /collections/*+* Disallow: /collections/*%2B* Disallow: /collections/*%2b* Disallow: /*/collections/*+* Disallow: /*/collections/*%2B* Disallow: /*/collections/*%2b* Disallow: */collections/*filter*&*filter* # --- BLOG FILTERS --- # Reason: Blog tag/category filters create duplicate content. Disallow: /blogs/*+* Disallow: /blogs/*%2B* Disallow: /blogs/*%2b* Disallow: /*/blogs/*+* Disallow: /*/blogs/*%2B* Disallow: /*/blogs/*%2b* # --- PARAMETER-BASED PAGES --- # Reason: Query parameters create duplicate versions of pages. Disallow: /*?*oseid=* Disallow: /*preview_theme_id* Disallow: /*preview_script_id* Disallow: /*?pr_prod # --- POLICY PAGES --- # Reason: Auto-generated policy pages are similar across all Shopify stores # and don't provide unique SEO value. Block to focus crawl budget on products. Disallow: /policies/ Disallow: /*/policies/ # --- DUPLICATE QUERY STRINGS --- # Reason: These create infinite duplicate URL variations. Disallow: /*/*?*ls=*&ls=* Disallow: /*/*?*ls%3D*%3Fls%3D* Disallow: /*/*?*ls%3d*%3fls%3d* # --- INTERNAL SEARCH PAGES --- # Reason: Search result pages are thin content with no SEO value. # Each search query creates a unique URL that dilutes crawl budget. Disallow: /search # --- SHOPIFY SYSTEM FILES --- # Reason: These are system/API endpoints not meant for public indexing. Disallow: /apple-app-site-association Disallow: /.well-known/shopify/monorail Disallow: /cdn/wpm/*.js Disallow: /recommendations/products Disallow: /*/recommendations/products # --- REMOTE PRODUCT HANDLES --- # Reason: These are system-generated product variations with hex codes # that create duplicate product pages. Block to prevent duplicate content. Disallow: /products/*-[a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]-remote Disallow: /*/products/*-[a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]-remote Disallow: /collections/*/products/*-[a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]-remote Disallow: /*/collections/*/products/*-[a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]-remote # --- ATOM FEEDS --- # Reason: Blocks automated feed readers from creating unnecessary traffic. Disallow: /*.atom$ # IMPORTANT NOTE: /collections/all is NOT blocked # Reason: Unlike the bloomex.co.nz example, we're allowing /collections/all # because it's a valuable page for SEO and user navigation. It helps search # engines discover all products and provides a comprehensive product listing. # Only filtered/sorted variations are blocked to prevent duplicate content. # SECTION 2: GOOGLE ADS BOT OPTIMIZATION # ============================================================================ # Reason: Google's ad bots need access to pages for Shopping campaigns and # product ads, but should avoid backend APIs and system pages. User-agent: adsbot-google Disallow: /checkouts/ Disallow: /checkout Disallow: /carts Disallow: /orders Disallow: /92981985565/checkouts Disallow: /92981985565/orders Disallow: /*?*oseid=* Disallow: /*preview_theme_id* Disallow: /*preview_script_id* Disallow: /cdn/wpm/*.js Disallow: /products/*-[a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]-remote Disallow: /*/products/*-[a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]-remote Disallow: /collections/*/products/*-[a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]-remote Disallow: /*/collections/*/products/*-[a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]-remote # ============================================================================ # SECTION 3: NUTCH BOT (Block Completely) # ============================================================================ # Reason: Nutch is an open-source crawler often used for data harvesting # without providing SEO benefit. Block entirely to save server resources. User-agent: Nutch Disallow: / # ============================================================================ # SECTION 4: AHREFS BOTS (Crawl Delay) - CONSOLIDATED # ============================================================================ # Reason: Ahrefs is a useful SEO tool, but can be aggressive. Apply crawl # delay to prevent server overload while still allowing SEO analysis. # Both AhrefsBot and AhrefsSiteAudit use identical rules, so they're combined. User-agent: AhrefsBot, AhrefsSiteAudit Crawl-delay: 10 Disallow: /a/downloads/-/* Disallow: /admin Disallow: /cart Disallow: /orders Disallow: /checkouts/ Disallow: /checkout Disallow: /92981985565/checkouts Disallow: /92981985565/orders Disallow: /carts Disallow: /account Disallow: /collections/*sort_by* Disallow: /*/collections/*sort_by* Disallow: /collections/*+* Disallow: /collections/*%2B* Disallow: /collections/*%2b* Disallow: /*/collections/*+* Disallow: /*/collections/*%2B* Disallow: /*/collections/*%2b* Disallow: */collections/*filter*&*filter* Disallow: /blogs/*+* Disallow: /blogs/*%2B* Disallow: /blogs/*%2b* Disallow: /*/blogs/*+* Disallow: /*/blogs/*%2B* Disallow: /*/blogs/*%2b* Disallow: /*?*oseid=* Disallow: /*preview_theme_id* Disallow: /*preview_script_id* Disallow: /policies/ Disallow: /*/policies/ Disallow: /*/*?*ls=*&ls=* Disallow: /*/*?*ls%3D*%3Fls%3D* Disallow: /*/*?*ls%3d*%3fls%3d* Disallow: /search Disallow: /apple-app-site-association Disallow: /.well-known/shopify/monorail Disallow: /cdn/wpm/*.js Disallow: /products/*-[a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]-remote Disallow: /*/products/*-[a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]-remote Disallow: /collections/*/products/*-[a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]-remote Disallow: /*/collections/*/products/*-[a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]-remote # ============================================================================ # SECTION 5: SEMRUSH BOT (Crawl Delay) # ============================================================================ # Reason: SEMrush is another SEO tool. Apply moderate crawl delay. User-agent: SemrushBot Crawl-delay: 10 # ============================================================================ # SECTION 6: PINTEREST BOT (Crawl Delay) # ============================================================================ # Reason: Pinterest needs to crawl for rich pins and product discovery, # but should be rate-limited to prevent overload. User-agent: Pinterest Crawl-delay: 1 # ============================================================================ # SECTION 7: AI TRAINING CONTROL BOTS # ============================================================================ # These bots are used specifically for AI model training and snippet generation. # You can choose to block them if you don't want your content used for training. # --- GOOGLE EXTENDED (AI Training & Snippets) --- # Reason: Used by Google to train Bard/Gemini and generate AI snippets. # Blocking this prevents your content from being used in Google's AI training # but still allows regular Googlebot to crawl for search results. User-agent: Google-Extended Disallow: / # --- APPLEBOT EXTENDED (AI Training) --- # Reason: Used by Apple to train their AI models and improve Siri responses. # Blocking this prevents AI training while still allowing regular Applebot. User-agent: Applebot-Extended Disallow: / # ============================================================================ # SECTION 8: AI CHATBOT WHITELIST - ALLOW ACCESS # ============================================================================ # STRATEGIC DECISION: Allow AI chatbots to crawl this e-commerce site # Reason: For e-commerce businesses, AI visibility is BENEFICIAL because: # 1. Product Discovery: Users asking AI chatbots for product recommendations # can discover your products organically # 2. Citation & Links: AI tools may reference and link to your products # 3. Brand Awareness: Being included in AI-generated shopping suggestions # increases brand visibility # 4. Future-Proofing: AI search is growing rapidly; blocking means missing # out on this emerging traffic channel # 5. Competitive Advantage: Many competitors block AI; allowing it gives you # visibility in AI-powered shopping assistants # # Unlike news/media companies that worry about content theft, e-commerce sites # BENEFIT from AI exposure as it drives product discovery and sales. # --- AMAZON BOT --- # Reason: Amazon's crawler for improving their services. Allow for potential # partnership opportunities and product discovery. User-agent: Amazonbot Disallow: # --- ANCHOR BROWSER --- # Reason: Experimental browser bot. Allow to support emerging technologies. User-agent: Anchor Browser Disallow: # --- APPLE BOT --- # Reason: Apple's standard crawler for Siri and Spotlight. Essential for # Apple ecosystem product discovery. User-agent: Applebot Disallow: # --- ARCHIVE.ORG BOT --- # Reason: Internet Archive provides historical records. Allow to preserve # your website's history (beneficial for brand legacy). User-agent: archive.org_bot Disallow: # --- BING BOT --- # Reason: Microsoft's search engine. Essential for Bing search visibility # and Microsoft ecosystem (Copilot, Edge, etc.). User-agent: BingBot Disallow: # --- BYTESPIDER (TikTok/ByteDance) --- # Reason: TikTok's crawler for content discovery. Allow for potential # TikTok Shop integration and social commerce visibility. User-agent: Bytespider Disallow: # --- COMMON CRAWL BOT --- # Reason: CCBot creates open datasets used by many AI companies. # Allowing helps with broad AI visibility. User-agent: CCBot Disallow: # --- CHATGPT USER BOT --- # Reason: Used when ChatGPT browses the web to answer real-time user queries. # CRITICAL for appearing in ChatGPT shopping recommendations. # Note: This is NOT used for training; it's for live query responses. User-agent: ChatGPT-User Disallow: # --- CLAUDE BOT --- # Reason: Anthropic's training crawler for Claude AI models. # Allow to be included in Claude's knowledge base. User-agent: ClaudeBot Disallow: # --- CLAUDE SEARCH BOT --- # Reason: Claude's real-time search bot for answering user queries. # Allow for product discovery in Claude conversations. User-agent: Claude-SearchBot Disallow: # --- CLAUDE USER --- # Reason: Used when Claude users trigger web searches. # Allow for live product recommendations and information. User-agent: Claude-User Disallow: # --- DUCKASSIST BOT (DuckDuckGo) --- # Reason: DuckDuckGo's AI-assisted answer bot. Crawls in real-time # for instant answers (not for training). Privacy-focused search engine. User-agent: DuckAssistBot Disallow: # --- FACEBOOK BOT --- # Reason: Meta's crawler for link previews, social sharing, and training # their AI models. Essential for Facebook/Instagram shopping integration. User-agent: FacebookBot Disallow: # --- GOOGLE BOT --- # Reason: Google's primary search crawler. ABSOLUTELY ESSENTIAL. # This is your main source of organic search traffic. User-agent: Googlebot Disallow: # --- GOOGLE CLOUD VERTEX BOT --- # Reason: Google Cloud's AI/ML crawler. Allow for inclusion in # Google's enterprise AI solutions. User-agent: Google-CloudVertexBot Disallow: # --- GPT BOT (OpenAI Training) --- # Reason: OpenAI's bot for training GPT models. Different from ChatGPT-User. # Allow to be included in future GPT model knowledge. User-agent: GPTBot Disallow: # --- META EXTERNAL AGENT --- # Reason: Meta's general-purpose web crawler for various Meta services. User-agent: Meta-ExternalAgent Disallow: # --- META EXTERNAL FETCHER --- # Reason: Meta's fetcher for real-time content retrieval (link previews, etc.). User-agent: Meta-ExternalFetcher Disallow: # --- MISTRAL AI USER --- # Reason: Mistral AI's user-triggered search bot. French AI company # with growing market presence in Europe. User-agent: MistralAI-User Disallow: # --- NOVELLUM AI CRAWL --- # Reason: Emerging AI crawler. Allow to future-proof for new AI platforms. User-agent: Novellum AI Crawl Disallow: # --- OAI-SEARCHBOT (OpenAI Search) --- # Reason: OpenAI's dedicated search bot for SearchGPT and real-time queries. # Allow for inclusion in OpenAI's search products. User-agent: OAI-SearchBot Disallow: # --- PERPLEXITY BOT --- # Reason: Perplexity AI's training and search crawler. Perplexity is # becoming a major AI search engine; allow for product visibility. User-agent: PerplexityBot Disallow: # --- PERPLEXITY SEARCH --- # Reason: Perplexity's real-time search crawler for answering queries. User-agent: Perplexity-Search Disallow: # --- PERPLEXITY USER --- # Reason: User-triggered Perplexity searches. User-agent: Perplexity-User Disallow: # --- PETAL BOT (Huawei) --- # Reason: Huawei's search engine bot for their ecosystem. # Important for markets where Huawei devices are popular (Asia-Pacific). User-agent: PetalBot Disallow: # --- PRO RATA INC --- # Reason: AI data company. Allow for potential AI licensing opportunities. User-agent: ProRataInc Disallow: # --- PRO RATA AI --- # Reason: ProRata's AI crawler. User-agent: ProRata.ai Disallow: # --- TIMPL BOT --- # Reason: Emerging AI/search bot. Allow to stay ahead of new platforms. User-agent: Timplbot Disallow: # ============================================================================ # END OF ROBOTS.TXT FILE Sitemap: https://florist.ca/sitemap.xml # ============================================================================ # IMPLEMENTATION NOTES: # # 1. This file should be placed in: /theme/templates/robots.txt.liquid # 2. This will override Shopify's default robots.txt # 3. Test after implementation using Google Search Console's robots.txt tester # 4. Monitor Google Search Console for any "blocked by robots.txt" warnings # 5. Key difference from bloomex.co.nz: /collections/all is NOT blocked # # CHANGES IN THIS VERSION: # 1. CONSOLIDATED: AhrefsBot and AhrefsSiteAudit combined into single block # 2. ADDED: Explicit Allow rules for /collections/, /products/, /pages/, # /blogs/, /collections/all, and /sitemap.xml # 3. ADDED: Google-Extended (blocks AI training for Google Bard/Gemini) # 4. ADDED: Applebot-Extended (blocks AI training for Apple AI) # # MONITORING RECOMMENDATIONS: # - Check Google Search Console weekly for indexing issues # - Monitor crawl stats to ensure bots aren't overwhelming your server # - Track organic traffic from AI referrals (look for referrals from # ChatGPT, Claude, Perplexity, etc. in Google Analytics) # - Review quarterly and update AI bot list as new bots emerge # # STRATEGIC RATIONALE: # This configuration balances four priorities: # 1. SEO Optimization: Prevents duplicate content and focuses crawl budget # 2. AI Visibility: Allows AI chatbots to discover and recommend products # 3. AI Training Control: Blocks training bots (Google-Extended, Applebot-Extended) # 4. Server Protection: Rate-limits aggressive crawlers to prevent overload # # For e-commerce, AI visibility = product discovery = potential sales # ============================================================================ # ============================================================================