2026-05-29 23:38:34 [scrapy.utils.log] INFO: Scrapy 2.10.0 started (bot: retriever)
2026-05-29 23:38:34 [scrapy.utils.log] INFO: Versions: lxml 5.1.0.0, libxml2 2.12.3, cssselect 1.2.0, parsel 1.9.0, w3lib 2.0.0, Twisted 22.4.0, Python 3.10.12 (main, Mar 3 2026, 11:56:32) [GCC 11.4.0], pyOpenSSL 24.1.0 (OpenSSL 3.2.1 30 Jan 2024), cryptography 42.0.5, Platform Linux-6.8.0-1030-aws-x86_64-with-glibc2.35
2026-05-29 23:38:34 [scrapy.addons] INFO: Enabled addons:
[]
2026-05-29 23:38:34 [scrapy.crawler] INFO: Overridden settings:
{'AUTOTHROTTLE_ENABLED': True,
'BOT_NAME': 'retriever',
'CONCURRENT_REQUESTS_PER_DOMAIN': 3,
'DOWNLOAD_DELAY': 2.5,
'HTTPCACHE_EXPIRATION_SECS': 86400,
'IMAGES_STORE_S3_ACL': 'public-read',
'LOG_FILE': '/home/scrapyd/logs/loknow_spiders_prod/HeartwoodcalgarySpider/7e3bd2c25bb711f1bdc0c3bbae71ecef.log',
'LOG_FORMATTER': 'loknow_spiders.logging.PoliteLogFormatter',
'LOG_LEVEL': 'INFO',
'NEWSPIDER_MODULE': 'loknow_spiders.spiders',
'ROBOTSTXT_OBEY': True,
'SPIDER_MODULES': ['loknow_spiders.spiders'],
'TELNETCONSOLE_ENABLED': False,
'TWISTED_REACTOR': 'twisted.internet.asyncioreactor.AsyncioSelectorReactor',
'USER_AGENT': 'AdRetriever (https://adretriever.com)'}
2026-05-29 23:38:34 [py.warnings] WARNING: /home/scrapyd/venv/lib/python3.10/site-packages/scrapy/utils/request.py:248: ScrapyDeprecationWarning: '2.6' is a deprecated value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting.
It is also the default value. In other words, it is normal to get this warning if you have not defined a value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting. This is so for backward compatibility reasons, but it will change in a future version of Scrapy.
See the documentation of the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting for information on how to handle this deprecation.
return cls(crawler)
2026-05-29 23:38:34 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.throttle.AutoThrottle']
2026-05-29 23:38:34 [HeartwoodcalgarySpider] INFO: Current Spider Environment set to: prod
2026-05-29 23:38:34 [HeartwoodcalgarySpider] INFO: Retrieving AWS secret: spider_prod_secrets
2026-05-29 23:38:34 [botocore.credentials] INFO: Found credentials in shared credentials file: ~/.aws/credentials
2026-05-29 23:38:34 [HeartwoodcalgarySpider] INFO: In base spider ml_kwargs is None and scrape_type is inventory
2026-05-29 23:38:34 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'loknow_spiders.middlewares.ThrottlingRetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2026-05-29 23:38:34 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'loknow_spiders.middlewares.RetrieverSpiderMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2026-05-29 23:38:35 [scrapy.middleware] INFO: Enabled item pipelines:
['loknow_spiders.pipelines.HTMLDetailsDecruftPipeline',
'loknow_spiders.pipelines.GPTRohitPipeline',
'loknow_spiders.pipelines.PostProcessGPTPropertyPipeline',
'loknow_spiders.pipelines.ForceNonZeroPricePipeline',
'loknow_spiders.pipelines.HeartwoodcalgaryPipeline',
'loknow_spiders.pipelines.RetrieverPipeline']
2026-05-29 23:38:35 [scrapy.core.engine] INFO: Spider opened
2026-05-29 23:38:35 [HeartwoodcalgarySpider] INFO: RetrieverAPI is making requests to https://api.adretriever.com with headers {'Authorization': 'Token 96dfb640234f1d676d6fa726f8eae6e7aab44cda'}
2026-05-29 23:38:35 [loknow_spiders] INFO: Scrape 7c60f8ed has been opened.
2026-05-29 23:38:35 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2026-05-29 23:38:35 [scrapy-playwright] INFO: Starting download handler
2026-05-29 23:38:35 [scrapy-playwright] INFO: Starting download handler
2026-05-29 23:38:45 [scrapy-playwright] INFO: Launching browser chromium
2026-05-29 23:38:45 [scrapy-playwright] INFO: Browser chromium launched
2026-05-29 23:38:54 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 23:39:04 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 23:39:05 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:8e3a6613b33af69e7bea9c788a565b2c
2026-05-29 23:39:05 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/64-heartwood-lane-se
2026-05-29 23:39:13 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 23:39:13 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:24dd26301ebdc6f3d567c61d352927c8
2026-05-29 23:39:13 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/38-heartwood-villas-se
2026-05-29 23:39:20 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 23:39:20 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:4d9cf6c70ef3f34261bd50b9d5febeac
2026-05-29 23:39:20 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/68-heartwood-lane-se
2026-05-29 23:39:27 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 23:39:27 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:b57bb2e31cb9dc7ebaaca0c269db2372
2026-05-29 23:39:27 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/7063-rangeview-ave-se
2026-05-29 23:39:35 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 23:39:35 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:4705abd5bad7dab1c9a03828b03db2c4
2026-05-29 23:39:35 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/7067-rangeview-ave-se
2026-05-29 23:39:35 [scrapy.extensions.logstats] INFO: Crawled 7 pages (at 7 pages/min), scraped 5 items (at 5 items/min)
2026-05-29 23:39:45 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 23:39:45 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:258ae584a4eacca27b0e3083c5036eb6
2026-05-29 23:39:45 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/7055-rangeview-ave-se
2026-05-29 23:39:45 [root] INFO: Making post request to https://api.adretriever.com/api/import/7c60f8ed
2026-05-29 23:39:46 [loknow_spiders] INFO: Created: 6 items out of 6
2026-05-29 23:39:56 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 23:39:56 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:6edc6f0b28f4d1619bde24a0afa6e386
2026-05-29 23:39:56 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/7059-rangeview-ave-se-ry63y
2026-05-29 23:40:08 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 23:40:08 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:0f96dc18f77d6666ff9523bd11becec1
2026-05-29 23:40:08 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/42-heartwood-villas-se
2026-05-29 23:40:12 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 23:40:12 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:307c5b5effa19f38a210f7df11b9bd04
2026-05-29 23:40:12 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/21-heartwood-lane-se
2026-05-29 23:40:17 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 23:40:17 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:49715e89df20f660a8f1b7032aa1a63d
2026-05-29 23:40:17 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/17-heartwood-lane-se
2026-05-29 23:40:24 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 23:40:25 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:3f8a3795a6c31efcf9760ff9c3fff544
2026-05-29 23:40:25 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/18-heartwood-villas-se
2026-05-29 23:40:31 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 23:40:32 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:0e3a54b8f2fbf9a5d8fc4fd8e7299c69
2026-05-29 23:40:32 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/94-heartwood-villas
2026-05-29 23:40:35 [scrapy.extensions.logstats] INFO: Crawled 14 pages (at 7 pages/min), scraped 12 items (at 7 items/min)
2026-05-29 23:40:36 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 23:40:36 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:fb8c800022b65089c3bbbf1b919a4961
2026-05-29 23:40:36 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/34-heartwood-villas-se
2026-05-29 23:40:41 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 23:40:41 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:f28360fb937fe248a81fffd6ca9cc45d
2026-05-29 23:40:41 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/25-heartwood-lane-se
2026-05-29 23:40:46 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 23:40:47 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:8b8f28854a6a32985120ac731e8acae1
2026-05-29 23:40:47 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/26-heartwood-villas-se
2026-05-29 23:40:47 [root] INFO: Making post request to https://api.adretriever.com/api/import/7c60f8ed
2026-05-29 23:40:47 [loknow_spiders] INFO: Created: 9 items out of 9
2026-05-29 23:40:52 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 23:40:52 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:2d98355c1f0d5aaabff9e42148e57d39
2026-05-29 23:40:52 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/30-heartwood-villas-se
2026-05-29 23:40:58 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 23:40:58 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:da8ca1bee7aa4217d7d3a1455b65ce3d
2026-05-29 23:40:58 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/230-heartwood-parade-se
2026-05-29 23:41:02 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 23:41:02 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:1fa0849343bb6320205686ced2c8850c
2026-05-29 23:41:02 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/367-heartwood-gardens-se
2026-05-29 23:41:06 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 23:41:06 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:d1782ae3e885fc40d67a446655b53168
2026-05-29 23:41:06 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/396-heartwood-grove-se
2026-05-29 23:41:11 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 23:41:11 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:aee66bee2107d3a81f511f2fe4a902c8
2026-05-29 23:41:11 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/343-heartwood-gardens-se
2026-05-29 23:41:15 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 23:41:15 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:c2b072e8dc1f0beb89dd8694ac005e2a
2026-05-29 23:41:15 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/5-heartwood-lane-se
2026-05-29 23:41:20 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 23:41:20 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:5ab3ec0d414c7708efa317ad76c71a37
2026-05-29 23:41:20 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/144-heartwood-lane-se
2026-05-29 23:41:23 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 23:41:23 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:6bc56b1456ec51d40a7443e98ef08059
2026-05-29 23:41:23 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/6-heartwood-villas-se
2026-05-29 23:41:29 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 23:41:30 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:4d59215d6ef31e23c1bcadfdf9a2f4fa
2026-05-29 23:41:30 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/355-heartwood-gardens-se
2026-05-29 23:41:30 [scrapy.core.engine] INFO: Closing spider (finished)
2026-05-29 23:41:30 [root] INFO: Making post request to https://api.adretriever.com/api/import/7c60f8ed
2026-05-29 23:41:31 [loknow_spiders] INFO: Created: 9 items out of 9
2026-05-29 23:41:31 [loknow_spiders] INFO: GPT cache hit rate: 100.0%
2026-05-29 23:41:31 [scrapy.extensions.feedexport] INFO: Stored jsonlines feed (24 items) in: file:///home/scrapyd/items/loknow_spiders_prod/HeartwoodcalgarySpider/7e3bd2c25bb711f1bdc0c3bbae71ecef.jl
2026-05-29 23:41:31 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 10044,
'downloader/request_count': 26,
'downloader/request_method_count/GET': 26,
'downloader/response_bytes': 5085323,
'downloader/response_count': 26,
'downloader/response_status_count/200': 26,
'elapsed_time_seconds': 176.04062,
'feedexport/success_count/FileFeedStorage': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2026, 5, 29, 23, 41, 31, 652257),
'httpcompression/response_bytes': 1515,
'httpcompression/response_count': 1,
'item_scraped_count': 24,
'log_count/INFO': 101,
'log_count/WARNING': 1,
'memusage/max': 173211648,
'memusage/startup': 125239296,
'playwright/context_count': 1,
'playwright/context_count/max_concurrent': 1,
'playwright/context_count/persistent/False': 1,
'playwright/context_count/remote/False': 1,
'playwright/page_count': 25,
'playwright/page_count/max_concurrent': 2,
'playwright/request_count': 2414,
'playwright/request_count/method/GET': 2134,
'playwright/request_count/method/POST': 280,
'playwright/request_count/navigation': 50,
'playwright/request_count/resource_type/document': 50,
'playwright/request_count/resource_type/fetch': 207,
'playwright/request_count/resource_type/font': 125,
'playwright/request_count/resource_type/image': 287,
'playwright/request_count/resource_type/other': 25,
'playwright/request_count/resource_type/script': 1423,
'playwright/request_count/resource_type/stylesheet': 199,
'playwright/request_count/resource_type/xhr': 98,
'playwright/response_count': 2385,
'playwright/response_count/method/GET': 2134,
'playwright/response_count/method/POST': 251,
'playwright/response_count/resource_type/document': 50,
'playwright/response_count/resource_type/fetch': 182,
'playwright/response_count/resource_type/font': 125,
'playwright/response_count/resource_type/image': 287,
'playwright/response_count/resource_type/other': 25,
'playwright/response_count/resource_type/script': 1423,
'playwright/response_count/resource_type/stylesheet': 199,
'playwright/response_count/resource_type/xhr': 94,
'request_depth_max': 1,
'response_received_count': 26,
'robotstxt/request_count': 1,
'robotstxt/response_count': 1,
'robotstxt/response_status_count/200': 1,
'scheduler/dequeued': 25,
'scheduler/dequeued/memory': 25,
'scheduler/enqueued': 25,
'scheduler/enqueued/memory': 25,
'start_time': datetime.datetime(2026, 5, 29, 23, 38, 35, 611637)}
2026-05-29 23:41:31 [scrapy.core.engine] INFO: Spider closed (finished)
2026-05-29 23:41:31 [scrapy-playwright] INFO: Closing download handler
2026-05-29 23:41:31 [scrapy-playwright] INFO: Closing download handler
2026-05-29 23:41:31 [scrapy-playwright] INFO: Closing browser