Desktop version Jobs Stats

PROJECT (loknow_spiders_prod)
SPIDER (HeartwoodcalgarySpider)

2026-05-29 22:47:33 [scrapy.utils.log] INFO: Scrapy 2.10.0 started (bot: retriever)
2026-05-29 22:47:33 [scrapy.utils.log] INFO: Versions: lxml 5.1.0.0, libxml2 2.12.3, cssselect 1.2.0, parsel 1.9.0, w3lib 2.0.0, Twisted 22.4.0, Python 3.10.12 (main, Mar  3 2026, 11:56:32) [GCC 11.4.0], pyOpenSSL 24.1.0 (OpenSSL 3.2.1 30 Jan 2024), cryptography 42.0.5, Platform Linux-6.8.0-1030-aws-x86_64-with-glibc2.35
2026-05-29 22:47:33 [scrapy.addons] INFO: Enabled addons:
[]
2026-05-29 22:47:33 [scrapy.crawler] INFO: Overridden settings:
{'AUTOTHROTTLE_ENABLED': True,
 'BOT_NAME': 'retriever',
 'CONCURRENT_REQUESTS_PER_DOMAIN': 3,
 'DOWNLOAD_DELAY': 2.5,
 'HTTPCACHE_EXPIRATION_SECS': 86400,
 'IMAGES_STORE_S3_ACL': 'public-read',
 'LOG_FILE': '/home/scrapyd/logs/loknow_spiders_prod/HeartwoodcalgarySpider/5f7804025bb011f1bdc0c3bbae71ecef.log',
 'LOG_FORMATTER': 'loknow_spiders.logging.PoliteLogFormatter',
 'LOG_LEVEL': 'INFO',
 'NEWSPIDER_MODULE': 'loknow_spiders.spiders',
 'ROBOTSTXT_OBEY': True,
 'SPIDER_MODULES': ['loknow_spiders.spiders'],
 'TELNETCONSOLE_ENABLED': False,
 'TWISTED_REACTOR': 'twisted.internet.asyncioreactor.AsyncioSelectorReactor',
 'USER_AGENT': 'AdRetriever (https://adretriever.com)'}
2026-05-29 22:47:33 [py.warnings] WARNING: /home/scrapyd/venv/lib/python3.10/site-packages/scrapy/utils/request.py:248: ScrapyDeprecationWarning: '2.6' is a deprecated value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting.

It is also the default value. In other words, it is normal to get this warning if you have not defined a value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting. This is so for backward compatibility reasons, but it will change in a future version of Scrapy.

See the documentation of the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting for information on how to handle this deprecation.
  return cls(crawler)

2026-05-29 22:47:33 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.feedexport.FeedExporter',
 'scrapy.extensions.logstats.LogStats',
 'scrapy.extensions.throttle.AutoThrottle']
2026-05-29 22:47:33 [HeartwoodcalgarySpider] INFO: Current Spider Environment set to: prod
2026-05-29 22:47:33 [HeartwoodcalgarySpider] INFO: Retrieving AWS secret: spider_prod_secrets
2026-05-29 22:47:33 [botocore.credentials] INFO: Found credentials in shared credentials file: ~/.aws/credentials
2026-05-29 22:47:33 [HeartwoodcalgarySpider] INFO: In base spider ml_kwargs is None and scrape_type is inventory
2026-05-29 22:47:33 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'loknow_spiders.middlewares.ThrottlingRetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2026-05-29 22:47:33 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'loknow_spiders.middlewares.RetrieverSpiderMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2026-05-29 22:47:34 [scrapy.middleware] INFO: Enabled item pipelines:
['loknow_spiders.pipelines.HTMLDetailsDecruftPipeline',
 'loknow_spiders.pipelines.GPTRohitPipeline',
 'loknow_spiders.pipelines.PostProcessGPTPropertyPipeline',
 'loknow_spiders.pipelines.ForceNonZeroPricePipeline',
 'loknow_spiders.pipelines.HeartwoodcalgaryPipeline',
 'loknow_spiders.pipelines.RetrieverPipeline']
2026-05-29 22:47:34 [scrapy.core.engine] INFO: Spider opened
2026-05-29 22:47:34 [HeartwoodcalgarySpider] INFO: RetrieverAPI is making requests to https://api.adretriever.com with headers {'Authorization': 'Token 96dfb640234f1d676d6fa726f8eae6e7aab44cda'}
2026-05-29 22:47:34 [loknow_spiders] INFO: Scrape 7fda16fe has been opened.
2026-05-29 22:47:34 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2026-05-29 22:47:34 [scrapy-playwright] INFO: Starting download handler
2026-05-29 22:47:34 [scrapy-playwright] INFO: Starting download handler
2026-05-29 22:47:45 [scrapy-playwright] INFO: Launching browser chromium
2026-05-29 22:47:45 [scrapy-playwright] INFO: Browser chromium launched
2026-05-29 22:47:49 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 22:47:55 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 22:47:56 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:8e3a6613b33af69e7bea9c788a565b2c
2026-05-29 22:47:56 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/64-heartwood-lane-se
2026-05-29 22:48:00 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 22:48:01 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:24dd26301ebdc6f3d567c61d352927c8
2026-05-29 22:48:01 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/38-heartwood-villas-se
2026-05-29 22:48:07 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 22:48:07 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:4d9cf6c70ef3f34261bd50b9d5febeac
2026-05-29 22:48:07 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/68-heartwood-lane-se
2026-05-29 22:48:11 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 22:48:12 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:b57bb2e31cb9dc7ebaaca0c269db2372
2026-05-29 22:48:12 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/7063-rangeview-ave-se
2026-05-29 22:48:16 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 22:48:16 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:4705abd5bad7dab1c9a03828b03db2c4
2026-05-29 22:48:16 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/7067-rangeview-ave-se
2026-05-29 22:48:24 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 22:48:24 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:258ae584a4eacca27b0e3083c5036eb6
2026-05-29 22:48:24 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/7055-rangeview-ave-se
2026-05-29 22:48:29 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 22:48:29 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:6edc6f0b28f4d1619bde24a0afa6e386
2026-05-29 22:48:29 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/7059-rangeview-ave-se-ry63y
2026-05-29 22:48:34 [scrapy.extensions.logstats] INFO: Crawled 10 pages (at 10 pages/min), scraped 7 items (at 7 items/min)
2026-05-29 22:48:34 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 22:48:35 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:0f96dc18f77d6666ff9523bd11becec1
2026-05-29 22:48:35 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/42-heartwood-villas-se
2026-05-29 22:48:35 [root] INFO: Making post request to https://api.adretriever.com/api/import/7fda16fe
2026-05-29 22:48:35 [loknow_spiders] INFO: Created: 8 items out of 8
2026-05-29 22:48:39 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 22:48:40 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:307c5b5effa19f38a210f7df11b9bd04
2026-05-29 22:48:40 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/21-heartwood-lane-se
2026-05-29 22:48:46 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 22:48:46 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:49715e89df20f660a8f1b7032aa1a63d
2026-05-29 22:48:46 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/17-heartwood-lane-se
2026-05-29 22:48:49 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 22:48:49 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:3f8a3795a6c31efcf9760ff9c3fff544
2026-05-29 22:48:49 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/18-heartwood-villas-se
2026-05-29 22:48:54 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 22:48:54 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:0e3a54b8f2fbf9a5d8fc4fd8e7299c69
2026-05-29 22:48:54 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/94-heartwood-villas
2026-05-29 22:49:00 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 22:49:00 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:fb8c800022b65089c3bbbf1b919a4961
2026-05-29 22:49:00 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/34-heartwood-villas-se
2026-05-29 22:49:06 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 22:49:06 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:f28360fb937fe248a81fffd6ca9cc45d
2026-05-29 22:49:06 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/25-heartwood-lane-se
2026-05-29 22:49:08 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 22:49:08 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:8b8f28854a6a32985120ac731e8acae1
2026-05-29 22:49:08 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/26-heartwood-villas-se
2026-05-29 22:49:14 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 22:49:14 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:2d98355c1f0d5aaabff9e42148e57d39
2026-05-29 22:49:14 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/30-heartwood-villas-se
2026-05-29 22:49:19 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 22:49:20 [loknow_spiders] INFO: Cache: missed loknow_spiders:gpt-response-cache:da8ca1bee7aa4217d7d3a1455b65ce3d
2026-05-29 22:49:22 [httpx] INFO: HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2026-05-29 22:49:22 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/230-heartwood-parade-se
2026-05-29 22:49:24 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 22:49:24 [loknow_spiders] INFO: Cache: missed loknow_spiders:gpt-response-cache:1fa0849343bb6320205686ced2c8850c
2026-05-29 22:49:26 [httpx] INFO: HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2026-05-29 22:49:26 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/367-heartwood-gardens-se
2026-05-29 22:49:29 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 22:49:29 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:d1782ae3e885fc40d67a446655b53168
2026-05-29 22:49:29 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/396-heartwood-grove-se
2026-05-29 22:49:33 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 22:49:33 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:aee66bee2107d3a81f511f2fe4a902c8
2026-05-29 22:49:33 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/343-heartwood-gardens-se
2026-05-29 22:49:34 [scrapy.extensions.logstats] INFO: Crawled 22 pages (at 12 pages/min), scraped 20 items (at 13 items/min)
2026-05-29 22:49:38 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 22:49:38 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:c2b072e8dc1f0beb89dd8694ac005e2a
2026-05-29 22:49:38 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/5-heartwood-lane-se
2026-05-29 22:49:38 [root] INFO: Making post request to https://api.adretriever.com/api/import/7fda16fe
2026-05-29 22:49:38 [loknow_spiders] INFO: Created: 13 items out of 13
2026-05-29 22:49:42 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 22:49:43 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:5ab3ec0d414c7708efa317ad76c71a37
2026-05-29 22:49:43 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/144-heartwood-lane-se
2026-05-29 22:49:47 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 22:49:47 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:6bc56b1456ec51d40a7443e98ef08059
2026-05-29 22:49:47 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/6-heartwood-villas-se
2026-05-29 22:49:50 [HeartwoodcalgarySpider] INFO: Closed Playwright page
2026-05-29 22:49:50 [loknow_spiders] INFO: Cache: hit loknow_spiders:gpt-response-cache:4d59215d6ef31e23c1bcadfdf9a2f4fa
2026-05-29 22:49:50 [HeartwoodcalgarySpider] INFO: Processed https://www.heartwoodcalgary.com/quick-possessions/355-heartwood-gardens-se
2026-05-29 22:49:50 [scrapy.core.engine] INFO: Closing spider (finished)
2026-05-29 22:49:50 [root] INFO: Making post request to https://api.adretriever.com/api/import/7fda16fe
2026-05-29 22:49:50 [loknow_spiders] INFO: Created: 3 items out of 3
2026-05-29 22:49:50 [loknow_spiders] INFO: GPT cache hit rate: 91.66666666666666%
2026-05-29 22:49:50 [scrapy.extensions.feedexport] INFO: Stored jsonlines feed (24 items) in: file:///home/scrapyd/items/loknow_spiders_prod/HeartwoodcalgarySpider/5f7804025bb011f1bdc0c3bbae71ecef.jl
2026-05-29 22:49:50 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 10044,
 'downloader/request_count': 26,
 'downloader/request_method_count/GET': 26,
 'downloader/response_bytes': 5098991,
 'downloader/response_count': 26,
 'downloader/response_status_count/200': 26,
 'elapsed_time_seconds': 136.261436,
 'feedexport/success_count/FileFeedStorage': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2026, 5, 29, 22, 49, 50, 576669),
 'httpcompression/response_bytes': 1515,
 'httpcompression/response_count': 1,
 'item_scraped_count': 24,
 'log_count/INFO': 103,
 'log_count/WARNING': 1,
 'memusage/max': 173494272,
 'memusage/startup': 125464576,
 'playwright/context_count': 1,
 'playwright/context_count/max_concurrent': 1,
 'playwright/context_count/persistent/False': 1,
 'playwright/context_count/remote/False': 1,
 'playwright/page_count': 25,
 'playwright/page_count/max_concurrent': 2,
 'playwright/request_count': 2403,
 'playwright/request_count/method/GET': 2124,
 'playwright/request_count/method/POST': 279,
 'playwright/request_count/navigation': 50,
 'playwright/request_count/resource_type/document': 50,
 'playwright/request_count/resource_type/fetch': 205,
 'playwright/request_count/resource_type/font': 125,
 'playwright/request_count/resource_type/image': 275,
 'playwright/request_count/resource_type/other': 25,
 'playwright/request_count/resource_type/script': 1425,
 'playwright/request_count/resource_type/stylesheet': 199,
 'playwright/request_count/resource_type/xhr': 99,
 'playwright/response_count': 2373,
 'playwright/response_count/method/GET': 2124,
 'playwright/response_count/method/POST': 249,
 'playwright/response_count/resource_type/document': 50,
 'playwright/response_count/resource_type/fetch': 180,
 'playwright/response_count/resource_type/font': 125,
 'playwright/response_count/resource_type/image': 275,
 'playwright/response_count/resource_type/other': 25,
 'playwright/response_count/resource_type/script': 1425,
 'playwright/response_count/resource_type/stylesheet': 199,
 'playwright/response_count/resource_type/xhr': 94,
 'request_depth_max': 1,
 'response_received_count': 26,
 'robotstxt/request_count': 1,
 'robotstxt/response_count': 1,
 'robotstxt/response_status_count/200': 1,
 'scheduler/dequeued': 25,
 'scheduler/dequeued/memory': 25,
 'scheduler/enqueued': 25,
 'scheduler/enqueued/memory': 25,
 'start_time': datetime.datetime(2026, 5, 29, 22, 47, 34, 315233)}
2026-05-29 22:49:50 [scrapy.core.engine] INFO: Spider closed (finished)
2026-05-29 22:49:50 [scrapy-playwright] INFO: Closing download handler
2026-05-29 22:49:50 [scrapy-playwright] INFO: Closing download handler
2026-05-29 22:49:50 [scrapy-playwright] INFO: Closing browser

PROJECT (loknow_spiders_prod)
SPIDER (HeartwoodcalgarySpider)