Python scrapy.spiders.CrawlSpider() Examples
The following are 3
code examples of scrapy.spiders.CrawlSpider().
You can vote up the ones you like or vote down the ones you don't like,
and go to the original project or source file by following the links above each example.
You may also want to check out all available functions/classes of the module
scrapy.spiders
, or try the search function
.
Example #1
Source File: utils.py From scrapy-autounit with BSD 3-Clause "New" or "Revised" License | 7 votes |
def parse_request(request, spider): _request = request_to_dict(request, spider=spider) if not _request['callback']: _request['callback'] = 'parse' elif isinstance(spider, CrawlSpider): rule = request.meta.get('rule') if rule is not None: _request['callback'] = spider.rules[rule].callback clean_headers(_request['headers'], spider.settings) _meta = {} for key, value in _request.get('meta').items(): if key != '_autounit': _meta[key] = parse_object(value, spider) _request['meta'] = _meta return _request
Example #2
Source File: utils.py From scrapy-autounit with BSD 3-Clause "New" or "Revised" License | 5 votes |
def get_filter_attrs(spider): attrs = {'crawler', 'settings', 'start_urls'} if isinstance(spider, CrawlSpider): attrs |= {'rules', '_rules'} return attrs
Example #3
Source File: haofl_spider.py From Spiders with Apache License 2.0 | 5 votes |
def parse_start_url(self, response): """CrawlSpider默认先从start_url获取Request,然后回调parse_start_url方法""" li_list = response.xpath('//*[@id="post_container"]/li') for li_div in li_list: link = li_div.xpath('.//div[@class="thumbnail"]/a/@href').extract_first() yield scrapy.Request(link, callback=self.parse_detail_url) next_page = response.xpath('//div[@class="pagination"]/a[@class="next"]/@href').extract_first() if next_page: yield scrapy.Request(next_page, callback=self.parse_start_url)