This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ scrapy runspider sitemapspider.py | |
2016-07-26 10:41:29 [scrapy] INFO: Scrapy 1.1.0 started (bot: scrapybot) | |
2016-07-26 10:41:29 [scrapy] INFO: Overridden settings: {} | |
2016-07-26 10:41:32 [scrapy] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.logstats.LogStats'] | |
2016-07-26 10:41:34 [scrapy] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.retry.RetryMiddleware', |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ scrapy runspider delayspider.py | |
2016-06-29 11:52:19 [scrapy] INFO: Scrapy 1.1.0 started (bot: scrapybot) | |
2016-06-29 11:52:19 [scrapy] INFO: Overridden settings: {} | |
2016-06-29 11:52:19 [scrapy] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.logstats.LogStats'] | |
2016-06-29 11:52:19 [scrapy] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.retry.RetryMiddleware', |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# --- install system dependencies (sudo apt-get install) | |
scrapyuser@8fb08da8f18b:/$ sudo apt-get install python3 python-dev python3-dev \ | |
> build-essential libssl-dev libffi-dev \ | |
> libxml2-dev libxslt-dev \ | |
> python-pip | |
[sudo] password for scrapyuser: | |
Reading package lists... Done | |
Building dependency tree | |
Reading state information... Done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# adapted from http://stackoverflow.com/questions/25845538/using-sudo-inside-a-docker-container | |
FROM ubuntu:16.04 | |
RUN apt-get update | |
RUN apt-get -y install sudo | |
RUN useradd -m scrapyuser && echo "scrapyuser:scrapypwd" | chpasswd && adduser scrapyuser sudo | |
USER scrapyuser | |
CMD /bin/bash |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
FROM centos:centos7 | |
RUN yum update -y | |
# Install Python and dev headers | |
RUN yum install -y \ | |
python-devel | |
# Install cryptography | |
# https://cryptography.io/en/latest/installation/#building-cryptography-on-linux |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
FROM ubuntu:trusty | |
ENV DEBIAN_FRONTEND noninteractive | |
RUN apt-get update | |
# Install Python3 and dev headers | |
RUN apt-get install -y \ | |
python3 \ | |
python-dev \ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ scrapy crawl httpbin | |
2016-04-06 00:16:58 [scrapy] INFO: Scrapy 1.1.0rc3 started (bot: mwtest) | |
2016-04-06 00:16:58 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'mwtest.spiders', 'SPIDER_MODULES': ['mwtest.spiders'], 'BOT_NAME': 'mwtest'} | |
2016-04-06 00:16:58 [scrapy] INFO: Enabled extensions: | |
['scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2016-04-06 00:16:58 [py.warnings] WARNING: /home/paul/tmp/mwtest/mwtest/middlewares.py:1: ScrapyDeprecationWarning: Module `scrapy.log` has been deprecated, Scrapy now relies on the builtin Python library for logging. Read the updated logging entry in the documentation to learn more. | |
from scrapy import log, signals |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ scrapy shell | |
2016-02-01 12:41:35 [scrapy] INFO: Scrapy 1.1.0dev1 started (bot: scrapybot) | |
2016-02-01 12:41:35 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter'} | |
2016-02-01 12:41:35 [scrapy] INFO: Enabled extensions: | |
['scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2016-02-01 12:41:35 [scrapy] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ scrapy shell | |
2016-01-28 18:21:43 [scrapy] INFO: Scrapy 1.1.0dev1 started (bot: scrapybot) | |
2016-01-28 18:21:43 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter'} | |
2016-01-28 18:21:43 [scrapy] INFO: Enabled extensions: | |
['scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2016-01-28 18:21:44 [scrapy] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ scrapy shell "https://www.youtube.com/watch?v=1EFnX1UkXVU" | |
/usr/local/lib/python2.7/dist-packages/twisted/internet/_sslverify.py:184: UserWarning: You do not have the service_identity module installed. Please install it from <https://pypi.python.org/pypi/service_identity>. Without the service_identity module and a recent enough pyOpenSSL tosupport it, Twisted can perform only rudimentary TLS client hostnameverification. Many valid certificate/hostname mappings may be rejected. | |
verifyHostname, VerificationError = _selectVerifyImplementation() | |
2014-12-30 15:18:08+0100 [scrapy] INFO: Scrapy 0.24.4 started (bot: scrapybot) | |
2014-12-30 15:18:08+0100 [scrapy] INFO: Optional features available: ssl, http11, boto, django | |
2014-12-30 15:18:08+0100 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0} | |
2014-12-30 15:18:08+0100 [scrapy] INFO: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState | |
2014-12-30 15:18:08+0100 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddlewa |
NewerOlder