之前将了多线程的实例可以查看之前的笔记:https://sulao.cn/post/606,由于多线程中进程资源是共享的,所以传入queue队列,由各子线程去queue队列中获取资源,现在在多进程中已经不适用,因为多进程中会为每个子进程拷贝一份资源,所以不能继续按照多线程的方式来撰写代码,解决方法是多进程multiprocessing模块中也有queue方法,此queue方法不是之前讲的queue方法,我们来看看代码:
#!/usr/bin/python3
#coding:utf-8
from urllib import request
from multiprocessing import Manager,Pool
import requests
import re
import random
import time
import sys
import os
def getImg(html):
response = request.urlopen(html)
response = response.read()
response = response.decode('gbk')
reg1 = r'<span class="page-ch">共(.*?)页</span>'
page_src = re.compile(reg1)
page = re.findall(page_src, response)
reg2 = r'<img alt="(.*)" src="(.*)" />'
img_src = re.compile(reg2)
img_list = []
i = 1
while True:
if i == 1:
img = re.findall(img_src, response)
img_list.append(img[0][1])
if i > 1:
html_more = html.replace(".html", "_%s.html" % str(i))
response_more = request.urlopen(html_more)
response_more = response_more.read()
response_more = response_more.decode('gbk')
img = re.findall(img_src, response_more)
img_list.append(img[0][1])
if i >= int(page[0]):
break
i = i + 1
return img_list
def downImg(q, html):
print("Proccess %s starting , Parent proccess %s" % (os.getpid(), os.getppid()))
headers = {
'Referer': html,
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'
}
while q.qsize() > 0:
try:
url = q.get()
img = requests.get(url, headers=headers)
num = random.randint(10000, 99999)
with open("img/%d.jpg" % num, 'wb') as f:
f.write(img.content)
except:
print("download %s failed !" % url)
if __name__ == "__main__":
start_time = time.time()
html = sys.argv[1]
img_list = getImg(html)
print("total %d picture" %len(img_list))
q = Manager().Queue()
for i in img_list:
q.put(i)
p = Pool(4)
for i in range(4):
p.apply_async(downImg, args=(q, html))
p.close()
p.join()
end_time = time.time()
print("Download imgages successfly , time consuming %.3f " % (end_time - start_time))
这个代码实例是使用进程池池来创建进程的,在使用pool进程池的时候使用queue队列会报错,需要加入manager方法,以上代码就是采集图片站实例,大家可以下载下来进行测试!
内容版权声明:除非注明,否则皆为本站原创文章。
转载注明出处:https://sulao.cn/post/613
评论列表