之前将了多线程的实例可以查看之前的笔记:https://sulao.cn/post/609.html,由于多线程中进程资源是共享的,所以传入queue队列,由各子线程去queue队列中获取资源,现在在多进程中已经不适用,因为多进程中会为每个子进程拷贝一份资源,所以不能继续按照多线程的方式来撰写代码,解决方法是多进程multiprocessing模块中也有queue方法,此queue方法不是之前讲的queue方法,我们来看看代码:
#!/usr/bin/python3 #coding:utf-8 from urllib import request from multiprocessing import Manager,Pool import requests import re import random import time import sys import os def getImg(html): response = request.urlopen(html) response = response.read() response = response.decode('gbk') reg1 = r'<span class="page-ch">共(.*?)页</span>' page_src = re.compile(reg1) page = re.findall(page_src, response) reg2 = r'<img alt="(.*)" src="(.*)" />' img_src = re.compile(reg2) img_list = [] i = 1 while True: if i == 1: img = re.findall(img_src, response) img_list.append(img[0][1]) if i > 1: html_more = html.replace(".html", "_%s.html" % str(i)) response_more = request.urlopen(html_more) response_more = response_more.read() response_more = response_more.decode('gbk') img = re.findall(img_src, response_more) img_list.append(img[0][1]) if i >= int(page[0]): break i = i + 1 return img_list def downImg(q, html): print("Proccess %s starting , Parent proccess %s" % (os.getpid(), os.getppid())) headers = { 'Referer': html, 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36' } while q.qsize() > 0: try: url = q.get() img = requests.get(url, headers=headers) num = random.randint(10000, 99999) with open("img/%d.jpg" % num, 'wb') as f: f.write(img.content) except: print("download %s failed !" % url) if __name__ == "__main__": start_time = time.time() html = sys.argv[1] img_list = getImg(html) print("total %d picture" %len(img_list)) q = Manager().Queue() for i in img_list: q.put(i) p = Pool(4) for i in range(4): p.apply_async(downImg, args=(q, html)) p.close() p.join() end_time = time.time() print("Download imgages successfly , time consuming %.3f " % (end_time - start_time))
这个代码实例是使用进程池池来创建进程的,在使用pool进程池的时候使用queue队列会报错,需要加入manager方法,以上代码就是采集图片站实例,大家可以下载下来进行测试!