用PHP尝试了半天都不行,网上找的python代码也不行,满足我不了我爬取图片的场景,而后搜集了一些资料,最后发现用request模块下的get方法和可以携带header头,然后将获取的对象直接写入图片就行了
原来没有防盗链的可以直接使用urllib模块中的request.urlretrieve方法保存图片到本地,但是现在很多图片网站使用了图床和第三方存储服务器,这样通过nginx的防盗链就无法直接下载图片,需要在浏览器的header头中仿造Referer和User-Agent,下面写了一段简单的代码仅供参考,下面是主要代码,稍微改改就能抓取大多数网站图片了,都是美女图片网,不好意思了!
01.#!/usr/bin/python302.#coding:utf-803.from urllib import request04.import requests05.import ssl06.import re07.import random08. 09.def getImg(html):10. ssl._create_default_https_context = ssl._create_unverified_context11. response = request.urlopen(html)12. response = response.read()13. response = response.decode('utf-8')14. 15. reg1 = r'<h1 class="center">(.*)\(1/(.*)\)</h1>'16. page_src = re.compile(reg1)17. page = re.findall(page_src, response)18. 19. reg2 = r'<img src="(.*)" alt="(.*)" />'20. 21. img_src = re.compile(reg2)22. img_list = []23. i = 124. while True:25. if i == 1:26. response = request.urlopen(html)27. response = response.read()28. response = response.decode('utf-8')29. img = re.findall(img_src, response)30. img_list.append(img[0][0])31. if i > 1:32. html_more = html.replace(".html", "_%s.html" % str(i))33. response_more = request.urlopen(html_more)34. response_more = response_more.read()35. response_more = response_more.decode('utf-8')36. img = re.findall(img_src, response_more)37. img_list.append(img[0][0])38. if i >= int(page[0][1]):39. break40. i = i + 141. return img_list42. 43.def downImg(img_list):44. ssl._create_default_https_context = ssl._create_unverified_context45. headers = {46. 'Referer':'http://www.uumnt.cc/',47. 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'48. }49. for p in img_list:50. img = requests.get(p, headers=headers)51. num = random.randint(10000, 99999)52. with open("img/%d.jpg" % num, 'wb') as f:53. f.write(img.content)54. 55.if "__main__" == __name__:56. html = r"https://www.uumtu.com/xinggan/30086.html"57. img_list = getImg(html)58. downImg(img_list)
下面几个图片大站的采集脚本已经写好,可以直接拿走不谢
内容版权声明:除非注明,否则皆为本站原创文章。
转载注明出处:https://sulao.cn/post/565
评论列表