1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
| var fetch_url = '/gongsi/${index}.html';
exports.runner = function fetchPage(index) { startRequest(index); };
function startRequest(index) { CompanysModel.findOne({index: index}, '_id', function (err, _company) { if (err || _company) { console.log(index + '------- got'); incrAndReget(index); return; } handlerContent(index); }); }
function handlerContent(index) { var url = fetch_url.replace('${index}', index);
var options = { hostname: 'www.lagou.com', port: '443', path: url, method: 'GET', headers: { 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8', 'Accept-Encoding': 'deflate, br', 'Accept-Language': 'zh-CN,zh;q=0.8,en;q=0.6,zh-TW;q=0.4', 'Connection': 'keep-alive', 'Cookie': 'user_trace_token=20171019112945-24a9d09c-bcfe-47d7-8d27-3bf53bfdb347; LGUID=20171019112947-c1eabc85-b47d-11e7-9c7e-525400f775ce; JSESSIONID=ABAAABAACDBAAIAC9A9B32CD990D08ED4BAEFDDAB14C879; TG-TRACK-CODE=index_campus; _gid=GA1.2.705151711.1508383779; _ga=GA1.2.1181508252.1508383779; Hm_lvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1508383779,1508383891,1508384000; Hm_lpvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1508392375; LGSID=20171019125706-f472e429-b489-11e7-9ca5-525400f775ce; LGRID=20171019135302-c509ee47-b491-11e7-9cac-525400f775ce', 'Host': 'www.lagou.com', 'Origin': 'http://www.lagou.com', 'Referer': 'http://www.lagou.com/', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.75 Safari/537.36'
} } http.get(options, function (res) { console.log(url + '-------' + res.statusCode); if (res.statusCode == 303) { incrAndReget(index); return; }
if (res.statusCode == 302) { setTimeout(function () { if (retryCount == RETRYTIMES) { retryCount = 1; incrAndReget(index); } else { retryCount++; handlerContent(index); } }, 1000 * 5); return; } var html = ''; res.on('data', function (chunk) { html += chunk; }); res.on('end', function () {
var $ = cheerio.load(html);
var company = { index: index, simpleName: $('a.hovertips').text().trim(), name: $('a.hovertips').attr('title') ? $('a.hovertips').attr('title').trim() : '', urlLink: $('a.hovertips').attr('href') ? $('a.hovertips').attr('href').trim() : '', companyDesc: $('span.company_content').text().trim(), productName: $('.url_valid').text().trim(), productType: $('.product_details ul').text().trim(), productDesc: $('.product_profile').text().trim(), productLink: $('.url_valid').attr('href') ? $('.url_valid').attr('href').trim() : '', manager: $("#company_managers .item_manager_name").text().trim(), managerDesc: $("#company_managers .item_manager_content").text().trim(), type: $($("#basic_container li")[0]).text().trim(), process: $($("#basic_container li")[1]).text().trim(), number: $($("#basic_container li")[2]).text().trim(), address: $($("#basic_container li")[3]).text().trim(), score: $('#interview_container .comprehensive-review .score').text().trim(), count: $('#interview_container .comprehensive-review .count').text().trim().replace("( 来自 ", "").replace(" 份评价 )", "") } var companysModel = new CompanysModel(company); companysModel.save(); }); incrAndReget(index); }).on('error', function (err) { console.log(err); setTimeout(function () { handlerContent(index); }, 1000 * 10); }); }
function incrAndReget(index) { index++; startRequest(index); }
|