bilibili之视频信息爬虫



视频信息同样在打开 F12 开发者工具后被找到了,

单线程

完整代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
import requests
import json
import time
import pymongo


class videoInfo_Spider(object):
def __init__(self, aid):
self.video_url = 'https://api.bilibili.com/x/web-interface/archive/stat?aid={}'.format(aid)
self.headers = {
'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Mobile Safari/537.36',
}

def get_source(self, url):
time.sleep(0.4)
response = requests.get(url, headers=self.headers)
if response.status_code == 200:
return response.json()
else:
print('访问出错')

def parse(self, items):
if items.get('data'):
info = items.get('data')
id = info.get('aid')
view = info.get('view')
danmaku = info.get('danmaku')
reply = info.get('reply')
favorite = info.get('favorite')
coin = info.get('coin')
share = info.get('share')
yield {
'id': id,
'观看数': view,
'弹幕数': danmaku,
'喜欢数': favorite,
'回复数': reply,
'硬币数': coin,
'分享数': share
}

def save_to_mongo(self, item):
client = pymongo.MongoClient('localhost', 27017)
db = client['BILIBILI']
collection = db['videoInfo']
if collection.insert(item):
print('保存到MongoDB成功')

def run(self):
html = self.get_source(self.video_url)
items = self.parse(html)
for item in items:
print(item)
self.save_to_mongo(item)


if __name__ == '__main__':

time1 = time.time()
for aid in range(1, 40000890):
spider = videoInfo_Spider(aid)
spider.run()
time2 = time.time()
print('cost time: %s' % (time2-time1))
-------------本文结束感谢您的阅读-------------

本文标题:bilibili之视频信息爬虫

文章作者:Tang

发布时间:2019年01月09日 - 13:01

最后更新:2019年01月20日 - 18:01

原始链接:https://tangx1.com/bilibili-videoInfo/

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。

0%