[go: up one dir, main page]

CN111475697A - Network video information acquisition method and equipment - Google Patents

Network video information acquisition method and equipment Download PDF

Info

Publication number
CN111475697A
CN111475697A CN202010136518.3A CN202010136518A CN111475697A CN 111475697 A CN111475697 A CN 111475697A CN 202010136518 A CN202010136518 A CN 202010136518A CN 111475697 A CN111475697 A CN 111475697A
Authority
CN
China
Prior art keywords
information
video
bloggers
blogger
crawling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010136518.3A
Other languages
Chinese (zh)
Inventor
贾璐
余家俊
林超
林倩倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Agricultural University
Original Assignee
China Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Agricultural University filed Critical China Agricultural University
Priority to CN202010136518.3A priority Critical patent/CN111475697A/en
Publication of CN111475697A publication Critical patent/CN111475697A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the invention provides a method and equipment for acquiring network video information, wherein the method comprises the following steps: crawling the information of bloggers on a video platform; crawling the rewarding information of the bloggers, and screening the bloggers with rewarding functions; and acquiring the video information of the video of the blogger with the opening and appreciation function. The method and the device for playing the video generally start from the information of the video blogger, acquire the opening condition of the appreciation according to the information of the video blogger, and acquire the video of interest according to the opening condition. Thus, the amount of crawled data and the amount of analyzed data are greatly reduced, and the obtained crawl is more targeted.

Description

一种网络视频信息获取方法及设备A kind of network video information acquisition method and device

技术领域technical field

本发明涉及视频获取技术领域,尤其涉及一种网络视频信息获取方法及设备。The present invention relates to the technical field of video acquisition, in particular to a method and device for acquiring network video information.

背景技术Background technique

短视频是近年来随着智能手机的普及和移动互联网的快速发展而崛起的新的信息传播形态,被业界认为是互联网领域的风口,已成为移动传播时代媒体创新报道的重要手段和途径,同时也是当前信息传播的重要发展方向。Short video is a new form of information dissemination that has emerged with the popularization of smartphones and the rapid development of the mobile Internet in recent years. It is considered by the industry to be the outlet of the Internet field, and has become an important means and approach for media innovation in the era of mobile communication. It is also an important development direction of current information dissemination.

目前以BAT为首的机构正在重金加速布局短视频平台,人才、资金等正大规模进入短视频领域,其中今日头条、腾讯、秒拍等均投入10亿元扶持短视频内容创业者。而不同平台对于发布视频的博主有着不同的打赏方式,而视频信息和其发布者收益之间的关系既可以辅助预测发布者的收益情况,也可以研究观众对于不同视频的喜好程度,因此如何有效获取网络视频平台中开通打赏功能的博主的所有的视频信息就有着重要的意义。At present, institutions headed by BAT are investing heavily to accelerate the deployment of short video platforms. Talents and funds are entering the short video field on a large scale. Among them, Toutiao, Tencent, and Miaopai have invested 1 billion yuan to support short video content entrepreneurs. Different platforms have different reward methods for bloggers who publish videos, and the relationship between video information and the publisher’s earnings can not only help predict the publisher’s earnings, but also study the audience’s preference for different videos. Therefore, How to effectively obtain all the video information of the bloggers who have enabled the tipping function in the online video platform is of great significance.

现有技术中,获取网络视频中开通打赏功能的博主的所有视频信息主要是通过大规模的遍历视频平台的所有视频,然后从所有视频中筛选出开通打赏功能的视频并收集该视频的博主信息以及该博主所有的视频信息的方法。但是,由于现有技术需要遍历视频平台所有的视频,这就造成了在收集网络视频平台中开通打赏功能的博主的所有视频信息时效率低,耗费时间较长的问题。In the prior art, obtaining all video information of bloggers who have enabled the tipping function in online videos is mainly by traversing all the videos on the video platform on a large scale, and then screening out the videos with the tipping function enabled from all the videos and collecting the videos. The blogger information of the blogger and the method of all the video information of the blogger. However, since the prior art needs to traverse all the videos on the video platform, this causes the problem of low efficiency and long time consuming when collecting all the video information of the bloggers who have enabled the tipping function in the online video platform.

因此,对于如何提供一种效率较高、耗费时间短的网络视频信息获取方法,是目前业界亟待解决的需要课题。Therefore, how to provide a method for acquiring network video information with high efficiency and short time consumption is an urgent problem to be solved in the industry at present.

发明内容SUMMARY OF THE INVENTION

本发明实施例提供一种网络视频信息获取方法及设备,用以解决现有技术中收集网络视频平台中开通打赏功能的博主的所有视频信息时效率低,耗费时间较长的问题。Embodiments of the present invention provide a method and device for acquiring online video information, which are used to solve the problems of low efficiency and long time consumption in the prior art when collecting all video information of bloggers who have enabled the tipping function in an online video platform.

第一方面,本发明实施例提供一种网络视频信息获取方法,包括:In a first aspect, an embodiment of the present invention provides a method for acquiring network video information, including:

爬取视频平台博主的信息;Crawl the information of video platform bloggers;

爬取所述博主的打赏信息,筛选出开通打赏功能的博主;Crawl the tipping information of the bloggers, and filter out bloggers who have enabled the tipping function;

获取开通打赏功能的所述博主的视频的视频信息。Obtain the video information of the video of the blogger who has enabled the tipping function.

可选地,所述爬取视频平台博主的信息,具体包括,Optionally, the information of the bloggers of the video platform for crawling specifically includes,

爬取视频平台预设时间范围内的视频的博主的信息;或爬取视频平台所有视频的博主的信息。Information of bloggers who crawl videos within the preset time range of the video platform; or information of bloggers who crawl all videos on the video platform.

可选地,所述爬取视频平台博主的信息,具体包括,Optionally, the information of the bloggers of the video platform for crawling specifically includes,

基于视频平台的视频的编号,爬取所述视频的基本信息;基于所述视频的基本信息,获取所述视频的博主的信息。Based on the number of the video on the video platform, the basic information of the video is crawled; based on the basic information of the video, the information of the blogger of the video is obtained.

可选地,爬取所述博主的打赏信息,筛选出开通打赏功能的博主,具体包括,Optionally, crawl the reward information of the blogger, and filter out the bloggers who have enabled the reward function, which specifically includes,

基于所述博主的信息,爬取所述博主的打赏信息;Based on the blogger's information, crawling the blogger's reward information;

基于所爬取到的所述博主的打赏信息,筛选出开通打赏功能的博主。Based on the crawled tip information of the bloggers, the bloggers who have enabled the tipping function are screened out.

可选地,所述基于视频平台的视频的编号,爬取所述视频的基本信息;基于所述视频的基本信息,获取所述视频的博主的信息,具体包括,Optionally, the number of the video based on the video platform, crawl the basic information of the video; based on the basic information of the video, obtain the information of the blogger of the video, specifically including,

基于所述视频平台的接口获取所述视频平台的视频的编号,并根据所述视频的编号计算出视频的数量;Obtain the number of the video of the video platform based on the interface of the video platform, and calculate the number of videos according to the number of the video;

创建包括不少于1个线程的第一线程池,根据所述视频的数量每次分出等于所述第一线程池中线程数量的所述视频的编号同时爬取所述视频的基本信息;Create a first thread pool including no less than 1 thread, and divide the number of the video equal to the number of threads in the first thread pool according to the number of the videos each time while crawling the basic information of the video;

基于所述视频的基本信息获取所述视频的博主的信息。The information of the blogger of the video is acquired based on the basic information of the video.

可选地,所述基于所述博主的信息,爬取所述博主的打赏信息,具体包括,Optionally, crawling the reward information of the blogger based on the information of the blogger, specifically includes,

根据爬取到的所述博主的信息计算出所述博主的数量;Calculate the number of the bloggers according to the crawled information of the bloggers;

创建包括不少于1个线程的第二线程池,根据所述博主的数量每次分出等于所述第二线程池中线程数量的所述博主的信息同时爬取所述博主的打赏信息,若爬取到所述博主的打赏信息,则所述博主为开通打赏功能的博主,若未爬取到所述博主的打赏信息,则所述博主为未开通打赏功能的博主。Create a second thread pool including no less than 1 thread, and divide the blogger's information equal to the number of threads in the second thread pool according to the number of bloggers each time, and crawl the blogger's information at the same time. Reward information, if the reward information of the blogger is crawled, the blogger is a blogger who has enabled the reward function; if the reward information of the blogger is not crawled, the blogger is For bloggers who have not opened the reward function.

可选地,所述获取开通打赏功能的所述博主的视频的视频信息,具体包括,Optionally, the acquiring the video information of the video of the blogger who has enabled the tipping function, specifically includes,

计算出开通打赏功能的所述博主的数量;Calculate the number of the bloggers who have opened the tipping function;

创建包括不少于1个线程的第三线程池,根据开通打赏功能的所述博主的数量每次分出等于所述第三线程池中线程数量的开通打赏功能的所述博主的信息同时爬取开通打赏功能的所述博主的视频的视频信息。Create a third thread pool including no less than 1 thread, and divide the bloggers with the tipping function equal to the number of threads in the third thread pool each time according to the number of the bloggers who have enabled the tipping function. At the same time, crawl the video information of the video of the blogger who has enabled the tipping function.

第二方面,本发明实施例提供一种视频输出模式切换装置,包括:In a second aspect, an embodiment of the present invention provides a video output mode switching device, including:

第一爬取模块,用于爬取视频平台博主的信息;The first crawling module is used to crawl the information of video platform bloggers;

第二爬取模块,用于爬取所述博主的打赏信息,筛选出开通打赏功能的博主;The second crawling module is used to crawl the tip information of the bloggers, and filter out the bloggers who have enabled the tipping function;

获取模块,用于获取开通打赏功能的所述博主的视频的视频信息。An obtaining module is used to obtain the video information of the videos of the bloggers who have enabled the tipping function.

第三方面,本发明实施例提供一种电子设备,包括:In a third aspect, an embodiment of the present invention provides an electronic device, including:

存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如第一方面中任一项所述一种网络视频信息获取方法的步骤。A memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that, when the processor executes the program, the acquisition of network video information according to any one of the first aspects is realized steps of the method.

第四方面,本发明实施例提供一种非暂态计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现如第一方面中任一项所述一种网络视频信息获取方法的步骤。In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the implementation is as described in any one of the first aspect. The steps of a method for acquiring network video information.

本发明实施例提供的网络视频信息获取方法及设备,通过根据博主的信息爬取博主的打赏信息并根据博主打赏信息筛选出开通了打赏了功能的博主,最后根据开通了打赏功能的博主的信息有针对性的获取开通打赏功能的博主的视频的视频信息。本申请总的从视频博主的信息出发,获取视频博主的信息,根据视频博主的信息,获取打赏开通情况,根据开通情况,获取感兴趣的视频。这样,所爬取的数据量和所分析的数据量大大降低,并且获取的爬取也更有针对性。The method and device for acquiring network video information provided by the embodiments of the present invention scrape the blogger's reward information according to the blogger's information and screen out the bloggers who have enabled the reward function according to the blogger's reward information. The information of the bloggers with the tipping function is targeted to obtain the video information of the videos of the bloggers who have opened the tipping function. This application generally starts from the information of the video bloggers, obtains the information of the video bloggers, obtains the status of the opening of rewards according to the information of the video bloggers, and obtains the videos of interest according to the information of the video bloggers. In this way, the amount of data crawled and analyzed is greatly reduced, and the obtained crawling is more targeted.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.

图1为本发明一种网络视频信息获取方法实施例流程图;1 is a flowchart of an embodiment of a method for acquiring network video information according to the present invention;

图2为本发明一种网络视频信息获取装置实施例结构示意图;2 is a schematic structural diagram of an embodiment of an apparatus for obtaining network video information according to the present invention;

图3为本发明实施例一种电子设备的实体结构示意图。FIG. 3 is a schematic diagram of a physical structure of an electronic device according to an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

不同于现有技术的爬取所有视频,根据视频中的打赏信息,来获取感兴趣的视频信息;本申请总的从视频博主的信息出发,获取视频博主的信息,根据视频博主的信息,获取打赏开通情况,根据开通情况,或者感兴趣的视频。这样,所爬取的数据量和所分析的数据量大大降低,并且获取的爬取也更有针对性。Different from crawling all videos in the prior art, the video information of interest is obtained according to the reward information in the video; this application generally starts from the information of the video bloggers, and obtains the information of the video bloggers, according to the information of the video bloggers. , to get the status of the opening of the reward, according to the status of the opening, or the video of interest. In this way, the amount of data crawled and analyzed is greatly reduced, and the obtained crawling is more targeted.

图1为本发明实施例基于一种网络视频信息获取方法实施例流程图。包括:FIG. 1 is a flowchart of an embodiment of a method for acquiring network video information based on an embodiment of the present invention. include:

步骤101,爬取视频平台博主的信息;Step 101, crawl the information of video platform bloggers;

具体地,爬取视频平台博主的信息的爬取方法是多种多样的,本领域技术人员可以根据具体情况选择不同的编程语言、算法和程序,例如可以选择JAVA、Python等语言,在Python中又可以选择例如beautifulsoup、requests等库,因此爬取视频基本信息的方法是不唯一的,本发明实施例在此不做限定。视频平台指视频博主可以上传视频的互联网网站、移动应用App等,视频平台都提供的有访问视频平台的接口,根据视频平台提供的接口可以爬取视频平台博主的信息,并可以通过改变接口中视频的编号来爬取不同视频的基本信息。根据视频的基本信息,用户可以获取不同的视频信息内容包括视频平台博主的信息,例如博主id。Specifically, there are various crawling methods for crawling the information of video platform bloggers. Those skilled in the art can choose different programming languages, algorithms and programs according to the specific situation. For example, languages such as JAVA and Python can be selected. For example, libraries such as beautifulsoup and requests can be selected in the video, so the method for crawling basic video information is not unique, which is not limited in this embodiment of the present invention. Video platforms refer to Internet websites, mobile applications, etc., where video bloggers can upload videos. Video platforms all provide interfaces to access the video platforms. The number of the video in the interface to crawl the basic information of different videos. According to the basic information of the video, the user can obtain different video information contents including the information of the bloggers of the video platform, such as the blogger id.

举个例子,以Bilibili(以下简称为B站)视频网站为例,B站提供获取视频基本信息的接口为:https://api.bilibili.com/x/web-interface/view?aid=For example, taking the Bilibili (hereinafter referred to as Bilibili) video website as an example, Bilibili provides an interface for obtaining basic video information: https://api.bilibili.com/x/web-interface/view? aid=

该接口的aid后加上视频的av号即可访问的特定序号的视频并获取到视频的基本信息,B站中视频信息字段如表1所示,例如视频av号、视频分区、发布时间、up主mid、总播放数、投硬币枚数、历史累计弹幕数、评论数、收藏人数、分享人数、点赞数等;其中的up主mid就是最基本的博主的信息。因此,根据B站提供的接口,用户可以通过改变aid后视频的av号来爬取视频的基本信息,并根据视频基本信息获取博主的信息,例如up主mid。After the aid of the interface, add the av number of the video to access the video of a specific serial number and obtain the basic information of the video. The video information fields in station B are shown in Table 1, such as the video av number, video partition, release time, Up main mid, the total number of plays, the number of coins tossed, the cumulative number of barrages in history, the number of comments, the number of favorites, the number of shares, the number of likes, etc. The up main mid is the most basic blogger's information. Therefore, according to the interface provided by station B, users can crawl the basic information of the video by changing the av number of the video after aid, and obtain the information of the blogger according to the basic information of the video, such as the up main mid.

表1视频信息字段说明Table 1 Description of video information fields

Figure BDA0002397518860000051
Figure BDA0002397518860000051

步骤102,爬取所述博主的打赏信息,筛选出开通打赏功能的博主;Step 102, crawling the reward information of the bloggers, and screening out the bloggers who have enabled the rewarding function;

具体地,视频平台通常都开设有打赏功能,视频平台的博主可以选择开通打赏功能或者不开通打赏功能,当视频平台的博主开通了打赏功能以后,视频平台的用户就可以通过电子支付现金或平台币等方式打赏博主。而具体的打赏信息例如本月打赏人数、累计打赏人数等均可以通过视频平台提供的接口访问到。因此,用户可以根据视频平台的博主信息和视频平台的接口爬取博主的打赏信息,通过改变视频平台接口中的博主信息可以爬取不同博主的打赏信息,并根据爬取到的打赏信息,筛选出开通打赏功能的博主。Specifically, video platforms usually have a tipping function. The bloggers of the video platform can choose to enable the tipping function or not. When the bloggers of the video platform enable the tipping function, the users of the video platform can Reward bloggers by electronic payment cash or platform currency. The specific reward information, such as the number of people who have been rewarded this month and the number of people who have been rewarded cumulatively, can be accessed through the interface provided by the video platform. Therefore, users can crawl the tip information of bloggers according to the blogger information of the video platform and the interface of the video platform. The reward information that arrives will filter out the bloggers who have enabled the reward function.

举个例子,B站提供的可以用于获取UP主充电信息的访问接口为:https://elec.bilibili.com/api/query.rank.do?mid=,其中在接口mid后加上UP主的mid即可访问对应UP主的充电信息,UP主的充电信息如表2所示,包括本月充电人数和历史充电人数。因此,用户可以根据UP主的up主mid和充电信息访问接口爬取UP主的打赏信息,如果UP主未开通打赏功能,则不会爬取到打赏信息,根据是否爬取到打赏信息,可以筛选出开通打赏功能的UP主。For example, the access interface provided by station B that can be used to obtain UP main charging information is: https://elec.bilibili.com/api/query.rank.do? mid=, where adding the mid of the UP master after the interface mid can access the charging information of the corresponding UP master. The charging information of the UP master is shown in Table 2, including the number of people who have been charged this month and the number of people who have been charged in history. Therefore, the user can crawl the tip information of the UP master according to the UP master mid and charging information access interface. If the UP master does not enable the tipping function, the tip information will not be crawled. Reward information, you can filter out the UP masters who have enabled the reward function.

表2 UP主充电信息字段说明Table 2 UP main charging information field description

Figure BDA0002397518860000061
Figure BDA0002397518860000061

步骤103,获取开通打赏功能的所述博主的视频的视频信息。Step 103, acquiring video information of the video of the blogger who has enabled the tipping function.

具体地,视频平台中每个博主通常都会上传很多的视频,视频平台通常都提供有能够直接访问博主视频信息的接口,通过改变接口中博主的信息,可以针对性的爬取不同的博主的视频信息。因此,用户可以根据视频平台提供的接口和开通打赏功能的博主的信息爬取对应博主的视频的视频信息。Specifically, each blogger in the video platform usually uploads a lot of videos. The video platform usually provides an interface that can directly access the video information of the blogger. By changing the information of the blogger in the interface, different blogs can be crawled in a targeted manner. main video information. Therefore, the user can crawl the video information of the video corresponding to the blogger according to the interface provided by the video platform and the information of the blogger who has enabled the reward function.

举个例子,B站提供的用于获取UP主视频信息的访问接口为:https://api.bilibili.com/x/space/arc/search?mid=17819768&pn=1&ps=25&jsonp=jsonp,该接口通过修改mid可以获取不同UP主的视频信息,而ps则是控制一次访问的视频数,通过尝试,如果ps修改过大会极其容易触发哔哩哔哩平台安全协议,所以一般按照25的标准来获取,pn则是控制每次访问视频的页数。因此,用户可以根据开通充电功能的UP主的up主mid和访问接口获取开通充电功能的UP主的视频的视频信息,视频信息如表3中UP首页视频信息字段说明,具体包括视频av号、总播放数、历史累计弹幕数、评论数、视频时长、发布时间等。For example, the access interface provided by station B for obtaining UP main video information is: https://api.bilibili.com/x/space/arc/search? mid=17819768&pn=1&ps=25&jsonp=jsonp, this interface can obtain the video information of different UP masters by modifying the mid, while the ps is to control the number of videos accessed at one time. By trying, if the ps is modified, it is extremely easy to trigger Bilibili. Platform security protocol, so it is generally obtained according to the standard of 25, and pn controls the number of pages of each video access. Therefore, the user can obtain the video information of the video of the UP host with the charging function enabled according to the up host mid and the access interface of the UP host with the charging function enabled. The total number of plays, the cumulative number of barrages in history, the number of comments, the length of the video, the release time, etc.

表3 UP首页视频信息字段说明Table 3 Description of the video information fields on the UP home page

Figure BDA0002397518860000071
Figure BDA0002397518860000071

作为一种可选实施例,所述爬取视频平台博主的信息,具体包括,As an optional embodiment, the information about crawling video platform bloggers specifically includes:

爬取视频平台预设时间范围内的视频的博主的信息;或爬取视频平台所有视频的博主的信息。Information of bloggers who crawl videos within the preset time range of the video platform; or information of bloggers who crawl all videos on the video platform.

具体地,视频平台由于视频数量巨大,全部爬取视频平台博主的视频消耗时间巨大,且有许多长时间不上传的不活跃博主的信息收集价值比较低,所以对于爬取视频平台的博主的信息,给出了两种可选方案。一种是只爬取预设时间范围内视频平台上传的视频的博主信息,例如可以是1年、半年等,通常适用于视频平台视频数量巨大的情形;另一种方案是爬取视频平台所有视频的博主信息,通常适用于视频平台视频数量不太大或者需要详细的所有博主的信息的情形。Specifically, due to the huge number of videos on the video platform, it takes a lot of time to crawl all the videos of bloggers on the video platform, and there are many inactive bloggers who have not uploaded for a long time. The information collection value is relatively low. The main message gives two options. One is to crawl only the blogger information of videos uploaded by the video platform within a preset time range, for example, it can be 1 year, half a year, etc., which is usually suitable for situations where the number of videos on the video platform is huge; the other solution is to crawl the video platform. The blogger information of all videos is usually suitable for situations where the number of videos on the video platform is not too large or detailed information on all bloggers is required.

举个例子,哔哩哔哩平台有5亿左右用户,遍历花费的时间过长,无法满足后期每月按时统计信息的要求,且其中有许多不活跃UP主,因此将B站全部视频的博主的信息全部爬取势必费时费力,增加不必要的工作量。因此,可以有针对性的爬取某段时间内的博主的信息,例如预设时间可以是1年,仅爬取1年内的博主的信息能够有针对性的覆盖绝大部分活跃UP主,又不会造成太大分析和计算量。我们选取2018年12月31日到2019年10月15的视频信息近3000万条,注意到由于平台的特殊性,B站的每个视频的av号都是按照其发布时间来排序,即av号越大,视频发布时间越晚。所以通过这个特性可以找到一个一年左右的av号序列,视频av号序列为为[39454972,71185529],爬取速度约为200万/天,一共花费时间两周。For example, the Bilibili platform has about 500 million users. It takes too long to traverse and cannot meet the requirements of monthly on-time statistical information in the later period, and many of them are inactive UP masters. Crawling all the main information is bound to be time-consuming and labor-intensive, adding unnecessary workload. Therefore, the information of bloggers within a certain period of time can be crawled in a targeted manner. For example, the preset time can be one year. Only crawling the information of bloggers within one year can cover most of the active UP masters in a targeted manner. , without causing too much analysis and calculation. We selected nearly 30 million pieces of video information from December 31, 2018 to October 15, 2019, and noticed that due to the particularity of the platform, the av number of each video in station B is sorted according to its release time, that is, av The bigger the number, the later the video will be released. So through this feature, you can find an av number sequence of about one year, the video av number sequence is [39454972, 71185529], the crawling speed is about 2 million/day, and it takes a total of two weeks.

作为一种可选实施例,所述爬取视频平台博主的信息,具体包括,As an optional embodiment, the information about crawling video platform bloggers specifically includes:

基于视频平台的视频的编号,爬取所述视频的基本信息;基于所述视频的基本信息,获取所述视频的博主的信息。Based on the number of the video on the video platform, the basic information of the video is crawled; based on the basic information of the video, the information of the blogger of the video is obtained.

具体地,为了访问视频平台中视频的基本信息,通常要通过视频平台中提供的视频的编号,视频平台中的不同视频拥有不同的视频编号,每个视频的视频编号都不相同。通过改变接口中的视频编号,可以访问到不同的视频的基本信息。视频的基本信息中通常包括视频视频编号、视频分区、发布时间、博主的信息、总播放数、历史累计弹幕数、评论数、收藏人数、分享人数、点赞数等,而其中博主的信息又包括博主的id等。根据视频的基本信息,可以获取视频的基本信息中的博主的信息。Specifically, in order to access the basic information of the video in the video platform, it is usually necessary to pass the video number provided in the video platform. Different videos in the video platform have different video numbers, and the video numbers of each video are different. By changing the video number in the interface, you can access the basic information of different videos. The basic information of a video usually includes the video video number, video division, release time, blogger's information, total number of plays, historical accumulative number of barrages, number of comments, number of favorites, number of shares, number of likes, etc. The information also includes the id of the blogger and so on. According to the basic information of the video, the information of the blogger in the basic information of the video can be obtained.

举个例子,根据B站给出的接口和视频的av号,用户可以访问到该av号对应视频的基本信息,通常包括视频视频av号、视频分区、发布时间、UP主的mid、总播放数、硬币数、历史累计弹幕数、评论数、收藏人数、分享人数、点赞数等。并可以根据得到的视频的基本信息进一步得到视频的UP主的信息。For example, according to the interface given by station B and the av number of the video, the user can access the basic information of the video corresponding to the av number, usually including the video video av number, video partition, release time, mid of the UP master, total playback Count, number of coins, cumulative number of barrages in history, number of comments, number of favorites, number of shares, number of likes, etc. And the information of the UP master of the video can be further obtained according to the obtained basic information of the video.

作为一种可选实施例,爬取所述博主的打赏信息,筛选出开通打赏功能的博主,具体包括,As an optional embodiment, the tipping information of the blogger is crawled, and the bloggers who have enabled the tipping function are screened out, which specifically includes:

基于所述博主的信息,爬取所述博主的打赏信息;Based on the blogger's information, crawling the blogger's reward information;

基于所爬取到的所述博主的打赏信息,筛选出开通打赏功能的博主。Based on the crawled tip information of the bloggers, the bloggers who have enabled the tipping function are screened out.

具体地,根据视频平台提供的接口可以获得博主的打赏信息,通过改变接口中博主的信息可以获取不同博主的打赏信息,但如果博主没有开通打赏功能,则不会爬取到博主的打赏信息。因此,如果爬取到博主的打赏信息,则博主开通了打赏功能,如果没有爬取到博主的打赏信息,则博主没有开通打赏功能。根据爬取到的打赏信息的情况,可以筛选出所有的开通打赏功能的博主。Specifically, according to the interface provided by the video platform, the tip information of bloggers can be obtained, and the tip information of different bloggers can be obtained by changing the information of bloggers in the interface, but if the blogger does not have the tip function enabled, it will not be crawled. To the blogger's reward information. Therefore, if the blogger's tipping information is crawled, the blogger has enabled the tipping function. If the blogger's tipping information is not crawled, the blogger has not enabled the tipping function. According to the crawled reward information, all the bloggers who have enabled the reward function can be screened out.

举个例子,根据UP主的mid和B站接口,能够获得UP主的打赏信息,包括本月充电人数和历史充电人数等;而如果能够爬取到UP主的打赏信息,则该UP主是开通充电(打赏)功能的,否则该UP主未开通充电(打赏)功能。最后筛选出所有开通打赏功能的UP主。For example, according to the UP master's mid and B station interfaces, you can obtain the UP master's reward information, including the number of people who have been charged this month and the number of people who have been charged in the past; and if the UP owner's reward information can be crawled, the UP owner's reward information can be obtained. The main is to open the charging (rewarding) function, otherwise the UP main does not open the charging (rewarding) function. Finally, all UP masters who have opened the reward function are screened out.

作为一种可选实施例,所述基于视频平台的视频的编号,爬取所述视频的基本信息;基于所述视频的基本信息;获取所述视频的博主的信息,具体包括,As an optional embodiment, the number of the video based on the video platform, the basic information of the video is crawled; based on the basic information of the video; the information of the blogger of the video is obtained, specifically including:

基于所述视频平台的接口获取所述视频平台的视频的编号,并根据所述视频的编号计算出视频的数量;Obtain the number of the video of the video platform based on the interface of the video platform, and calculate the number of videos according to the number of the video;

创建包括不少于1个线程的第一线程池,根据所述视频的数量每次分出等于所述第一线程池中线程数量的所述视频的编号同时爬取所述视频的基本信息;Create a first thread pool including no less than 1 thread, and divide the number of the video equal to the number of threads in the first thread pool according to the number of the videos each time while crawling the basic information of the video;

基于所述视频的基本信息获取所述视频的博主的信息。The information of the blogger of the video is acquired based on the basic information of the video.

具体地,在通过视频平台接口爬取视频的基本信息时,可以采用多线程的技术,创建不少于1个线程的第一线程池,并根据视频的数量合理设置线程池中线程的数量,例如视频数量多时线程的数量也多一些,视频的数量少时,线程的数量也相应少一些。每次分出和第一线程池中线程数量相同的视频编号通过第一线程池中的线程同时爬取视频的基本信息。并从爬取到的视频的基本信息中获取视频的博主的信息。Specifically, when crawling the basic information of videos through the video platform interface, multi-threading technology can be used to create a first thread pool with no less than one thread, and the number of threads in the thread pool can be reasonably set according to the number of videos. For example, when the number of videos is large, the number of threads is also larger, and when the number of videos is small, the number of threads is correspondingly smaller. The basic information of the video is simultaneously crawled through the threads in the first thread pool, and the video number that is the same as the number of threads in the first thread pool is split out each time. And obtain the information of the blogger of the video from the basic information of the crawled video.

举个例子,我们要爬取3000万个视频的博主的信息,由于在根据UP主的视频的av号爬取视频的基本信息时数据量视频较多有3000万,耗费时间较长,因此在爬取程序中设置一个第一线程池,线程池中有不少于1个线程,例如100个线程。在爬取视频的基本信息时,每次从视频总数3000万中分出100个视频av号,通过线程池中的100个线程同时爬取100个视频的基本信息。最后我们获得了300万UP主的信息,包括300万个UP主mid。For example, we want to crawl the information of bloggers with 30 million videos. Since the basic information of the videos is crawled according to the av number of the UP master's video, the amount of video data is 30 million, which takes a long time. Therefore, A first thread pool is set in the crawler, and there are no less than 1 thread in the thread pool, for example, 100 threads. When crawling the basic information of the video, 100 video av numbers are divided from the total number of 30 million videos each time, and the basic information of 100 videos is simultaneously crawled through 100 threads in the thread pool. Finally, we got the information of 3 million UP masters, including 3 million UP master mids.

作为一种可选实施例,所述基于所述博主的信息,爬取所述博主的打赏信息,具体包括,As an optional embodiment, the crawling of the blogger's reward information based on the blogger's information specifically includes:

根据爬取到的所述博主的信息计算出所述博主的数量;Calculate the number of the bloggers according to the crawled information of the bloggers;

创建包括不少于1个线程的第二线程池,根据所述博主的数量每次分出等于所述第二线程池中线程数量的所述博主的信息同时爬取所述博主的打赏信息,若爬取到所述博主的打赏信息,则所述博主为开通打赏功能的博主,若未爬取到所述博主的打赏信息,则所述博主为未开通打赏功能的博主。Create a second thread pool including no less than 1 thread, and divide the blogger's information equal to the number of threads in the second thread pool according to the number of bloggers each time, and crawl the blogger's information at the same time. Reward information, if the reward information of the blogger is crawled, the blogger is a blogger who has enabled the reward function; if the reward information of the blogger is not crawled, the blogger is For bloggers who have not opened the reward function.

具体地,在通过视频平台接口爬取视频博主的打赏信息时,可以采用多线程的技术,创建不少于1个线程的第二线程池,并根据博主的数量合理设置线程池中线程的数量,例如博主数量多时线程的数量也多一些,博主的数量少时,线程的数量也相应少一些。每次分出和第二线程池中线程数量相同的博主的信息通过第二线程池中的线程同时爬取博主的打赏信息,若未爬取到博主的打赏信息,则博主未开通打赏功能。Specifically, when crawling the video blogger's reward information through the video platform interface, multi-threading technology can be used to create a second thread pool with no less than one thread, and reasonably set the thread pool according to the number of bloggers. The number of threads, for example, when the number of bloggers is large, the number of threads is also larger, and when the number of bloggers is small, the number of threads is correspondingly smaller. Each time the information of the bloggers with the same number of threads as in the second thread pool is crawled simultaneously through the threads in the second thread pool to crawl the blogger's reward information. The master has not enabled the reward function.

举个例子,在根据B站接口和UP主mid来爬取UP主打赏信息时,我们同样在程序中设置一个第二线程池,同样在线程池中设置不少于1个的线程,例如我们设置了100个线程。我们每次从300万UP主mid中分出100个UP主的mid,通过100个线程同时爬取UP主的打赏信息,如果没爬到打赏信息,则该UP主没有开通打赏信息,最后我们从300万UP主中爬取到40万开通充电功能的UP主。For example, when crawling the UP main reward information according to the B station interface and the UP main mid, we also set a second thread pool in the program, and also set no less than 1 thread in the thread pool. For example, we 100 threads are set. Each time we separate 100 UP master mids from the 3 million UP master mids, and crawl the UP master's reward information through 100 threads at the same time. If the reward information is not climbed, the UP master has not opened the reward information. , and finally we crawled from the 3 million UP masters to 400,000 UP masters with the charging function enabled.

作为一种可选实施例,所述基于开通打赏功能的所述博主的信息,爬取开通打赏功能的所述博主的视频的视频信息,具体包括,As an optional embodiment, the crawling video information of the videos of the bloggers who have enabled the tipping function based on the information of the bloggers who have enabled the tipping function, specifically includes:

计算出开通打赏功能的所述博主的数量;Calculate the number of the bloggers who have opened the tipping function;

创建包括不少于1个线程的第三线程池,根据开通打赏功能的所述博主的数量每次分出等于所述第三线程池中线程数量的开通打赏功能的所述博主的信息同时爬取开通打赏功能的所述博主的视频的视频信息。Create a third thread pool including no less than 1 thread, and divide the bloggers with the tipping function equal to the number of threads in the third thread pool each time according to the number of the bloggers who have enabled the tipping function. At the same time, crawl the video information of the video of the blogger who has enabled the tipping function.

具体地,在通过视频平台接口爬取博主的视频的视频信息时,可以采用多线程的技术,创建不少于1个线程的第三线程池,并根据开通打赏功能的博主的数量合理设置线程池中线程的数量,例如开通打赏功能的博主的数量多时线程的数量也多一些,开通打赏功能的博主的数量少时,线程的数量也相应少一些。每次分出和第三线程池中线程数量相同的开通打赏功能的博主的信息通过第三线程池中的线程同时爬取开通打赏功能的博主的视频的视频信息。Specifically, when crawling the video information of the blogger's video through the video platform interface, multi-threading technology can be used to create a third thread pool with no less than one thread, and according to the number of bloggers who have enabled the reward function Set the number of threads in the thread pool reasonably. For example, when the number of bloggers who have enabled the tipping function is large, the number of threads will be larger. When the number of bloggers who have enabled the tipping function is small, the number of threads will be correspondingly smaller. Each time the information of the bloggers with the same number of threads as in the third thread pool with the tipping function enabled, the video information of the videos of the bloggers with the tipping function enabled is simultaneously crawled through the threads in the third thread pool.

举个例子,在根据B站接口和爬取开通充电功能的UP主的视频信息时,我们同样创建一个第三线程池,里边同样有不少于1个线程,例如是100个线程。我们从40万个开通充电功能的UP主中每次分出100个UP主mid同时爬取UP主的视频的视频信息。For example, when crawling the video information of the UP master with the charging function enabled according to the interface of station B and crawling, we also create a third thread pool, which also has no less than 1 thread, such as 100 threads. From the 400,000 UP masters with the charging function enabled, we divided 100 UP master mids each time and crawled the video information of the UP master's video.

但是B站对于该接口的管控十分严格,如果一次性开的线程数过大会触发哔哩哔哩平台的安全协议,并且相同IP在接近1个小时以内都无法访问该接口,所以我们可以使用代理IP进行爬取。However, station B has very strict control over this interface. If the number of threads opened at one time is too large, the security protocol of the Bilibili platform will be triggered, and the same IP cannot access this interface for nearly an hour, so we can use a proxy IP crawling.

需要说明的是,哔哩哔哩平台对于各个接口的监控力度不一致,比如通过UP主首页获取视频信息的接口监控极为严格,而对于UP主充电信息的监控则较为松散,所以对于这个问题我们采用了购买代理IP的方式来进行解决,对于监控严格的接口使用代理IP并调小线程池中的线程数来保证抓取的稳定性,而对于监控松散的接口则使用较大线程数来保证抓取的效率。It should be noted that the monitoring of each interface on the Bilibili platform is inconsistent. For example, the monitoring of the interface to obtain video information through the UP main home page is extremely strict, while the monitoring of the UP main charging information is relatively loose, so for this problem, we use To solve the problem by purchasing proxy IP, use proxy IP and adjust the number of threads in the thread pool to ensure the stability of the capture for the interface with strict monitoring, and use a larger number of threads for the interface with loose monitoring to ensure the capture. take efficiency.

此外,对于UP主的充电信息和视频信息,我们并不需要对它进行频繁的增删改查操作。所以我们选择牺牲数据读取的便利性,使用CSV文件格式进行存储。In addition, for the charging information and video information of the UP master, we do not need to frequently add, delete, modify and check it. So we choose to sacrifice the convenience of data reading and use the CSV file format for storage.

图2为本发明实施例基于一种网络视频信息获取装置实施例结构图。包括:FIG. 2 is a structural diagram of an embodiment of an apparatus for obtaining network video information based on an embodiment of the present invention. include:

第一爬取模块201,用于爬取视频平台博主的信息;The first crawling module 201 is used to crawl the information of video platform bloggers;

具体地,视频平台指视频博主可以上传视频的互联网网站、移动应用App等,视频平台都提供的有访问视频平台的接口,根据视频平台提供的接口并通过改变接口中不同的视频信息字段,用户可以获取不同的视频信息内容包括视频平台博主的信息。Specifically, the video platform refers to the Internet website, mobile application App, etc. where video bloggers can upload videos. The video platform provides an interface for accessing the video platform. According to the interface provided by the video platform and by changing different video information fields in the interface, Users can obtain different video information content including the information of video platform bloggers.

第二爬取模块202,用于爬取所述博主的打赏信息,筛选出开通打赏功能的博主;The second crawling module 202 is used for crawling the tipping information of the bloggers, and filtering out bloggers who have enabled the tipping function;

具体地,视频平台通常都开设有打赏功能,视频平台的博主可以选择开通打赏功能或者不开通打赏功能,当视频平台的博主开通了打赏功能以后,视频平台的用户就可以通过电子支付现金或平台币等方式打赏博主。而具体的打赏信息例如本月打赏人数、累计打赏人数等均可以通过视频平台提供的接口访问到。因此,用户可以根据视频平台的博主信息和视频平台的接口爬取博主的打赏信息,并根据爬取到的打赏信息,筛选出开通打赏功能的博主。Specifically, video platforms usually have a tipping function. The bloggers of the video platform can choose to enable the tipping function or not. When the bloggers of the video platform enable the tipping function, the users of the video platform can Reward bloggers by electronic payment cash or platform currency. The specific reward information, such as the number of people who have been rewarded this month and the number of people who have been rewarded cumulatively, can be accessed through the interface provided by the video platform. Therefore, the user can crawl the tipping information of the bloggers according to the blogger information of the video platform and the interface of the video platform, and screen out the bloggers who have enabled the tipping function according to the scraped tipping information.

获取模块203,用于获取开通打赏功能的所述博主的视频的视频信息。The obtaining module 203 is configured to obtain the video information of the video of the blogger who has enabled the tipping function.

具体地,视频平台中每个博主通常都会上传很多的视频,视频平台通常都提供有能够直接访问博主视频信息的接口,通过改变接口中博主的信息,可以针对性的访问不同的博主的视频信息。因此,用户可以根据视频平台提供的接口和开通打赏功能的博主的信息爬取对应博主的视频的视频信息。Specifically, each blogger in the video platform usually uploads a lot of videos. The video platform usually provides an interface that can directly access the video information of the blogger. By changing the information of the blogger in the interface, different bloggers can be accessed in a targeted manner. video information. Therefore, the user can crawl the video information of the video corresponding to the blogger according to the interface provided by the video platform and the information of the blogger who has enabled the reward function.

图3为本发明实施例基于一种电子设备实体结构图。包括:FIG. 3 is a physical structure diagram of an electronic device based on an embodiment of the present invention. include:

存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如本发明实施例中任一项关于一种网络视频信息获取方法的步骤。A memory, a processor, and a computer program stored on the memory and running on the processor, characterized in that, when the processor executes the program, the acquisition of network video information as described in any one of the embodiments of the present invention is realized steps of the method.

举个例子如下:An example is as follows:

图3示例了一种电子设备的实体结构示意图,如图3所示,该电子设备可以包括:处理器(processor)301、通信接口(Communications Interface)303、存储器(memory)302和通信总线304,其中,处理器301,通信接口303,存储器302通过通信总线304完成相互间的通信。处理器301可以调用存储器302中的逻辑指令,以执行如下方法:根据博主的信息爬取博主的打赏信息并根据博主打赏信息筛选出开通了打赏了功能的博主,最后根据开通了打赏功能的博主的信息有针对性的获取开通打赏功能的博主的视频的视频信息。FIG. 3 illustrates a schematic diagram of the physical structure of an electronic device. As shown in FIG. 3 , the electronic device may include: a processor (processor) 301, a communication interface (Communications Interface) 303, a memory (memory) 302, and a communication bus 304, The processor 301 , the communication interface 303 , and the memory 302 communicate with each other through the communication bus 304 . The processor 301 can call the logic instruction in the memory 302 to perform the following method: crawl the blogger's tipping information according to the blogger's information and screen out the bloggers who have opened the tipping function according to the blogger's tipping information, and finally according to the blogger's tipping information. The information of the bloggers who have enabled the tipping function is targeted to obtain the video information of the videos of the bloggers who have enabled the tipping function.

此外,上述的存储器302中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the above-mentioned logic instructions in the memory 302 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product. Based on this understanding, the technical solution of the present invention can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .

另一方面,本发明实施例还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现以执行上述各实施例提供的传输方法,例如包括:根据博主的信息爬取博主的打赏信息并根据博主打赏信息筛选出开通了打赏了功能的博主,最后根据开通了打赏功能的博主的信息有针对性的获取开通打赏功能的博主的视频的视频信息。On the other hand, an embodiment of the present invention further provides a non-transitory computer-readable storage medium on which a computer program is stored, and the computer program is implemented by a processor to execute the transmission method provided by the above embodiments, for example, including : Crawling the blogger's reward information according to the blogger's information and screen out the bloggers who have enabled the reward function according to the blogger's reward information, and finally obtain targeted access according to the information of the bloggers who have opened the reward function. The video information of the video of the blogger with the tipping function.

以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.

通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on this understanding, the above-mentioned technical solutions can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products can be stored in computer-readable storage media, such as ROM/RAM, magnetic A disc, an optical disc, etc., includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments or some parts of the embodiments.

最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be The technical solutions described in the foregoing embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A network video information acquisition method is characterized by comprising the following steps,
crawling the information of bloggers on a video platform;
crawling the rewarding information of the bloggers, and screening the bloggers with rewarding functions;
and acquiring the video information of the video of the blogger with the opening and appreciation function.
2. The method for acquiring network video information according to claim 1, wherein the crawling of the information of the blogger on the video platform specifically comprises,
crawling the information of bloggers of the video within a preset time range of a video platform; or
And crawling all video blogger information of the video platform.
3. The method for acquiring network video information according to claim 1, wherein the crawling of the information of the blogger on the video platform specifically comprises,
crawling basic information of a video based on the number of the video of a video platform;
and acquiring the information of the blogger of the video based on the basic information of the video.
4. The method for acquiring network video information according to claim 1, wherein crawling the reward information of the blogger and screening the blogger who activates the reward function comprises,
crawling the reward information of the blogger based on the information of the blogger;
and screening out the bloggers with the reward opening function based on the crawled reward information of the bloggers.
5. The method according to claim 3, wherein the basic information of the video is crawled based on the number of the video platform; acquiring information of a blogger of the video based on the basic information of the video, specifically comprising,
acquiring the number of the video platform based on the interface of the video platform, and calculating the number of the video according to the number of the video;
creating a first thread pool comprising not less than 1 thread, and according to the number of the videos, dividing the number of the videos equal to the number of the threads in the first thread pool every time and simultaneously crawling the basic information of the videos;
and acquiring the information of the blogger of the video based on the basic information of the video.
6. The method according to claim 4, wherein the crawling of the reward information of the blogger based on the information of the blogger specifically comprises,
calculating the number of the bloggers according to the crawled information of the bloggers;
the method comprises the steps of establishing a second thread pool which comprises at least 1 thread, according to the number of the bloggers, dividing information which is equal to the number of the threads in the second thread pool every time, crawling the reward information of the bloggers, enabling the bloggers to have a reward function if the reward information of the bloggers is crawled, and enabling the bloggers not to have the reward function if the reward information of the bloggers is not crawled.
7. The method according to claim 1, wherein the step of obtaining video information of the video of the blogger with the reward function includes,
calculating the number of the bloggers with the reward opening function;
and creating a third thread pool comprising at least 1 thread, and according to the number of the bloggers opening the reward function, dividing the information of the bloggers opening the reward function, which is equal to the number of the threads in the third thread pool, every time and simultaneously crawling the video information of the video of the bloggers opening the reward function.
8. A network video information acquisition device is characterized by comprising,
the first crawling module is used for crawling information of bloggers on the video platform;
the second crawling module is used for crawling the rewarding information of the bloggers and screening the bloggers with rewarding functions;
and the acquisition module is used for acquiring the video information of the video of the blogger with the reward function.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the steps of the network video information acquisition method according to any one of claims 1 to 7.
10. A non-transitory computer readable storage medium, having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of a network video information acquisition method according to any one of claims 1 to 7.
CN202010136518.3A 2020-03-02 2020-03-02 Network video information acquisition method and equipment Pending CN111475697A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010136518.3A CN111475697A (en) 2020-03-02 2020-03-02 Network video information acquisition method and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010136518.3A CN111475697A (en) 2020-03-02 2020-03-02 Network video information acquisition method and equipment

Publications (1)

Publication Number Publication Date
CN111475697A true CN111475697A (en) 2020-07-31

Family

ID=71747113

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010136518.3A Pending CN111475697A (en) 2020-03-02 2020-03-02 Network video information acquisition method and equipment

Country Status (1)

Country Link
CN (1) CN111475697A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106657966A (en) * 2016-12-29 2017-05-10 四川大学 Method for rapidly generating integrated imaging 3D film source based on CPU multithreading
CN110602514A (en) * 2019-09-12 2019-12-20 腾讯科技(深圳)有限公司 Live channel recommendation method and device, electronic equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106657966A (en) * 2016-12-29 2017-05-10 四川大学 Method for rapidly generating integrated imaging 3D film source based on CPU multithreading
CN110602514A (en) * 2019-09-12 2019-12-20 腾讯科技(深圳)有限公司 Live channel recommendation method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
Macarthy 500 Social Media Marketing Tips: Essential Advice, Hints and Strategy for Business Facebook, Twitter, Pinterest, Google+, YouTube, Instagram, LinkedIn, and More!
Crovella et al. Internet measurement: infrastructure, traffic and applications
Gao et al. Even central users do not always drive information diffusion
US9600503B2 (en) Systems and methods for pruning data by sampling
Lah et al. Technology-as-a-service playbook: How to grow a profitable subscription business
US20160380875A1 (en) Identifying referral pages based on recorded url requests
CN106233325A (en) Generate activity summary
CN109359263B (en) A kind of user behavior feature extraction method and system
US20150237056A1 (en) Media dissemination system
CN104376066B (en) A kind of network certain content method for digging and device and a kind of electronic equipment
CA2894608A1 (en) A computer-implemented method of aggregating and presenting digital photos from numerous sources
CN111475697A (en) Network video information acquisition method and equipment
CN112836141B (en) Network resource access management method, system, device and storage medium
Singh et al. Learning big data with Amazon elastic MapReduce
You et al. Deepscraper: a complete and efficient tweet scraping method using authenticated multiprocessing
McGilvrey Instagram Secrets: The Underground Playbook for Growing Your Following Fast, Driving Massive Traffic & Generating Predictable Profits
JP6971053B2 (en) Data management equipment, data management methods, and programs
CN118760466B (en) Data processing method and related equipment based on federal learning model
CN105141663B (en) A kind of user behavior information collecting method and device
Wang The journey of cloud computing with open source
Visinescu The influence of business intelligence components on the quality of decision making
Anderson Social Media Marketing: A Beginner Guide To Get Success In Your Business (Volume 1)
Fox et al. Raspberry HadooPI: a low-cost, hands-on laboratory in big data and analytics
CN115795168A (en) A user recall method, device, equipment and computer-readable storage medium
Reinhardt Supporting agent-based modeling and simulation in demography

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200731

RJ01 Rejection of invention patent application after publication