偶然遇到一个傻瓜式搜索引擎 API
莫名其妙
事情的原委是这样的:严学姐的老板希望能获取全网所有含有某一关键词的文章,Then 这个人 turned to me for help。俺一来日久不敲代码已然技能十分生疏,二来不希望将时间浪费到重复的造轮子活动中三来在办公室敲代码太危险了。机缘巧合间发现了一款免费搜索引擎 API: SerpApi,也算是为我免除了不少体力活动。

使用方法
废话不多说,直接上例子。所需的 Package 如下:
# Setup
import requests # Just in case I wanna crawl each page
import pandas as pd
from bs4 import BeautifulSoup
from serpapi import GoogleSearch # Key Package
from serpapi import BaiduSearch
下载 serpapi 的方法是:在 Anaconda Console (或者 CMD)中键入:
pip install google-search-results
在官网注册账号之后,可获得一个免费 API(每月100次):
api_key = "ad9xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxe1413"
接下来只需设定好 Parameters:
# Setup with google search
params = {
"engine": "google",
"q": "青年律师如何寻找案源", # Keyword
"num": 50,
"api_key": api_key
}
Baidu
# Baidu Search
params = {
"engine": "baidu",
"q": "青年律师如何寻找案源",
"rn": 50, # The only adjustment made is the name of "num_of_page"
"api_key": api_key
}
然后所需的便只是借助 API 发送 Get 请求,不需要自己设定 Headers 等:
GSearch = GoogleSearch(params) # Search via api
GResults = GSearch.get_dict() # Get raw results
GOrganic_results = GResults["organic_results"] # Prettify the structure
GoogleSeachResultsDic = {}
for each in GOrganic_results:
title = each['title']
url = each['link']
GoogleSeachResultsDic[title] = url # Make it a dictionary
Baidu
BSearch = BaiduSearch(params)
BResults = BSearch.get_dict()
BOrganic_results = BResults["organic_results"]
BaiduSearchResultsDic = {}
for each in BOrganic_results:
title = each['title']
url = each['link']
BaiduSearchResultsDic[title] = url
然后将结果整合到一个 Excel 中输出即可
SearchResultsDic = GoogleSeachResultsDic | BaiduSearchResultsDic
# Export to an excel file
ExportDF = pd.DataFrame(data=SearchResultsDic, index=[0])
ExportDF = ExportDF.T
ExportDF.to_excel('Title and Links.xlsx')
结果如下图所示

以上只是一个简单的小例子,此外官网亦有许多范例可以参考~