偶然遇到一个傻瓜式搜索引擎 API

莫名其妙

事情的原委是这样的:严学姐的老板希望能获取全网所有含有某一关键词的文章,Then 这个人 turned to me for help。俺一来日久不敲代码已然技能十分生疏,二来不希望将时间浪费到重复的造轮子活动中三来在办公室敲代码太危险了。机缘巧合间发现了一款免费搜索引擎 API: SerpApi,也算是为我免除了不少体力活动。

来源:https://serpapi.com/

使用方法

废话不多说,直接上例子。所需的 Package 如下:

# Setup
import requests # Just in case I wanna crawl each page
import pandas as pd
from bs4 import BeautifulSoup
from serpapi import GoogleSearch # Key Package
from serpapi import BaiduSearch

下载 serpapi 的方法是:在 Anaconda Console (或者 CMD)中键入:

pip install google-search-results

在官网注册账号之后,可获得一个免费 API(每月100次):

api_key = "ad9xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxe1413"

接下来只需设定好 Parameters:

Google

# Setup with google search
params = {
  "engine": "google",
  "q": "青年律师如何寻找案源", # Keyword
  "num": 50,   
  "api_key": api_key
}

Baidu

# Baidu Search
params = {
  "engine": "baidu",
  "q": "青年律师如何寻找案源", 
  "rn": 50,  # The only adjustment made is the name of "num_of_page"
  "api_key": api_key
}

然后所需的便只是借助 API 发送 Get 请求,不需要自己设定 Headers 等:

Google

GSearch = GoogleSearch(params) # Search via api
GResults = GSearch.get_dict() # Get raw results
GOrganic_results = GResults["organic_results"] # Prettify the structure
GoogleSeachResultsDic = {}
for each in GOrganic_results:
    title = each['title']
    url = each['link']
    GoogleSeachResultsDic[title] = url # Make it a dictionary

Baidu

BSearch = BaiduSearch(params)
BResults = BSearch.get_dict()
BOrganic_results = BResults["organic_results"]
BaiduSearchResultsDic = {}
for each in BOrganic_results:
    title = each['title']
    url = each['link']
    BaiduSearchResultsDic[title] = url

然后将结果整合到一个 Excel 中输出即可

SearchResultsDic = GoogleSeachResultsDic | BaiduSearchResultsDic
# Export to an excel file
ExportDF = pd.DataFrame(data=SearchResultsDic, index=[0])
ExportDF = ExportDF.T
ExportDF.to_excel('Title and Links.xlsx')

结果如下图所示

以上只是一个简单的小例子,此外官网亦有许多范例可以参考~