爬虫 爬取12306车站的车次信息

用requests模块,爬取12306的车次信息

先看代码

import re
import requests
import json
a=requests.get('https://kyfw.12306.cn/otn/resources/js/framework/station_name.js')
# print(a.text)
a1=a.text
l=a1.split('@')
# print(l)
li={}
li1={}
for i in range(1,len(l)):
    s = re.findall(r"[|](.*|[A-Z]?)[|]", l[i])
    s1=s[0].split('|')
    # print(s1)
    li[s1[0].replace(' ','')]=s1[1]
    li1[s1[1]]=s1[0].replace(' ','')
# print(li)
# print(li['厦门'])
time=input('请输入出发日期(如2000-09-21):')
cf=input('请输入出发车站的名字:')
dd=input('请输入到达车站的名字:')
# time='2021-06-10'
# cf='沈阳'
# dd='锦州'
cf=li[cf]
dd=li[dd]
print(f'出发站{cf}到达站{dd}时间{time}')
print(f'出发站{li1[cf]},到达站{li1[dd]},时间{time}')
hand={
'Accept':'*/*',
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6',
'Cache-Control':'no-cache',
'Connection':'keep-alive',
'Cookie':f'_uab_collina=162260308402882393240191; JSESSIONID=A192852AEC13A9DEACE9911E49B4EB36; route=6f50b51faa11b987e576cdb301e545c4; BIGipServerotn=435159562.50210.0000; _jc_save_wfdc_flag=dc; BIGipServerpool_passport=216269322.50215.0000; RAIL_EXPIRATION=1623446044363; RAIL_DEVICEID=hO8LICSFIb2G-tCxEn6O6scXNZjc3rzN-lPYlI7UjdxWbfwvLOfw9XKkxeWTg_nF_R9CofOm_ldRAOUFBNsOwzmKkpMWNgMvA67_V0-xfQ_F455S3dTPxr4boEBXEOIDdKcCGZhqvx4FX_Gno_j1BDFuUQfEnjDa; _jc_save_fromStation=%u5317%u4EAC%2C{cf}; _jc_save_toStation=%u4E0A%u6D77%2C{dd}; current_captcha_type=C; _jc_save_fromDate={time}; _jc_save_toDate={time}',
'Host':'kyfw.12306.cn',
'If-Modified-Since':'0',
# 'Referer':f'https://kyfw.12306.cn/otn/leftTicket/init?linktypeid=dc&fs=%E5%8C%97%E4%BA%AC,{cf}&ts=%E4%B8%8A%E6%B5%B7,{dd}&date={time}&flag=N,N,Y',
'sec-ch-ua':'"Not;A Brand";v="99", "Microsoft Edge";v="91", "Chromium";v="91"',
'sec-ch-ua-mobile':'?0',
'Sec-Fetch-Dest':'empty',
'Sec-Fetch-Mode':'cors',
'Sec-Fetch-Site':'same-origin',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36 Edg/91.0.864.41',
'X-Requested-With':'XMLHttpRequest',
}
car=requests.get(f'https://kyfw.12306.cn/otn/leftTicket/query?leftTicketDTO.train_date={time}&leftTicketDTO.from_station={cf}&leftTicketDTO.to_station={dd}&purpose_codes=ADULT',headers=hand)
# print(car.json()['data']['result'])
c=car.json()['data']['result']
lo=[]
for i1 in c:
    we=i1.split('|')
    # print(we[2:12])
    lo.append(we[2:12])
for i in lo:
    print(f'车辆编号:{i[0]},车次:{i[1]},始发站:{li1[i[2]]},终点站:{li1[i[3]]},乘坐站:{li1[i[4]]},目的地:{li1[i[5]]},乘车日期:{time},进站时间:{i[6]},到站时间:{i[7]},历时:{i[8]},是否可购票:{i[9]}')
    # print('\n')

项目主要用到了re包,requests包

a=requests.get('https://kyfw.12306.cn/otn/resources/js/framework/station_name.js')
# print(a.text)
a1=a.text
l=a1.split('@')
# print(l)
li={}
li1={}
for i in range(1,len(l)):
    s = re.findall(r"[|](.*|[A-Z]?)[|]", l[i])
    s1=s[0].split('|')
    # print(s1)
    li[s1[0].replace(' ','')]=s1[1]
    li1[s1[1]]=s1[0].replace(' ','')

首先通过’https://kyfw.12306.cn/otn/resources/js/framework/station_name.js’爬取12306的地名的代码
爬虫 爬取12306车站的车次信息
通过分析数据,能够得到车站名与车站名对应的大写英文字母,并组成一个列表

time=input('请输入出发日期(如2000-09-21):')
cf=input('请输入出发车站的名字:')
dd=input('请输入到达车站的名字:')
# time='2021-06-10'
# cf='沈阳'
# dd='锦州'
cf=li[cf]
dd=li[dd]
print(f'出发站{cf}到达站{dd}时间{time}')
print(f'出发站{li1[cf]},到达站{li1[dd]},时间{time}')

然后输入我们要查询的车站,将其转换成大写英文字母代号

hand={
'Accept':'*/*',
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6',
'Cache-Control':'no-cache',
'Connection':'keep-alive',
'Cookie':f'_uab_collina=162260308402882393240191; JSESSIONID=A192852AEC13A9DEACE9911E49B4EB36; route=6f50b51faa11b987e576cdb301e545c4; BIGipServerotn=435159562.50210.0000; _jc_save_wfdc_flag=dc; BIGipServerpool_passport=216269322.50215.0000; RAIL_EXPIRATION=1623446044363; RAIL_DEVICEID=hO8LICSFIb2G-tCxEn6O6scXNZjc3rzN-lPYlI7UjdxWbfwvLOfw9XKkxeWTg_nF_R9CofOm_ldRAOUFBNsOwzmKkpMWNgMvA67_V0-xfQ_F455S3dTPxr4boEBXEOIDdKcCGZhqvx4FX_Gno_j1BDFuUQfEnjDa; _jc_save_fromStation=%u5317%u4EAC%2C{cf}; _jc_save_toStation=%u4E0A%u6D77%2C{dd}; current_captcha_type=C; _jc_save_fromDate={time}; _jc_save_toDate={time}',
'Host':'kyfw.12306.cn',
'If-Modified-Since':'0',
# 'Referer':f'https://kyfw.12306.cn/otn/leftTicket/init?linktypeid=dc&fs=%E5%8C%97%E4%BA%AC,{cf}&ts=%E4%B8%8A%E6%B5%B7,{dd}&date={time}&flag=N,N,Y',
'sec-ch-ua':'"Not;A Brand";v="99", "Microsoft Edge";v="91", "Chromium";v="91"',
'sec-ch-ua-mobile':'?0',
'Sec-Fetch-Dest':'empty',
'Sec-Fetch-Mode':'cors',
'Sec-Fetch-Site':'same-origin',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36 Edg/91.0.864.41',
'X-Requested-With':'XMLHttpRequest',
}
car=requests.get(f'https://kyfw.12306.cn/otn/leftTicket/query?leftTicketDTO.train_date={time}&leftTicketDTO.from_station={cf}&leftTicketDTO.to_station={dd}&purpose_codes=ADULT',headers=hand)

制作请求头,通过输入的站点,制作html请求,
通过https://kyfw.12306.cn/otn/leftTicket/query?leftTicketDTO.train_date={time}&leftTicketDTO.from_station={cf}&leftTicketDTO.to_station={dd}&purpose_codes=ADULT网站获取到所有符合要求的的列车信息

c=car.json()['data']['result']
lo=[]
for i1 in c:
    we=i1.split('|')
    # print(we[2:12])
    lo.append(we[2:12])
for i in lo:
    print(f'车辆编号:{i[0]},车次:{i[1]},始发站:{li1[i[2]]},终点站:{li1[i[3]]},乘坐站:{li1[i[4]]},目的地:{li1[i[5]]},乘车日期:{time},进站时间:{i[6]},到站时间:{i[7]},历时:{i[8]},是否可购票:{i[9]}')
    # print('\n')

通过分析数据得到车次,始发展,终点站进站时间等信息

上一篇:Redis - 5. Redis 数据结构之 Hash (哈希)表


下一篇:力扣705-设计哈希集合