最近打算做一个抓取12306车票信息的功能,自动登录实现起来有些问题,老是登录不成功。于是就做了不需要登录的一些查询的功能。先看看票价查询的读取。
也就是这个页面的查询
![]()
这个页面是POST方式提交数据,有一个验证码图片的读取。
一般来说IE浏览器打开后(IE9),F12(开发人员工具)-网络,可以进行捕获提交的url,数据等。
可以查看每一步请求了哪些页面和传输了哪些数据,相当于火狐的firebug。这个可以用IE9研究一下,我试了下IE7、IE8都没有这个功能,IE9可以的。
具体过程大家可以自己尝试,直接上代码。
因此获取验证码图片方法可以这样写:
setCookie字段存放的是当前页面的请求时的cookie。这个也是为了后面调用查询票价url时用的,用同一个cookie才能保证查询结果正确。
那么下面看传参数读取票价信息的方法:
至于其他参数也是通过捕获查询过程获取的。
所有代码如下,这里是用Visual studio 2010,wpf实现的界面显示。
![]()
其中比如:
parent.mygrid.addRow(0,"1,G832^skbcx.jsp?cxlx=cc&date=20131125&trainCode=G832_6c0000G83201,广州南^skbcx.jsp?cxlx=czjgcc&zm=&date=20131125&stationName_passTrain=%E5%B9%BF%E5%B7%9E%E5%8D%97^self,西安北^skbcx.jsp?cxlx=czjgcc&zm=&date=20131125&stationName_passTrain=%E8%A5%BF%E5%AE%89%E5%8C%97^self,--,--,1301.5,813.5,--,--,--,--,--,07:18,16:23,09:05,广州南^skbcx.jsp?cxlx=czjgcc&zm=&date=20131125&stationName_passTrain=%E5%B9%BF%E5%B7%9E%E5%8D%97^self,西安北^skbcx.jsp?cxlx=czjgcc&zm=&date=20131125&stationName_passTrain=%E8%A5%BF%E5%AE%89%E5%8C%97^self,高速,有",0);
这些才是票价信息,当然了还需要对字符串进行一些简单的处理。
用
也就是这个页面的查询
这个页面是POST方式提交数据,有一个验证码图片的读取。
一般来说IE浏览器打开后(IE9),F12(开发人员工具)-网络,可以进行捕获提交的url,数据等。
可以查看每一步请求了哪些页面和传输了哪些数据,相当于火狐的firebug。这个可以用IE9研究一下,我试了下IE7、IE8都没有这个功能,IE9可以的。
具体过程大家可以自己尝试,直接上代码。
因此获取验证码图片方法可以这样写:
private static string setCookie = string.Empty; /// <summary> /// 票价查询验证码 /// </summary> /// <returns></returns> public static string DoGetTrainTicketPriceRandCodeImage() { string resultImage = string.Empty; try { HttpWebRequest request = (HttpWebRequest)WebRequest.Create(@"http://dynamic.12306.cn/TrainQuery/passCodeActi0n.do?rand=rrand"); request.Accept = @"image/png, image/svg+xml, image/*;q=0.8, */*;q=0.5"; request.UserAgent = @"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"; request.Referer = @"http://dynamic.12306.cn/TrainQuery/ticketPriceByStation.jsp"; request.Method = "GET"; using (HttpWebResponse response = (HttpWebResponse)request.GetResponse()) { setCookie = response.GetResponseHeader("Set-cookie"); setCookie = Regex.Replace(setCookie, "path(?:[^,]+),?", "", RegexOptions.IgnoreCase); using (Stream reader = response.GetResponseStream()) { string path = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, new Random().Next(10000, 99999) + @"queryTicketPrice.JPEG"); using (FileStream file = new FileStream(path, FileMode.OpenOrCreate, FileAccess.Write)) { reader.CopyTo(file); } resultImage = path; } } } catch { } return resultImage; }之所以写的这么麻烦,我原本打算用WebClient直接将http://dynamic.12306.cn/TrainQuery/passCodeActi0n.do?rand=rrand这个路径的文件下载下来,但是,测试时发现直接下载下来的输入验证码后返回验证码过期。于是用这种方式下载文件然后保存。并将图片文件显示在界面上。
setCookie字段存放的是当前页面的请求时的cookie。这个也是为了后面调用查询票价url时用的,用同一个cookie才能保证查询结果正确。
那么下面看传参数读取票价信息的方法:
/// <summary> /// 火车票票价信息字符串 /// </summary> public static string DoQueryTrainTicketPriceInfo(DateTime fromDate, string startStation,string arriveStation,string randCode) { string result = string.Empty; try { HttpWebRequest request = (HttpWebRequest)WebRequest.Create(@"http://dynamic.12306.cn/TrainQuery/iframeTicketPriceByStation.jsp"); request.Accept = @"text/html, application/xhtml+xml, */*"; request.UserAgent = @"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"; request.Referer = @"http://dynamic.12306.cn/TrainQuery/ticketPriceByStation.jsp"; request.ContentType = @"application/x-www-form-urlencoded"; request.Headers[HttpRequestHeader.Cookie] = setCookie; request.Method = "POST"; byte[] buffer = new byte[0]; DoGetTrainTicketPriceRandCodeImage(); string parameter = @"condition=&queryMode=1&nmonth2={0}&nmonth2_new_value=true&nday2={1}&nday2_new_value=false&startStation_ticketPrice1={2} &startStation_ticketPrice1_new_value=false&arriveStation_ticketPrice1={3}&arriveStation_ticketPrice1_new_value=false&trainCode=&trainCode_new_value=true& rFlag=1&name_ckball=value_ckball&tFlagDC=DC&tFlagZ=Z&tFlagT=T&tFlagK=K&tFlagPK=PK&tFlagPKE=PKE&tFlagLK=LK&randCode={4}"; parameter = string.Format(parameter, fromDate.Month, fromDate.Day, startStation, arriveStation, randCode); buffer = Encoding.UTF8.GetBytes(parameter); request.ContentLength = buffer.Length; using (Stream writer = request.GetRequestStream()) { writer.Write(buffer, 0, buffer.Length); writer.Flush(); } using (HttpWebResponse response = (HttpWebResponse)request.GetResponse()) { using (StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.UTF8)) { result = reader.ReadToEnd(); } } } catch { } return result; }这里比较关键的就是传输正文,也就是parameter字符串的内容,关键参数也就是代码中的那几个,时间,出发站,到达站和验证码。这里的request.Headers[HttpRequestHeader.Cookie]就要用到读取验证码文件那里的同一个cookie。
至于其他参数也是通过捕获查询过程获取的。
所有代码如下,这里是用Visual studio 2010,wpf实现的界面显示。
主要代码:
public class TrainTicketPriceQuery { private static string setCookie = string.Empty; /// <summary> /// 火车票票价信息字符串 /// </summary> public static string DoQueryTrainTicketPriceInfo(DateTime fromDate, string startStation, string arriveStation, string randCode) { string result = string.Empty; try { HttpWebRequest request = (HttpWebRequest)WebRequest.Create(@"http://dynamic.12306.cn/TrainQuery/iframeTicketPriceByStation.jsp"); request.Accept = @"text/html, application/xhtml+xml, */*"; request.UserAgent = @"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"; request.Referer = @"http://dynamic.12306.cn/TrainQuery/ticketPriceByStation.jsp"; request.ContentType = @"application/x-www-form-urlencoded"; request.Headers[HttpRequestHeader.Cookie] = setCookie; request.Method = "POST"; byte[] buffer = new byte[0]; DoGetTrainTicketPriceRandCodeImage(); string parameter = @"condition=&queryMode=1&nmonth2={0}&nmonth2_new_value=true&nday2={1}&nday2_new_value=false&startStation_ticketPrice1={2} &startStation_ticketPrice1_new_value=false&arriveStation_ticketPrice1={3}&arriveStation_ticketPrice1_new_value=false&trainCode=&trainCode_new_value=true& rFlag=1&name_ckball=value_ckball&tFlagDC=DC&tFlagZ=Z&tFlagT=T&tFlagK=K&tFlagPK=PK&tFlagPKE=PKE&tFlagLK=LK&randCode={4}"; parameter = string.Format(parameter, fromDate.Month, fromDate.Day, startStation, arriveStation, randCode); buffer = Encoding.UTF8.GetBytes(parameter); request.ContentLength = buffer.Length; using (Stream writer = request.GetRequestStream()) { writer.Write(buffer, 0, buffer.Length); writer.Flush(); } using (HttpWebResponse response = (HttpWebResponse)request.GetResponse()) { using (StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.UTF8)) { result = reader.ReadToEnd(); } } } catch { } return result; } /// <summary> /// 票价查询验证码 /// </summary> /// <returns></returns> public static string DoGetTrainTicketPriceRandCodeImage() { string resultImage = string.Empty; try { HttpWebRequest request = (HttpWebRequest)WebRequest.Create(@"http://dynamic.12306.cn/TrainQuery/passCodeActi0n.do?rand=rrand"); request.Accept = @"image/png, image/svg+xml, image/*;q=0.8, */*;q=0.5"; request.UserAgent = @"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"; request.Referer = @"http://dynamic.12306.cn/TrainQuery/ticketPriceByStation.jsp"; request.Method = "GET"; using (HttpWebResponse response = (HttpWebResponse)request.GetResponse()) { setCookie = response.GetResponseHeader("Set-cookie"); setCookie = Regex.Replace(setCookie, "path(?:[^,]+),?", "", RegexOptions.IgnoreCase); using (Stream reader = response.GetResponseStream()) { string path = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, new Random().Next(10000, 99999) + @"queryTicketPrice.JPEG"); using (FileStream file = new FileStream(path, FileMode.OpenOrCreate, FileAccess.Write)) { reader.CopyTo(file); } resultImage = path; } } } catch { } return resultImage; } }界面代码:
<Window x:Class="TestQueryTrainTicketPrice.MainWindow" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" Title="MainWindow" Width="900" MinHeight="600"> <StackPanel> <Grid> <Grid.Resources> <Style TargetType="TextBlock"> <Setter Property="FontFamily" Value="Microsoft YaHei"/> <Setter Property="FontSize" Value="20"/> <Setter Property="VerticalAlignment" Value="Center"/> </Style> <Style TargetType="TextBox"> <Setter Property="FontSize" Value="20"/> <Setter Property="MinWidth" Value="300"/> <Setter Property="Height" Value="50"/> <Setter Property="VerticalAlignment" Value="Center"/> </Style> </Grid.Resources> <Grid.RowDefinitions> <RowDefinition Height="auto"/> <RowDefinition Height="auto"/> <RowDefinition Height="auto"/> <RowDefinition Height="auto"/> </Grid.RowDefinitions> <Grid.ColumnDefinitions> <ColumnDefinition Width="auto"/> <ColumnDefinition Width="auto"/> </Grid.ColumnDefinitions> <TextBlock Grid.Row="0" Grid.Column="0" Text="日 期"/> <TextBox Grid.Row="0" Grid.Column="1" x:Name="txtDate" /> <TextBlock Grid.Row="1" Grid.Column="0" Text="出发站"/> <TextBox Grid.Row="1" Grid.Column="1" x:Name="txtStartStation" /> <TextBlock Grid.Row="2" Grid.Column="0" Text="到达站"/> <TextBox Grid.Row="2" Grid.Column="1" x:Name="txtArriveStation" /> <TextBlock Grid.Row="3" Grid.Column="0" Text="验证码"/> <StackPanel Grid.Row="3" Grid.Column="1" Orientation="Horizontal" VerticalAlignment="Center"> <TextBox x:Name="txtRandCode" /> <Image x:Name="imageRandCode" Width="auto"/> <Button Content="刷新验证码" Height="30" Width="100" Click="Button_Click_1" /> </StackPanel> </Grid> <Button Content="开始查询" Width="150" Height="50" Click="Button_Click" /> <RichTextBox x:Name="rtxResultInfo" VerticalScrollBarVisibility="Visible" Height="315"/> </StackPanel> </Window>控制显示代码:
public partial class MainWindow : Window { public MainWindow() { InitializeComponent(); this.Loaded += new RoutedEventHandler(MainWindow_Loaded); } void MainWindow_Loaded(object sender, RoutedEventArgs e) { string date = DateTime.Now.AddDays(3).ToString(@"yyyy-MM-dd"); this.txtDate.Text = date; this.txtStartStation.Text = @"广州"; this.txtArriveStation.Text = @"西安"; InitialRandCode(); } private void InitialRandCode() { //初始化验证码 string randCodeImageUrl = TrainTicketPriceQuery.DoGetTrainTicketPriceRandCodeImage(); if (!string.IsNullOrEmpty(randCodeImageUrl)) { ImageSource src = (ImageSource)(new ImageSourceConverter().ConvertFromString(randCodeImageUrl)); this.imageRandCode.Source = src; } } /// <summary> /// 开始查询 /// </summary> /// <param name="sender"></param> /// <param name="e"></param> private void Button_Click(object sender, RoutedEventArgs e) { string startStation = this.txtStartStation.Text.Trim(); string arriveStation = this.txtArriveStation.Text.Trim(); string randCode = this.txtRandCode.Text.Trim(); string startDate = this.txtDate.Text; if (string.IsNullOrEmpty(startStation) || string.IsNullOrEmpty(arriveStation) || string.IsNullOrEmpty(randCode) || string.IsNullOrEmpty(startDate)) { MessageBox.Show("参数不完整,请重新填写!"); return; } DateTime fromDate = DateTime.Parse(startDate); string queryResult = TrainTicketPriceQuery.DoQueryTrainTicketPriceInfo(DateTime.Now.AddDays(4), startStation, arriveStation, randCode); System.Windows.Documents.FlowDocument doc = rtxResultInfo.Document; doc.Blocks.Clear(); rtxResultInfo.AppendText(queryResult); } private void Button_Click_1(object sender, RoutedEventArgs e) { InitialRandCode(); } }效果,
其中比如:
parent.mygrid.addRow(0,"1,G832^skbcx.jsp?cxlx=cc&date=20131125&trainCode=G832_6c0000G83201,广州南^skbcx.jsp?cxlx=czjgcc&zm=&date=20131125&stationName_passTrain=%E5%B9%BF%E5%B7%9E%E5%8D%97^self,西安北^skbcx.jsp?cxlx=czjgcc&zm=&date=20131125&stationName_passTrain=%E8%A5%BF%E5%AE%89%E5%8C%97^self,--,--,1301.5,813.5,--,--,--,--,--,07:18,16:23,09:05,广州南^skbcx.jsp?cxlx=czjgcc&zm=&date=20131125&stationName_passTrain=%E5%B9%BF%E5%B7%9E%E5%8D%97^self,西安北^skbcx.jsp?cxlx=czjgcc&zm=&date=20131125&stationName_passTrain=%E8%A5%BF%E5%AE%89%E5%8C%97^self,高速,有",0);
这些才是票价信息,当然了还需要对字符串进行一些简单的处理。
用
Regex regex = new Regex(@"parent.mygrid.addRow"); string[] arrows = regex.Split(priceContent);
将整个字符串中的每一个车次的信息分离出来,然后再对每一条数据截取、替换等操作最终就能得到你想要的列车票价信息了。
作者:yysyangyangyangshan 发表于2013-11-21 23:04:17 原文链接
阅读:64 评论:0 查看评论