MMOwned - World of Warcraft Exploits, Hacks, Bots and Guides

Homepage Register FAQ Members Mark Forums Read Advertise Marketplace FPSowned


Go Back   MMOwned - World of Warcraft Exploits, Hacks, Bots and Guides > Programming > Programming section > C#
Reload this Page How to get and parse information from a web page.
C# Discussions about C#

Reply
 
LinkBack Thread Tools
How to get and parse information from a web page.
(#1)
Old
Apoc's Avatar
Apoc is Offline
c|_| My care cup is empty
Legendary User
Rep Power: 5
Reputation: 614
Apoc is a name known to allApoc is a name known to allApoc is a name known to allApoc is a name known to allApoc is a name known to allApoc is a name known to all
 
Posts: 619
Join Date: Jan 2008
How to get and parse information from a web page. - 04-14-2008

Well, since there has been a bit of an uproar lately about how exactly to get and parse information from web pages, I figured I'd write a small tutorial about how to do so. (Using MMOwned as a test dummy for the URL.)

First thing we need to do, is create a new windows forms application. (This tutorial assumes it's named MMOwnedRepParser)

First things first, we'll make it flashy.



Very simple, right?

Few things to keep in mind, to make it easier to explain.

The form's name is "MainForm"
The "Get Rep" button's name is "btnGetRep"
The "No User Selected!" lable is "lblUserRep"
The textbox name is "txtUserName"


Now, first things first, we need to add in a simple HTTPGET method to return a web page's source.

Create a new class named "Http" and change it to the following:

Code:
using System.IO;
using System.Net;
using System.Text;

namespace MMOwnedRepParser
{
    public class Http
    {
        private static HttpWebResponse HttpWResponse;

        public static string GetHTTP(string url)
        {
            // Send a request to the URL provided when the method was called.
            var HttpWRequest = (HttpWebRequest)WebRequest.Create(url);

            // Set some specific things needed for certain web pages to be viewed.
            HttpWRequest.Credentials = CredentialCache.DefaultCredentials;
            HttpWRequest.UserAgent = "MMOwned Wins Hard";
            HttpWRequest.KeepAlive = true;
            HttpWRequest.Headers.Set("Pragma", "no-cache");
            HttpWRequest.Timeout = 300000;

            // We are only GETting the page information. We are not passing it any.
            HttpWRequest.Method = "GET";

            // This is in a try/catch block due to some pages going offline. 
            // (If we didn't catch the error, we would crash the app)
            try
            {
                // Get the response we sent with the HttpWRequest from above.
                HttpWResponse = (HttpWebResponse)HttpWRequest.GetResponse();

                // Read the page we got from the response, and pass it out as our return statement.
                var sr = new StreamReader(HttpWResponse.GetResponseStream(), Encoding.ASCII);
                var s = sr.ReadToEnd();

                // Make sure we close our reader, or we end up with some nasty bugs.
                sr.Close();
                return s;
            }
            catch (WebException)
            {
                // The page could not be viewed. So we return an ERROR string instead.
                return "ERROR";
            }
        }
    }
}
The code itself is documented fairly well, so I won't bother explaining it.

Just keep in mind, this is a very simple httpGET method. It does not handle POST http methods.

Now that we have our way to grab the page information, let's create a way to find the rep using the user profile page of MMOwned.

First things first we need to see what type of page source is generated. Using my own profile view (by clicking on my name, not by going to "User CP") Right click and select "View Page Source" (Might be different in other browsers, you want to view the source of the page.)

Now we need to find where reputation is displayed. (Luckily, this page is mostly static, so the position of what we want is always in the same place.)

The bit of information we want to find is the following:

Code:
<span class="smallfont" style="float:right">

				202 point(s) total
				&nbsp; &nbsp;
				<a href="/forums/members/apoc.html#top" onclick="return toggle_collapse('profile_reputation')">
All we really want is the "202 point(s) total" since we just want to see how much rep a given person has.

Now we're going to use a bit of regex (regular expressions) to find that single line so we can use it.

Create a new method. (I created it right in the MainForm code file. Just double click the form to open it, or select it from the solution browser.)

We first need to add the following using directive:

Code:
using System.Text.RegularExpressions;
This will allow us to use Regex.

Now we add the following method to parse the page we received and get our reputation points.

Code:
private string Rep(string toSearch)
        {
            var rx = new Regex(@"d*spoint(s)stotal");
            return rx.Match(toSearch).ToString();
        }
The "var rx" is initializing a new instance of Regex using the supplied regex string. It will return a match of "<any number of digits> point(s) total" if it finds it.

Then we just return the match from our page string we will be passing to it in a minute.

Now, to make this do anything, we need to add some code for the button itself.

Back in the designer view for the form, double click the button, to bring up the OnClick event handler. (Visual Studio does this automatically for you when you double click)

Code:
        private void btnGetRep_Click(object sender, EventArgs e)
        {

        }
So now we need to add in our code, first, let's make sure we have something typed in the text box for the user name.

Code:
        private void btnGetRep_Click(object sender, EventArgs e)
        {
            if (txtUserName.Text.Length == 0)
            {
                MessageBox.Show("Please enter a user name!");
            }
        }
Pretty simple right?

Now let's actually make this thing work!

Update the method as follows:

Code:
        private void btnGetRep_Click(object sender, EventArgs e)
        {
            if (txtUserName.Text.Length == 0)
            {
                MessageBox.Show("Please enter a user name!");
            }
            else
            {
                var urlSource = Http.GetHTTP(String.Format("http://www.mmowned.com/forums/members/{0}.html", txtUserName.Text));
                lblUserRep.Text = String.Format("{0} has {1}", txtUserName.Text, Rep(urlSource));
            }
        }
I split this up to make it easier to read. The first part of the else statement, grabs our url source, and stores it in a string variable. (The compiler will use the implicitly typed "var" as a string by itself.)

Next we update our lblUserRep to show the name we entered in the username box, and the Rep that we parsed using our Rep method from earlier.

Now, here's a little bit more on this tutorial, what if we searched for a member who doesn't exist?

Our label ends up saying "<SomeUser> has" with no rep. Well, we can make it look a bit prettier and easier by making the following changes:

In the Rep method:

Code:
        private string Rep(string toSearch)
        {
            var rx = new Regex(@"d*spoint(s)stotal");
            return rx.Match(toSearch).Success ? rx.Match(toSearch).ToString() : null;
        }
Now we've changed the return statement to something you may not understand easily. In short, if the regex match was successful, return the matching string, if not, return null.

Now in our button method we change it to the following:

Code:
        private void btnGetRep_Click(object sender, EventArgs e)
        {
            if (txtUserName.Text.Length == 0)
            {
                MessageBox.Show("Please enter a user name!");
            }
            else
            {
                var urlSource = Http.GetHTTP(String.Format("http://www.mmowned.com/forums/members/{0}.html", txtUserName.Text));
                if (Rep(urlSource) != null)
                {
                    lblUserRep.Text = String.Format("{0} has {1}", txtUserName.Text, Rep(urlSource));
                }
                else
                {
                    lblUserRep.Text = String.Format("User {0} does not exist!", txtUserName.Text);
                    MessageBox.Show("Invalid username!");
                }
            }
        }
Now, if our Rep method returns null, we'll get a message box, and our label will show that the user in question does not exist!

All done! You can use this method to do a lot of other types of web parsing as well.

This is almost the same method I use in the Account Check Aisle Four program.

Enjoy folks!

Edit: Source is below. (Written in Visual Studio 2008 Team Suite and .NET 3.5. If you have problems with it, too bad. I'm not re-doing it in another IDE or .NET version.)
[Only registered and activated users can see links. ]


Edit2: Since I know someone will come complaining, this tutorial does NOT touch on thread invoking to stop the GUI from freezing while the web request method is called. That's beyond the scope of this tutorial, and will be handled elsewhere, or google'd.



[Only registered and activated users can see links. ]

Last edited by Apoc; 04-14-2008 at 03:57 PM.
Reply With Quote

Donate to remove ads.
(#2)
Old
2dgreengiant's Avatar
2dgreengiant is Offline
Has a fist of god

Rep Power: 9
Reputation: 953
2dgreengiant is a splendid one to behold2dgreengiant is a splendid one to behold2dgreengiant is a splendid one to behold2dgreengiant is a splendid one to behold2dgreengiant is a splendid one to behold2dgreengiant is a splendid one to behold2dgreengiant is a splendid one to behold2dgreengiant is a splendid one to behold
 
Posts: 3,795
Join Date: Feb 2007
Location: MMowned
04-14-2008

gawd just what i needed :P






[Only registered and activated users can see links. ]Errage's angry face [Only registered and activated users can see links. ][Only registered and activated users can see links. ]
Reply With Quote
(#3)
Old
-Lex is Offline
Banned
Rep Power: 0
Reputation: 1055
-Lex has much to be proud of-Lex has much to be proud of-Lex has much to be proud of-Lex has much to be proud of-Lex has much to be proud of-Lex has much to be proud of-Lex has much to be proud of-Lex has much to be proud of
 
Posts: 958
Join Date: Jun 2007
Location: Hyboria
04-14-2008

cool ........
Reply With Quote
(#4)
Old
Yeti is Offline
Banned
Rep Power: 0
Reputation: 181
Yeti has a spectacular aura aboutYeti has a spectacular aura about
 
Posts: 660
Join Date: Feb 2008
Location: Winterspring..mainly
04-14-2008

apoc this is awesome!
thank you for commenting the code too!
Reply With Quote
(#5)
Old
slack7219 is Offline
Site n00b.. (A leecher if I've been here for more than a month and can't earn 5 rep)
Rep Power: 1
Reputation: 4
slack7219 is an unknown quantity at this point
 
Posts: 8
Join Date: Feb 2008
07-08-2008

you don't invoke a thread,you just run that method on a different thread,if you have something to modify to the controls on the form you use a delegate and control.invoke it.
Reply With Quote
(#6)
Old
Caroe's Avatar
Caroe is Offline
Sergeant
Rep Power: 2
Reputation: 14
Caroe is on a distinguished road
 
Posts: 60
Join Date: Jul 2007
Location: H4x0r W0r|d
07-09-2008

Awesome
Reply With Quote
(#7)
Old
Apoc's Avatar
Apoc is Offline
c|_| My care cup is empty
Legendary User
Rep Power: 5
Reputation: 614
Apoc is a name known to allApoc is a name known to allApoc is a name known to allApoc is a name known to allApoc is a name known to allApoc is a name known to all
 
Posts: 619
Join Date: Jan 2008
07-09-2008

Quote:
Originally Posted by slack7219 View Post
you don't invoke a thread,you just run that method on a different thread,if you have something to modify to the controls on the form you use a delegate and control.invoke it.
That's invoking a thread. -_-

And I'm fully aware of how to use cross thread calls. (I usually use AsyncCallback for these types of things) But hey, whatever floats your boat.


[Only registered and activated users can see links. ]
Reply With Quote
(#8)
Old
slack7219 is Offline
Site n00b.. (A leecher if I've been here for more than a month and can't earn 5 rep)
Rep Power: 1
Reputation: 4
slack7219 is an unknown quantity at this point
 
Posts: 8
Join Date: Feb 2008
07-10-2008

you may use an async call but there may be cases where you would like your thread to wait for those updates to the ui thread for god knows whatever reason.on a side note, the BackgroundWorker class is a nice little helper that could be used here ,it does the same thing as a normal thread would but with some bonuses
Reply With Quote
(#9)
Old
Apoc's Avatar
Apoc is Offline
c|_| My care cup is empty
Legendary User
Rep Power: 5
Reputation: 614
Apoc is a name known to allApoc is a name known to allApoc is a name known to allApoc is a name known to allApoc is a name known to allApoc is a name known to all
 
Posts: 619
Join Date: Jan 2008
07-10-2008

Quote:
Originally Posted by slack7219 View Post
you may use an async call but there may be cases where you would like your thread to wait for those updates to the ui thread for god knows whatever reason.on a side note, the BackgroundWorker class is a nice little helper that could be used here ,it does the same thing as a normal thread would but with some bonuses
And drawbacks on high priority asynch threads. (Which a BackgroundWorker is not)

If you want to talk threading, please make a new thread in these forums. (Please excuse the pun)


[Only registered and activated users can see links. ]
Reply With Quote
Reply

Donate to remove ads.

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are On




Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
LinkBacks Enabled by vBSEO 3.1.0
vBulletin Skin developed by: vBStyles.com


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342