MMOwned - World of Warcraft Exploits, Hacks, Bots and Guides  
Homepage Register FAQ Members Mark Forums Read Advertise Marketplace FPSowned


Go Back   MMOwned - World of Warcraft Exploits, Hacks, Bots and Guides > Programming > Programming section > C#
Reload this Page How to get and parse information from a web page.
C# Discussions about C#

Reply
 
LinkBack Thread Tools
How to get and parse information from a web page.
(#1)
Old
Apoc's Avatar
Apoc is Offline
c|_| My care cup is empty
Legendary User
Rep Power: 5
Reputation: 688
Apoc is a splendid one to beholdApoc is a splendid one to beholdApoc is a splendid one to beholdApoc is a splendid one to beholdApoc is a splendid one to beholdApoc is a splendid one to behold
 
Posts: 734
Join Date: Jan 2008
How to get and parse information from a web page. - 04-14-2008

Well, since there has been a bit of an uproar lately about how exactly to get and parse information from web pages, I figured I'd write a small tutorial about how to do so. (Using MMOwned as a test dummy for the URL.)

First thing we need to do, is create a new windows forms application. (This tutorial assumes it's named MMOwnedRepParser)

First things first, we'll make it flashy.



Very simple, right?

Few things to keep in mind, to make it easier to explain.

The form's name is "MainForm"
The "Get Rep" button's name is "btnGetRep"
The "No User Selected!" lable is "lblUserRep"
The textbox name is "txtUserName"


Now, first things first, we need to add in a simple HTTPGET method to return a web page's source.

Create a new class named "Http" and change it to the following:

Code:
using System.IO;
using System.Net;
using System.Text;

namespace MMOwnedRepParser
{
    public class Http
    {
        private static HttpWebResponse HttpWResponse;

        public static string GetHTTP(string url)
        {
            // Send a request to the URL provided when the method was called.
            var HttpWRequest = (HttpWebRequest)WebRequest.Create(url);

            // Set some specific things needed for certain web pages to be viewed.
            HttpWRequest.Credentials = CredentialCache.DefaultCredentials;
            HttpWRequest.UserAgent = "MMOwned Wins Hard";
            HttpWRequest.KeepAlive = true;
            HttpWRequest.Headers.Set("Pragma", "no-cache");
            HttpWRequest.Timeout = 300000;

            // We are only GETting the page information. We are not passing it any.
            HttpWRequest.Method = "GET";

            // This is in a try/catch block due to some pages going offline. 
            // (If we didn't catch the error, we would crash the app)
            try
            {
                // Get the response we sent with the HttpWRequest from above.
                HttpWResponse = (HttpWebResponse)HttpWRequest.GetResponse();

                // Read the page we got from the response, and pass it out as our return statement.
                var sr = new StreamReader(HttpWResponse.GetResponseStream(), Encoding.ASCII);
                var s = sr.ReadToEnd();

                // Make sure we close our reader, or we end up with some nasty bugs.
                sr.Close();
                return s;
            }
            catch (WebException)
            {
                // The page could not be viewed. So we return an ERROR string instead.
                return "ERROR";
            }
        }
    }
}
The code itself is documented fairly well, so I won't bother explaining it.

Just keep in mind, this is a very simple httpGET method. It does not handle POST http methods.

Now that we have our way to grab the page information, let's create a way to find the rep using the user profile page of MMOwned.

First things first we need to see what type of page source is generated. Using my own profile view (by clicking on my name, not by going to "User CP") Right click and select "View Page Source" (Might be different in other browsers, you want to view the source of the page.)

Now we need to find where reputation is displayed. (Luckily, this page is mostly static, so the position of what we want is always in the same place.)

The bit of information we want to find is the following:

Code:
<span class="smallfont" style="float:right">

				202 point(s) total
				&nbsp; &nbsp;
				<a href="/forums/members/apoc.html#top" onclick="return toggle_collapse('profile_reputation')">
All we really want is the "202 point(s) total" since we just want to see how much rep a given person has.

Now we're going to use a bit of regex (regular expressions) to find that single line so we can use it.

Create a new method. (I created it right in the MainForm code file. Just double click the form to open it, or select it from the solution browser.)

We first need to add the following using directive:

Code:
using System.Text.RegularExpressions;
This will allow us to use Regex.

Now we add the following method to parse the page we received and get our reputation points.

Code:
private string Rep(string toSearch)
        {
            var rx = new Regex(@"d*spoint(s)stotal");
            return rx.Match(toSearch).ToString();
        }
The "var rx" is initializing a new instance of Regex using the supplied regex string. It will return a match of "<any number of digits> point(s) total" if it finds it.

Then we just return the match from our page string we will be passing to it in a minute.

Now, to make this do anything, we need to add some code for the button itself.

Back in the designer view for the form, double click the button, to bring up the OnClick event handler. (Visual Studio does this automatically for you when you double click)

Code:
        private void btnGetRep_Click(object sender, EventArgs e)
        {

        }
So now we need to add in our code, first, let's make sure we have something typed in the text box for the user name.

Code:
        private void btnGetRep_Click(object sender, EventArgs e)
        {
            if (txtUserName.Text.Length == 0)
            {
                MessageBox.Show("Please enter a user name!");
            }
        }
Pretty simple right?

Now let's actually make this thing work!

Update the method as follows:

Code:
        private void btnGetRep_Click(object sender, EventArgs e)
        {
            if (txtUserName.Text.Length == 0)
            {
                MessageBox.Show("Please enter a user name!");
            }
            else
            {
                var urlSource = Http.GetHTTP(String.Format("http://www.mmowned.com/forums/members/{0}.html", txtUserName.Text));
                lblUserRep.Text = String.Format("{0} has {1}", txtUserName.Text, Rep(urlSource));
            }
        }
I split this up to make it easier to read. The first part of the else statement, grabs our url source, and stores it in a string variable. (The compiler will use the implicitly typed "var" as a string by itself.)

Next we update our lblUserRep to show the name we entered in the username box, and the Rep that we parsed using our Rep method from earlier.

Now, here's a little bit more on this tutorial, what if we searched for a member who doesn't exist?

Our label ends up saying "<SomeUser> has" with no rep. Well, we can make it look a bit prettier and easier by making the following changes:

In the Rep method:

Code:
        private string Rep(string toSearch)
        {
            var rx = new Regex(@"d*spoint(s)stotal");
            return rx.Match(toSearch).Success ? rx.Match(toSearch).ToString() : null;
        }
Now we've changed the return statement to something you may not understand easily. In short, if the regex match was successful, return the matching string, if not, return null.

Now in our button method we change it to the following:

Code:
        private void btnGetRep_Click(object sender, EventArgs e)
        {
            if (txtUserName.Text.Length == 0)
            {
                MessageBox.Show("Please enter a user name!");
            }
            else
            {
                var urlSource = Http.GetHTTP(String.Format("http://www.mmowned.com/forums/members/{0}.html", txtUserName.Text));
                if (Rep(urlSource) != null)
                {
                    lblUserRep.Text = String.Format("{0} has {1}", txtUserName.Text, Rep(urlSource));
                }
                else
                {
                    lblUserRep.Text = String.Format("User {0} does not exist!", txtUserName.Text);
                    MessageBox.Show("Invalid username!");
                }
            }
        }
Now, if our Rep method returns null, we'll get a message box, and our label will show that the user in question does not exist!

All done! You can use this method to do a lot of other types of web parsing as well.

This is almost the same method I use in the Account Check Aisle Four program.

Enjoy folks!

Edit: Source is below. (Written in Visual Studio 2008 Team Suite and .NET 3.5. If you have problems with it, too bad. I'm not re-doing it in another IDE or .NET version.)
[Only registered and activated users can see links. ]


Edit2: Since I know someone will come complaining, this tutorial does NOT touch on thread invoking to stop the GUI from freezing while the web request method is called. That's beyond the scope of this tutorial, and will be handled elsewhere, or google'd.



VB skills is an oxymoron. - Cypher

Last edited by Apoc; 04-14-2008 at 04:57 PM..
Reply With Quote

Donate to remove ads.
(#2)
Old
2dgreengiant's Avatar
2dgreengiant is Offline
Ban Proton Mass Charger

Rep Power: 10
Reputation: 1041
2dgreengiant has much to be proud of2dgreengiant has much to be proud of2dgreengiant has much to be proud of2dgreengiant has much to be proud of2dgreengiant has much to be proud of2dgreengiant has much to be proud of2dgreengiant has much to be proud of2dgreengiant has much to be proud of
 
Posts: 4,094
Join Date: Feb 2007
Location: MMowned
04-14-2008

gawd just what i needed :P
Reply With Quote
(#3)
Old
-Lex's Avatar
-Lex is Offline
Contributor
Rep Power: 2
Reputation: 85
-Lex will become famous soon enough
 
Posts: 974
Join Date: Jun 2007
Location: Paradise City
04-14-2008

cool ........



Reply With Quote
(#4)
Old
Yeti is Offline
Banned
Rep Power: 0
Reputation: 181
Yeti has a spectacular aura aboutYeti has a spectacular aura about
 
Posts: 655
Join Date: Feb 2008
Location: Winterspring..mainly
04-14-2008

apoc this is awesome!
thank you for commenting the code too!
Reply With Quote
(#5)
Old
slack7219 is Offline
Site n00b.. (A leecher if I've been here for more than a month and can't earn 5 rep)
Rep Power: 1
Reputation: 4
slack7219 is an unknown quantity at this point
 
Posts: 8
Join Date: Feb 2008
07-08-2008

you don't invoke a thread,you just run that method on a different thread,if you have something to modify to the controls on the form you use a delegate and control.invoke it.
Reply With Quote
(#6)
Old
Caroe's Avatar
Caroe is Offline
Master Sergeant
Rep Power: 2
Reputation: 14
Caroe is on a distinguished road
 
Posts: 73
Join Date: Jul 2007
Location: H4x0r W0r|d
07-09-2008

Awesome
Reply With Quote
(#7)
Old
Apoc's Avatar
Apoc is Offline
c|_| My care cup is empty
Legendary User
Rep Power: 5
Reputation: 688
Apoc is a splendid one to beholdApoc is a splendid one to beholdApoc is a splendid one to beholdApoc is a splendid one to beholdApoc is a splendid one to beholdApoc is a splendid one to behold
 
Posts: 734
Join Date: Jan 2008
07-09-2008

Quote:
Originally Posted by slack7219 View Post
you don't invoke a thread,you just run that method on a different thread,if you have something to modify to the controls on the form you use a delegate and control.invoke it.
That's invoking a thread. -_-

And I'm fully aware of how to use cross thread calls. (I usually use AsyncCallback for these types of things) But hey, whatever floats your boat.


VB skills is an oxymoron. - Cypher
Reply With Quote
(#8)
Old
slack7219 is Offline
Site n00b.. (A leecher if I've been here for more than a month and can't earn 5 rep)
Rep Power: 1
Reputation: 4
slack7219 is an unknown quantity at this point
 
Posts: 8
Join Date: Feb 2008
07-10-2008

you may use an async call but there may be cases where you would like your thread to wait for those updates to the ui thread for god knows whatever reason.on a side note, the BackgroundWorker class is a nice little helper that could be used here ,it does the same thing as a normal thread would but with some bonuses
Reply With Quote
(#9)
Old
Apoc's Avatar
Apoc is Offline
c|_| My care cup is empty
Legendary User
Rep Power: 5
Reputation: 688
Apoc is a splendid one to beholdApoc is a splendid one to beholdApoc is a splendid one to beholdApoc is a splendid one to beholdApoc is a splendid one to beholdApoc is a splendid one to behold
 
Posts: 734
Join Date: Jan 2008
07-10-2008

Quote:
Originally Posted by slack7219 View Post
you may use an async call but there may be cases where you would like your thread to wait for those updates to the ui thread for god knows whatever reason.on a side note, the BackgroundWorker class is a nice little helper that could be used here ,it does the same thing as a normal thread would but with some bonuses
And drawbacks on high priority asynch threads. (Which a BackgroundWorker is not)

If you want to talk threading, please make a new thread in these forums. (Please excuse the pun)


VB skills is an oxymoron. - Cypher
Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are On



Powered by vBulletin® Version 3.7.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vBulletin Skin developed by: vBStyles.com


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366