Skip to content
View in the app

A better way to browse. Learn more.

Tip.It Forum

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Script I made: Scrape the entire High scores

Featured Replies

Hey, with the discussion about all the bots voting on the wilderness poll going on, I thought I would make my own version of a high scores scraper.

I'm a person who loves dealing with lots of data, so I decided I would download the 2 million users and their stats from the overall high scores.

Sadly, I realized this wouldn't work after about 3 hours. Because I'm having to use proxies to get the next users from the high scores, this script runs really slowly. After 2 hours, I only had 3.5k users in my database.

Either way, I thought it would be nice to show you my code.

#!/usr/bin/perl
require LWP::UserAgent;
use Parallel::ForkManager;
use DBI();
$pm = new Parallel::ForkManager(15);

sub get_page {
my ($page,$proxy) = @_;
my $ua = LWP::UserAgent->new;
$ua->timeout(10);
$ua->agent('Mozilla/5.0');
$ua->env_proxy;
if ($proxy) {
    $ua->proxy(['http', 'ftp'], "http://".$proxy);
}
my $response = $ua->get($page);

if ($response->is_success) {
    return $response->decoded_content;  # or whatever
}
else {
   return 0;
}
}
sub get_user_stats {
   my ($user) = @_;
   my $pagedata = get_page("http://hiscore.runescape.com/index_lite.ws?player=$user");
   my @group = split /\n/,$pagedata;
   my $i = 0;
   my $insert_values = '';
   foreach (@group) {
       if ($i<=25) {
           my ($rank, $level, $exp) = split /,/;
           if ($rank == -1) {
               $level = 1;
               $exp = 0;
           }
           $insert_values .= "$level,$exp,";
       }
       else {
           my ($rank, $score) = split /,/;
           if ($rank == -1) {$score = 0;}
           $insert_values .= "$score,"
       }
       $i++;
   }
   $insert_values =~ s/,$//;
   my $insert = "INSERT INTO highscores VALUES('$user',$insert_values)";
   do_query($insert);

}
sub do_query {
   my ($query) = @_;
   $dbh = DBI->connect("DBI:mysql:database=runescape_stats;host=localhost", 'trent', 'password');
   $dbh->do($query);
   $dbh->disconnect();

}
do_query("truncate table highscores;");
#Step 1: Get a list of proxies.
my @proxies = ();
for (my $i = 1;$i<=15;$i++){
   $i = sprintf('%02s',$i);
   $page_data = get_page("http://www.samair.ru/proxy/proxy-$i.htm");
   while ($page_data =~ m/<td>(\d+\.\d+\.\d+\.\d+)<script/g){
       push @proxies, $1;
   }
}
$pagedata = get_page("http://services.runescape.com/m=hiscore/overall.ws?table=0&category_type=0");
$pagedata =~ m/<a href="hiscorepersonal\.ws\?user1=(.+?)">\1<\/a>/;
$user = $1;
$page = 1;
get_user_stats($user);
while ($user) {
   $pagedata = '';
   $attempts = 1;
   while (!$pagedata && $#proxies){
       print "Attempt $attempts of page $page\n";
       my $rand_key = int(rand($#proxies));
       $pagedata = get_page("http://services.runescape.com/m=hiscore/overall.ws?table=0&category_type=0&user=$user",$proxies[$rand_key]);
       if (!$pagedata) {
           print "Removing proxy ".$proxies[$rand_key]."\n";
           splice(@array, $rand_key, 1);
           $attempts++;
       }
       else {
           print "Got pagedata for page $page\n";
           my ($used_users,$new_users) = split (/<a style="color:#F3C334;"/,$pagedata);
           if (!$new_users) {
               print "Removing proxy ".$proxies[$rand_key]." for spamming me...\n";
               splice(@array, $rand_key, 1);
               $attempts++;
           }
           else {
               $user = '';
               while ($new_users =~ m/<a href="hiscorepersonal\.ws\?user1=(.+?)">\1<\/a>/g){
                  $user = $1;
                  my $pid = $pm->start and next;
                  print "Getting stats for $user\n";
                  get_user_stats($user);
                  $pm->finish;
               }
           }
       }
   }
   $page++;
}



~M

Thats pretty cool. I am no good perl or php, how are you getting the next user to grab the stats from. I'm going to try this out in C# :P

  • Author

Thats pretty cool. I am no good perl or php, how are you getting the next user to grab the stats from. I'm going to try this out in C# :P

Simple regexes. The first thing you'll notice that I do is grab the first user on the list and reload the list with that user selected. That user is highlighted with a different font colour than the rest of the users, effectivly splitting the list into two sections, one that i've already processed, and one that I haven't.

Then I do a global regex search using /<a href="hiscorepersonal\.ws\?user1=(.+?)">\1<\/a>/ on the list of users I haven't processed and loop through the results.

~M

Okay I am just grabbing the users and adding them to a database at this point, but basically with my code which is below I can process 1000 users a minute meaning its

going to take 33 hours to get all of the users :P And it will probobly double when I have to grab each members stats, Im stoked to test this out.

 

Runs on a seperate thread gives me the total amount of usernames processed and the time elapsed.

 

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.Threading;

namespace RSGrab
{
   public partial class Form1 : Form
   {
       int CurrentPageList = 1;
       int usernamec = 0;
       System.Diagnostics.Stopwatch wat = new System.Diagnostics.Stopwatch();
       public delegate void UpdateForm(string text);
       public Form1()
       {
           InitializeComponent();
       }

       private void kryptonButton1_Click(object sender, EventArgs e)
       {

           wat.Start();
           timer1.Start();
           Thread ms = new Thread(MainStuff);
           ms.Start();


       }
       private void MainStuff()
       {
           DateTime eventtime1 = DateTime.Now;
           while (CurrentPageList < 1000000)
           {
               string aData = getPageSource(getCurrentUrl()); //Grab the source
               aData = StripTagsCharArray(aData); //Gets Rid of <html> tags
               aData.Trim(); //Trim all whitespaces beginning end end
               string[] bData = getActualData(aData); //Takes only the data needed
               //Now that I have the data I need I will parse it for each person
               int count = 0;
               int line = 1;
               while (count < 88)
               {
                   string name = ""; //Account Name - Only grabbing the Username because I will grab the
                   //entire highscores for each member.
                   if (line == 1)
                   {
                       count++;
                       line++;
                   }
                   if (line == 2)
                   {
                       name = bData[count];
                       count++;
                       line++;
                   }
                   if (line == 3)
                   {
                       count++;
                       line++;
                   }
                   if (line == 4)
                   {
                       ListBox1.Invoke(new UpdateForm(this.AddUserNameXml), new object[] { name });
                       count++;
                       line = 1;
                   }
               }

               CurrentPageList = CurrentPageList + 22;
           }
           DateTime eventtime2 = DateTime.Now;
           TimeSpan elapsed = eventtime1 - eventtime2;
           MessageBox.Show(elapsed.ToString());
       }

       private void AddUserNameXml(string username)
       {
           usernamec++;
           label2.Text = usernamec.ToString();
       }
       private string[] getActualData(string data)
       {
           string[] aData = data.Split('\r', '\n');
           string bData = "";
           int count = 0;
           while (count < aData.Length)
           {
               if (aData[count].Trim() != "")
               {
                   bData += aData[count].ToString() + "%";
               }
               count++;
           }
           bData.Trim();
           string[] cData = bData.Split('%');
           int newcount = 122;
           int newcount2 = 0;
           string[] final = new string[89];
           while (newcount <= 209)
           {
               final[newcount2] = cData[newcount];
               newcount++;
               newcount2++;
           }
           return final;

       }
       private string getCurrentUrl()
       {
           string url = "http://services.runescape.com/m=hiscore/overall.ws?rank=" + CurrentPageList.ToString() + "&table=0&scroll=true&category_type=0";
           return url;
       }
       private string getPageSource(string url)
       {
           System.Net.WebClient wb = new System.Net.WebClient();
           string strSource = wb.DownloadString(url);
           wb.Dispose();
           return strSource;
       }
       private static string StripTagsCharArray(string source)
       {
           char[] array = new char[source.Length];
           int arrayIndex = 0;
           bool inside = false;

           for (int i = 0; i < source.Length; i++)
           {
               char let = source[i];
               if (let == '<')
               {
                   inside = true;
                   continue;
               }
               if (let == '>')
               {
                   inside = false;
                   continue;
               }
               if (!inside)
               {
                   array[arrayIndex] = let;
                   arrayIndex++;
               }
           }
           return new string(array, 0, arrayIndex);
       }

       private void timer1_Tick(object sender, EventArgs e)
       {
           label4.Text = Math.Round((decimal)wat.Elapsed.TotalMinutes, 2).ToString();
       }


   }
}

  • Author

Okay I am just grabbing the users and adding them to a database at this point, but basically with my code which is below I can process 1000 users a minute meaning its

going to take 33 hours to get all of the users :P And it will probobly double when I have to grab each members stats, Im stoked to test this out.

 

Runs on a seperate thread gives me the total amount of usernames processed and the time elapsed.

How are you avoiding being banned from accessing the high scores list? Last time I tried to do this, I got banned from accessing the page(not the lite high scores).

~M

its not actually grabbing the highscores yet just grabbing each user. from this

http://services.runescape.com/m=hiscore/overall.ws?rank=1&table=0&scroll=true&category_type=0 every 22 usernames that are processed a global variable which starts at 1 goes up by 22 and the

link changes respectively: "http://services.runescape.com/m=hiscore/overall.ws?rank=" + globalvariable + "&table=0&scroll=true&category_type=0" which downloads the next source. About the banning, I dont no I haven't been yet anyway. What I am going to do is download each username into an xml database a new one every 100k which will end me up with 20 of these 100k databases. Then my new program will be multithreaded (20 threads) which will grab the highscores similtaniously hopefully making this faster.

 

I made a mistake when I said it will double when I access the highscores it will probobly slow down by *22 times :( and the final database should be around 3gigs

  • Author

its not actually grabbing the highscores yet just grabbing each user. from this

http://services.runescape.com/m=hiscore/overall.ws?rank=1&table=0&scroll=true&category_type=0 every 22 usernames that are processed a global variable which starts at 1 goes up by 22 and the

link changes respectively: "http://services.runescape.com/m=hiscore/overall.ws?rank=" + globalvariable + "&table=0&scroll=true&category_type=0" which downloads the next source. About the banning, I dont no I haven't been yet anyway. What I am going to do is download each username into an xml database a new one every 100k which will end me up with 20 of these 100k databases. Then my new program will be multithreaded (20 threads) which will grab the highscores similtaniously hopefully making this faster.

 

I made a mistake when I said it will double when I access the highscores it will probobly slow down by *22 times :( and the final database should be around 3gigs

Weird that you aren't getting banned. I just did a test, and I was only able to get about 20k users before they banned me and suggested that I use the lite high scores.

~M

couldnt you continue from when you left off? Give 10 minutes then restart but at the point you left off?

Ya my program kicks off at 21450 users ill just wait a while and restart from there.

  • Author

couldnt you continue from when you left off? Give 10 minutes then restart but at the point you left off?

Yeah, I realized that I could just make it sleep for a while if it doesn't find any users. But, I'm on another project now, so that'll happen some other time.

~M

Awesome, thanks for giving me something to do anyway.

You could try going with users in each of the skill lists, I'd bet you'd get closer to 3-4 million unique users than just the top 2 million skill totals will provide. Of course you'd have to merge your lists, which may or may not be a problem.

99 dungeoneering achieved, thanks to everyone that celebrated with me!

 

♪♪ Don't interrupt me as I struggle to complete this thought
Have some respect for someone more forgetful than yourself ♪♪

♪♪ And I'm not done
And I won't be till my head falls off ♪♪

.

  • Author

Of course you'd have to merge your lists, which may or may not be a problem.

If you work it down to a text list of names for each skill, GNU sort's -u flag removes duplicate values and -o lets you specify an output file.

sort -u [list1] [list2] [list3]... -o mergedlist

 

I don't foresee any problem with mpm's database, just add all the users and:

SELECT DISTINCT username FROM highscores;

(or some such)

 

 

 

   $dbh = DBI->connect("DBI:mysql:database=runescape_stats;host=localhost", 'trent', 'password');

I'm on another project

Is it a password generator, by chance? :D

I've made a million password generators :P

 

Edit: Ohhh, I just got what you meant. That's not my real password :P

~M

Create an account or sign in to comment

Important Information

By using this site, you agree to our Terms of Use.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.