Jump to content

Database Question


Recommended Posts

I am currently working on a project that requires me to pull a large amount of information information directly off a website. All the info is in a standard format, but it's not delimited into a table or anything. I was wondering, does anyone know of a program or other method to efficiently catalog this information for me into a table?

Link to comment
Share on other sites

Well, it all depends on the source and the amount of data you are trying to collect. The easiest way to gather the data if they exist, would be an external API made by the site for the specific purpose of gathering the data. If that fails you could use any number of different programming or scripting languages to automate the proccess of accessing and parsing the web pages and formatting the output file (Not sure what, if any programming languages you know, so I will just suggest going to google to search for how you can parse html with the programming language of your choice) If it is a somewhat small number of pieces of data, you may save time by just doing it manually.

Luna_pirate_signature.png

Thanks to DrCue at DeviantArt for the signature source

Link to comment
Share on other sites

I need the database for a college schedule builder I'm working on. The website I need to pull from can be found here: http://www.registrar.fas.harvard.edu/fasro/courses/index.jsp?cat=ugrad&subcat=courses (click on each individual department name for the list). Ideally, it would pull down the description, term offered, and other relevant information as listed. No API is available to me.

Link to comment
Share on other sites

What programming language are you using? Basically you should use an xml parser to grab this data....so for C# for instance you could use LINQ to XML...

polvCwJ.gif
"It's not a rest for me, it's a rest for the weights." - Dom Mazzetti

Link to comment
Share on other sites

I was thinking Java, since I'm proficient in it, but I'm still at a fairly early stage of planning at this point. Note that I don't need the system to dynamically update - the course catalog is only altered infrequently, so manual updates should not be problematic.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.