Generate xml sitemaps with php directly from the database of your site

2007-11-12

A simple php solution that can automatically generate your website xml sitemap periodically from your database

Most websites content these days is stored in databases and for this reason, I think it is better to generate your xml sitemap directly from the database.

For sites with small page numbers that don't use databases I think xml sitemaps are useless because of their small content. So, the indexing approach is a somewhat obsolete.

The purpose of sitemaps is to ensure that all your website pages are indexed and this helps especially for big sites that have a lot of changing content and is hard for spiders to keep up with page changes. Sitemaps help spiders find and index otherwise unreachable pages from your site.

As I said earlier, it is better to generate your website from the database instead of indexing it. It is faster and easier to generate such a sitemap.

My php class that generates sitemaps from databases uses adodb, so it can support many databases. It even pings google.com, yahoo.com and ask.com when your sitemap changes. Just configure it, add a cron job to run it at a regular period to regenerate your sitemap and watch how your site is indexed better.

And now an example on how I use xml sitemap on my blog

require_once ('sitemap.inc' );

define ( 'DB_TYPE'  , 'mysqli' ) ;
define ( 'DB_SERVER' 	, '' ) ;
define ( 'DB_NAME', 'codeassembly' ) ;

define ( 'DB_USERNAME'	, 'root' ) ;
define ( 'DB_PASSWORD'	, '' ) ;


//if the url_limit is reached then the remaining links will be written to a new file
//google has a 50.000 link limit for a sitemap file, so this is set for compatibility
define('url_limit', 50000);


//this is the character that will replace all white space from the links, if you don't want this just put a space as the character
define('white_space', '-');

//if true the sitemaps are compressed, this speeds up sitemap downloading and reduces bandwidth usage
define('sitemap_compress', true);

//initiate a new sitemap object
$sitemap = new sitemap('sitemap','http://www.codeassembly.com');

//add a table from which to generate the links

//sql query to extract data used to generate the links to the posts
//url format, %s will be replaced with title for each record from the database
//database field name that will be used to replace the "%s" in the url
//page change interval can be one of these : always, hourly, daily, weekly, monthly, yearly, never
//importance of the pages, this is usefull if you have other tables from which to generate links and you want to them to have different importance, the scale if from 0.1 to 1
//update date, you can enter a fixed date here using date('Y-m-d') or you can specify a database field that contains the page modification date

$sitemap -> addtable(
'SELECT title,update_date FROM posts ORDER BY update_date DESC', 
'http://www.codeassembly.com/%s/', 
array('title'), 
'weekly', 
'0.5', 
'update_date' 
); 

//this is useless because there is a small number of pages and they will get indexed without the help of the sitemap
//but I use it as an example to show you is possible to have mutiple tables from which to generate links
$sitemap -> addtable(
'SELECT name FROM categories ORDER BY id DESC',
'http://www.codeassembly.com/%s/',
array('name'),
'weekly',
'0.2',
date('Y-m-d')
);

//generate the sitemap
echo $sitemap -> process();

//ping search engines
$sitemap -> ping();
// you can email the results returned by this function or you can log them to a file

A sample from the generated sitemap code

<urlset
      xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
            http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
  <loc>http://www.codeassembly.com/A-different-approach-to-page-caching/</loc>
  <changefreq>weekly</changefreq>
  <priority>0.5</priority>
  <lastmod>2007-09-16</lastmod>
</url>
<url>
  <loc>http://www.codeassembly.com/Implementing-basic-searching-for-your-website/</loc>
  <changefreq>weekly</changefreq>
  <priority>0.5</priority>
  <lastmod>2007-09-11</lastmod>
</url>
<url>
  <loc>http://www.codeassembly.com/CSS/</loc>
  <changefreq>weekly</changefreq>
  <priority>0.2</priority>
  <lastmod>2007-11-13</lastmod>
</url><url>
  <loc>http://www.codeassembly.com/Javascript/</loc>
  <changefreq>weekly</changefreq>
  <priority>0.2</priority>
  <lastmod>2007-11-13</lastmod>
</url><url>
  <loc>http://www.codeassembly.com/Mysql/</loc>
  <changefreq>weekly</changefreq>
  <priority>0.2</priority>
  <lastmod>2007-11-13</lastmod>
</url><url>
  <loc>http://www.codeassembly.com/Php/</loc>
  <changefreq>weekly</changefreq>
  <priority>0.2</priority>
  <lastmod>2007-11-13</lastmod>
</url>
</urlset>

For more information about sitemaps specifications check the xml sitemaps protocol

Download the php xml sitemap generator class and all files from this example

Tip 1 You can put Sitemap: sitemap.xml in your robots.txt, so that search engines that support xml sitemaps can auto discover your xml sitemap

Tip 2 Just add a cron job to run the php script periodically to keep your sitemap up to date

Share this with the world

Related

Comments

David Bradley

That looks far too difficult for my puny php brain. Much simpler (as I'm using Wordpress) is to use the sitemap plugin, which does it all with a couple of clicks.

db

Posted on 2007-11-13 02:22:19
Agus Halim

Agree with David, the code is too complex
i've got error message
Parse error: parse error, unexpected T_RETURN, expecting T_OLD_FUNCTION or T_FUNCTION or T_VAR or '}' in sitemap.inc on line 274 and already chmod to 777

Posted on 2008-03-21 14:02:43
miCRoSCoPiC^eaRthLinG

I got the same error too.. i.e. "Parse error: parse error, unexpected T_RETURN, expecting T_OLD_FUNCTION or T_FUNCTION or T_VAR or '}' in sitemap.inc on line 274".

If you look into the code, you'll find a misplaced "return" statement right before the closing brace '}' of the class. Commenting this out will make it work. Most likely the author didn't attach a tested copy of the class here.

After fixing the error above, I encountered a second one. The ping() function makes several calls to the file_get_contents() method, but on line 132, the same is written as contents() causing php to throw-up an error and halt. Changing that should help too.

Moreover, the URLs http://submissions.ask.com/ping and http://search.yahooapis.com/SiteExplorerService/V1/ping throw 403 Forbidden errors. Most likely the ping addresses have changed / and or require some sort of a login now. It'd be wise to enclose this section in a try-catch block so as to not break your code.

Cheers,
m^e

Posted on 2008-05-01 14:23:10
miCRoSCoPiC^eaRthLinG

Forgot to thank the author :D Apart from those minor glitches this class works just fine. I got it integrated into my custom CMS in no time at all :)

Cheers,
m^e

Posted on 2008-01-10 05:21:21
miCRoSCoPiC^eaRthLinG

Got a question here for the author.

Scenario
---------
I'm using .htaccess to redirect all page requests to my script, which then parses the pretty URLs and loads the correct page. Now some of these pretty URLs are generated out of concatenating 2 different fields, i.e. a 6 digit ID and a content title.

The statement goes like...
-----------------------------------
SELECT CONCAT( id, '-', title_nicename ) AS content_title, list_date AS update_date FROM tbl_content, tbl_content_meta WHERE tbl_content.id = tbl_content_meta.id ORDER BY tbl_content.list_date DESC;
-----------------------------------

As you can see, the concatenated result is return as a pseudo field named content_title, which isn't physically present in any of the tables.

When I pass this statement to your sitemap class, I keep getting this error message:
-----------------------------------
Fatal error: Call to a member function FetchRow() on a non-object in E:\\xampp\\htdocs\\site\\core\\classes\\sitemap\\sitemap.php on line 185
-----------------------------------

In contrast, generating sitemap off hard-coded field names presents no problem at all. So the reason must be that the sitemap class is running into some difficulties reading the value out of this dynamically generated field.

Any ideas how to get this thing working?

Thanks,
m^e

Posted on 2008-03-21 14:02:43
CodeAssembly

Does your query return any rows ? The FetchRow() function is within ADODB so it can be an ADODB bug, try updating ADODB.
Also try testing your query and see if it runs fine using phpmyadmin or another tool.

Posted on 2008-03-08 05:01:50
miCRoSCoPiC^eaRthLinG

Hey there... yep, I believe it's a ADODB bug. I traced the FetchRow() function to ADODB... but couldn't locate the cause. And yes, the query does return rows. My knowledge of MySQL statements fall in the "intermediate" region... so to avoid any problems while using them through PHP, I always make sure the statement is wholly functional by testing them out first through phpMyAdmin and SQLYog. There's nothing wrong with the statement itself.. anyway, I'll try with the latest ADODB release and get back to you if it works out...

Thanks,
m^e

Posted on 2008-03-21 14:02:43
Jeff

After fixing the errors that m^e noted above I still couldn't get the script to work. ADODB is good but it appears there are errors still in the script.

Anyone else have luck?

Posted on 2008-05-29 11:43:35

Make yourself heard

Categories

Subscribe

All Posts

Php posts

All Comments

This post comments

© Copyright CodeAssembly

All code is licensed under LGPL, unless otherwise noted