🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Creating A BBCode System

posted in Rarely Spoken
Published September 26, 2005
Advertisement
I havent been coding too much lately. School just burns every desire I have to work on my own projects unfortunately. This piece of code is actually one of the last things I worked on, finishing it last month.

Creating a BBCode System In PHP


Earlier posts were about the creation of a CMS and blogging system using PHP. One of the original goals of the CMS was to output well-formed XHTML and one of the requirements for the blog was to allow user comments. Unfortunately these two ideals can come into opposition: on the one hand I want to give my visitors a way of expressing themselves in their posts by creating links, defining emphasis, etc. but on the other I need to make sure that the code that they input does not break my page's well-formedness. To rectify this problem I've decided to disable XHTML input entirely and have instead replaced it with a BBCode-like system which I have called Blog Code.

Blog Code Syntax


For the sake of familiarity I've decided that Blog Code syntax would be similar to HTML syntax. However, instead of delimiting tags with < and > I will use [ and ] to match other BBCode systems.

Blog Code will comprise of different tags which all have zero or more attributes and some type of content. A sample tag to display an image might look like

[img alt="This is my alt text"]http://www.example.com/img.png[/img]


where 'alt' is an attribute with a value of "This is my alt text" and 'http://www.example.com/img.png' is the content of the tag.

Further, because I would like to automatically encapsulate my paragraphs in XHTML's p tag I need to define which tags can exist in a paragraph. This gives me two types of tags: inline tags, which can exist in a paragraph, and block tags, which can not exist in a paragraph.

Implementation


In order to help me manage the different types of tags that exist I've created a class, BlogCode, which will allow me to register new tags, find paragraphs, and apply all or a subset of the registered tags.

Tag Registration


When a tag is registered I simply save the name of the tag and a callback function which will handle the tag to an array within the BlogCode class.

class BlogCode {   var $taginfo = array();      // Registers a tag name with a callback to handle this tag   function RegisterCode($tagname, $callback, $display = 'inline') {      // The tag name must be just alphanumeric      if (preg_match('/^\w+$/', $tagname) == false)         return false;            $this->taginfo[$tagname] = array($callback, $display);   }}


I've restricted my tag names to be only composed of alphanumerics and before a tag can be registered its name must be validated to ensure no wacky tags make it through.

Tag Application


To actually apply the tags we must be able to search for them. I've come up with this regex to hunt down any tag

/\[(\w+)( [^\]]*)?\](.*)\[\/\\1\]/sU


This describes a tag which begins with a '[' and immediately after has one or more alphanumeric characters (the tag's name). Optionally afterward there may be a space followed by anything other than a ']' (the tag's attributes). Next there will be a ']' followed by anything and then finally ending in '[\' followed by the tag's name and ']'. The search is not greedy so it will end a tag at the first instance of its ending tag rather than the last instance of its ending tag. That is this

[em]Test test test[/em] bla bla bla test test test[/em]


Should be rendered as

Test test test bla bla bla test test test[/em]

and not as

Test test test[/em] bla bla bla test test test

In order to make a regex which will find only select tags we simply need to restrict the name of the tags. To do this I replace (\w+) with (tagname 1|tagname 2|...|).

Handling The Attributes


Notice from the above that we dont really care what is in the attributes section of each tag. However, it would be nice if we could take these attributes and convert them into a name/value pairing for easy handling. To do this we'll define a new syntax for the attributes.

Each attribute will have a name composed of only alphanumerics. The values of the attributes must all be quoted to make parsing simpler and must not contain either '"' or ']', again to make parsing simpler. This is the regex I've come up with

/(\w+)="([^"\]]+)"/


So, a valid attribute would look like

name="value"

Coding It In PHP


This is the implementation I have for tag application and attribute parsing

global $blog_code;   class BlogCode {   var $taginfo = array();         // Registers a tag name with a callback to handle this tag   function RegisterCode($tagname, $callback, $display = 'inline') {      // The tag name must be just alphanumeric      if (preg_match('/^\w+$/', $tagname) == false)         return false;            $this->taginfo[$tagname] = array($callback, $display);   }      // Apply these tags on a body of text   function ApplyCodes($str) {      return preg_replace_callback("/\[(\w+)( [^\]]*)?\](.*)\[\/\\1\]/sU",                                   'BlogCodeCallback', $str);   }      // Applies only a select subset of the tags   function ApplySelectCodes($str, $tags) {      $namelist = array_shift($tags);            foreach ($tags as $tag)         $namelist .= "|$tag";            return preg_replace_callback("/\[($namelist)( [^\]]*)?\](.*)\[\/\\1\]/sU",                                   'BlogCodeCallback', $str);   }      // Breaks up the attribute list into an associative array where the   // attribute names are the keys and the associated values are the values   function ConvertToArray($attrs) {      $attrarray = array();            preg_match_all('/(\w+)="([^"\]]+)"/', $attrs, $matches);            $keys = $matches[1];      $vals = $matches[2];            for ($i = 0; $i < count($keys); $i++)         $attrarray[$keys[$i]] = $vals[$i];            return $attrarray;   }}$blog_code = new BlogCode;// Routes each match to the callback associated with itfunction BlogCodeCallback($matches) {   global $blog_code;      $tagname = $matches[1];   $attrs   = $blog_code->ConvertToArray($matches[2]);   $content = $matches[3];      if (isset($blog_code->taginfo[$tagname])) {      $str = call_user_func($blog_code->taginfo[$tagname][0], $tagname,                            $attrs, $content);            if ($str !== false)         return $str;   }      return $matches[0];}


The new bits of code are the applicantion functions, the conversion of the attributes to an associative array, and the main tag callback.

The application functions, ApplyCodes() and ApplySelectCodes() work exactly as I described above. They search for the tags or a subset of the tags within a string. When a tag is found preg_replace_callback() calls BlogCodeCallback which grabs the tagname, attributes, and content of the tag before sending those values to the callback associated with that tag.

ConvertToArray() takes in an attribute string and searches for attribute/value pairs and turns them into an associative array.

Finally, since BlogCode is the only entity that should manage the tags I've made a global, $blog_code, which will act as the one instance for this class. This was necessary as preg_replace_callback required that BlogCodeCallback be a non-member function.

Finding Paragraphs


By far, the most difficult part of this project was finding out where paragraphs being and end. After many different tries I came up with this:

It's easy to see paragraphs being broken up by two newlines but that cannot account for block level tags. My solution was to first split the text along block level tags and then apply the simplistic view of what separates paragraphs. Finally, I paste the different sections of text back together using the block level tags that were originally between them.

Add this member to BlogCode

// Finds each paragraph and wraps it in  tagsfunction BlockParagraphs($str) {   $blocktags = array();      foreach ($this->taginfo as $tagname => $value) {      if ($value[1] != 'inline')         array_push($blocktags, $tagname);   }      $namelist = array_shift($blocktags);      foreach ($blocktags as $blocktag)      $namelist .= "|$blocktag";      // Break up the text into array elements separated by block-level tags   $pararray = preg_split("/\[($namelist)(?: [^\]]*)?\].*\[\/\\1\]/sU",                          $str);      // Capture all of the block level tags (in order)   preg_match_all("/\[($namelist)(?: [^\]]*)?\].*\[\/\\1\]/sU", $str,                  $delims);      // Ignore the names of the tags   $delims = $delims[0];      // Wrap each paragraph in the tags with p tags   $paras = preg_replace("/(.+)(?:\n\n|\r\r|\r\n\r\n|$)/sU", "\\1",                         $pararray);      // Remove empty p tags   $paras = preg_replace("/\s*<\/p>/U", '', $paras);      $output = '';      // Interleave the block-level elements into the text   foreach ($paras as $para)      $output .= $para . array_shift($delims);            // Put the rest of the block-level elements into the text   $output .= implode('', $delims);      return $output;}


Using the above code it becomes necessary to call this member on a piece of text before applying any of the tags.

The Final Code


This is my final PHP file, with some sample tag definitions at the bottom

   /************************************************************************    *    *    Title:   BlogCode Class    *    Author:  Colin Jeanne (http://colinjeanne.net)    *    Date:    August 23, 2005    *    *    Description:    *       A class that represents a BBCode-like formatter    *    ************************************************************************/      global $blog_code;      class BlogCode {      var $taginfo = array();            // Registers a tag name with a callback to handle this tag      function RegisterCode($tagname, $callback, $display = 'inline') {         // The tag name must be just alphanumeric         if (preg_match('/^\w+$/', $tagname) == false)            return false;                  $this->taginfo[$tagname] = array($callback, $display);      }            // Finds each paragraph and wraps it in  tags      function BlockParagraphs($str) {         $blocktags = array();                  foreach ($this->taginfo as $tagname => $value) {            if ($value[1] != 'inline')               array_push($blocktags, $tagname);         }                  //XXX There must be a better way to do this         $namelist = array_shift($blocktags);                  foreach ($blocktags as $blocktag)            $namelist .= "|$blocktag";                  // Break up the text into array elements separated by block-level tags         $pararray = preg_split("/\[($namelist)(?: [^\]]*)?\].*\[\/\\1\]/sU",                                $str);                  // Capture all of the block level tags (in order)         preg_match_all("/\[($namelist)(?: [^\]]*)?\].*\[\/\\1\]/sU", $str,                        $delims);                  // Ignore the names of the tags         $delims = $delims[0];                  // Wrap each paragraph in the tags with p tags         $paras = preg_replace("/(.+)(?:\n\n|\r\r|\r\n\r\n|$)/sU", "\\1",                               $pararray);                  // Remove empty p tags         $paras = preg_replace("/\s*<\/p>/U", '', $paras);                  $output = '';                  // Interleave the block-level elements into the text         foreach ($paras as $para)            $output .= $para . array_shift($delims);                  // Put the rest of the block-level elements into the text         $output .= implode('', $delims);                  return $output;      }            // Apply these tags on a body of text      function ApplyCodes($str) {         return preg_replace_callback("/\[(\w+)( [^\]]*)?\](.*)\[\/\\1\]/sU",                                      'BlogCodeCallback', $str);      }            // Applies only a select subset of the tags      function ApplySelectCodes($str, $tags) {         //XXX There must be a better way to do this         $namelist = array_shift($tags);                  foreach ($tags as $tag)            $namelist .= "|$tag";                  return preg_replace_callback("/\[($namelist)( [^\]]*)?\](.*)\[\/\\1\]/sU",                                      'BlogCodeCallback', $str);      }            // Breaks up the attribute list into an associative array where the      // attribute names are the keys and the associated values are the values      function ConvertToArray($attrs) {         $attrarray = array();                  preg_match_all('/(\w+)="([^"\]]+)"/', $attrs, $matches);                  $keys = $matches[1];         $vals = $matches[2];                  for ($i = 0; $i < count($keys); $i++)            $attrarray[$keys[$i]] = $vals[$i];                  return $attrarray;      }   }      $blog_code = new BlogCode;      // Routes each match to the callback associated with it   function BlogCodeCallback($matches) {      global $blog_code;            $tagname = $matches[1];      $attrs   = $blog_code->ConvertToArray($matches[2]);      $content = $matches[3];            if (isset($blog_code->taginfo[$tagname])) {         $str = call_user_func($blog_code->taginfo[$tagname][0], $tagname,                               $attrs, $content);                  if ($str !== false)            return $str;      }            return $matches[0];   }      function inlineCallback($tagname, $attrs, $content) {      global $blog_code;            return "<$tagname>" .               $blog_code->ApplySelectCodes($content,                                            array('strong', 'em', 'link')) .             "";   }      $blog_code->RegisterCode('strong', 'inlineCallback');   $blog_code->RegisterCode('em', 'inlineCallback');      function imgCallback($tagname, $attrs, $content) {      if (isset($attrs['alt']))         return '"' . $attrs['alt'] . '" src="' . $content . '" />';      else         return "$content\" />";   }      $blog_code->RegisterCode('img', 'imgCallback', 'block');      function linkCallback($tagname, $attrs, $content) {      global $blog_code;            if (isset($attrs['href'])) {         return '"' . $attrs['href'] . '">' .                $blog_code->ApplySelectCodes($content,                                             array('strong', 'em', 'img')) .                '';      } else {         return "$content\">$content";      }   }      $blog_code->RegisterCode('link', 'linkCallback');      function codeCallback($tagname, $attrs, $content) {      return 'class="blog_code">' . htmlentities($content) . '';   }      $blog_code->RegisterCode('code', 'codeCallback', 'block');?>
Previous Entry Yay!
Next Entry Hello There
0 likes 0 comments

Comments

Nobody has left a comment. You can be the first!
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Advertisement

Latest Entries

4E6 PyGame

1382 views

4E5

1116 views

Happy Yesterday!

1071 views

Another Game

1278 views

Merry Christmas!

1052 views

Hello There

1051 views

Yay!

1066 views
Advertisement