WordPress 2.6 and Captions

Last week, The Open Planning Project premiered a new blog called Gotham Schools.  As a blogging platform, we decided to give the newly released WordPress 2.6 a go.  The one feature WordPress added that excited us the most was the ability to add captions to images.  Soon, however, we realized that WordPress/TinyMCE strips any html that is added to a caption.

I set out to fix this, and with the help of a couple of coworkers, I think I got something that works (change set). After the jump I have an overview of the problem and how it was fixed.

When you insert an image with a caption into a blog post in WordPress 2.6, markup like this is generated and put into the “Visual” tab of your TinyMCE editor:

<div class="mceTemp">
  <dl id="attachment_104" class="wp-caption alignnone" style="width: 310px;">
    <dt class="wp-caption-dt">
       <a mce_href="http://host/image.jpg" href="http://host/image.jpg">
          <img class="size-medium wp-image-104" width="300" height="240" alt="Test Caption"
               mce_src="http://host/imagethumb.jpg" src="http://host/imagethumb.jpg"
               title="img00005"/>
       </a>
    </dt>
    <dd class="wp-caption-dd">
      Test Caption
    </dd>
  </dl>
</div>

When you click on the “HTML” tab of the TinyMCE editor, Javascript converts this code to something like this:

〈caption id="attachment_104" align="alignnone" width="300" caption="Test Caption"〉<a
href="http://host/image.jpg"><img class="size-medium wp-image-104" title="img00005"
src="http://host/image.jpg" alt="Test Caption" width="300" height="240" /></a>〈/caption〉

WordPress calls the latter of these a “short code”. It is the short code itself stored in the database, not the markup. When you switch back and forth between the “Visual” and “HTML” tab, WordPress converts to and from the short code.

The first change we have to make to allow markup in captions in the file: /wp-includes/js/tinymce/plugins/wpeditimage/editor_plugin.js in the function “_get_shcode”.  This is the Javascript function that turns markup into shortcode.  Lets look at line 120 in this file:

cap = cap.replace(/<\S[^<>]*>/gi, '').replace(/'/g, '&#39;').replace(/"/g, '&quot;');

The first “replace” in this line of code gets rid of anything that is within a left than sign (<) and a greater than sign (>).  The second and third “replace” turn single quote (‘), and double quotes (“) into their respective HTML entities.  These three replaces together effectively remove all markup from our caption.  For this reason we have to comment it out, and replace it with this line:

cap = cap.replace(/(['"])/g, '\\$1');

This line escapes all of the single and double quotes in our caption (it turns ‘ into \’ and ” into \”).  Later on in this post, it will be obvious why we have to do this.

The next change we have to make is in line 91 of the same file.  This line is in the “_do_shcode” function.  This function is responsible for turning the short code into the markup described earlier.  Line 91 is as follows:

b = b.replace(/\\'|\\&#39;|\\&#039;/g, '&#39;').replace(/\\"|\\&quot;/g, '&quot;');

This line basically un-does the escaping that we just added in.   Lets comment it out.  The reason we want to leave the escaping in is because WordPress has to be certain where a short code attributes ends.  For example if we had:

〈caption id="test" caption="test <a href="test.com">Test</a>"〉

WordPress would think that the caption was “test <a href=”.  However if we escape our quotes, we have something like this:

〈caption id="test" caption="test <a href=\"test.com\">Test</a>"〉

If we tell WordPress to ignore \” and \’ and only look for ” and ‘ when looking for the end of a short code, it can be certain where the caption attribute ends.  So lets make that change to the second half of line 94 of the same file.  Change this:

cap = b.match(/caption=['"]([^'"]+)/i);

to this:

cap = b.match(/caption=['"]((?:[^'"\\]+(?:\\[\\'"])*)+)/i);

This change allows the caption to have back slashed single and double quotes in it.  Now that we have the caption, with its markup intact, we can get rid of the backslashes before the shortcode is turned back into markup.  To do this lets add this line following line (see the changeset for exact location):

cap = cap.replace(/\\(["'])/g, "$1");

Ok, now we are done with fixing the TinyMCE editor, we now have to make sure that when the WordPress backend converts short codes to markup, it know how to deal with back slashed quotes.  To do this lets change line 212 of wp-includes/shortcodes.php to:

$pattern = '/(\w+)\s*=\s*"((?:\\\\.|[^"])*)"(?:\s|$)|(\w+)\s*=\s*\'([^\']*)\'(?:\s|$)|
(\w+)\s*=\s*([^\s\'"]+)(?:\s|$)|"([^"]*)"(?:\s|$)|(\S+)(?:\s|$)/';

We are almost done.  The last problem is that when we upload and insert and image into TinyMCE, it assumes that the “alt” attribute of the image tag and the “caption” attribute of the caption short code will be the same.  The problem is that we probably do NOT want markup in the alt attributes of our image tags.  How do we lets fix this? 

When you insert an image with a caption into a WordPress blog post, the following function calles are made:

“image_media_send_to_editor” which calls “get_image_send_to_editor” which applies the filter “image_send_to_editor” which calls “image_add_caption”.

If we look at line 591 of wp-admin/includes/media.php (where “image_media_send_to_editor” calls “get_image_send_to_editor”) we see that the image “alt” text is passed through. The “alt” text is locally referred to as the the ‘post_excerpt’ field in the attachment array:

return get_image_send_to_editor($attachment_id, $attachment['post_excerpt'], 
                                                  $attachment['post_title'], $align, $url, $rel,
                                                  $size);

Instead of just passing in “alt” text, lets send a HTML stripped version of the text as well:

$stripped = strip_tags($attachment['post_excerpt']); 
return get_image_send_to_editor($attachment_id, $attachment['post_excerpt'], $stripped,
                                                  $attachment['post_title'], $align, $url, $rel,
                                                  $size); 

This way we can use the unstripped version for the “caption” and the stripped version for the “alt” text. Since we are now passing in a new variable into “get_image_send_to_editor” we have to alter its function definition in line 54 to:

function get_image_send_to_editor($id, $caption, $alt, $title, $align, $url='', $rel = false,
                                                     $size='medium') { 

This new $caption variable should be be now passed along to “image_send_to_editor” in line 63:

$html = apply_filters( 'image_send_to_editor', $html, $id, $caption,  $alt, $title, $align, $url, 
                                          $size ); 

And finally passed to “image_add_caption” in line 68:

function image_add_caption( $html, $id, $caption, $alt, $title, $align, $url, $size ) { 

This function “image_add_caption” generates the image caption short code on line 81:

 $shcode = '〈caption id="' . $id . '" align="align' . $align 
                   . '" width="' . $width . '" caption="' . $alt . '"〉' . $html . '〈/caption〉';

Instead of using the $alt variable for our caption, lets use our newly created $caption variable like this:

$shcode = '〈caption id="' . $id . '" align="align' . $align 
                . '" width="' . $width . '" caption="' . $caption . '"〉' . $html . '〈/caption〉'; 

Great! We are finally done. We can now have markup in our Wodpress captions. Again, all the changes described on this page can be succinctly found here: change set

4 thoughts on “WordPress 2.6 and Captions”

  1. @Max: My guess is no. Looking at the source code it seems like the WordPress guys intentionally are stripping out markup from captions.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.