What is a Content Management System?

A survey of the features and functionality that define Web Content Management.

What advantages does a Web content management system offer compared to a simple, one-off relational database system for managing your content? What features should you be looking for when evaluating a Web content management system?

Content Management, Content Management System, Web Content Management

What does a Web content management system do?  And how does it differ from the common “relational database with an admin section” (RDBWAAS) that powers a lot of Web sites?  Finally, when does your site management tool graduate to the lofty and deserved title of “content management system?”

If we look at content management functionality as a continuum, there’s a graduated scale between two extremes.

  1. On the one side, you have something simple — an “articles” table with a couple of password-protected pages to update it.
  2. On the other side, you have a commercial CMS that you paid $50K for with all the bells and whistles.

Specifically, how are the two different?

Basic Data Management and Publishing Functionality

In terms of feature sets, the two models overlap fairly clearly in a number of very basic areas.

Content Modeling and Storage

This is the process by which you take the content you want to manage, and turn it into data the system can process and store.

Ironically, this is actually one place where the RDBWAAS system shines — there’s no more granular way to model content than a custom relational database. In this respect, most content management systems are shadow of what you can do with an empty database and a copy of phpMyAdmin. (So why use a CMS at all? Well, to get everything else on this list…)

On the content management side, systems vary wildly in how well they model content. Some allow you to create custom XML files, others have object-oriented relational databases, and some simple let you create your own database and just tell the system how it works.

Content Editing

At the risk of being too basic, this one is obvious — both extremes allow you to create new content, and edit and delete existing content.

We’ll also lump WYSIWYG editing under here, since the quality of the rich-text editing interface is usually completely independent and separable from the larger application. Even the most basic RDBWAAS system can sport a really advanced WYSIWYG editor without too much trouble.

Publishing and Templating

Both extremes allow you to present content at a URL for visitors to consume it.

While this seems obvious, it’s worth mentioning because there’s such a wide range of ways to do it, and there’s quite a range of how separate the two steps — templating and publishing — are. They may happen at the same instant as the content is requested (a PHP page that retrieves and formats database records, for instance), or your system may use a template to convert data into an output file that it then FTPs, file copies, or otherwise moves to a publishing location.   (Seth Gottlieb calls this "Baking vs. Frying.")

Core Content Management Functionality

Moving past the shared functionality, we get into “higher level” content management functions.  We’ll start with the absolute “core” functionality — things which most everything calling itself a “content management system” should be able to do.

Versioning

In a versioning system, when content is updated, the older version is kept. If something needs to rollback, the older version can be restored — usually to a draft state, ready to be published as a new version. Versioning is usually simple and serial, but higher-end systems can have branching, merging, and all the other goodness you normally associate with source code management systems like Subversion.

Versioning is clearly necessary for any claim of “content management.” Managing content is largely concerned with keeping it safe, and making sure old versions are recoverable is a big part of that. Remember, if you don’t version, then there’s no functional difference between the permission to update content, and the permission to delete it.

Granular User Management

Most RDBWAAS systems have binary access — there’s a password, and if you know it, you can do anything. More advanced user management allows you to put users into groups, to which you can assign specific actions (edit, update, delete, etc.) on specific content.

For a system with a sufficiently large number of authors, permissions are everything. Josh Clark of Big Medium fame says that “the #1 category of feature requests I get is how to restrict people from doing x, y and z.”

Content Organization and Relation

Content doesn't exist in a vacuum.  Often times, the ability to position content in larger organizational structures and in relation to other content is more important than modeling the content itself.  This ability to relate content items to each other is one of the most crucial functions of a CMS.

File and Image Management

In additional to textual data, content is often supported by binary files — images, PDFs, etc. A CMS needs to store these files somehow, preferably in relation to the content that uses them.

Multi-State Content

It’s awfully handy to be able to leave content in a half-finished state before publishing it, or to be able to “archive” content so it comes off the published site without actually deleting it from the repository. Even something as simple as an “active” checkbox can be a Godsend in a lot of cases.

Higher-level Content Management Functionality

Building further upon our definition, we now get into higher-level functions.  These are the competitive advantages a developed CMS offers over the RDBWAAS, and the features content management vendors use to separate their products from their competition.


Advanced Repository Services

While a simple relational database is just that, a CMS can provide some advanced services from its repository, including multiple access methods.  eZ publish, for instance, exposes a WebDAV server which Windows users can map as a drive in Windows explorer, and then drag files into it.  Even at just its default install, Alfresco offers no less than four ways to access their repository: a Web client, FTP, WebDAV, and CIFS.

Additionally, repositories can offer programmatic access through APIs -- SOAP and REST access, for instance.

Workflow or Approval Chains

Workflow can get complicated and people usually think they need much more complicated workflow capabilities than they really do. Usually they just need serial approval chains -- a la Ektron and others -- to ensure one extra person reviews and approves content before it's exposed to the public.

I read a good quote in a book once: “Workflow is the most over-purchased aspect of content management.” (I think it was Bob Boiko’s book, but with 1,200 pages, the exact citation escapes me.)

Check In/Out

Got a lot of people working on your content at the same time? If you’re able to check content out, it means no one else can touch it while you’re in it, saving you from concurrency conflicts or the “last one to the submit button wins” problem.

(Some versioning systems can approximate this. The trick is determining when the new version is generated. ez Publish generates a new version when the “edit” button is pressed, not when the edits are saved. This is crucial, because it means that even if two people hit “edit” at the same time, they each get their own new version to work with. They both save those versions independently, at which time they can work out which one actually gets published.)

Extensibility and Integration

Oftentimes you need your system to do something just a little bit extra. Higher-level systems have hooks and filters in place for you to add functionality not built in, and programmatic APIs and other ways of getting into the repository to allow you to connect and manipulate content from other systems.  Published event models allow you to write code that executes in response to user actions.

Shared User Directory

Having yet another database of users can be a real problems for some organizations, so you'll find a lot of systems that have LDAP integration (oftentimes specifically with Active Directory) or other methods of plugging into other authentication schemes to manage the user base.

Scheduled Publishing and Expiration

It’s handy to be able to schedule when content should automatically go live and when it should come down.

Ektron has a nice implementation that takes this a step further by defining what should happen when the content expires. Does it actually come off the site or does some other administrative action happen like the content getting added to an “Expired Content Report” or generating a task for someone?

Task Management and Collaboration

More and more systems are providing task management subsystems to handle the real “management” of the content — the discussions and preliminary communications that occur when teams are working on content together.

The benefit of having these systems integrated into your CMS is that you can bind discussions, tasks, and historical comments to specific content items, so they can be viewed in-context and retained for future reference regarding that content.

Image Manipulation

Some systems will allow you to store images in a “pure” form, then have them automatically manipulated by the system (usually re-sized) for delivery with content. So if you want to change the size of all the images with your news articles someday, you have the source image which the system can just re-manipulate in a new way.

This is fairly closely relation to multi-format publishing (see below). In both cases, you have a “pure” object which is published in multiple “renditions.” The “pure” version stays...pure. The renditions are more or less disposable because they can be regenerated as necessary.

Auditing

This is versioning taken a step further. A lot of organizations need to know everything single thing that has ever happened in the life of a piece of content — who created it, who has looked at it and when, who has edited it, and finally when it got delete and by whom. The more regulated the company, the more important this becomes.

In-Context Editing

This is the anti-admin section. When logged in, the system provides controls to initiate actions on content from the public-facing side of the site, so you can browse the site like a visitor, and do things without having to wade through an admin section.

The actual interface varies. Often you just see menus and hyperlinks (“Edit this Content”) that are hidden from the general public. Ektron uses a right-click context menu.  Several versions ago, Documentum Web Publisher had special HTML comments in its templates that, when viewed through a proxy page, exposed editing menus. (Why the complexity? Since the logic was built into their proxy page, they could provide the same in-context editing no matter what templating language you used, from ASP.Net to Perl.)

Membership Management

Fewer and fewer systems these days are anonymous-only. More often than not, there are membership systems in place so that visitors to the site can be “known” to the system. This enables you to allow content on a subscription basis, tune analytics, and store preferences and other settings on a per-user basis.

The more graceful implementations of this make everyone a “user,” just giving some users more permissions than others (e.g. — the ability to edit content).

Pre-Built Functionality or “Widgets”

CMS vendors push their pre-built functionality.  “What about discussion forums?” or “What about our RSS generator?” or “What about our ‘Get the Weather for your Neighborhood’ control?”

All this, and more, falls under the heading of “pre-built widgets.” There’s a lot of them, and CMS vendors build them all the time as simple solutions to common problems. Sometimes they’re good, but a lot of the time they do everything you want, except that one little thing, so you end up re-writing them yourself anyway.

The bottom line: functionality is nice, but a system will live or die on architecture. Make sure the core architecture of your system is in good shape before you start writing little widgets.

In-Context Preview

If you have pending changes, some systems will allow you to “publish” them to your session only, so you — and only you — can browse your Web site as if all staged content was published. Ektron does this nicely — you can enter “Preview Mode” where you can browse your entire Web site through “publish-it-all glasses.”

A lesser implementation of this just allows you to see pending content rendered in its corresponding template, on a piece-by-piece basis.

Integrated Analytics

Integrated analytics is often not well-developed and simply thown in as a bonus.  It's tough to find any package built in to a CMS that provides even 10% of the functionality that Google Analytics provides. Still, a lot of people want something simple and integrated with their CMS so they can view analytics from their admin interface.

Search

When you say “search,” you usually think full-text search. However, a core piece of architecture is the ability to fetch content from the repository, sometimes with finely-grained criteria (think a SELECT statement in SQL with lots of WHERE clauses). Some systems are better than others.

One of the problems with integrated search is (1) the search is from the CMS perspective, rather than from the user’s perspective; and (2) you often want to search things contained outside the CMS.  For these reasons, Web site search is often accomplished via an external system, such as a search appliance or service that spiders the Web site from the users perspective.

(Of course, that method invites its own problems -- how do you ensure the indexing engine has access to all the content?  And how do you ensure that the search results are "cleansed" of content the searching user may not have access to view?  When the search engine isn't in direct communication with the CMS, it can't determine any special rules -- such as permissions -- which may apply to content.)

URL Management

If your system ever renders a link to content within itself, it needs to know the URL at which that content gets accessed.

For example, the CMS may know that your article ID is 348, but you have to somehow tell it that when it publishes a link to that article in the navigation, the template (and consequent URL) it should use is at “/article.php”…except for things in the “politics” section — they use “/politics/article.php,” etc.

As such, a CMS needs to be “URL-aware” so it knows the publicly-accessible URL that maps to the contain contained within it.

Additionally, many systems allow you to alias URLs for usability and search engine optimization. (However, rewrite engines at the Web server level — mod_rewrite or ISAPI Rewrite — can usually solve this problem.)

Multiple-Format Publishing

Though your primary publishing format is usually HTML, many systems will allow you to define other “renditions” of content, from a simple printer-friendly version to a full-blown PDF. In practice, however, I’ve found that the utility of this is limited. Printer-friendly versions are best handled with CSS these days, and alternate formats like low-bandwidth and mobile versions can be handled at the template level.

That said, one handy application of this is the automatic PDF generation of binary files. Users store a Word document, but a PDF gets published. This saves you from having to manage an explicit PDF rendition — the CMS creates a new one whenever a new Word document is published.

A good example of this is Cascade Server from Hannon Hill. At one point, you could view any page on their site in six different formats, from rich text to WML.

Localization

Same content, multiple languages. It’s sounds simple, but it gets awfully complex pretty quickly. What you find is that everything has to be multi-lingual — every piece of content, every bit of navigation, even publishing templates since they often have text built into them. (Not to mention the admin interface of the CMS itself.)

Most systems will let you select a default language. If content doesn’t exist in the requested language, the two options are (1) don’t show that content, or (2) show the content in the default language.

(What gets really complex is when you have content published in multiple formats and in multiple languages. One article published in six languages with HTML, printer friendly, PDF, and WML versions comes to…24 renditions of your single piece of content.)

Data Collection

Most every Web site has a form or two, whether it be a simple "Contact Us" form, or a complex, multi-step registration form.  Many systems allow users to design forms to collect and manage data from end-users.

Form development is a considerable step more complex that content publishing, so capabilities and ease-of-use vary wildly in this area.  Ektron has a well-developed form development tool


So, there you have it — a brief survey of what content management systems do over the RDBWAAS systems we all start with.

(Note: this Knol is based on a blog posting from Gadgetopia entitled: "What Makes a Content Management System?," originally published in June 2007.)

Comments

Deane Barker
Deane Barker
Content Management Practice Director
Sioux Falls, SD
Article rating:
Your rating:
Moderated collaboration
All signed in users can suggest edits to the knol, but these need approval from an author before being published
Version: 11
Versions
Last edited: Jul 31, 2008 8:05 AM.

Activity for this knol

This week:

17pageviews

Totals:

1163pageviews
2comments