Skip to content

Shepley v. Auto-classification

December 15, 2011

We’ve been told that auto-classification is here, or about to be here, for years now, and every time it turns out to be a lie. The road to working, housebroken auto-classification is littered with casualties (the Wal-Mart DVD recommendation engine is the highest profile failure that comes to mind, but we’ve all witnessed demos and POCs that crashed and burned) while I know of only one real (public) success story, the DOD. But a government entity with no shareholders and massive resources and time at their disposal is not exactly the operating model corporations are looking for.

And for the last 18 months or so, my clients have been pretty silent on auto-classification—they’ve had other ECM things on their mind. But lately the buzz around auto-classification has been picking up at my clients and out in the wider world. So I figured it was high time I devoted some attention to it here.

I want to spend a few posts on auto-classification, from a number of angles:

  • What it is and what it isn’t – the very name auto-classification conjures up almost magical powers that can transform a gloppy, hulking mass of unstructured content into a highly structured, polished collection of tagged documents. As you might imagine, this is not entirely true.
  • How it works – not from a technical perspective, because this goes way beyond my knowledge. But I do know a bit about the people and process work these tools require to work properly, and the reality of it will likely surprise you.
  • Whether it works – I’m involved with a POC to test some of the auto-classification solutions out there against that most elusive of things: real client data. We’ve got an organization willing to share a chunk of their shared drive content as well as some vendors willing to use their tools to auto-classify that content. I won’t be identifying either the firm or the vendors here, but I will speak to auto-classification capabilities in general and what I saw working and not working during the POC.

I’m also hoping to hear from lots of you all out there during the series of posts about your thoughts on auto-classification, your experiences with it, thoughts on my thoughts, etc.—so get ready to jump in and get the conversation started.

My first post will be after the winter recess, but in the meantime, I hope you all have safe and enjoyable holidays with family and friends and look forward to seeing you back here in January!

One Comment leave one →
  1. December 18, 2011 5:09 pm


    This has been one of those “on the cusp” technologies for years. People have hungered after this for Records Management auto-classification/categorization and the like, but it has never met its potential. In my experience, this only works when you restrict the domain. If you can focus the topic to “P&C Claims documentation” or “Clinical Trials,” then you are able to introspect and find keywords that help get you 80% of the way, and tuning can get you asymptotically near 100%. However, if one tries to use auto-classification for information that could be an HR doc, a multi-topic blog post, or a shipping invoice, the system breaks down.

    Interested in your thoughts here, as well as any experts out there from Autonomy, Google and the like… it’s their bread and butter…

    – DG

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s