- Document management: use labels (aka “tags”) to help find content.
- Content retention: use labels to decide when content should be deleted.
- Information protection: use labels to determine how content should be protected.
In our experience, each of these activities has been tough for customers to adopt. Why? Because, as a colleague told me, each activity requires creating labels. And no one wants to do that.
Sometimes users resist creating labels. The job just seems too big. In other cases, no one wants to go through document after document and decide what label to apply to it.
Fear not; machine learning may have fixed that problem.
Labels and Information Management
Let’s take a minute to understand how each of the activities above operates.
First, we create a set of labels. Next, we decide what action the system will take based on the label. For a retention policy, that action will be to delete the message or not. For information protection, the action might be to encrypt a document. Finally, we go through all our content and apply a label to each object.
Using Machine Learning to Automate Label Application
It’s this last step that my friend said everyone hated. And no wonder! We’ve been creating content like nobody’s business for quite a while. So, the pile of content that we must be review and label is big. Really big. Who would want to review a stack of documents and decide what label applies to each one? I’d rather watch paint dry.
Thankfully, technology may be here to help us. We can scan documents and recognize words. That’s not new. What is new is that we’re teaching machines how to determine meaning by the presence of certain words in a document, or near one another, or present together with other characteristics.
Creating Labels is the First Step
Imagine that you have Post-it® Notes in three colors. You have a set of documents scattered on your desk. (Nowadays your cat is on your desk, because you’re working from home.) You’re going to put one and only one colored note on each document. I have literally done this exercise with customers.
The Post-it® Notes are analogous to creating labels. Putting a note on a document is assigning a label. Later, we define policies that specify what to do given a particular label.
How many labels do you want? The set should be small. Think, 3 ± 1. For information protection, you might settle on
- Confidential: exposure of this information could materially hurt the organization or an individual.
- Sensitive: This information is specific to our organization, but not materially harmful if released.
- General: there’s nothing here to bother with.
Let’s Get Scanning
OK, we’re done creating labels. Now, we want to scan our content and apply the labels. In the case of Microsoft 365 (fka Office 365) and Azure, we can do this via the Azure Information Protection unified labeling scanner. (Why the long name? Because Microsoft replaced several other scanners that worked in more limited circumstances with this one.) I’ll be covering how to do this in a future post.
Since a lot of our customers have file servers, SharePoint and OneDrive file repositories, using the Azure Information Protection scanner can take a big dent out of the label application work. I’m still investigating tools that would do similar scanning for other repositories like Google Drive, Box and Dropbox.
Knowing what kind of content is stored and where is the first step. With this knowledge, you know the size of your information protection problem. And knowledge is power, yes? Let’s create some labels, scan our content and see where we stand!