Mining for Knowledge

Mining for KnowledgeIn my doctoral research, I’ve been researching ways to improve knowledge capture and sharing methods, specifically within project teams but the ideas can be dissemenated around the organization.

One of the biggest issues I’ve found while working as a consultant is the amount of knowledge that I walk away with after a project is complete.  Sure, I try to share this knowledge in every way possible but converting tacit (i.e., internal) knowledge to explicit (i.e., external) knowledge is one of the most difficult things to do.

Let’s assume though, that some portion of the knowledge that I hold in my head is converted into some form of writing at various periods throughout a consulting project.  Where does that explicit knowledge live?  In an email?  In some document stored on a server?  In a knowledge repository somewhere?

In the past, this problem has been attacked using centralized knowledge repository platforms.  These systems require users to log in and ‘enter’ their knowledge into the system.  Many of these platforms have been well built and some have been successfully used in organizations, but the success stories are far outweighed by the stories of KM repositories sitting idle and unused.

So…how can we get that tidbit of knowledge from my brain into some form of knowledge repository without me logging in and ‘entering’ it into the system?

Web 2.0 as knowledge repository

The use of Web 2.0 tools (blogs, IM, wikis, etc) has become ubiquitous..  If incorporated into a project environment, these tools might allow an easy and efficient method for capturing and sharing knowledge throughout project teams and project organizations.

The key to retrieving knowledge from tools is to make the user experience as seamless as possible. For example, an employee creates a blog on an organization’s intranet and then uses this blog to write different topics, some that pertain to her project and some that don’t.

Perhaps this employee is participating in two projects within the organization and she writes about topics that might be of interest to a portion of the organization and project team members.  While she writes about interesting topics and at times, writes about her experiences on the projects that she’s worked on, perhaps her blog posts aren’t widely read.  This employee has attempted to convert a portion of her tacit knowledge to explicit knowledge but few people on the project team or within the organization find this knowledge because its tucked away in the intranet site (which is rarely used anyway).

In the above scenario, knowledge was converted from tacit to explicit but few people are able to absorb this knowledge and make it their own (i.e., perform the conversion from explicit to tacit knowledge).  What would happen if this knowledge were indexed, searched and shared with the rest of the project team in something akin to a project knowledge ‘journal’?

Since Web 2.0 platforms are ubiqutious, why can’t we use these tools as our knowledge repository?  Employees and project team members are already using them…so can we find a way to ‘mine’ these platforms for knowledge?

Could a system be built that ‘mines’ these web 2.0 platforms along with other unstructured data (documents, email, etc) to ‘build’ a knowledge repository available to the entire organization?

Mining for Knowledge

I’m currently looking at ways to use text mining methods and techniques to mine for knowledge. Text mining looks to be a good approach to solving this problem because it allows for knowledge to be gathered without additional work by project team members.

There are other approaches that could be used for gathering knowledge from project team members, but all require additional work to input information.  For example, a project team using a manual approach could ask team members to regularly update their blog and to ‘tag’ their posts with a special project tag or keyword so that a non-intelligent aggregation system (RSS, etc) could simply pull these tagged posts into a central repository.  While this is a good approach, it relies on the end-user to tag their content correctly, accurately and in a timely manner.  Tagging, and other categorization and taxonomic approaches, require the user to do something to allow their knowledge contribution to be categorized, indexed and found by aggregation systems and other users.

Using text-mining methods against pre-existing tools and platforms takes away the human fallibility issues found in current knowledge management repository platforms or by requiring a user to ‘tag’ a piece of content correctly as described above.

Using text-mining and other data mining approaches, I’m looking at ways to build semi-autonomous systems to index and organize both structured data and unstructured data pulled from blogs, email, IM, social networks, documents, spreadsheets and any other location / data sources. This system could aggregate knowledge found via text mining and social network analysis and build a project knowledge ‘repository’ that will contain all knowledge for any specific project. This repository will be searchable and will contain both manually curated content (e.g., content uploaded by project team members) and automatically curated / generated content based on text-mining and indexing techniques.

There are some major privacy issues here of course. How can you mine a users email and find the relevant knowledge without truly invading their privacy?  Not sure you can but I’m looking at it.

Trust & Mined Knowledge

One key element of this new inter-connected world that we live in is trust.   How can I trust that the information I read on a web page is worthwhile, honest and accurate?   If I want to know something about organizational behavior do I read go read a Wikipedia article on the subject or do I go look through the Harvard Business School’s Organizational Behavior faculty pages and find publications written by the faculty there?

Which of these two sources of knowledge would you trust to be more accurate?

The same can be said of knowledge captured and shared within an organization. How do you know that the white paper on your new API is true?  Is it because it was released? Is it because of the author(s) of the paper?   What if you had a knowledge-base generated by an autonomous agent using text-mining techniques…how would you know to trust the information contained in it?  Who wrote the content?  Were did it come from?

This is where trust comes into play. If you could ‘see’ the qualifications of the author or authors of the knowledge base articles would you trust the content more?  If I knew that the worlds leading authority on organizational behavior wrote the Wikipedia article on the subject, I’d tend to trust that article more.

This is another aspect of my research…building trust into the mined knowledge using social network analysis (SNA) methods & techniques.  Using SNA techniques, can the background, profiles, connections and knowledge of the users within an organization be automatically (or semi-automatically) generated to provide some form for initial trust metric to show that mined knowledge can be trusted?

I don’t know if it can…but I’m looking into it 🙂

Next Steps?

So what are the next steps for me and this research?

I’m working on a research paper now that I hope will outline the research in more detail.

Lots of questions still exist and there is quite a bit of research left to do.  I do believe I’m headed in the right direction as evidenced by an HBR video & Blog tilted How Knowledge Management Is Moving Away From the Repository as Goal which discusses a similar topic.

Look for more on this topic from me in the coming months.

Related articles by Zemanta