Home » Posts tagged 'data mining'

Tag Archives: data mining

Semantic Knowledge Management

I would like to share this article that was once published by Patrick Scholler in my former online journal Inside-Lifescience. I think it is worth not to be lost …

Semantic Knowledge Management

by Dr. Patrik Scholler, Sciconis GmbH

Do you believe that the internet is a powerful and gigantic database? It appears so. If you search for “genomics technology platform” by Google, you will receive immediately 654 documents about the subject, truly a gigantic amount of information and unfortunately too much to read. It is kind of interesting in a statistical sense, but does it help you in answering the question you have in mind? You would probably either try a different search string or start reading from top or you might simply continue working with your old genomics technology platform. Interestingly, somebody else might not even name it so, even if he had the same. He might call his high-throughput robot ‘liquid handling device’, one of fifteen possible synonyms, and would only have to deal with 23 documents. But what good are 23 documents when you miss the 654 plus a lot more, containing important information for you to take the best possible decision. In any case, getting exactly what you mean is difficult, using the internet with today’s search engines.

But it is obvious, that there is an enormous amount of valuable knowledge hidden in these petabytes of internet documents or between networks of all documents. Content is related by associations, as patents on genomic devices might contain information on companies producing pipetting robots? If you only could access the most useful documents and related links directly, the internet would be like a gigantic database or a reference library. If it answers your most complex questions without you having to read anything, it could even be called your personal consultant. Sounds futuristic, but the future becomes today every now and then. The hidden wealth of answers, the knowledge you are striving for, is like the message inbetween or beyond the rhyming lines of poetry. And being associative enough, implicity can be accessed. Finding the documents you mean and intelligently ranking and visualising the result to receive directly your answer without having to read them, are two important aspects of semantic knowledge management. In contrast to searching for strings of syntax it is about finding needles in haystacks in virtually no time. This becomes ever more relevant with every new day the internet’s data keep exploding.

What can you achieve with semantic knowledge management today, which has been impossible in the past? You could – for instance – perform as many market studies as you want without paying for a time-consuming survey.

Imagine, you have 24 hours and you would be urged to analyse 200 of the top pharma and biotech companies for their activities in 40 different fields of interest (e.g. protein design, SNP typing etc.) in order to characterise their market potential for your products or to find a market niche or the most common application or the companies that will be your individual top 30 customers. To check for all combinations means to perform 8,000 internet searches. Each search would deliver something between 0 and 10,000 internet pages: documents, news, quotes, patents, company homepages, and online-reports etc., where the company name is associated with one or more applications. If you checked only the first 10 hits for each search, you would have to read up to 80,000 documents and extract manually the information to fill your excel sheet. This would take one to two man years and is simply impossible. With semantic knowledge management software you can access the content of the first 100 hits of the search results to “company name and research”, automatically download the documents and correlate a full text analysis of 20,000 indexed documents. Novel software tools (Sciconis) would read through all documents with each combination of a company name (including all possible synonyms and abbreviations) and an application (including all possible synonyms in different languages), which is 8,000 combinations multiplied by at least 10 synonyms each. The number of hits per document are used to rank the companies according to the number of company-application collocations. For an overview and two-dimensional navigation the results can be ordered and displayed in a 3-d diagram:

This diagram contains too much information to manually deduce all answers for every useful question, e.g.: “which group of companies is more active in drug development type of applications than in basic research oriented ones?” Projecting the results by mathematical operations to condense the content of this information in regard to your individual decision process, you will receive the following plot:

Now you can immediately zoom into any zone of your interest and take instantaneous decisions on the basis of an information quality not achievable by any other means. Of course, you still have the documents and you can decide to read as many as you want. But you could also start a different semantic search and answer your next question and simply boycot reading.

Originally published in November 2001 by Inside-Lifescience, ISSN 1610-0255.