Random Thoughts on search techniques, semantic web, inferencing and other topics which capture my interest

Monday, October 19, 2009

Cognition, Singularities, and Neuroscience

One interesting "Aha" moment in my research on Semantic Web technology has been the interesting links tht I see between this technology, the Singularity (the rise of Artificial Intelligence), Neuroscience and Cognition. Perhaps the best book I came across was "The Executive Brain" by Elkhonon Goldberg.

He provides an interesting wholistic view to brain functions and cognitive functions that actually makes sense of the flexible nature of Neural Networks. This has led me to rethink Neural Networks as a model of cognition.

The Singularity deals with the eventual creation of an Artficial consciousness. There is a Singularity Institute that you should check out. The just had a conference in NYC.

Labels: ,

Back to the blog...

It's been a while since I've posted here. Been a lot of changes in my life that are only relevant in that they kept me from posting until now.

But I'm back and getting into the Semantic Web in a big way. Finally getting around to using the technology and will update the site as I dive in and swim around.

First up....a sidebar and then a book or two of note...soon.

Labels: ,

Thursday, May 18, 2006

Enterprise Applications of Semantic Web: Sweet Spot of Risk and Compliance

Interesting paper from Semagix (www.semagix.com) on the intersection of risk and compliance and the semantic web

Semantic Web is in the transition from vision and research to reality. In this early state, it is important to study the technical capabilities in the context of real-world applications, and how applications built using the Semantic Web technology meet the real market needs. Beyond push from research, it is the market pull and the ability of the technology to meet real business needs that is a key to ultimate success of any technology. In this paper, we discuss the market of Risk and Compliance which presents unique market opportunity combined with challenging technical requirements. We discuss how the Semantic Web technology with an ontology driven approach is especially well suited to support the demanding requirements of the applications in this market. We also discuss the capabilities of a commercial semantic technology that has origins in academic research, as it is utilized in a significant Risk and Compliance application deployed at large financial institutions. Core capabilities of this technology include the ability
to develop and maintain focused but large populated ontologies, automatic semantic metadata extraction supported by disambiguation techniques, ability to process heterogeneous information and provide semantic integration combined with link identification and analysis through rule specification and
execution, as well as organization and domain specific scoring and ranking. These semantic capabilities are coupled with enterprise software capabilities which are necessary for success of an emerging technology for meeting the needs of demanding enterprise customers.

Semantic Web Search Engine Companies

The following companies are involved in the Semantic Web portion of the search engine market:

• OntologyWorks brings ontology-based information and enterprise software engineering tools to the commercial market.
• NetworkInference creating software products, and promoting the development of web standards, that, together, will power the advance of machine understanding, and reduce the level of human processing involved in web-based applications.
• CognIT is the Norway-based provider of CORPORUM, a tool suit for Ontologie Extraction, Semantic and Content Analysis, Summarising and Content Visualisation.
• Taalee provides semantics based search facilities.
• Invention-Machines provides also semantics based search facilities.
• Ontoprise develops Ontology Editors and Inference Engines.
• Intellidimension provides an RDF based information integration environment including an inference engine.
Open Source Search Engines:

• Lucene

• Nutch (subproject of Lucene)

DOD and Intel Search Engine Market

The Defense Technical Information Center (DTIC), which creates libraries and information retrieval systems for organizations throughout Defense, uses information retrieval software. In the case of GulfLink, a system that covers issues related to the Gulf War, DTIC developed a search and retrieval system that incorporates all three vendors' systems. To guide users, DTIC provides a checklist of capabilities and directs users to the most appropriate system based on the answers it receives.
More and more government entities will take advantage of advanced information retrieval systems. In a recent study, Delphi Group of Boston forecast that the market will expand at a rate in excess of 20 percent through 2004.
The problem with today's search technologies, experts say, is that often they don't scour the entire body of knowledge available to an agency. Although a typical search may gather valuable information from myriad sources, it may not include relevant data from systems outside the agency or unstructured data in the form of white papers, news reports or e-mail messages.
The problem gets worse in the case of cross-agency searches. Developing an intelligent search and retrieval system that works across agency boundaries is fraught with difficulties. Not only must such a system handle both structured and unstructured data, it must be monitored for constant updates and deal with the varied security clearances of federal employees seeking information.

FAST Corporate Overview

Leadership
– Founded in 1997
– Public company (OSE: ’FAST’)
– Profitable and wel capitalizedTromsø
– Revenue growth = 50% TorontoOslo Stockholm Chicago
– > 1,500 customersBostonCologne
SaltLake CityLondonMunich
– Enterprise search position:New YorkRomeTokyo
•#1 in revenue growthSan FranciscoWashingtonDC
DubaiSingapore
•#2 in revenue size
•#1 in market capRiodeJaneiro
•FocusSydney
Sao Paulo
Melbourne
– Internet business sold to Overture
– Acquired AltaVista & NPAG
•Momentum
– Outperforming competitors in key financials
– Continued record revenues and new major customers each quarter
– 100% Customer Retention Rate; 97% Customer Satisfaction Rate

Free/Open Source Search Engines

• ASPSeek (free)
• BBDBot (free)
• Datapark Search (free)
• ebhath (free)
• Eureka (free)
• ht://Dig (free)
• ISearch (free and commercial versions)
• JXTA Search (free)
• Lucene (free engine, no crawler)
• MG (Managing Gigabytes book and system)
• + see also A quick guide to performing IR batch experiments with the MG system
• mnoGoSearch (free for Unix, low cost for Windows)
• MPS Information Server
• Namazu (free)
• Nutch (free)
• OpenFTS (free)
• PLWeb (free, partial open source)
• SWISH-E (free)
• SWISH++ (free)
• WAIS and freeWAIS (free, fairly obsolete)
• WebGlimpse (free)
• Xapian Code Library (free)
• XML Query Engine (GPL)
• Zebra (free, mainly Z39.50 server)

Search Engine Companies in the DOD market

• Google – Google Search Client for military and DOD
• Yahoo
• Oracle's UltraSearch
• Verity Inc. of Sunnyvale, Calif., -- State and Defense departments
• Convera (formerly Excalibur) of Vienna, Va. -- Social Security Administration, IRS, and the Agriculture, Defense and State departments, FBI, CIA, NSA,
• Thunderstone Software LLC of Cleveland --- Defense, the National Weather Service and Agriculture;
• OpenText Corp. of Waterloo, Ontario -- Air Force and Navy.
• Quigo, IntelliSonar

Enterprise Search Market Vendor Breakdown

As noted in the CMSWatch Enterprise Search Report, the Vendors are broken down into the following groups:
"The Big 4"
• Autonomy: IDOL Server
• Convera: Retrievalware
• Fast Search & Transfer: Enterprise Search Platform (ESP)
• Verity: K2 Enterprise
Specialized High End Players
• Endeca: Profind
• InQuira: InQuira
• iPhrase: OneStep
• Lextek International: Onix
• Stratify: Discovery System
• TeraText: TeraText Suite
• Triplehop: MatchPoint
Infrastructure & ECM Suite Vendors
• Hummingbird: Hummingbird Search Server
• Microsoft: SharePoint Search Services
• OpenText: Livelink
• Oracle: Oracle Text
• SAP: TREX
Mid-Tier Challengers
• Arikus: Aire
• Coveo: Coveo Enterprise Search
• ISYS: ISYS Search Suite
• Speed of Mind: Speed of Mind Index Server
• Vivisimo: Vivisimo Clustering Engine
Search Appliances
• Google: Google Search Appliance
• Thunderstone: Thunderstone Search Appliance
Hosted Services
• Blossom: Blossom Enterprise Search
• WebSideStory (formerly Atomz): Search
Lower-cost, Web-oriented
• dtSearch: dtSearch
• Innerprise: ES.NET 2004
• Mondosoft: MondoSearch
• Verity: Ultraseek
• YourAmigo: YourAmigo Enterprise Search

Search Engine Marketing Overview

Search Engine Market Size and Description

The total available revenue from licensing search software in 2005 is estimated to be $4 billion from all market segments. The majority of the revenue is accounted for by Overture, Verity, Autonomy, Google, OpenText, and Microsoft. Most smaller search companies are not likely to survive unless they make a major market breakthrough like Google and Overture did in the last three years.

Financial Results of Search-Centric Firms:

Company Revenues (000) Total Net Income (000) Comment
Autonomy $51.3 $7.3
Google $300.0 $50.0
Hummingbird, Ltd $372.1 $4.9 About 20% are search related revenue
OpenText $158.7 $23.3 About 30% are search related revenue
Overture $569.2 $84.5 About 60% of adjusted total
Verity $96.0 $9.6
Total $1,547.30 $179.60


Next generation search is or has been driven by military and intelligence initiatives. However, the technology that many start ups are describing as “state of the art” is anywhere from 2-4 years behind what the advanced laboratories are working on. The pipeline for many systems is 24 months due to the length of time that it takes for the government process. Commercial products must run in a price-sensitive, stable, reliable environment. Research projects and government initiatives operate under different rules.

Also, a few companies: Verity, Convera and Autonomy appear as examples of successful companies in multiple markets. In fact, the list of commercially viable search companies is very short. Even former industry leaders such as Fulcrum, OpenText and DT Search are struggling to find customers, revenue and a sustainable competitive advantage.


Overall Search Engine Market Structure:

According to Comscore MediaMetrix, the largest percentage of searches performed by US web surfers was Google for the month of July 2005. Of the 4.8 billion web searches, the key players in the search engine market share broke down as follows:



• Google
• Yahoo (includes Yahoo, AltaVista, and Overture)
• MSN
• AOL (includes AOL and Netscape)
• InfoSpace (includes Dogpile)
• Ask (includes AskJeeves, Teoma, Excite Network, iWon, and MyWay)
• Others
Growth Over Time

Ownership
The major players can now be broken down as follows:
• YAHOO owns INKTOMI and OVERTURE who owns ALTA VISTA and FAST and ALLTHEWEB
• AOL search owns NETSCAPE search who owns OPEN DIRECTORY
• ASK JEEVES owns TEOMA
• LYCOS owns HOTBOT
• EXCITE owns IWON
Non-search engine owners
• GO is owned by Walt Disney
• Google is owned by Google
• Looksmart is owned by LOOK
• MSN is owned by Microsoft
Interlocking Relationships
The market has many relationships between search engines themselves and content providers and advertisers. The market seems to bifurcate itself between two groups centered around either Google or Yahoo. There is a smaller group centered around Microsoft.


Free/OpenSource Search Engines

There is a healthy supply of Free or OpenSource Search Engines that operate adequately for small/medium size enterprises (See Appendix xxx) and thus compete with the low end search engines.

Change in Business Model
The search market space is a very competitive one. The niche has high visibility. Somewhere between 65% and 80% of Internet users say search is the chief use of the Internet.

However, no company has been able to build a sustainable business with basic search software licensing regardless of the presence of advanced technology. There is always an extra feature such as for-fee services or commissions on content licenses that generate cash flow.

Companies have repositioned themselves, abandoning markets where sales were too costly or two small. When one company abandons a segment, others enter it. Most of the technologies used by these companies are generally well known and have been available for more than a few years.

Search Company Old Positioning New Positioning Business Model
Applied Linguistics (formerly Oingo) NLP tools Ontologies for Intranet content collections and services License software and provide ontology consulting
Ask Jeeves Easy Query Enterprise search and associate search results based on Teoma License “engine” for enterprise portal
Charge for professional services and support.
Autonomy Knowledge Management “portal in a box” License “engine” for Intranet portal indexing and wireless device search tool.
Brightplanet Web indexing Access to content in structured databases License software and provide sell services to build text mining systems
Convera (formerly Excalibur Technologies) ASP service plus site license Enterprise search and text mining Obtain new investors and return to site license Business Model
divineInterventures
(Formerly Retrieval Technologies and Northern Light) Indexed and filtered news feeds and Web indexing None No viable business model
HNC Software Intelligence tool Health care and enterprise intelligence License software to organizations; fees for customization
iPhrase Inc. Web content indexing Enterprise and Web content indexing License software and provide professional services
OpenText Web search and Intranet indexing Enterprise applications including knowledge management and collaboration License tools to ecommerce sites wanting collaboration, database, services and search in a one-stop shop.
PLS/AOL “Find a needle in a haystack” No cost OpenSource software None. Out of the search and retrieval business
Verity Enterprise search and OEM deals with other software products requiring search Enterprise search including access to structured database Text and SQL search licenses, OEM deals plus professional services and maintenance
Yahoo! Web directory of popular sites Acquire Inktomi and shift to for-fee directory listings and Inktomi-generated Web Search Shift to for-fee advertising and subscription model

VALUE CHAIN

The above description of Business Models for Search Engine Companies is born out by an analysis of the Value Chain for E-publishing. This Value Chain can be extended to any number of Internet markets where the most value is obtained by consumer-facing sites that offer services. The most value is not created by search engines per se.
Defense Department Search Engine Market
DOD and Intelligence Agencies’ Markets
The Defense Technical Information Center (DTIC), which creates libraries and information retrieval systems for organizations throughout Defense, uses information retrieval software. In the case of GulfLink, a system that covers issues related to the Gulf War, DTIC developed a search and retrieval system that incorporates all three vendors' systems. To guide users, DTIC provides a checklist of capabilities and directs users to the most appropriate system based on the answers it receives.
More and more government entities will take advantage of advanced information retrieval systems. In a recent study, Delphi Group of Boston forecast that the market will expand at a rate in excess of 20 percent through 2004.
The problem with today's search technologies, experts say is that often they don't scour the entire body of knowledge available to an agency. Although a typical search may gather valuable information from myriad sources, it may not include relevant data from systems outside the agency or unstructured data in the form of white papers, news reports or e-mail messages.
The problem gets worse in the case of cross-agency searches. Developing an intelligent search and retrieval system that works across agency boundaries is fraught with difficulties. Not only must such a system handle both structured and unstructured data, it must be monitored for constant updates and deal with the varied security clearances of federal employees seeking information.
Enterprise Search Engine Market
The Enterprise Search Market is a growing segment of the Search Market, growing from $430 M in 2003 to over $1 B in 2006. Among the dominant factors causing the growth are:

• Sarbanes Oxley compliance, especially in e-mail messages
• Reducing duplicate data in the organization
• Making internal data accessible to employees with the proper authorizations to improve productivity
• Enhancing synergistic opportunities to develop new product, identify and exploit new markets, and improve productivity
As shown in Appendix I, the market has four “Big” vendors and a variety of vendors that can be grouped into sub-segments of the market. In particular, the market is fragmented and has different meanings to different vendors. While some vendors, like Google and Thunderbird, are attempting to commoditize the market the market the market consists of many sub-segments that are still evolving.


Online Advertising Market
Internet-based advertising remains a small portion of the total U.S. advertising market, which was $71 billion during the first half of 2005, according to research firm TNS Media Intelligence. According to the Boston Globe, Internet-based advertising grew by 26% to a record $5.8 billion - in U.S. Internet advertising revenues for the first half of 2005. Search ads made up 40 percent of the online ad revenues, the same share as the first half of 2004. But revenue totals for search jumped 27 percent to $2.3 billion, from $1.8 billion. Display ads made up 20 percent and classifieds 18 percent.
The top-tier publishers loosened their grip on revenues, writes ClickZ. The top 10, 25 and 50 sites lost a few percentage points as advertisers spent a bit more of their budgets on lower-tier publishers, likely including blogs. In 1H04, the top 10 sites accounted for 74 percent of spending; that proportion for 1H05 was 72 percent.
CPM and impression models grew to account for 48 percent, from 45 percent in the first half of 2004; performance deals reached 40 percent from 38 percent; and hybrid revenues were at 12 percent, down from 17 percent.
The largest players in the online Ad Serving Market are Google Adsense, Yahoo Overture and projected to start in October 2005, Microsoft AdCenter, Other players include: LookSmart Listings, Yahoo Paid Inclusion, and Altavista Trusted Feed, Double Click
Ontology Market
Relevancy Issues



Effects of Government Regulation
Swarming Solutions are subject to a number of foreign and domestic laws that affect companies conducting business on the Internet. In addition, because of the increasing popularity of the Internet and the growth of online services, laws relating to user privacy, freedom of expression, content, advertising, information security and intellectual property rights are being debated and considered for adoption by many countries throughout the world.

In the U.S., laws relating to the liability of providers of online services for activities of their users are currently being tested by a number of claims, which include actions for defamation, libel, invasion of privacy and other data protection claims, tort, unlawful activity, copyright or trademark infringement, or other theories based on the nature and content of the materials searched and the ads posted or the content generated by users. Likewise, other federal laws could have an impact on our business. For example, the Digital Millennium Copyright Act has provisions that limit, but do not eliminate, our liability for listing or linking to third-party web sites that include materials that infringe copyrights or other rights, as long as we comply with the statutory requirements of this act. The Children’s Online Protection Act and the Children’s Online Privacy Protection Act restrict the distribution of materials considered harmful to children and impose additional restrictions on the ability of online services to collect information from minors. In addition, the Protection of Children from Sexual Predators Act of 1998 requires online service providers to report evidence of violations of federal child pornography laws under certain circumstances.

In addition, the application of existing laws regulating or requiring licenses for certain businesses of our advertisers including, for example, distribution of pharmaceuticals, adult content, financial services, alcohol or firearms, can be unclear. Application of these laws in an unanticipated manner could expose Swarming Solutions to substantial liability and restrict the ability to deliver services to users. For example, some French courts have interpreted French trademark laws in ways that would, if upheld, limit the ability of competitors to advertise on generic keywords.

Also, regulations affecting export controls in sensitive technologies may be applicable.

Thursday, October 20, 2005

Flock, the preview

Just downloaded Flock, the social browser, and will start testing this puppy out.