
iAgent - an initial study
Ever wonder how is information gathered and retrieved in the internet? When you typed http://depzzy.blogspot.com, ever wonder how the blog is retrieved? This is an introduction of one of the earliest commercialized information processing system - iAgent. (early days? Think of altavista.com, excite.com and the first inception of yahoo.com!)

iAgent was developed by researchers in Kent Ridge Digital Labs. One of them being Dr.Kok F. Lai, who is the co-founder of BuzzCity Pte. Ltd. The first version of the system was introduced in 1997 and was later adopted by AltaVista, a premier search engine at the advent of the popularization of the Internet. This write-up is based on the original engine in 1997.
It is a system developed for information collection and retrieval, especially in for multilingual and networks environments. This original blueprint has been expended to accomodate for major languages in the world, including Korean, Tamil, Chinese, Malay. Bahasa Indonesia, Thai, English, etc.
There are 3 important parts of this engine, namely: Information Gathering, Freetext Engine (Indexing, Retrieval), WWW interface.
Information Gathering:
iAgent works with well-known protocols. These include HTTP, NNTP, POP, FTP. These protocols would allow the agent to collect the information widely available on the internet. HTTP works with DNS names to retrieve valid content, NNTP retrieves information from newss servers, POP enable connectivity of email systems and FTP for transfer of messages among network nodes.
Freetext Engine:
After the information gathering, the Freetext Engineer will create inverted freetext index to enable document retrieval in the future. Treat this as a library indexing machine. We have many type of material in the library, such as: audio & visual products, books, documents, graphical material, computers, etc. The freetext engineer puts a label on these items and enables them to be identified quickily.
The retrieval process can support different query strings, such as: keywords or Boolean operations (AND, OR, NOT). These would present the document in a clustering format and it would also show the document extract. These clustering and document extract are the most common database retrieval results in nowadays. This enables people to form initial opionion about the document they want to retrieve.
WWW Interface:
This is a common protocol used to interface with browsers, such as Microsoft's Internet Explorer, Firefox, Opera, etc. The results retrieved from the retrieval process would be presented in HTML format. The document would be retrieved via URL of the document after users have selected it.
Study Notes:
We may take this simple engine for granted nowadays but back in 1997, it was a great invention and helped shaped the internet technologies as we know today. This engine provides a blueprint for information collection and retrieval system development since then, especially those that emphasize much on multilingual characters. A later development has seen paterns and graphical recognition being integrated. This would be studied in next article.
Please find more information about iAgent from here Technical Paper




