Galaxy Consulting Blog: Search Engine Technology

Thursday, May 9, 2013

Search Engine Technology

Modern web search engines are highly intricate software systems which employ technology that has evolved over the years. There are few categories of search engines that are applicable to specific browsing needs.

These include web search engines (e.g. Google), database or structured data search engines (e.g. Dieselpoint), and mixed search engines or enterprise search.

The more prevalent search engines such as Google and Yahoo! utilize hundreds of thousands of millions of computers to process trillions of web pages in order to return fairly well-aimed results. Due to this high volume of queries and text processing, the software is required to run in a highly dispersed environment with a high degree of superfluity.

Search Engine Categories

Web search engines

These are search engines that are specifically designed for searching web pages. They were developed to facilitate searching through a large amount of web pages. They are engineered to follow a multi-stage process: crawling the infinite number of pages to skim the figurative foam from their contents, indexing the foam/buzzwords in a sort of semi-structured form (for example a database), and returning mostly relevant as links to those skimmed documents or pages from the inventory.

Crawl

In the case of a wholly textual search, the first step in classifying web pages is to find an "index item" that might relate expressly to the "search term". Most search engines use sophisticated algorithms to "decide" when to revisit a particular page, to check its relevance. These algorithms range from constant visit-interval with higher priority for more frequently changing pages to adaptive visit-interval based on several criteria such as frequency of chance, popularity, and overall quality of site. The speed of the web server running the page as well as resource constraints like amount of hardware or bandwidth also figure in.

Link map

The pages that are discovered by web crawls are often distributed and fed into another computer that creates a veritable map of uncovered resources. This looks a little like a graph, on which different pages are represented as small nodes that are connected by links between the pages. The excess of data is stored in multiple data structures that allow quick access to this data by certain algorithms that compute the popularity score of pages on the web based on how many links point to a certain web page, which is how people can access any number of resources concerned with diagnosing psychosis.

Database Search Engines

Searching for text-based content in databases presents few special challenges from which a number of specialized search engines developed. Databases are slow when solving complex queries (with multiple logical or string matching arguments). Databases allow pseudo-logical queries which full-text searches do not use. There is no crawling necessary for a database since the data is already structured. However, it is often necessary to index the data in a more economized form designed to inspire a more expeditious search.

Mixed Search Engines

Sometimes, searched data contains both database content and web pages or documents. Search engine technology has developed to respond to both sets of requirements. Most mixed search engines are large Web search engines, like Google. They search both through structured and unstructured data sources. Pages and documents are crawled and indexed in a separate index. Databases are indexed also from various sources. Search results are then generated for users by querying these multiple indices in parallel and compounding the results according to "rules".

Galaxy Consulting Blog

Pages

Thursday, May 9, 2013

Search Engine Technology

No comments:

Post a Comment