Galaxy Consulting Blog

Friday, August 17, 2012

Content Management Systems Reviews - Documentum - Automatic Classification

Documentum has two tools for automatic classification: Content Intelligence Services (CIS) and EMC Captiva Dispatcher. The subject of my today’s post is Content Intelligence Services (CIS. In my next post, I will describe EMC Captiva Dispatcher.

Content Intelligence Services (CIS) is an extension to the EMC Documentum content management platform that enables automatic classification and categorization of content in the Documentum repository. Its benefit is well organized, classified, and categorized content. With CIS, content is parsed and analyzed and classification rules are applied. The results of the classification can then be used for categorization as keywords to populate content metadata.

The capability of automatically creating keywords to populate content metadata can remove the burden from end users who otherwise have to do it manually. Many users struggle to consistently populate metadata as content is being created which significantly limits its future use since metadata is what enables processing of the content.

CIS eliminates this dependency on users. CIS can propose metadata to users who can accept or modify them as needed. CIS can provide support for a combination or automatic and manual classification with a special user interface to category owners. Category owners can make a classification decisions manually in cases where the automatic rules cannot classify content with a preset certainty level. The user interface is built into every Documentum client such as Webtop and becomes active upon detection of CIS in the system.

With CIS, the results of the classification can be used for content categorization which assigns content to appropriate categories. Typically, categories are represented by a folder structure to which content is linked. A category hierarchy – or taxonomy – is usually common to a department or an organization and allows all users to share the same navigation view for content in active project or content that has been archived.

CIS comes with prepackaged taxonomies for various industries. These taxonomies can be customized and used either out of box or as a starting point for a customization. Users can add categories and sub-categories to these taxonomies.

CIS supports major European languages. This enables the classification of local content in its native language against an enterprise-wide or local taxonomy. Using this multilingual capabilities, companies can deploy CIS globally, enhancing globalization capabilities of Documentum that include pervasive Unicode compatibility and localized user interfaces.

Next time: EMC Captiva Dispatcher.

Monday, August 13, 2012

SharePoint Libraries for Content Management

A library is a place on a site where team members can work together to create, update, and manage files. Each library displays a list of files and key information about the files.

Why work with libraries?

Storing your documents in a central location can help your team work on files together, especially if your files tend to be scattered among people's computers or in multiple shared folders on your network.

For example, the Marketing team uses a document library named Marketing Documents for managing its press releases, budget files, contracts, and other types of files. The library stores information that is relevant to the type of file, such as the name of the project that the file is associated with. The Marketing team also uses a slide library to share and reuse slides for presentations.

The Shared Documents library is created automatically when your team creates a new site. You can start using this library right away, customize it, or create other libraries. Your team can also create more specialized libraries, such as slide libraries, picture libraries, and form libraries.

The Marketing team tracks versions in its libraries, so the team has a history of how files have evolved and can restore a previous version if someone makes a mistake. Team members check out documents when they work on them, so that no one else can overwrite their changes.

If you want a workspace where you can coordinate work on a document or a small number of related documents, you can create a Document Workspace site. A Document Workspace site includes a document library in addition to a tasks list, schedules, and a list of workspace members. For storing your team's primary set of documents, which your team uses on a routine basis, use your team's Shared Documents library.

By default, people in the Members group can add files to and edit files in a library. If you don't have permission, contact the person who owns your site or library. If you are a site owner or designer, you can customize the library by changing how the files are displayed and managed.

Some key advantages of working with libraries

The following are some key features of libraries that enable your team to manage its files and work more efficiently. Advanced features for managing content, such as policies on how documents are used and shared, are explained in other topics.

Central location - a library is a central location where your team can update and manage documents. If your team members struggle to keep up with files stored on individual shares or sent in separate e-mail messages, a library can help reduce the chaos.

Checkout - you can check out a file to reserve it for your use so that others cannot change it while you are working on it. If you are using the 2007 Microsoft Office system, you can work with files on your computer, and even take them offline, when you check them out.

Versions - a library can track versions, which provides a version history and enables previous versions to be restored.

Alerts and RSS - you can set up e-mail alerts or subscribe to RSS Feeds so that you are updated on changes to files.

Views - your team can create views that show content in multiple ways that may be especially relevant or meaningful. For example, the Marketing team has views of files grouped by department and contracts that expire this month.

Search - libraries are searchable. For example, you can search on a title or property of a document, such as the document author.

Client integration - If you are running some 2007 Office release programs, such as Microsoft Office Word 2007, you can work with server features directly from the client, such as checking out files, updating server properties, or viewing a version history.

Approval - your library can be set up to require someone to approve files before they are displayed to others. This feature can be helpful if your library contains important guidelines or procedures that need to be final before others see them.

Content types - your team can set up content types for the types of documents it uses most often, such as marketing presentations, budget worksheets, and contracts. The content types include templates as a starting point, for formatting and any boilerplate text and for properties that apply to the documents of that type, such as department name or contract number.

Workflow - your group can apply business processes to its documents, known as workflows, which specify actions that need to be taken in a sequence, such as approving or translating documents.

Thursday, August 9, 2012

E-Discovery and Records Management

Discovery is the pre-trial phase in a lawsuit in which each party can request documents and other evidence from opposing parties. E-discovery deals with discovery of electronically stored information (ESI), including documents and e-mails.

E-Discovery preparedness makes it imperative for organizations to develop an enterprise wide strategy to manage the volume of electronic information. The discovery process affects many individuals in an organization, not just lawyers and others involved in discovery, but also IT professionals and records managers, who have to be prepared to produce electronic content for discovery and litigation.

For legal counsel, it means having a review process to determine what discovered content is relevant to the case. For an IT person, it means restoring backup tapes to show evidence on file shares, content management systems, e-mail systems, or other applications. But for records managers, this work will have begun long before any lawsuit with managing records for retention, placing legal holds, and finalizing disposition.

ESI presents special issues for discovery:

ESI can be replicated at a very low cost, resulting in tremendous volume;
Electronic content can be easily changed and deleted;
ESI can be backed up, creating more volume as content is copied;
Electronic content may require certain software to access and read;
ESI can reflect relationships based upon how it is distributed;
ESI may have associated metadata;
ESI can be searched.

Ediscovery could be costly because it requires organizations to retrieve content from servers, archives, backup tapes, and other media.

In some cases, an organization is unable to execute a discovery order because it is unable to locate all content in a timely manner, or it is unable to place holds on all content and some of it is deleted during the lawsuit. The inability to do this correctly also has a cost, and it can be considerable.

To address these costs, many organizations are looking at e-discovery solutions that will enable them to review the found content and take it through litigation.

But organizations can also lower costs for archiving and restoring, legal review, and sanctions by simply cutting down how much content it retains. Less stored content means less content on which to perform discovery.

On the other hand, because all ESI is now discoverable, organizations may be tempted to destroy that information as soon as possible to reduce the cost of discovery. But, some information must be kept for regulatory and compliance reasons. For example, many organizations are governed by regulatory bodies that require business information to be retained for a specific period of time. Some of that information might also be important to support the organization in case of litigation. Destroying the wrong information can lead to fines and unfavorable judicial decisions.

Some organizations may randomly pick through content to remove content that is deemed most risky. But in litigation, it will be necessary to prove that the deletion of this content was consistent with a policy that has been applied rigorously. Without audit trails and certificates of destruction, it can be difficult to prove compliance with an organization’s policies.

To avoid this situation, many organizations are simply choosing to keep everything. But this experience proves that the cost of restoring backup and archive tapes, as well as the cost of discovery and the inability to identify content and place immediate holds, can make this policy economically disastrous in the event of litigation.

Developing a strategy and a plan of action for handling e-discovery will help organizations mitigate their risk and save them a significant amount of money in the event of litigation. Organizations need to have a retention policy to determine which content can be destroyed and at what time and which content should be kept and for how long. The key is to have a retention program that is flexible enough to keep content for the right retention period.

Retention periods are historically thought of in terms of calendar events. A document that was created in 2000 may no longer be required in 2012, and so it may be destroyed. Retention periods for content are driven by events, such as the length of a project, the duration of a contract, or the termination of an employee. And the retention policies that match up to these content types must reflect the lifecycle of the content.

Organizations may choose to keep project information for x number of years after the end of the project. A workflow event that signals the end of a project, such as the publishing of a report, may commence the retention period for the associated e-mails and files. An organization may create a retention policy that a contract will be retained for x number of years after the end of the contract period. The end of the contract, then, could then trigger a lifecycle action for that document.

There are many types of events that could trigger a retention policy: content expired (e.g. a contract), usage statistics (e.g., document has not been accessed in six months), business event (e.g., environmental impact filing), content lifecycle event (e.g., new revision checked in).

There are many actions an organization can take based upon the retention policy: delete, notify author, archive, move, delete revisions, revise. These different actions can be applied to retained content over the course of its lifecycle as it moves from its active use to inactive status to its deletion.

The best approach to records management is where authors create content using their familiar tools and systems, and retention management is enforced on that content where it lives, from a centralized place. This approach has a number of benefits:

retention policies are centrally administered through a single interface;
a catalog of discoverable content is created;
holds can be placed instantly across these different systems, ensuring that evidence is not deleted during litigation;
disposition can be performed from a central place.

By categorizing content, creating a catalog of the content, creating a retention plan, implementing a hold methodology, and having disposition procedures, an organization will benefit in many ways. They include:

Decreased Risk – by keeping less content, an organization decreases the risk of adverse evidence being found;
Higher Productivity – by organizing content through a file plan, key information, such as regulatory filings, tax information, business licenses, invoices, and other content, can be more easily found;
Lower Discovery Costs – with less information available for discovery, an organization will reduce the cost of restoration of content and the cost of legal review;
Increased Flexibility – an organization will be prepared to present a catalog of discoverable content, which is a requirement in a case of a litigation;
Stronger Legal Action – By knowing the evidence that an organization possesses, legal counsel can more quickly assess strategy and pursue a settlement, which can be a huge money savings;
Less Vulnerability – organizations that are unable to comply with electronic discovery requirements are beginning to see nuisance lawsuits. When an organization cannot comply with discovery requirements, it may set a cost threshold – stating, for instance, that any lawsuit under $100,000 is not worth the discovery effort and should be settled. This exposes the organization to nuisance lawsuits that are brought at just under the threshold.

If you have not already done so, now is the time to develop ESI retention programs. Now is the time to create committees within your organizations and to bring their expertise together with legal counsel and IT to prepare for e-discovery and litigation. And, now is the time to focus on one of any organization’s greatest assets, its information.

Friday, August 3, 2012

SharePoint and Collaboration

Most people spend the greater part of their work day involved in collaborative tasks. They share information, they work together in teams, and they manage projects. It can be a challenge to collaborate effectively if you do not have tools to easily communicate, share information, and coordinate projects details and deadlines among a large group of people.

SharePoint can help you get your work done more efficiently because it provides organizations with a platform for sharing information and working together in teams. A SharePoint site offers specific kinds of tools and workspaces that you can use to communicate with team members, track projects, coordinate deadlines, and collaboratively create and edit documents.

Manage Projects More Efficiently

Users can create a site from the Team Site template to manage a range of team projects and document related tasks. They would use their team site every day to create and manage documents, track issues and tasks, and share links and contacts. Because they have one location for these activities, members of a team can save time and enjoy increased productivity.

The site template for a team site includes:

Shared documents library;
Announcements list;
Calendar;
Team discussion list;
Tasks list;
Links list.

The site can store long term routine information for a single department or short term information for a special project that spans several departments. By creating a team site to use as a collaborative workspace, your team can become both more efficient and more productive and ultimately achieve better business results. You can also customize your site to meet the needs of your team or project by adding lists, libraries, or other features to the site. Calendar can be used for tracking events, meetings, etc. Users can link the calendar to their personal calendars in Microsoft Office Outlook so that they can view this information along with their personal calendar information. Users can create a Project Tasks list to visualize and track the key phases of projects.

There are several different ways you can use a team site to manage projects more efficiently:

use built-in features such as the Project Tasks list template, which enables you to visualize task relationships and project status with automated Gantt charts;
coordinate the team's work with shared calendars, alerts, and notifications. You can connect a calendar on your SharePoint site to your calendar in Office Outlook 2007, where you can view and update it just as you do your personal calendar;
create Meeting Workspace sites to gather materials and documents related to a meeting.

Create, Review, and Share Documents

Groups of people can create and edit documents collaboratively. For example, team members save general documents to a shared Documents library, where other team members can easily read them or check them out and edit them or team can use slide or picture libraries to save and reuse slides and pictures for various presentations, etc. For special projects that involve only a few people, team members can create Document Workspace subsites on their team site. Document Workspace sites help users to coordinate work on a single document or a group of documents.

There are several different ways to save and work on documents and other files on a team site:

use document libraries to store and manage important documents. Features such as versioning and check-out help you keep track of revisions to a document and to prevent multiple people from making changes at the same time;
create Document Workspace sites to coordinate the development of specific documents;
use Slide Libraries to share and reuse slides in a central location;
take document libraries offline to enable people to view and edit documents while they are not connected to the network;
use workflows to manage collaborative tasks such as document review or approval.

Capture and Share Team Knowledge

SharePoint provides organizations with a central location to capture best practices, share information, and promote standardized business processes. Teams can use both a wiki site and a blog site to capture and communicate information of interest to the team. A team can use a wiki to compile general information about company and team processes that will be helpful to new team members. Any member of the team can add information to the wiki or update the wiki posts. A team can also routinely post industry-related or marketing-related information to a blog site, where other team members can read the posts and comment on them. The blog provides team members with a forum to share new ideas, opinions, or inspiration.

Here are some ways you can use SharePoint to capture and share collective team knowledge or important information:

track updates and information with alerts or Really Simple Syndication (RSS);
use blogs to share or promote information;
Capture community knowledge or document internal processes by using a wiki;
use surveys or discussions to gather information or encourage dialogue.

Tuesday, July 31, 2012

Content Management Systems Reviews - Open Text - ECM Suite - Portal

OpenText Portal (formerly Vignette Portal) is a part of OpenText ECM Web Content Management Solution. It enables you to create web sites with rich content and applications, enabling customized users interactions. It provides a highly scalable and efficient means of aggregating content and applications for use across a variety of initiatives inside and outside the firewall.

It enables users to combine web services, repository data, and user interfaces in meaningful ways to create valuable business applications without IT help. Users can create web pages by simply selecting portlets from OpenText’s library of over 200 portlets.

Portal layout management allows users to easily apply a variety of page layouts via visually intuitive tools. There is interaction between portlets. Pages refresh only as needed. Portlets load separately, so the end user does not have to wait until the entire page loads. Pages are dynamic and load quickly.

Users can create a template from an existing site to be used later to create similar sites. Portlets can be embedded at any web site. User can utilize pre-defined portlets that allow rapid site creation of portals with common functionality that is integrated with existing applications and data. There are portlets that enable teams to share and publish portal documents as part of any business processes.

Social platforms such as blogs, wikis, rating, ranking, tagging could be integrated. Out-of-box federated search and taxonomy management tools are also available. There is an ability to custom connect 3rd party search engines.

All portals could be managed from a unified permissions-based management console while allowing delegated administration to individual portals which allows diverse multi-dialect administrators to manage virtually all of their portal objectives.

Content could be delivered to customer's PDA, cell phone or other device of choice. Content could be targeted from different repositories based on dynamic user segments.

User experience can be improved by enabling inter-portlet communications, web services integration and display of 3rd party portlets in an integrated, contextual way.

One can enhance security and auditing of the site activity via a native reporting interface that reports on site modifications.

Web presence is globalized with extensive internationalization for portal users and administrators to support diverse audiences.

Live sites could be updated faster via incremental item changes instead of import/export of the entire navigation tree. Import/export components or versions of components, categories of components or versions of components using batch processes.

Friday, July 27, 2012

Content Management Systems Reviews - Open Text - ECM Suite - Web Content Management

The following products deliver the Web Content Management component of the OpenText ECM Suite:

OpenText Web Experience Management is a comprehensive solution for managing content in high performance, scalability, and transaction-oriented web applications.

OpenText Portal works in tandem with OpenText Web Experience Management to allow you to rapidly create mashups and composite applications built on Web services, repository data, and user interfaces.

OpenText Dynamic Portal for Third-Party Portals works in tandem with OpenText Web Experience Management to allow you to publish content directly into portals such as Liferay, IBM WebSphere, or Oracle WebCenter.

OpenText High Performance Web Delivery provides a unique, integrated combination of real-time caching and intelligent cache management capabilities. It improves Web site performance, makes Web sites more scalable, and in many cases reduces costs and manual overhead.

OpenText Semantic Navigation is a powerful and engaging technology, OpenText Semantic Navigation combines content analytics with information retrieval to automatically present Web site visitors with content that is relevant to what they’re looking for or viewing.

OpenText's Web Experience Optimization products give you the capabilities to optimize each phase of your on-line marketing campaign lifecycle and provide customers with a more relevant Web experience.

OpenText Campaign Management helps deliver highly personalized content to individual recipients through online and offline touch points. From simple campaigns to more sophisticated marketing programs, it enables the easy design, execution and measurement of multipart, results-driven communications across a variety of channels.

OpenText Business Integration Studio is a graphical development environment for rapidly integrating business applications, processes and information that facilitates the integration of OpenText's Web content management, social media, and portal management applications with disparate applications and content repositories inside and outside the enterprise.

Today's post is about OpenText Web Experience Management.

OpenText Web Experience Management (formerly Vignette Content Management) is a solution for creating and managing content for enterprise Internet, extranet, or intranet applications.

Users can:

create new sites from site templates derived from their existing sites or launch a sample site with out-of-the-box content types, workflows, and presentation assets;
apply graphical themes, page and region layouts to pages, layouts or whole sites;
browse content in contextual, multi-dimensional workspaces by site, content type, folder, category or explorer views.

It offers user-friendly console, branded themes, and preferences that empower its content owners to easily create and manage web content while automatically adjusting to day-to-day authoring actions. This enables users to edit pages and content with no experience and non-intrusive toolbars, improve productivity with contenxtual views that present information users need when and where they need it, and publish in one click.

Powerful and easy to use site layout, theme, and content templating interface enables users to control how site content is presented and helps ensure consistent branding and communication to a variety of audiences, while reducing site development and maitenance cost.

Users can manage all content through intuitive and configurable role-based management console. The console includes ribbon menu and properties toolbar for commonly used items, content tracker, task inbox, and content search with saved queries. There are ergonomic controls for faster editing including language, time zone, filters, page and content settings.

Cotent items can be reused across multiple sites. For example, one article can be published on 100+ sites with a single management workflow.

Vanity URLs can be completely automated or manually defined to help increase site rankings in major search engines and support marketing campaigns, promotions, and messaging that can help increase the number of visitors to your site.

It integrates well with social and collaboration sites.

Users can create content using their favorite tools and web forms. Content from other repositories can be dynamically integrated or migrated for full cycle workflow and publishing management. Images, podcasts, Adobe Flash files and video metadata management allows editors to streamline approval, metadata tagging, and publishing of these assets.

Support for roles allows organizations to customize access to content creators, approvers, developers, and other users. This allows individulas to participate in selected processes automatically while standardizing and enforcing business practices that are exposed to users through delegated administration.

There are good workflows and content types modeling and the best practices template in the sample site. The content type modeler provides an intuitive interface to create and modify content objects such as articles, products, news, etc. Content type evolution allows you to make common modifications.

Content could be in any data format including files, database records, XML documents, rich media assets such as images, videos, and podcasts. There are library services such as check-in and check-out, version control, rollback, content history, security, content classification, metadata indexing, and search. Content could be in any language.

There are tools to optimize the staging and delivery of managed content through web sites, portals, and other applications. The application streamlines the retrieval of content items according to their multi-faceted taxonomies and then transforms the items to suit the intended delivery context, application or device.

User can publish content through automated workflows that deliver content to multiple delivery applications (web servers, databases, application servers). The publishing engine manages content dependencies, so content retains its context throughout content lifecycle.

Search capabilities allow parametric search across content, content attributes, and metadata, both within the management console and for the site search features as well as framework for 3rd party search, enabling high degrees of search accuracy.

There is an ability to manage user access centrally including delegated administration based on LDAP standards.

I will continue describing other products of web content management component of the OpenText ECM Suite in my next posts. Follow me and stay tuned!

Wednesday, July 25, 2012

Automatic Classification

In my previous posts, I mentioned that the taxonomy is necessary to create navigation to content. If users know what they are looking for, they are going to search. If they don't know what they are looking for, they will look for ways to navigate to content, in other words, browse through content. Taxonomies can also be used as a method of filtering search results so that results are restricted to a selected node on the hierarchy.

Once documents have been classified, users can browse the document collection, using an expanding tree-view to represent the taxonomy structure.

When there are many documents involved, creating taxonomy could be time consuming. There are few tools on the market that provide automatic classification. Another use of the automatic classification is to automatically tag content with controlled metadata (also known as Automatic Metadata Tagging) to increase the quality of the search results.

The tools that provide automatic classification are: Autonomy, ClearForest, Documentum, Interwoven, Inxight, Moxomine, Open Text, Oracle, SmartLogic.

These tools can classify any type of text documents. Classification is either performed on a document repository or on a stream of incoming documents.

Here is how this software works. Example: "International Business Machines today announced that it would acquire Widget, Inc. A spokesperson for IBM said: "Big Blue will move quickly to ensure a speedy transition".

The software classifies concepts rather than words. Words are first stemmed, that is they are reduced to their root form. Next, stop words are being eliminated. These include words such as a, an, in, the - words that add little semantic information. Then, words with similar meanings are equated using thesaurus. For example, the words IBM, International Business Machines, and Big Blue are treated as equivalent.

Next, the software will use statistical or language processing techniques to identify noun phrases or concepts such as "red bicycle". Further, using thesaurus, these phrases are reduced to distinct concepts that will be associated with the document. In this example, there are 3 instances of IBM, 2 instances of acquisition (acquire, speedy transition), and 1 instance of Widget, Inc.

Approaches to Classification

Manual - requires individuals to assign each document to one or more categories. It can achieve a high degree of accuracy. However, it is labor intensive and therefore are more costly than automatic classification in the long run.

Rule-based - keywords or Boolean expressions are used to categorize a document. This is typically used when a few words can adequately describe a category. For example, if a collection of medical papers is to be classified according to a disease together with its scientific, common, and alternative names can be used to define the keywords for each category.

Supervised Learning - most approaches to automatic classification require a human expert to initiate a learning process by manually classifying or assigning a number of "training documents" to each category. This classification system first analyzes the statistical occurrences of each concept in the example documents and then constructs a model or "classifier for each category that is used to classify subsequent documents automatically. The system refines its model, in a sense "learning" the categories as documents are processed.

Unsupervised Learning - these systems identify both groups or clusters of related documents as well as the relationship between these clusters. Commonly referred as clustering, this approach eliminates the need for training sets because it does not require a preexisting taxonomy or category structure. However, clustering algorithms are not always good at selecting categories that are intuitive to users. On the other hand, clustering will often expose useful relationships and themes implicit in the collection that might be missed by a manual process. For this reasons, clustering generally works hand-in-hand with supervised learning techniques.

Each of approaches is optimal for a different situation. As a result, classification vendors are moving to support multiple methods.

Most real world implementations combine search, classification, and other techniques such as identifying similar documents to provide a complete information retrieval solution. Organizations having document repositories will generally benefit from a customized taxonomy.

Once documents are clustered, an administrator can first rearrange, expand or collapse the auto-suggested clusters or categories, and then give them intuitive names. The documents in the cluster serve as initial training sets for supervised-learning algorithms that will be used subsequently to refine the categories. The end result is a taxonomy and a set of topic models are fully customized for an organization's needs.

Building an extensive custom taxonomy can be a large expense. However, automated classification tools can reduce the taxonomy development and maintenance cost.

Organizations with document collections that span complex areas such as medicine, biotechnology, aerospace will have a large taxonomy. However, there are ways to refine taxonomy so it does not become an overwhelming task.

Together, enterprise search and classification provide an initial response to information overload.

Thursday, July 19, 2012

Content Management Systems Reviews - Joomla

Joomla is a free and open source content management framework (CMF) for publishing content on the World Wide Web and intranets. It includes features such as page caching, RSS feeds, printable versions of pages, news flashes, blogs, polls, search, and support for language internationalization.

Over 9,200 free and commercial extensions are available from the official Joomla! Extension Directory, and more are available from other sources. It is estimated to be the second most used CMS on the Internet after WordPress. Joomla won the Packt Publishing Open Source Content Management System Award in 2006, 2007, and 2011.

You can think of a Joomla! website as bringing together three elements:

your content, which is mainly stored in a database;
your template, which controls the design and presentation of your content (such as fonts, colors and layout);
Joomla! which is the software that bring the content and the template together to produce webpages.

A Joomla template is a multifaceted Joomla extension which is responsible for the layout, design and structure of a Joomla powered web site. While the CMS itself manages the content, a template manages the look and feel of the content elements and the overall design of a Joomla driven web site.

The content and design of a Joomla template is separate and can be edited, changed and deleted separately. The template is where the design of the main layout for a Joomla site is set. This includes where users place different elements (components, modules, and plug-ins), which are responsible for the different types of content. If the template is designed to allow user customization, the user can change the content placement on the site, i.e. putting the main menu on the right or left side of the screen.

Template Components

Layout

The template is the where the design of the main layout is set for a Joomla site. This includes where users place different elements (components, modules, and plug-ins, which are responsible for different types of content.

Color Scheme

Using CSS within the template design, users can change the colors of the backgrounds, text, links or just about anything that they could using (X)HTML code.

Images and Effects

Users can also control the way images are displayed on the page and even create flash-like effects such as drop-down menus.

Fonts

The same applies to fonts. They are set within the template's CSS file(s) to create a uniform look across the entire site, which makes it easy to change the whole look just by altering one or two files rather than every single page.

Joomla! is composed of a Platform and extensions.

Joomla! extensions

Joomla! extensions help extend the Joomla web sites' ability. There are five types of extensions for Joomla: components, modules, plugins, templates, and languages. Each of these extensions handles a specific function.

Components: they are the largest and most complex extensions. They can be seen as mini-applications. Most components have two parts: a site part and an administrator part. Every time a Joomla page loads, one component is called to render the main page body. Components are the major portion of a page because a component is driven by a menu item and every menu item runs a component.

Plugins: they are more advanced extensions and are, in essence, event handlers. In the execution of any part of Joomla, a module or a component, an event can be triggered. When an event is triggered, plugins that are registered with the application to handle that event execute. For example, a plugin could be used to block user-submitted articles and filter out bad words.

Templates: this describes the main design of the Joomla web site and is the extension that allows users to change the look of the site. Users will see modules and components on a template. They are customizable and flexible. Templates determine the style of a website.

Modules: rendering pages flexibly in Joomla requires a module extension, which is then linked to Joomla components to display new content or new images. Joomla modules look like boxes – like the search or login module. However, they don’t require HTML to Joomla to work.

Languages: they are very simple extensions that can either be used as a core part or as an extension. Language and font information can also be used for PDF or PSD to Joomla conversions.

Joomla also has built-in extensions which include: component (Banner, Contacts, Joomla! Update, Messaging, Newsfeeds, Redirect, Search, Smart Search), Content, Menus, ect.

Tuesday, July 17, 2012

Search Applications - Exalead

Exalead provides search platforms and search-based applications. Search-based applications (SBA) are software applications in which a search engine platform is used as the core infrastructure for information access and reporting. SBAs use semantic technologies to aggregate, normalize and classify unstructured, semi-structured and/or structured content across multiple repositories, and employ natural language technologies for accessing the aggregated information.

Exalead has a platform that uses advanced semantic technologies to bring structure, meaning, and accessibility to previously unused or under-utilized data in the disparate, heterogeneous enterprise information overload.

The system collects data from virtually any source, in any format, and transforms it into structured, pervasive, contextualized building blocks of business information that can be directly searched and queried, or used as the foundation for a new breed of lean, innovative information access applications.

Exalead products include the CloudView platform and the ii Solutions Suite of packaged SBAs, all built on the same powerful CloudView platform.

Exalead CloudView

Exalead CloudView enables organizations to meet demands for real time, in-context, accurately delivered information, accessed from diverse web and enterprise big data sources, yet delivered faster and with cost less than with traditional application architectures. This platform is used for both online and enterprise SBAs as well as enterprise search.

Available for on-premises or cloud delivery, Exalead CloudView is the infrastructure that powers all Exalead solutions, including Exalead’s public web search engine, the company’s custom SBAs, and the Exalead ii Solution Suite of packaged, vertical SBAs.

Exalead ii Solutions Suite

Exalead ii ("information intelligence”) applications are packaged, workflow specific SBAs that transform large volumes of heterogeneous, multi-source data into meaningful, real time information intelligence, and deliver that information intelligence in context to users to improve business processes.

On the data side, the Exalead information infrastructure uses semantic technologies to non-intrusively aggregate, align and enhance multi-source data to create a powerful base of actionable knowledge (i.e., information intelligence).

Exalead Advanced Search options appear as a drop-down menu below the search form, where users select the search criteria that will be entered directly into the search form. A different set of advanced search options is available for each search type.

Registered users can select the "Bookmark" option below for any search results to add these results to a list of saved sites, accessible on the Exalead homepage as a collection of image thumbnails.

Strengths:

truncation, proximity, and many other advanced operators not available from other search engines;
includes thumbnails of pages;
provides excellent narrowing options on right side.

Exalead appears to support Boolean operators and nested searching with the operators AND, OR, and NOT. Either AND, OR or NOT can be used. Searching can be nested using parentheses. Operators must be in upper case. Exalead can also use "for NOT" but only when it is not used along with the Boolean operators. In the Advanced Search, it also has drop-down choices for "containing," "not containing," and "preferably containing." There is also an OPT operator which means that the word following it is an optional word.

Phrase searching is available by using "double quotes" around a phrase. It also supports a NEXT operator for ordered proximity of one word (in other words, the same thing as a phrase search.) So "double quotes" should get the same results as double NEXT quotes. Exalead also supports the NEAR operator for 16 word proximity. You can change it to NEAR/5 (or any other number) to specify a different proximity value.

Exalead's Advanced Search also offer some unusual types of special searches:

phonetic spelling with the sounds like: operator;
approximate spelling with the spells like: operator;
regular expression using regex syntax.

On a search with two or more words, stemming is automatic. Exalead also supports truncation using an asterisk * symbol. Stemming is also controlled on the preferences page. Exalead has no case sensitive searching. Using either lower or upper or mixed case will result in the same results. Exalead supports a title search.

Exalead has limits for language, country, file type, site, and date available on the Advanced Search page. The file type limit includes text, PDF, Word, Excel, Powerpoint, Rich Text Format, Corel WordPerfect, and Shockwave Macromedia Flash. You can place the file type search command into the search box. The Advanced Search page offers a site limit, which can be used to limit results to those from the specified domain. The language limit is available in the Advanced Search.

Some common words such as 'a,' 'the' and 'in' are ignored, but they can be searched with a + in front. Within a phrase search, all words are searched.

Results are sorted by a relevance algorithm. Pages are also clustered by site. Only one page per site will be displayed. Others are available via the yellow folder and domain name. The Advanced Search page used to include two date sort options, but those disappeared with the new interface in October 2006. They are still available via the field prefix of sort. Two options are available: sort:new and sort:old.

Saturday, July 14, 2012

Methods and Techniques for Information Architecture Design

Yesterday, I described information architecture design patterns for web sites and best practices for this design. Today, I am going to describe methods and techniques for information architecture design.

There are a few different approaches commonly used for information architecture design.

Card Sorting

Card sorting is a low cost, simple way to figure out how best to group and organize your content based on user input. Card sorting works by writing each content set or page on an index card, and then letting users sort them into groups based on how they think the content should be categorized.

There are several types of card sorting methodologies. The basic method starts out with cards in random order and users sort them in the way they think they should be grouped. In reverse card sorting, the cards are pre-sorted into groups, and users are then given the task of rearranging them as they see fit. Open card sorting lets users name the groups they’ve created for the cards, while closed card sorting will have group names in which the participant places the cards into.

Various methods can be used to analyze the data. The purpose of the analysis is to extract patterns from the population of test subjects, so that a common set of categories and relationships emerges. This common set is then incorporated into the design of the site, either for navigation or for other purposes.

There are a number of tools available to perform card sorting activities with survey participants via the internet. The perceived advantage of remote card sorting is that it allows a larger group of participants to be reached at a lower cost. The software can also assist in the process of analyzing card sort results. The advantages of a remote card sort must be traded off against the lack of personal interaction between card sort participants and the card sort administrator, which may produce valuable insights.

Wireframes and Prototypes

Basic wireframes can do a lot more than just give an outline of the design layout of a site. It also informs us how content will be arranged, at least on a basic level. Putting content into wireframes and prototypes gives us a good sense of how the content is arranged in relation to other content and how well our information architecture achieves our goals.

When you are wireframing, and especially when you are prototyping, you should be working with content that at least resembles what the final content of the site will be.

Site Maps and Outlines

Site maps are quick and easy ways to visually denote how different pages and content relate to one another. It is an imperative step that "mocks up" how content will be arranged.

These content outlines show how all the pages on your site are grouped, what order they appear in, and the relationships between parent and child pages. This is often a simple document to prepare, and may be created after a round or two of card sorting.

For existing sites or content that must be placed in a web site, a content inventory is usually the prelude to this phase.

Information Architecture Design Styles

There are two basic styles of information architecture: top-down and bottom-up. The thing that many designers must realize is that it is useful to look at a site from both angles to devise the most effective IA. Rather than just looking at your projects from a top-down or bottom-up approach, look at it from both ends to see if there are any gaps in how things are organized.

Top-Down Architecture

Top-down architecture starts with a broad overview and understanding of the website’s strategy and goals, and creates a basic structure first. From there, content relationships are refined as the site architecture grows deeper, but it is all viewed from the overall high-level purpose of the site.

Bottom-Up Architecture

The bottom-up architecture model looks at the detailed relationships between content first. With this kind of architecture, you might start out with user personas and how those users will be going through the site. From there, you figure out how to tie it all together, rather than looking at how it all relates first.

Different websites require different types of information architecture. What works best will vary based on things like how often content is updated, how much content there is, and how visitors use the site.