Kamis, 05 Januari 2017

Paper Pertemuan 10 Arsitektur

Semantic Web Architecture and Applications

Organizations will migrate to Semantic Web architecture and applications in the next 1-3 years; copying the 1981 – 1984 user-led migration from centralized mainframe-terminal systems with rigid applications; to distributed Intel/Microsoft/PC architecture and flexible Visi-calc applications.
First Generation-Keyword
Keyword technologies were originally used in IBM’s free text retrieval system in the late 1960’s. These tools are based on a simple scan of a text document to find a key word or root stem of a key word. This approach can find key words in a document, and can list and rank documents containing key words. But, these tools have no ability to extract the meaning from the word or root stem, and no ability to understand the meaning of the sentence.
Advance Search
Most keyword systems now include some form of Boolean logic “AND” , “OR” functions to narrow searches. This is often called “advanced search”. But, using Boolean logic to exclude documents from a search is not “advanced” . It is an arbitrary and random means to reduce the size of the source database to reduce the number of documents retrieved
Applications
Keyword tools are appropriate for creating a word location or a list of documents which contain specific defined keywords and root stems
Problems:
  • false negatives (no matches found because the word or stem are not exactly identical: “big” and “large
  • false positives (too many unrelated matches found because a root stem finds many unrelated words: “process” and “processor”,
  • scale factors (keyword search tool produce very long random lists of documents if the source database is large, and the relevance rankings are highly misleading.
Examples:
The most common examples of key word tools are web site “Search” tools and the Microsoft “Find” function (control “f” key) in Microsoft Office applications


Second Generation – Statistical Forecasting
Statistical forecasting first finds keywords; and then calculates the frequency and distance of these keywords. Statistical forecasting tools now include many techniques for predictive forecasting, most often using inference theory. The frequency and distribution of words has some general value in understanding content
Applications
Statistical forecasting tools are appropriate for performing simple document searches where the desired output is a list of documents which contain specific words which must then be read and classified and summarized manually by end users.
Problems
  • keyword limitations of false positives and false negatives
  • misunderstanding the meaning of words and sentences (“man bites dog” is the same as “dog bites man”)
  • lack of context: “Duke” could be Duke of Windsor or Duke of Earl or John Wayne
  • scale factors: a single statistical relevance ranking creates huge “Google” lists of many irrelevant documents.(“you have 100,000 hits”).
Examples:
The most common statistical forecasting tool is “Google” and many other tools using inference theory and similar analysis and predictive algorithms.
Third Generation – Natural Language Processing
Natural language processors focus on the structure of language. These recognize that certain words in each sentence (nouns and verbs) play a different role (subject-verb-object) than others (adjectives, adverbs, articles). This understanding of grammar increases the understanding of key words and their relationships
Applications
Natural language tools are appropriate for linguistic research and word-for-word translation applications where the desired output is a linguistic definition or a translation. These are not capable of understanding the meaning or context of sentences in documents, or integrating information within a database.

Problems
  • keyword limitations of false positives and false negatives
  • misunderstanding the context (does “I like java” mean an island in Indonesia, a computer programming language or coffee?) Without understanding the broader context, a linguistic tool only has a dictionary definition of “Java” and does not know which “Java” is relevant or what other data related to a specific “Java” concept
Examples:
The most common natural language tools are translator programs which use dictionary look up tables to convert words and language-specific grammar to convert source to target languages
Fourth Generation – Semantic Web Architecture and applications..
  • Semantic Web architecture is the automated conversion and storage of unstructured text sources in a semantic web database.
  • Semantic Web applications automatically extract and process the concepts and context in the database in a range of highly flexible tools.
Architecture: Not only Application
the Semantic web is a complete database architecture, not only an application program. Semantic web architecture combines a two-step process.
  • a Semantic Web database is created from unstructured text documents
  • then Semantic Web applications run on the Semantic Web database; not the original source documents.
The Semantic Web architecture is created by first converting text files to XML and then analyzing these with a semantic processor. This process understands the meaning of the words and grammar of the sentence, and also the semantic relationships of the context
Semantic Web applications directly access the logical relationships in the Semantic Web database. Semantic web applications can efficiently and accurately search, retrieve, summarize, analyze and report discrete concepts or entire documents from huge databases
A search for “Java” links directly to the three Semantic Web logical clusters for “Java”: (island in Indonesia, a computer programming language, and coffee). The processor can now query the user for which “Java”, and then expand the search to all other concepts and documents related to the specific “Java”.

Structured and unstructured data
Second, Semantic Web architecture and applications handle both structured and unstructured data. Structured data is stored in relational databases with static classification systems, and also in discrete documents
Much of the data we read, produce and share is now unstructured; emails, reports, presentations, media content, web pages. And, these documents are stored in many different formats; text, email files, Microsoft word processor, spreadsheet, presentation files, Lotus Notes, Adobe.pdf, and HTML.
Dynamic and automatic ; not static and manual
Semantic Web database architecture is dynamic and automated. Each new document which is analyzed, extracted and stored in the Semantic Web expands the logical relationships in all earlier documents
Semantic Web architecture is different from relational database systems. Relational databases are manual and static because these are based on a manual process for maintaining a taxonomy, meta data tagging and document classification in static file structures.
From machine Readable to machine understandable
Semantic Web architecture and applications support both human and machine intelligence systems. Humans can use Semantic Web applications on a manual basis, and improve the efficiency of search, summary, analysis and reporting tasks.
Synthetic Vs Artificial intelligence;
Semantic Web technology is NOT “Artificial Intelligence”. AI was a mythical marketing goal to create “thinking” machines. The Semantic Web supports a much more limited and realistic goal. This is “Synthetic Intelligence”. The concepts and relationships stored in the Semantic Web database are “synthesized”, or brought together and integrated, to automatically create a new summary, analysis, report, email, alert; or launch another machine application.
Future of information management: Network Spread sheets for ideas
The future of information management will be based on Semantic Web architecture and applications. The most important issue is which technologies and firms take the immediate leadership to drive the migration, and therefore guide the information architecture of the future.
  1. Tidal Wave of information shifts power
    End users and corporations will drive the rapid expansion of Semantic Web architecture and applications to survive the tidal wave of data, and improve costs, speed and performance. IT management will resist or accelerate this trend.
  2. Migration to XML and RDF Standards
    Applications programs will follow Microsoft’s migration to XML standards for document authoring and exchange. XML and RDF standards will become the dominant approach for capturing, understanding, storing and exchanging external document descriptions and document content.
  3. Universal Internet Web Portals
    Information access will migrate to web portals within organizations and with the general population; and web portals based on Semantic Web applications will become the central user application.
  4. Parallel legacy Database Integration
    Legacy databases will be extracted into parallel Semantic Web architecture databases to provide access to fragmented sources. Parallel architecture dramatically reduces the costs, risks, and schedules from the ERP “tear down and rebuild” Transparent Grid Architecture.
  5. Global and Language Expansion
    Information sources, users and entities will expand globally and support many languages. Because Semantic Web architectures and applications “learn and think” in the original language, the production and exchange of multi-language information between language domains will increase dramatically.Interactive Japanese language sources on China in English.
  6. Network Access and Distribution
    Networks will get better, faster, cheaper, wireless and distributed. Semantic Web architecture and applications will expand to link global data sources from mainframe servers, desktop workstation and laptops, to hand held PDA and cell phones. Voice driven expert systems.
  7. Network Transactions and capacity
    Human transactions will grow slowly; and machine transactions will grow exponentially. The migration from man to machine intelligence transactions will rapidly take over the private and public networks. This rapid capacity demand will force a major increase in network hardware investment and stimulate new value added network services.
Nature of building specifications
A building specification is a central document in a building process. It, traditionally, sits between the design phase and the actual construction phase. A specification consists of both the specification drawings and the specification text.
Formal description
The formal specification is build-up from a list of specification items. Historically, one specification item often deals with something that has to be budgeted. For such an item, for instance the required end result, the required quality and the source material is described.
Conditions
Conditions (or regulations) give extra information on top of the plain technical data (like fire resistance = 30min). Conditions can be technical or administrative and standard or additional. The standard (technical or administrative) conditions are typically valid for every building project.
Specification Structure: classification
  • One common way of subdividing the textual specification is subdivision into parts called chapters
  • On the down side, much information gets scattered all over the place when there are specification items that impact more than one kind of work.
  • The structure is normally a classification (a subdivision in classes and subclasses, with the subdivision being done according to a specific view, to make it comfortable to be used by humans (Van Rees 2003)).
  • A second common way of subdividing is by following normal execution patterns. The reason for this is that detailed cost estimations (in the ground/water/road sector) are normally made that way. A good match between the cost estimation and the specifi­cation text is desired.
  • The specification classification is sometimes also used to structure other information. Links, made that way, are however on the chapter/section/subsection level, not really on the level of the actual specification units.
ferences to the specification drawings
Normally, the references to the accompanying specification drawings are not extensive. The “doors on the ground floor” are described. Also you can describe a set of doors, mentioning “placement according to drawing”.
Nature of the semantic web
The web allows us to access a vast hoard of information. You search in google almost before you ask a colleague for information, so the web is already firmly in place. The semantic web is a set of technologies that allows computer programs an equivalent richness of information.
Related research in the building industry
In order to be able to place the contents of this section in its proper perspective, we briefly show the work done in two recent EU-funded projects: eConstruct and e-cognos.
The goal of eConstruct (http://www.econstruct.org/) was to harness the possibilities of the Internet for the building industry, concentrating on the communication in the buying and selling phase. Conceptually, three things are needed for communication.
  • A taxonomy (sporting a specialisation hierarchy, property definitions and multi-linguality) was used as the vocabulary of terms
  • The grammar (data format) was bcxml, a custom xml format
  • The communication medium was the Internet, used to connect a few services (catalogue server, taxonomy server, etc.).
E-cognos (http://www.e-cognos.org/) started the moment eConstruct finished and took the development into the direction of knowledge management.
  • Multiple cooperating ontologies (footnote: e-cognos used the term ontology instead of taxonomy.
  • Data was exchanged in xml (partly re-using bcxml) and in rdf (combined with daml+oil), which is an xml format for ontologies and ontology-based data.
  • The Internet was, like in eConstruct, used to access the ontologies' information
Both projects achieved good results, allowing us to suggest the following as best practice.
  • Store definitions of terms, vocabularies, etc. in widely accessible ontologies. This way, the terminology used is made explicit.
  • Use xml, or the more specific rdf, for information exchange
  • Use the internet as the basic communication medium
Webify data
Webifying data means that every piece of useful data should have a URI. The success of the World Wide Web is entirely based on assigning a URI to every single webpage and image and enabling links between them (Prescod 2002).
for example, in this case doesn't mean having one human-readable page containing pictures and some text listing the available types.
URI's, standardised data formats and the standard http protocol are what make the internet work. As the building and construction industry is too fragmented, proprietary solutions will fail, so everything must have a URI; and XML and http are mandatory. (Van Rees et al. 2002).
Ontology language
Webifying data is the first necessary step to enabling the semantic web for the building and construction industry.
  • Classes and properties and their relations
  • Subtype hierarchy (both for classes and for properties)
  • Textual information (labels and descriptions, multilingual)
  • Re-using classes and properties from other ontologies, allowing you to build on previous work and to use more generic high-level ontologies as a common basis for two ontologies that need to exchange information.
Implementation notes: Zope/python
  • A web application server, providing a web server and a programmatic framework to drive it. A popular choice in the research community seems to be apache’s tomcat java web application server (http://jakarta.apache.org/tomcat/).
  • A semantic data store, providing a means to store and query RDF files. A popular choice is Hewlett-Packard’s jena (http://www.hpl.hp.com/semweb/jena.htm)

Development speed and ease-of-use
Python and Zope are attractive for web programming. Python (http://python.org/) is a high level (scripting) language which is regarded by most as both elegant and powerfull, suitable for programs both big and small. It is platform-independent (windows, unix, mac; recent versions of mac OSX even ship it as part of the operating system).
  • Built-in object database.
  • User management and flexible password protection
  • Through-the-web management interface. No need for changing files on the file system.
Reusable modules
Both Python and Zope have a big community that creates a lot of add-ons and modules that - most of them are open source - can be freely reused ("free" meaning both freedom to change and re-distribute and free of charge).
  • Rdflib (http://rdflib.net/).
  • A simple rdf store that parses, stores, queries and exports rdf files. To store and query big data sets you can use Zope’s object database that can handle big data sets efficiently
  • Plone (http://plone.org/).
  • An attractive (but changeable) user interface on top of Zope’s. With little effort a great result can be obtained (ideal for a time-strapped researcher). Recently, the possibility to generate web forms from UML diagrams added even more attractiveness to this solution.
Architecture
Basic property of the architecture is to cater for exchange of information between different sources of information, each with its own goal, its own methods, its own peculiarities.
Ontologies
Each information source has its own view of the information. Such a view can be formally and explicitly described in an ontology. An ontology is a formal way of describing the set of concepts used in a certain field, from a certain viewpoint.
This means that, to describe the field of specifications and the terms used therein, a specification ontology could capture the concepts used to create specifications (chapters, specification units, regulation references), but also the concepts that form the actual contents of the specification (masonry, double glazed windows).

A generic ontology could be made more specific by branch-specific or application-specific ontologies. Application ontologies add the concepts needed for cost estimation, for instance, or for fire safety calculations. Branch ontologies further specify and add concepts from their branch of the construction industry.
The emerging picture here is that of multiple ontologies that cooperate to a bigger or lesser degree
OWL (the web ontology language, building on RDF) has built-in support for cooperating ontologies
Information Sources
  • What is presented in this section and the next is just one way of looking at the information sources, but it serves to illustrate the point
  • When looking at a building project, you can distinguish four kinds of information in two dimensions

Example

As an example, let us take the project description, as made by the initiator of the project. When taking the semantic web route, this should be available over the Internet and its concepts should be described in an ontology. It includes links to the textual specification, for instance.

















Tidak ada komentar:

Posting Komentar