Taming the Wild Web - An Introduction to the Semantic Web
Dr. Jeff Heflin, Computer Science and Engineering
Lehigh University
- More than 21% of humanity has used the web
- Web can be used for more than you think
- By tracking flu-related queries, Google can determine flue outbreaks faster than reported by the CDC.
- What if we had richer data?
Definition
- The semantic web is not a separate web, but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation (Berners-Lee et al., 2001).
Ontology
- a key component of the Semantic Web
- ontologies define the semantics of the terms used in semi-structured web pages
- identify context, provide shared definitions
- has a formal syntax and unambiguous semantics
- usually includes a taxonomy but typically much more
- interface algorithms can compute what logically follows
W3C Recommendations
- RDF(S) (1999, revised 2004)
- directed graphs labeled with URIs
- XML serialization syntax
- OWL (2004)
- extends RDF with more semantic primitives
- based on description logics (DLs)
- has a model theoretic semantics
<owl:Class rdf:ID="Band">
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource="#hasMember" />
<owl:allValuesFrom red:resource="#Musician" />
</owl:Restriction>
</rdfs:subClassOf>
</owl:Class>
A band is a subset of the groups which only have Musicians as members
URI (Uniform Resource Identifier)
- Includes URLs
- also anything that you can design an identification scheme for
- helps to prevent collision of names
- all the "symbols" in RDF are either URIs or Literals
Namespace
- a mechanism for abbreviating
Description Logic
- Form of knowledge representation
- Useful for formally defining classes
- Studied extensively in the 1990s
- mature reasoning software
- e.g., FaCT, RACER, Pellet
- benefits
- optimized computation of subsumption
- calculate implicit subClassOf relations
- ontology integration
Level of Adoption?
- Open source Semantic Web tools
- Commercial software vendors (Oracle, Adobe)
- ~65 million Semantic Web documents (as of October 2009)
- Yahoo SearchMonkey uses REF to present richer search results
- Google now indexes RDFa
- Semantic Web Enabled Sites
- BBC Music
- Harper's Magazine
- DBPedia
- LiveJournal uses FOAF
Linking Open Data Project
An Application: Hawkeye
- Requested 1.7 million real Semantic Web documents identified by Swoogle (swoogle.umbc.edu)
- Loaded 760,000 documents, 16.280 ontologies and 166 million triples
- conversion of Citeseer, DBLP, NSF award data and various e-Gov sources
- Developed ontologies to map different schemas
- Developed sources that equate individuals from different sources
- Use OWL as mapping language
- Mapping Ontologies
- Individual Equivalence Statements
Conclusion
- Web is a powerful tool
- Semantic Web is approaching critical mass
- We have demonstrated the feasibility of large-scale integration using OWL
- Integration can emerge via social web processes
Future work
- User-friendly interface
- improved performance
- support more complex reasoning
- support for updates