SWC:Semantic Data Transformation

From getSchema

Jump to: navigation, search

This article or section is in the process of an expansion or major restructuring.

You are welcome to assist in its construction by editing it.


Project Goals

  • Create a (set of) REST services enabling automated discovery of structured data from text under the assumption that the discovery service already knows the pattern structure is going to fill.

Notes on implementation

  • Establish a flexible REST service architecture allowing easy extension with new discovery capabilities.
  • Caching solutions to improve performance

Scenario 1: Recognizing an Address

Define a service which receives a Text, knows the PostalAddress object pattern and transforms this into a PostalAddress instance. The data transformation is not completely blind as the service is informed on the pattern that it has to discover.

A Schema.org postal address Schema.org object is briefly defined by the below properties (see PostalAddress for full definition):

For example, such a service will take (the encoded ) version of the below text

Roots Restaurant at The Orchard Garden Hotel<br>  466 Bush Street (at Grant Ave)<br>  San Francisco, California 94108<br>  Phone: 415.659.0349 <br>  <a href="mailto:[email protected]">Email Us by clicking here</a>

and obtain various structured serializations such as JSON-LD which would be used in an HTML solution like

<div itemscope itemtype="http://schema.org/PostalAddress">  <h1>Address:</h1>  <span itemprop="name">Roots Restaurant at The Orchard Garden Hotel</span><br/>  <span itemprop="streetAddress">466 Bush Street</span>(at Grant Ave)<br/>  <span itemprop="addressLocality">San Francisco</span>,    <span itemprop="addressRegion">California</span>, <span itemprop="postalCode">94108</span><br/>  Phone: <span itemprop="telephone">415.659.0349</span><br/>  <a itemprop="email" href="mailto:[email protected]">Email Us by clicking here</a> </div>

therefore it has to understand/discover "San Francisco" as a city (addressLocality), "466 Bush Street" as a street address (streetAddress) and so on.

Hint: To perform such discovery the service can invoke any useful open service on the Web..

Hint: An interesting service is http://www.geonames.org/ but it seems its data is available only for US. Still the ideas behind geonames can be used in a generic case..

Hint: To investigate http://www.gisgraphy.com/ too..

Request and Response

A typical request will be like:

GET /postaladdress/?q=Roots%20Restaurant%20at%20The%20Orchard%20Garden%20Hotel%3Cbr%3E%0A %20466%20Bush%20Street%20(at%20Grant%20Ave)%3Cbr%3E%0A%20San%20Francisco%2C%20California%2094108%3Cbr%3E%0A%20Phone%3A%20415.659.0349%20%3Cbr%3E%0A%20%3Ca%20href%3D%22mailto%3Aevents%40theorchardgardenhotel.com%22%3EEmail%20Us%20by%20clicking%20here%3C%2Fa%3E 

with q:text as parameter taking an URL encoded text as value[1]. and a typical JSON answer would be:

{  "name":"Roots Restaurant at The Orchard Garden Hotel",  "streetAddress":"466 Bush Street",  "addressLocality":"San Francisco",    "addressRegion":"California",   "postalCode":"94108",  "telephone":"415.659.0349",  "addressCountry":{"name":"US"} }

If the service fails to recognize some property then it should not include it in the response. For information on the model of responses see Mapping Schema.org vocabulary to JSON Objects.

Scenario 2: Recognizing Places

Place is somehow a concept related to a PostalAddress therefore the discovery treatment may follow the same procedures.

Scenario 3: Recognizing Organization

See Organization.

Scenario 4: Recognizing an Event

See Event.

Scenario 5: Recognizing an Offer

See Offer.


  1. ? Simple URL Decoder/Encoder Service, http://meyerweb.com/eric/tools/dencoder/

Personal tools