SWC:Semantic Data Transformation
This article or section is in the process of an expansion or major restructuring.
You are welcome to assist in its construction by editing it.
Contents |
Project Goals
- Create a (set of) REST services enabling automated discovery of structured data from text under the assumption that the discovery service already knows the pattern structure is going to fill.
Notes on implementation
- Establish a flexible REST service architecture allowing easy extension with new discovery capabilities.
- Caching solutions to improve performance
Scenario 1: Recognizing an Address
Define a service which receives a Text, knows the PostalAddress object pattern and transforms this into a PostalAddress instance. The data transformation is not completely blind as the service is informed on the pattern that it has to discover.
A Schema.org postal address Schema.org object is briefly defined by the below properties (see PostalAddress for full definition):
- streetAddress:Text
- addressLocality:Text
- addressRegion:Text
- postalCode:Text
- postOfficeBoxNumber:Text
- addressCountry:Country
For example, such a service will take (the encoded ) version of the below text
Roots Restaurant at The Orchard Garden Hotel<br> 466 Bush Street (at Grant Ave)<br> San Francisco, California 94108<br> Phone: 415.659.0349 <br> <a href="mailto:[email protected]">Email Us by clicking here</a>
and obtain various structured serializations such as JSON-LD which would be used in an HTML solution like
<div itemscope itemtype="http://schema.org/PostalAddress"> <h1>Address:</h1> <span itemprop="name">Roots Restaurant at The Orchard Garden Hotel</span><br/> <span itemprop="streetAddress">466 Bush Street</span>(at Grant Ave)<br/> <span itemprop="addressLocality">San Francisco</span>, <span itemprop="addressRegion">California</span>, <span itemprop="postalCode">94108</span><br/> Phone: <span itemprop="telephone">415.659.0349</span><br/> <a itemprop="email" href="mailto:[email protected]">Email Us by clicking here</a> </div>
therefore it has to understand/discover "San Francisco"
as a city (addressLocality
), "466 Bush Street"
as a street address (streetAddress
) and so on.
Request and Response
A typical request will be like:
GET /postaladdress/?q=Roots%20Restaurant%20at%20The%20Orchard%20Garden%20Hotel%3Cbr%3E%0A %20466%20Bush%20Street%20(at%20Grant%20Ave)%3Cbr%3E%0A%20San%20Francisco%2C%20California%2094108%3Cbr%3E%0A%20Phone%3A%20415.659.0349%20%3Cbr%3E%0A%20%3Ca%20href%3D%22mailto%3Aevents%40theorchardgardenhotel.com%22%3EEmail%20Us%20by%20clicking%20here%3C%2Fa%3E
with q:text
as parameter taking an URL encoded text as value[1]. and a typical JSON answer would be:
{ "name":"Roots Restaurant at The Orchard Garden Hotel", "streetAddress":"466 Bush Street", "addressLocality":"San Francisco", "addressRegion":"California", "postalCode":"94108", "telephone":"415.659.0349", "addressCountry":{"name":"US"} }
If the service fails to recognize some property then it should not include it in the response. For information on the model of responses see Mapping Schema.org vocabulary to JSON Objects.
Scenario 2: Recognizing Places
Place is somehow a concept related to a PostalAddress therefore the discovery treatment may follow the same procedures.
Scenario 3: Recognizing Organization
See Organization.
Scenario 4: Recognizing an Event
See Event.
Scenario 5: Recognizing an Offer
See Offer.
References
- ? Simple URL Decoder/Encoder Service, http://meyerweb.com/eric/tools/dencoder/