getSchema’s Microdata extractor is a REST web service to extract RDF [1] data from Microdata [2] annotations and provide the semantic information as N-Triples [5] , N3 [3] and JSON [4]. This service conforms with the Microdata2RDF specification [11] at W3C, but the generation algorithm may be different from the one proposed by the specification.
This service is powered by node.js [9] and is using the jsdom [10] library.
Use our test form to try the service: http://getschema.org/microdataextractor-test
Get some test examples from: http://getschema.org/microdata2rdf/examples/
The service endpoint is http://getschema.org/microdataextractor
The following parameters are required:
rdf
, n3
and json
. Any other value is treated as invalid and the service will return an error.When missing any of the parameters the service will return an error.
The service allows only GET requests. Any other request type will return an error.
Test the service using this url: http://getschema.org/microdataextractor?url=http%3A%2F%2Fgetschema.org%2Fmicrodata2rdf%2Fexamples%2Fexample.html&out;=rdf
Requests sent to the API endpoint must be HTTP GET requests, with all arguments sent as query parameters.
All arguments must be url-encoded (as per RFC 3986, [7])
rdf
, n3
and json
. Any other value is treated as invalid and the service will return an error.Consider the following HTML example and find below various possible service responses.
(See http://getschema.org/microdata2rdf/examples/example.html)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
|
<!DOCTYPE HTML> < html > < head > < title >Untitled</ title > </ head > < body > < h3 itemprop = "name" >Star Wars Episode IV: A New Hope</ h3 > < div itemprop = "description" > Set < em >a long time ago in a galaxy far, far away</ em >, the film follows a group of freedom fighters known as the Rebel Alliance as they plot to destroy the powerful Death Star space station, a devastating weapon created by the evil Galactic Empire... </ div > < div > < strong >Directed by: </ strong > < span itemprop = "director" > < span itemid = "http://en.wikipedia.org/wiki/George_Lucas" itemscope itemtype = "http://schema.org/Person" > < span itemprop = "name" >George Lucas</ span > </ span > </ span >; < strong >Produced by: </ strong > < span itemprop = "producer" > Gary Kurtz </ span > </ span >; < strong >Music by: </ strong > < span itemprop = "musicBy" > John Williams </ span > </ span >; < strong >Starring: </ strong > < span itemprop = "actors" > Mark Hamill </ span > </ span >, < span itemprop = "actors" > Harrison Ford </ span > </ span >, < span itemprop = "actors" > < span itemprop = "name" >Carrie Fisher</ span > </ span > </ span >, < span itemprop = "actors" > Peter Cushing </ span > </ span >, < span itemprop = "actors" > Alec Guinness </ span > </ span > </ div > < div > < strong >Studio: </ strong > < span itemprop = "productionCompany" > Lucasfilm </ span > </ span >; < strong >Distributed by: </ strong > < span itemprop = "provider" > 20th Century Fox </ span > </ span >; < strong >Release date(s): </ strong > < meta itemprop = "datePublished" content = "1977-05-25" /> May 25, 1977; < strong >Duration: </ strong > < meta itemprop = "duration" content = "P2H1M" /> 121 minutes; < strong >Country: </ strong > < span itemprop = "contentLocation" > United States </ span > </ span >; < strong >Language: </ strong > < span itemprop = "inLanguage" >English</ span > </ div > </ div > </ body > </ html > |
The out
parameter will be changed according to the desired output format.
An N-Triples [5] response is sent when the out
parameter set to rdf (out=rdf
). The headers use the Content-type text/plain.
A N3 response is sent when the out
parameter set to n3 (out=n3
). The headers use the Content-type text/n3
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
|
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>. @prefix schema: <http://schema.org/>. rdf:type <http://schema.org/Movie>; schema:name "Star Wars Episode IV: A New Hope"; schema:description "Set a long time ago in a galaxy far, far away, the film follows a group of freedom fighters known as the Rebel Alliance as they plot to destroy the powerful Death Star space station, a devastating weapon created by the evil Galactic Empire..."; schema:director <http://en.wikipedia.org/wiki/George_Lucas>; schema:producer <_:gs0>; schema:musicBy <_:gs1>; schema:actors <_:gs2>; schema:actors <http://en.wikipedia.org/wiki/Harrison_Ford>; schema:actors <_:gs3>; schema:actors <_:gs4>; schema:actors <_:gs5>; schema:productionCompany <_:gs6>; schema:provider <_:gs7>; schema:datePublished "1977-05-25"; schema:duration "P2H1M"; schema:contentLocation <_:gs8>; schema:inLanguage "English". rdf:type <http://schema.org/Person>; schema:name "George Lucas". <_:gs0> rdf:type <http://schema.org/Person>; schema:name "Gary Kurtz". <_:gs1> rdf:type <http://schema.org/Person>; schema:name "John Williams". <_:gs2> rdf:type <http://schema.org/Person>; schema:name "Mark Hamill". rdf:type <http://schema.org/Person>; schema:name "Harrison Ford". <_:gs3> rdf:type <http://schema.org/Person>; schema:name "Carrie Fisher". <_:gs4> rdf:type <http://schema.org/Person>; schema:name "Peter Cushing". <_:gs5> rdf:type <http://schema.org/Person>; schema:name "Alec Guinness". <_:gs6> rdf:type <http://schema.org/Organization>; schema:name "Lucasfilm". <_:gs7> rdf:type <http://schema.org/Organization>; schema:name "20th Century Fox". <_:gs8> rdf:type <http://schema.org/Place>. |
A JSON response is sent when setting the out
parameter to json (out=json
). The response format follows Talis RDF-JSON [6]. It is a well formed JSON delivered using the Content-type application/json
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
|
{ { "type" : "uri" , } ], { "type" : "literal" , "value" : "Star Wars Episode IV: A New Hope" } ], { "type" : "literal" , "value" : "Set a long time ago in a galaxy far, far away, the film follows a group of freedom fighters known as the Rebel Alliance as they plot to destroy the powerful Death Star space station, a devastating weapon created by the evil Galactic Empire..." } ], { "type" : "uri" , } ], { "type" : "bnode" , "value" : "_:gs0" } ], { "type" : "bnode" , "value" : "_:gs1" } ], { "type" : "bnode" , "value" : "_:gs2" }, { "type" : "uri" , }, { "type" : "bnode" , "value" : "_:gs3" }, { "type" : "bnode" , "value" : "_:gs4" }, { "type" : "bnode" , "value" : "_:gs5" } ], { "type" : "bnode" , "value" : "_:gs6" } ], { "type" : "bnode" , "value" : "_:gs7" } ], { "type" : "literal" , "value" : "1977-05-25" } ], { "type" : "literal" , "value" : "P2H1M" } ], { "type" : "bnode" , "value" : "_:gs8" } ], { "type" : "literal" , "value" : "English" } ] }, { "type" : "uri" , } ], { "type" : "literal" , "value" : "George Lucas" } ] }, "_:gs0" : { { "type" : "uri" , } ], { "type" : "literal" , "value" : "Gary Kurtz" } ] }, "_:gs1" : { { "type" : "uri" , } ], { "type" : "literal" , "value" : "John Williams" } ] }, "_:gs2" : { { "type" : "uri" , } ], { "type" : "literal" , "value" : "Mark Hamill" } ] }, { "type" : "uri" , } ], { "type" : "literal" , "value" : "Harrison Ford" } ] }, "_:gs3" : { { "type" : "uri" , } ], { "type" : "literal" , "value" : "Carrie Fisher" } ] }, "_:gs4" : { { "type" : "uri" , } ], { "type" : "literal" , "value" : "Peter Cushing" } ] }, "_:gs5" : { { "type" : "uri" , } ], { "type" : "literal" , "value" : "Alec Guinness" } ] }, "_:gs6" : { { "type" : "uri" , } ], { "type" : "literal" , "value" : "Lucasfilm" } ] }, "_:gs7" : { { "type" : "uri" , } ], { "type" : "literal" , "value" : "20th Century Fox" } ] }, "_:gs8" : { { "type" : "uri" , } ] } } |
All service errors are delivered using JSON format. The following kinds of errors may occur:
Iframes are not loaded.
Scripts are loaded when the script
element is annotated with an itemtype
attribute with the value http://schema.org/WebPageElement/Script
.
There might be other limitations regarding the triple extraction such as duplicates since we are still in beta.
While itemid
is supported, itemref
is not.
The property schema:additionalType
is not processed and multiple item types for the same itemscope are not yet supported too.
RuleTheWeb! – A Firefox Extension consuming Schema.org Annotations
This service is offered free of charge by http://binarypark.org
You must follow any policies made available to you within the Services.
We believe you will not misuse this service, rather may find it helpful. However, just in case:
Using this service does not give you ownership of any intellectual property rights related to the service or the content
you access. You may not use content from our Services unless you obtain permission from its owner or are otherwise permitted
by law. These terms do not grant you the right to use any branding or logos used in this service. Don’t remove, obscure, or
alter any legal notices displayed in or along with the service.
This service provides content that is not owned by the service provider. This content is the sole responsibility of the entity that makes it available.
The terms of use can change at any time and is not the provider responsibility to inform you.
More necessary information may be found at http://binarypark.org.
Would you be interested to learn more or to contribute to this service, please contact us at mtg(at)binarypark.org.
[1] Resource Description Framework (RDF), http://www.w3.org/RDF/
[2] HTML5 Microdata, http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html
[3] Notation3 (N3): A readable RDF syntax, http://www.w3.org/TeamSubmission/n3/
[4] JavaScript Object Notation, http://json.org/
[5] RDF N-Triples Syntax, http://www.w3.org/TR/rdf-testcases/#ntriples but also http://www.w3.org/2011/rdf-wg/wiki/N-Triples-Format
[6] RDF-JSON Specification, http://docs.api.talis.com/platform-api/output-types/rdf-json
[7] Uniform Resource Identifier (URI): Generic Syntax (RFC3986), http://www.ietf.org/rfc/rfc3986.txt
[8] Web Application Description Language (WADL), http://www.w3.org/Submission/wadl/
[9] node.js http://nodejs.org
[10] jsdom – A JavaScript implementation of the DOM, for use with node.js, https://github.com/tmpvar/jsdom
[11] Microdata to RDF: Transformation from HTML+Microdata to RDF, W3C Interest Group Note 08 March 2012, http://www.w3.org/TR/microdata-rdf/