Note: The scraping schema is optional.
If left empty, the system will automatically analyze the paper listing page and infer the appropriate schema.
If you want to provide a custom schema, it should be a JSON object with the following structure:
{
"name": "Schema Description",
"baseSelector": ".container-class",
"fields": [
{
"name": "title",
"type": "text",
"selector": ".title-class"
},
{
"name": "authors",
"type": "list",
"selector": ".author-link",
"fields": [
{
"name": "author_name",
"type": "text"
}
]
},
{
"name": "paper_url",
"type": "attribute",
"selector": "a.paper-link",
"attribute": "href"
}
]
}
name
- Description of the schema
baseSelector
- CSS selector for the container of each paper item
fields
- Array of field definitions:
- type: "text" - Extract text content from element
- type: "list" - Extract multiple items (like authors)
- type: "attribute" - Extract an attribute value (requires
attribute field)