A dataset is a file, or group of files, containing JSON documents, one per line.
Note |
---|
The content of each line of the file is a JSON document but, the file itself is a plain text file read line by line and is NOT a JSON array. |
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
{"id":"A0001","title":"This is a title","content":"Some text."} {"id":"A0002","title":"This is a title","content":"Some text, more text after a comma.","non-import-text":"This will not be process"} {"id":"A0003","content":"Some text.\nMore text in a new line."} {"id":"NON-IMPORTANT-ID","content":"Only the field configured on the metadata file will be used"} {"id":"AAAA","content":"Some text for a dummy ID"} {"id":"AAAA","content":"JSON documents"} {"id":"AAAA","content":"JSON documents"} {"id":"AAAA","content":"^^^^^^^^^^^^^^^JSON documents do not need to be unique"} |
A dataset must have a ".metadata" file used to define information needed to process the dataset.
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
{ "processFields" : ["title","content"], "splitRegex" : "\n" } |
Creating a new dataset is a straight forward process that only needs you to keep already explained format. This These are some recommendations and standards that would be nice to follow.
Panel | |
---|---|
In this page:
|