Learn more about the data model we use to represent pages in SamePage!
The data model we use is a variant on AtJson, a content format used by the media publisher Condé Nast. The data model is a JSON
object with a specific schema to adhere to, the definition of which could be found here. We'll break down each field and sub-field in the Schema
.
All the bare content that a page uses is stored as a single string in the top level content
field. By bare, we intuitively mean what are the characters that the user sees after the host application has already rendered the data to the screen.
For example, let's say our application is a standard Markdown editor and the user types in the following data:
Hello **World**
The text that a user would see on the screen would be "Hello World". So our content value here is going "Hello World"
.
This is the core idea of our data schema and AtJson
. Our schema says that all page data is composed of:
Each annotation have the following fields:
start
– the index within content
where this annotation starts.end
– the index within content
where this annotation ends.type
– the type of annotation it is. We go through all of the different annotation types below.attributes
– the set of properties associated with this annotation. These are universal cross-app properties so they are expected to be implemented by each app's extension. The schema of attributes
is predefined depending on the type
of the annotation.appAttributes
– this field could be used by extensions to store attributes about a given annotation that it's app cares about that no other app does. It's an object that maps the app's identifier with a key value object.Let's use our example to go through each one.
Remember, our markdown is Hello **World**
, which resolves to a content
field of Hello World
.
Since our bolding surrounds the word World
, we want our start
and end
values to use the indices of the content
field. This indices are zero-indexed:
{ "start": 6, "end": 11}
One important rule to keep in mind is that annotations are not allowed to be zero length. Meaning the start
value cannot equal the end
value. The reason for this has to do with ambiguity: if you have one annotation at 2,2 and another at 2,3, it's unclear whether the first annotation would be inside or before the second one. Also conceptually speaking, if an annotation is length zero, then it shouldn't be on the page.
The second important rule to note is that order matters. If two consecutive annotations have the same position, than the earlier one always surrounds the later one.
Next, our type
field should mark what kind of annotation is. It must be a valid value of the list below:
Continuing our example with the bold
type:
{ "start": 6, "end": 11, "type": "bold", "attributes": { "delimiter": "**" }}
Here are each of the types and the attributes
they support.
A block
refers to a unit of content that can be organized hierarchically and contains various types of information, such as text, images, or media. It serves as a fundamental building block for structuring and managing notes efficiently.
Supported attributes
:
"level": number,"viewType": "bullet" | "numbered" | "document"
The bold
type typically refers to adding font weight to the content it's annotating.
Supported attributes
:
"open": boolean,"delimeter": string
The italics
type is used to emphasize or give emphasis to certain words or phrases in a document.
Supported attributes
:
"open": boolean,"delimeter": string
The strikethrough
type is used to indicate that a certain portion of text should be considered as deleted or no longer valid, while still keeping it visible for reference.
Supported attributes
:
"open": boolean,"delimeter": string
The highlighting
type refers to the process of marking or colorizing specific portions of text to draw attention to them.
Supported attributes
:
"open": boolean,"delimeter": string
The inline
type typically refers to a formatting style where specific elements, such as keywords or variables are embedded within the surrounding text, typically distinguished by different typography or highlighting.
Supported attributes
:
"open": boolean,"delimeter": string
The code
type refers to the practice of highlighting specific text within backticks (`
) or other formatting to indicate that it represents code or commands.
Supported attributes
:
"language": string,"ticks": number
A link
type is usually a reference or connection between the current document and an external resource.
Supported attributes
:
"href": string
An image
is a visual representation or graphic element that is embedded within a document.
Supported attributes
:
"src": string
The custom
annotation is an escape hatch for extensions to implement whichever other annotation would be useful for their app that isn't universally supported. This combined with the appAttributes
field below should be ignored by extensions as updates are made.
Supported attributes
:
"name": string
name
– The name assigned to this custom annotation, so that apps could handle accordingly.
The metadata
type refers to descriptive information that provides additional context, attributes, or properties about a particular document, file, or piece of content.
Supported attributes
:
"title": string,"parent": string
A reference
type is a SamePage connection that points internally to an app on our network.
Supported attributes
:
"notebookPageId": string,"notebookUuid": string
The appAttributes
field is an excape hatch for applications to enter data that only pertains to its application and no other ones on the network. Using our previous example, let's say our app has a special bolding character where Hello &&World&&
was also bolded the word World
. For full data integrity, we need to know when we see the bold
attribute, whether to deserialize as a pair of asterisks (**
) or a pair of ampersands (&&
). We could use appAttributes
to denote this:
{ "start": 6, "end": 11, "type": "bold", "appAttributes": { "specialapp": { "kind": "&" } }}
This is the last field used in the data schema, to help identify what version of the schema the given page is using. The following is the latest supported value at the moment:
`application/vnd.atjson+samepage; version=2022-12-05`;
The version field at the end is subject to change as we iterate on the schema. This makes it possible for extensions to detect data and migrate accordingly if it detects older schemas.
Putting our Hello &&World&&
example together, it would have the following final data representation:
{ "content": "Hello World", "annotations": [ { "start": 6, "end": 11, "type": "bold", "appAttributes": { "specialapp": { "kind": "&" } } } ], "contentType": "application/vnd.atjson+samepage; version=2022-12-05"}
There are few important types to become familiar with:
SamePageSchema
– An intemediary representation used by extensions to actually calculate and apply the related data. This type only contains the content
and annotations
fields.LatestSchema
– This is the latest version of the schema that is actually stored in IPFS. The data is wrapped by Automerge utilities to assist in conflict resolution and history management.V*Schema
– Previous versions of LatestSchema
that can be found stored in IPFS.Schema
– Conjuction of LatestSchema
and all V*Schema
s. This data type represents all of the possibilities stored in IPFS - the unwrapSchema
utility helps convert this data into the SamePageSchema
intermediate data type.