AwareNet Sync
From SiyaWiki
awareNet installations are arranged into a peer-to-peer network to co-operatively distribute content and to broadcast notifications of events between servers. The aim is to minimize use of external bandwidth and to maintain as much functionality as possible when an external internet connection is not available. awareNet users connect to a local awareNet installation on their own school network and can interact with users and content on other peers through their local server.
Communication between awareNet nodes is carried out by the exchange of XML documents to and from the /sync/ module. Authentication of requests is performed using custom HTTP headers.
To keep bandwidth use steady content is not sent or received immediately but is stored in queues, with processes being forked off to dispatch items three at a time until the queues are empty. These queues are:
- sync: records information to be broadcast to individual peers
- downloads: lists files to be downloaded from the network
Contents |
Topology
awareNet nodes are arranged in a tree structure, with individual nodes joined in trust relationships to either upstream (closer to the hub) or downstream (further from the hub) peers. The Hub is the node with no upstream peers, and the central node of the network. Nodes may have as many downstream peers as needed and a maximum of one upstream peer.
In a recent addition to the protocol it is assumed that all awareNet servers can see all others on the network. This is required only for the chat feature of awareNet - all other data is passed only between immediate neighbours. Note that awareNet users may log in to any node on the network, and may be logged into multple nodes simultaneously.
Sync Queue (outgoing)
The sync queue holds information which is to be broadcast to peers - either generated in response to an event on the local server or received from another awareNet node. The queue of sync items (/modules/sync/models/sync.mod.php) is stored in the 'sync' database table and has the following structure:
| Field | Data Type | Comment |
|---|---|---|
| UID | varchar(30) | |
| source | varchar(30) | 'self' or the address of the peer this information was received from |
| type | varchar(50) | that type of communication is this |
| data | text | an XML document |
| peer | varchar(30) | to which this information is addressed |
| status | varchar(30) | may be 'new', 'failed', 'locked' or 'wait' |
| received | varchar(30) | when this information was generated or received from peer |
| timestamp | varchar(20) | when this was created or time of last attempt to send |
Type may be any of the following:
- dbUpdate (/sync/recvsql/): holds an updated copy of a record
- dbDelete (/sync/recvsqldelete/): holds details of a record being deleted
- fileCreate (/sync/recvfilecreate/): not used at present (live sync of all files)
- fileDelete (/sync/recvfiledelete/): not used at present (live sync of all files)
- notification (/sync/recvnotice/): a page notification
When a new sync item is created it is added to the queue and if the queue is not busy (no more than three processes are currently working at sending items from this queue) a new process is created to call /sync/send/%%UID%% on the local installation. This process will then attempt to POST the XML document (sync.data) to the appriate URL on the peer specifed in sync.peer, along with HTTP headers to authenticate.
If the /send/ succeeds (peer returns <ok/>) then the item is deleted from the queue, of not the status of the sync item is set to failed and will be retried in about 5 minutes or whenever the sync queue is quiet enough.
On receiving one of these sync items from the network a new copy is made for all peers except the one it came from and added to the sync queue - ie, information is never sent back to the peer it was received from.
Authentication/Security
Every node has a password, which should be known only to itself and its immediate peers. When making HTTP requests to peers this password is used in the construction of a proof to validate the request. The following, non-standard headers are sent with HTTP requests to peers in order to authenticate the request.
Sync-Timestamp: 1258028147 Sync-Proof: 58ba56ba77289baba3aa8ca6a7ed911254f161d5 Sync-Source: http://somepeer/path/to/awarenet/
At present the proof is the SHA1 hash of the server's password and the timestamp. The proof is re-calculated on reciept of the request and the request ignored if they do not match or if the timestamp is too old. Future versions of this scheme will use the entire message, including any data sent, to better prevent MITM and replay attacks, though using the full request caused problems in the test system that are yet to be worked out. The purpose of this scheme is to prevent tampered data from being stored in the database, it does not prevent data being copied or logged by an adversary. HTTPS should be used where practical to prevent sniffing.
Database
awareNet peers attempt to maintain a single, identical copy of synced tables on all members of the network. That is, database records are serializations of objects which should ideally be in the same state on all peers. Not all tables are synced, some data apply only to the local awareNet installation. Those tables which are synced must have the following fields:
| Field | Data Type | Comment |
|---|---|---|
| UID | varchar(30) | Globally unique identifier |
| editedOn | datetime | Date and time the record was last changed |
| editedBy | varchar(30) | UID of user who last edited the record |
Creating and Updating Objects
When a record is created (dbCreate) or saved (dbSave) through the database wrapper (/core/mysql.inc.php) a copy of the record is passed to the sync module (/core/sync.inc.php). The sync module determines whether peers need to know about the new or changed record and if so dbUpdate items are created in the sync queue for all peers. On receiving the notice at their /sync/recvsql/ interface, peers create or update the record in their copy of the database and create sync items for their own peers.
They are passed as an XML document specifying the table and record to be updated. Record data is base64 encoded since the record may itself contain XML. An example; update to an image record (some fields removed):
<?xml version="1.0" encoding="UTF-8" ?>
<update>
<table>images</table>
<fields>
<UID>MTYzMjk4NDA5MTIwOTM1NDU4</UID>
<refUID>MTA5MTg4NDg0MTAzNjI5Mzk4</refUID>
<refModule>bW9ibG9n</refModule>
<title>NDAwcHgtYXJhX21hY2FvXy1fdHdvX2F0X2xvd3J5X3Bhcmtfem9v</title>
<licence>dW5rbm93bg==</licence>
<attribName></attribName>
<attribURL>..truncated..</attribURL>
<fileName>ZGF0YS9pbWFnZXMvMS82LzMvMTYzMjk4NDA5MTIwOTM1NDU4LmpwZw==</fileName>
<format>anBn</format>
<weight>Mg==</weight>
<createdOn>MjAwOS0xMC0wNyAxNDo1NTozMg==</createdOn>
<createdBy>YWRtaW4=</createdBy>
<hitcount></hitcount>
<editedOn>MjAwOS0xMS0xMiAxNDo1NTo1NA==</editedOn>
<editedBy>YWRtaW4=</editedBy>
<recordAlias>NDAwcHgtYXJhLW1hY2FvLXR3by1hdC1sb3dyeS1wYXJrLXpvby5qcGc=</recordAlias>
</fields>
</update>
Note that the hitcount field is empty. Some fields apply only to the local node and changes to them are not synced.
Deleting Objects
When a record is deleted (dbDelete in /core/mysql.inc.php) from a synced table in the database, notice of the deletion is passed to peers as an HTTP POST to /sync/recvsqldelete/. Peers in turn will delete the record from from their copy of the database and pass on the notification to their own peers, spreading the deletion across the network. In addition, awareNet nodes keep a list of recently deleted items to prevent them from being recreated by peers which may not have received the deletion notice. This could occur if a peer was offline for the deletion but came back online before the next pulse, reporting a record which other peers don't have, or if an item was deleted and updated on different nodes in a short period of time. The object involved is /modules/sync/deleted.mod.php and stores itself in the delitems table:
| Field | Data Type | Comment |
|---|---|---|
| UID | varchar(30) | Globally unique identifier |
| refTable | varchar(50) | Table from which the item was deleted |
| refUID | varchar(30) | UID of the deleted item |
| timestamp | varchar(20) | when the record was deleted from the local database |
A deletion notification is an XML document with the following format:
<?xml version="1.0" encoding="UTF-8" ?> <deletion> <table>table name</table> <uid>UID of deleted item</uid> </deletion>
Notifications
Notifications are ephemeral data passed between awareNet peers and from peers to their clients to signal events to which an HTML page may need to respond. For example, if a user adds a comment to a blog post a page being viewed by a user on another node may need to update itself with the new comment. Notifications are passed between nodes and to clients on a best effort basis. They are not stored in the database unless a client is subscribed to the channel on which the notification is broadcast. Notifications are separate from the records they refer to. When a record is created a dbUpdate is broadcast before any notifications about about that update. Not all notifications are broadcast to the network, some channels are not useful to other peers or may not be broadcast for reasons of privacy.
Chat
Chat messages are a special class of notification in that they are not broadcast to the entire network, but only to nodes where the recipient (a user) is logged in. Because the node may not be an immediate peer we may not know its password; the userlogin UID is used as a token to validate the chat message (user logins are noted in the database and synced whole network with a common UID).
Files
User generated content such as images - distinct from files distributed with awareNet, as in the theme - are downloaded from peers only when requested by users on the local server. Images uploaded to awareNet are are referenced as transforms - versions of the image at different sizes and cropped to different aspect ratios - and only the transforms requested by clients are downloaded to the local server. For example, a user on a remote server might upload a 1024x800px image attached to a blog post, but the transform of the image used in the content of the blog post may be only 570x445px. When a user on the local server views the blog post only the smaller version of the image is downloaded. Files are not downloaded immediately (eg, the first time the blog post is viewed on the local server), but are added to a download queue and a placeholder image shown to users until the file is successfully copied to the local server. The items in the download queue are instances of /modules/sync/models/download.mod.php and are stored in the downloads table:
| Field | Data Type | Comment |
|---|---|---|
| UID | varchar(30) | Unique ID of this queue item |
| filename | varchar(255) | relative to installPath |
| hash | varchar(255) | hash of file a reported by a peer which has it |
| status | varchar(20) | 'searching' or 'wait' |
| timestamp | varchar(20) | time of last change to this item |
Usually a maximum of three worker threads will go through the download queue and search for files by querying peers at /sync/hasfile/. The peer may respond with an error message (not found) or with the SHA1 hash of the file in question. The worker thread will then download the file from the first peer it finds it on and check the downloaded file against the hash. If the hash matches then the file is saved to disk and the item removed from the download queue. If the file is corrupt or not found the file will be retried once the queue is no longer busy at a minimum of five minute intervals.
Querying a peer (/sync/hasfile/) which does not have the file in question will cause it to add the file to its own download queue, causing files to be pulled across the network as needed.
For security reasons files may only be downloaded to/from the /data/ directory. The download queue can be viewed from the admin console at /admin/downloads/
Pulse and Manual Sync
Every once in awhile a pulse is scheduled, this is a cascading event where all nodes to a complete database sync with their upstream and dowstream peers. This is done be comparing the UID, editedOn and editedBy fields of all records in all synced tables to ensure that all nodes have the same version of all records. A member of the 'admin' group can initiate a manual sync with one or more peers at any time from the 'Peers' section of the admin console. This is the same page used to manage a node's relationships to its peers.


