Anyone who is a member in one of my rooms should have noticed by now: Something went terribly wrong. There was no malicious intent at any point in time which lead to this, and, while certain peculiarities of the Matrix spec may have had an influence, the fault was mainly due to a mistake on my side… Everything was related to server ACLs in these rooms, so that is where I will start.
What are server ACLs?
"ACL" is short for "Access Control List". They are used to,
well, control which entities are permitted to access which
resources. In Matrix, rooms can have server ACLs in order to
prevent or allow certain servers from interacting with other
servers in that room. Like most things in Matrix, they are defined
in JSON, with an
"allow" array as a whitelist and a
"deny" array as a blacklist. The former typically has
"*" element to allow any server to participate
* being a wildcard for an arbitrary string).
What did I do, and why?
I had recently configured a Mjolnir bot to help with moderating my rooms. Now, said bot wanted to configure a server ACL in my rooms, even though I had heard this frequently caused performance issues due to the aforementioned whitelist rule having to be checked against with every server - so I thought redacting the event it had sent would revert the room back to a previous state, where no ACL had existed. Wrongly so, as I learned soon after that.
I did not notice anything at first: It seemed like I could not
prevent the bot from doing this, so I decided that instead of using
the more efficient Mjolnir rules like
which we use in the Techlore room and do not cause any server side
load, I just would use server ACLs what they were meant for. The
bot was on my server; everything seemed to work fine, I could even
receive messages from other servers. Until MMJD told me from his
side it looked like Mjolnir had banned every other server… Unlike I
had thought, the redaction only removed the content of the
ACL, not the ACL itself, leaving it with an empty array for
"allow". This caused every server in the rooms to
ignore any events (messages, settings changes and so on) in these
rooms. A truly weird situation, as my server could receive things
just fine - only the other servers would refuse to do so, even
though they sent events out. They would even ignore the fact I had
changed the server ACL yet again…
I could manually patch servers back in by changing the ACL for these rooms on them as well - however, this would only work for servers where there was a moderator for the room on, as other servers, while being able to have users join the room, would not receive the fact someone had been promoted in there… A state reset could also have solved it, however, such is hard nowadays as Matrix has become more reliable… So, my solution was to create new rooms, reclaim the aliases from the old ones for them, and invite every old user over, while trying to close the old rooms as well as possible (tombstoning, deny permissions etc., I am not sure how much of that arrived on the other servers, however). I hope everyone has found their way into the new rooms now, if not, feel free to tell me.
NEVER EVER redact
m.room.server_acl events unless you are
exactly sure what you are doing. Actually, do not
redact any state events where you cannot be absolutely sure nothing