Did the AoC team invent a new language?
Not really. The input for today’s puzzle (Day 3 of AoC, i.e., 03.12.2024 🤣) looks a lot like primitive, minified code — but it has only one syntax token defined (so far). We get something like this:
xmul(2,4)%&mul[3,7]!@^do_not_mul(5,5)+mul(32,64]then(mul(11,8)mul(8,5))
In this code, we are interested in mul(X,Y) phrases. These represent multiplication operations between X and Y. Our goal is to parse this code and calculate the sum of all multiplication results.
State machines FTW
State machines are a great fit for this kind of problem. But what is a state machine?
A state machine is a structure with states and behavior that models state changes. Most commonly, states are represented by an enum or by string values.
We will parse each character of the input and change the state of the machine based on predefined rules.
Here are the states we’ll use:
class ParserState(Enum):
IDLE = "idle"
OPERATION = "operation"
PARAMETERS = "parameters"
READY = "ready"
FAILED = "failed"
- IDLE: The default state; we haven’t started parsing an operation.
- OPERATION: We’ve encountered a valid operation keyword and are parsing it.
- PARAMETERS: We’re parsing the operation’s arguments.
- READY: The operation has been successfully parsed.
- FAILED: Parsing failed.
These states will be used in two independent classes: Parser (for identifying operations) and Operation (for parsing parameters). For now, we only have one operation: mul.
Operation class
class Operation(ABC):
type: str = NotImplemented
state: ParserState = ParserState.PARAMETERS
args: list[int] = []
def __init__(self):
self.args = []
def __repr__(self):
return f"[{self.type}] {self.state}"
def consume(self, char: str) -> Operation | None:
if self.state == ParserState.PARAMETERS and char == ")":
self.state = ParserState.READY
return self
This basic structure holds the parsing state, the operation type, and any parsed arguments.
Parser
Here’s our abstract Parser class:
class Parser:
stack: list[str] = []
operation: Operation | None = None
state: ParserState = ParserState.IDLE
debug: bool = False
operations: list[Operation] = []
def __init__(self, operations: list[Operation], debug=False):
self.operations = operations
self.debug = debug
def restart(self):
if self.debug:
print()
self.state = ParserState.IDLE
self.stack.clear()
self.operation = None
return self
def __repr__(self):
return f"State {self.state} | Operation ({self.operation}) | Stack {self.stack}"
def consume(self, char: str) -> MultOperation | None:
if self.debug:
print(char, self)
if self.state == ParserState.IDLE:
self.stack.append(char)
if char != "(":
return
for operation in self.operations:
length = len(operation.type)
phrase = "".join(self.stack[-length - 1 : -1 :])
if phrase == operation.type:
self.restart()
self.state = ParserState.OPERATION
self.operation = operation()
return
elif self.state == ParserState.OPERATION:
if not self.operation:
return
res = self.operation.consume(char)
if not res:
return
self.restart()
if res.state == ParserState.READY:
return res
Example usage:
class MyOperation(Operation):
type: 'smth'
def consume(self, char: str):
pass
my_parser = Parser([MyOperation])
inp = "smth(1,2,3)and_other?"
for c in inp:
res = my_parser.consume(c)
if res and res.state == ParserState.READY:
# we have successfully parsed the operation!
pass
This structure lets us define operations independently and control how their arguments are parsed.
The important note here is that we start parsing operation, when we encounter operation keyword, followed by the open parenthesis sign (.
Because of that, we have to perform slicing [-length - 1 : -1 :] on the current character stack.
We start parsing mul, when we have [..., m, u, l, (] and go straight to the parameters parsing. This is arbitrary rule, that makes parsing our puzzle a bit simpler.
The mul operation
class MultOperation(Operation):
type = "mul"
stack: list[str] = []
state: ParserState = ParserState.PARAMETERS
def __repr__(self):
return f"MULT - State {self.state} | Args: {self.args}"
def restart(self):
self.stack.clear()
self.state = ParserState.FAILED
return self
def consume(self, char: str) -> Operation | None:
if self.state != ParserState.PARAMETERS:
return
if char.isdigit():
self.stack.append(char)
elif char == "," and len(self.stack) > 0:
number_as_str = int("".join(self.stack))
self.args.append(number_as_str)
self.stack.clear()
elif char == ")" and len(self.stack) > 0:
number_as_str = int("".join(self.stack))
self.args.append(number_as_str)
self.stack.clear()
if len(self.args) == 2:
self.state = ParserState.READY
return self
else:
return self.restart()
else:
return self.restart()
When we encounter a comma, we store the current number and continue parsing. When we see a closing parenthesis, we check if we have exactly two arguments; if so, the operation is ready.
Full solution (Part 1)
def solution_1(stream: str):
total = 0
parser = Parser(operations=[MultOperation], debug=False)
for char in stream:
res = parser.consume(char)
if res is None:
continue
if res.state != ParserState.READY or len(res.args) != 2:
continue
total += res.args[0] * res.args[1]
return total
We pass in a stream of characters, and parse each one by one. When we receive operation that was parsed successfully (READY state) and it has exactly two arguments, then we can perform the operation.
Analysis of the complexity
So this solution works in O(n) time complexity. We only have to iterate over each character exactly once. We don’t have to jump back and forth. That seems great!
The memory complexity is O(1), because we only store some minor helper arrays and values. After correctly parsing the operation, we can remove it from memory, and reset our parsing stack.
This solution seems quite efficient, considering also that we have the flexibility of extensive code, that allows us to implement other types of operations.
Part 2 - additional operations
In the second part we receive two additional operations - do and don't.
Our test input looks like this:
xmul(2,4)&mul[3,7]!^don't()_mul(5,5)+mul(32,64](mul(11,8)undo()?mul(8,5))
As we can se, we have:
- mul(2, 4)
- don’t()
- mul(5, 5)
- mul(11, 8)
- do()
- mul(8, 5)
Those operations basically toggle, when we should use mul operations in our sum.
Because we’ve defined default behaviour in our abstract Operation class (default meaning parsing exactly zero arguments), we can now simply define new operations:
class DoOperation(Operation):
type = "do"
class DontOperation(Operation):
type = "don't"
Solution:
def solution_2(stream: str):
operations: list[Operation] = []
parser = Parser(operations=[MultOperation, DoOperation, DontOperation], debug=False)
for char in stream:
res = parser.consume(char)
if res is not None:
operations.append(res)
total = 0
is_enabled = True
for op in operations:
if op.type == DoOperation.type:
is_enabled = True
elif op.type == DontOperation.type:
is_enabled = False
elif (
op.type == MultOperation.type
and op.state == ParserState.READY
and is_enabled
):
total += op.args[0] * op.args[1]
return total, operations
Now we parse the input, and store operations on the side.
Then we iterate over the operations, and use is_enabled toggle, to correctly calculate the sum. If we encounter do operation, we enable summing, if we encounter don't, we disable it.
Analysis of complexity (part 2)
Time complexity remains O(n), but memory usage is now O(n) since we store all parsed operations.
Conclusion
We’ve successfully parsed the input (or “memory” in puzzle terms) using a flexible state machine approach. It feels like writing a parser for a tiny programming language with C-like function calls. Maybe in the coming days we’ll build a full AST (Abstract Syntax Tree)?
We’ll see.
Thanks for reading, and see you in the next article!