Skip to content

Parser Documentation

The Squiid Parser is a Rust implementation of a mathematical expression parser, with support for implicit multiplication and negative signs. The parser is based on the Shunting Yard algorithm, which is used to convert mathematical expressions written in infix notation to postfix notation. The parser takes in a vector of tokens and returns a vector of string references.

Functions

lib::parse

This function lexes and parses a given string automatically. You won't need to call any other internal methods when this is run, as all required operations will be performed.

lexer::lex

This function automatically lexes a given input string into a Result<Vec<Token>, String> using the logos crate. This is meant to be used internally however it may be used externally if needed. No parsing is done to the lexed string.

parser::shunting_yard_parser

This function takes a vector of Tokens and returns a vector of string references. It uses an implementation of the Shunting Yard algorithm to parse the expression. The function creates two stacks, output_queue and operator_stack, to keep track of the output and the operators in the expression. It also uses a HashMap to store the precedence of each operator. The function then iterates through each token in the input vector and performs different actions based on the type of the token.

parser::parse_subtract_sign

This function takes a mutable reference to a vector of Tokens and parses whether the token - is a negative sign or a minus operator. It does this by iterating through each token in the vector and checking if the token is -. If the token is -, the function checks whether it is a negative sign or a minus operator based on the position of the token in the expression. If the token is a negative sign, the function replaces it with a Token::Negative token in the vector. If the token is a minus operator, the function does not modify the vector.

parser::parse_implicit_multiplication

This function takes a mutable reference to a vector of Tokens and parses whether implicit multiplication is needed between two tokens. Implicit multiplication happens when two tokens are adjacent to each other and there is no operator between them. The function uses two constant arrays, LEFT_SIDE_IMPLICIT and RIGHT_SIDE_IMPLICIT, to determine whether implicit multiplication is needed. If a token in the left side array is followed by a token in the right side array, the function inserts a Token::Multiply token between them in the vector.

Exposed FFI Functions

parse_exposed

The parse_exposed function takes a null-terminated C string input and returns a pointer to an array of null-terminated C strings. The length of the array is stored in the outlen variable, which must be passed as a pointer

free_string_array

To free the memory allocated for the array of C strings returned by parse_exposed, use the free_string_array function, passing in the pointer to the array and the length of the array as arguments.

Example in C:

// Compiled with `gcc file.c -o file -L. -l:libsquiid_parser.so`
#include <stdio.h>
#include <stdlib.h>

// Define function signatures of Rust functions
extern char** parse_exposed(const char* input, int* outlen);
extern void free_string_array(char** ptr, int len);

int main() {
    // create the string to parse
    const char* input = "3+4*5";
    // array length buffer
    int outlen = 0;

    // call Rust function from shared object
    char** result = parse_exposed(input, &outlen);
    if (result == NULL) {
        printf("Parsing failed.\n");
        return 1;
    }

    // print tokens in order
    for (int i = 0; i < outlen; i++) {
        printf("%s\n", result[i]);
    }

    // free the allocated memory
    free_string_array(result, outlen);
    return 0;
}

A similar result can be achieved in other languages which allow you to interact with shared object libraries via the FFI, such as Python, Go, and many others.

Tokens

Each variant of the Token enum represents a type of token. Here is an explanation of each variant:

  • Function(&'a str): A function name followed by an opening parenthesis.
  • Comma(&'a str): A comma.
  • VariableAssign(&'a str): A variable identifier.
  • VariableRecal(&'a str): A variable identifier preceded by $.
  • Constant(&'a str): A constant identifier preceded by #.
  • ScientificNotation(&'a str): A number in scientific notation format.
  • Float(&'a str): A floating-point number.
  • Int(&'a str): An integer.
  • PrevAns(&'a str): The symbol @ representing the previous answer.
  • LParen(&'a str): An opening parenthesis.
  • RParen(&'a str): A closing parenthesis.
  • Equal(&'a str): The equals symbol.
  • Power(&'a str): The power symbol.
  • Multiply(&'a str): The multiplication symbol.
  • Divide(&'a str): The division symbol.
  • Modulo(&'a str): The modulo symbol.
  • Add(&'a str): The addition symbol.
  • Subtract(&'a str): The subtraction symbol. This can represent either unary or binary subtraction.
  • Negative(&'a str): This is not a valid token, but it is used for differentiation between minus and negative later on in parsing.

The PartialEq implementation ignores the content of the Token enum and only compares the variants.