Converting Infix Notation to Postfix Notation
Introduction
In mathematical notation, expressions can be written in various formats, including infix, postfix, and prefix notations. Infix notation is the most common and intuitive (it's likely what you were taught), where operators are placed between operands (e.g., A + B). Postfix notation (also known as Reverse Polish Notation or RPN) places operators after their operands (e.g., A B +). Converting infix expressions to postfix notation can be useful for evaluation using stack-based algorithms.
Even though Fortran is a strongly typed language, many applications could find it useful to dynamically set equations. For example, imagine having a single program to solve the heat equation but you can change the initial conditions by defining a function as a character string through the command line. FEQParse aims to provide such dynamic equation support wherein you can define equations in infix notation using the Fortran character
type and evaluate the equation using floating point input.
FEQParse works by converting an infix expression into a postfix expression and running the evaluation using a stack-based algorithm. This page walks through sketches of the various algorithms that clean, parse & tokenize, convert, and evaluate equations in FEQParse.
Algorithm overview
The EquationParser
data type has a constructor routine and evaluation routines that make it easy to create equations that can be evaluated
program example
use feqparse
use iso_fortran_env
implicit none
type(EquationParser) :: eq
real(real32) :: xscalar, fscalar
real(real32) :: xarray(1:100), farray(1:100)
eq = EquationParser( 'f=x^2', ['x'] )
! Example evaluating with a scalar
xscalar = 1.0_real32
fscalar = eq % Evaluate(xscalar)
! Example evaluating with an array
xarray(1:100) = 1.0_real32
farray(1:100) = eq % Evaluate(xarray)
end program example
There are clearly two main algorithms at play here for (1) construction and (2) evaluation of the parser. Construction involves
- Cleaning the input equation
- Tokenizing the infix equation
- Conversion from infix notation to postfix notation
After tokenizing and converting, both the infix and postfix expressions are store in a stack of tokens. The postfix expression is used for equation evaluation using a stack-based algorithm, where intermediate results are processed through a stack; the type of data stored in the stack depends on the type of data passed as input for evalation (e.g. scalar float32, 4-d array float 64, etc.)
In case you're not familiar, a stack is a data structure that operates on the Last-In-First-Out (LIFO) principle, where the most recently added element is the first to be removed. It is typically implemented as a class with methods for common stack operations such as push (to add an element to the top), pop (to remove and return the top element), peek or top (to view the top element without removing it), and often additional methods like isEmpty (to check if the stack is empty) and size (to get the number of elements in the stack).
Cleaning the equation
Before stepping into tokenizing, parsing, and converting to postfix notation for evaluation, an input CHARACTER
that contains an equation in infix notation
is preprocessed. We need to clean and format the equation, ensuring it is in a suitable form for subsequent operations.
The algorithm for the FEQParse EquationParser
CleanEquation
type-bound procedure is broken down as follows
- Identify Dependent Variable Name:
- Locate the position of the equal sign (
=
) in the equation. -
If an equal sign is found, the part of the equation before the equal sign is considered the variable name and stored in
parser%variableName
. -
Replace Power Operator:
-
Replaces occurrences of the exponentiation operator
**
with the caret symbol^
in the equation using theReplaceStr
function. This is done to allow using either**
or^
for exponentiation. -
Extract Infix Formula:
- Extracts the part of the equation to the right of the equal sign, which represents the infix formula.
-
Left adjusts this extracted formula for further processing.
-
Remove Spaces:
- Iterates over the extracted infix formula and removes any spaces, effectively compacting the formula.
-
This is done by copying non-space characters to a new position in the string and adjusting the string length accordingly.
-
Finalization:
- Pads the remaining part of
parser%inFixFormula
with spaces to ensure there are no leftover characters. - Sets the
equationCleaned
flag to.true.
if the cleaning process completes successfully.
Tokenizing the infix formula
After the equation is cleaned and we are certain we have a valid equation in infix notation, we now "tokenize" the equation. Essentially, this process is used to break down an infix expression into individual tokens and classify each token as a function, variable, number, operator, monad, or parentheses.
The algorithm for the FEQParse EquationParser
Tokenize
type-bound procedure is broken down as follows
- Initialization:
- The subroutine initializes the
tokenized
flag to.false.
. -
Set up a token stack and calculates the maximum length of variable names from our list of independent variables
-
Tokenization Loop:
- The main loop processes the input infix formula one character at a time.
-
For each character (or sequence of characters), the subroutine determines the type of token (function, variable, number, separator, or operator).
-
Token Identification: Each token is assigned a type and a string.
- Function: Check if the current substring represents a function. If so, it adds the function token to the stack.
- Variable: Check if the current substring matches any of the independent variables. If it does, it adds the variable token to the stack.
- Number: Check if the current substring is a number. If so, it adds the number token to the stack, handling special cases like the constant
pi
. -
Separator: Identify and process separators (parentheses and operators), adding the appropriate token to the stack.
-
Error Handling:
- The subroutine checks for missing operators or closing parentheses after variables and numbers. If an error is found (such as missing a closing parentheses), print an
errorMsg
and exits. -
If an invalid token is encountered (we cannot classify it), print an
errorMsg
and exits. -
Special Case Handling:
-
Adjusts the token type for unary operators (monadic tokens), which can appear at the beginning of the expression or immediately after an opening parenthesis. Unary operators, or monadic tokens are typically either
+
or-
that indicate the sign of a number of variable, e.g. in the infix equationf = -2*x
, the-
is a monadic token. -
Finalization:
- Sets the
tokenized
flag to.true.
if the tokenization process completes successfully.
Conversion from infix to postfix
The conversion from infix to postfix notation involves the use of a stack to temporarily hold operators and ensure that the final postfix expression maintains the correct order of operations. The following steps outline the conversion algorithm:
- Initialize Structures:
- Create an empty stack to hold operators.
-
Create an empty list to output the postfix expression.
-
Process Each Token:
- Read the infix expression from left to right, one token (operand or operator) at a time.
-
Depending on the type of token, perform the following actions:
a. Operand: - Directly append operands (numbers, variables) to the output list.
b. Left Parenthesis (
(
): - Push the left parenthesis onto the stack.c. Right Parenthesis (
)
): - Pop operators from the stack and append them to the output list until a left parenthesis is encountered on the stack. - Discard the left parenthesis.d. Operator (e.g.,
+
,-
,*
,/
): - While the stack is not empty and the operator at the top of the stack has higher or equal precedence than the current operator, pop the operator from the stack and append it to the output list. - Push the current operator onto the stack.e. Monadic Token:
f. Functions (e.g
sin
,cos
,log
, etc.): - Push the function onto the stack -
Empty the Stack:
- After processing all tokens, pop any remaining operators from the stack and append them to the output list.
Operator Precedence and Associativity
Operators have different levels of precedence which affect the order in which operations are performed:
- Multiplication (*
), division (/
), and modulus (%
) have higher precedence than addition (+
) and subtraction (-
).
- Operators of equal precedence are evaluated based on their associativity (left-to-right or right-to-left).
In FEQParse.F90
, the Priority
function uses the token % tokenType
and token % tokenString
to determin the priority for evaluation.
Example Conversion
Consider the infix expression: A + B * (C - D)
- Initialize the stack and output list:
- Stack:
[]
-
Output:
[]
-
Process each token:
A
: Append to output. Output:[A]
+
: Push onto stack. Stack:[+]
B
: Append to output. Output:[A, B]
*
: Push onto stack (higher precedence than+
). Stack:[+, *]
(
: Push onto stack. Stack:[+, *, (]
C
: Append to output. Output:[A, B, C]
-
: Push onto stack. Stack:[+, *, (, -]
D
: Append to output. Output:[A, B, C, D]
)
: Pop and append until(
is found. Stack:[+, *]
, Output:[A, B, C, D, -]
- Pop
*
from stack (higher precedence than+
). Stack:[+]
, Output:[A, B, C, D, -, *]
-
Pop
+
from stack. Stack:[]
, Output:[A, B, C, D, -, *, +]
-
Final postfix expression:
A B C D - * +
Evaluation
There are multiple functions for evaluation of a parser. All of them follow the same algorithm and they are distinguished only by the type of data used to define the independent variables, which also matches the resulting output. For example, FEQParse provides evaluation routines for scalar real32
, scalar real64
, 1-d array real32
, etc.
Below is a rough breakdown of the evaluation algorithm:
- Initialization:
-
Construct a stack for evaluation and storing intermediate results
-
Evaluation Loop:
- Iterate through each token in the postfix expression (stored in
parser%postfix%tokens
). -
Depending on the type of each token (number, variable, operator, function, or monadic operator), perform specific actions using the stack.
-
Token Processing:
- Number Token:
- Numbers are pushed onto the stack. Special numbers, like
pi
are allowed to be used, so we check for these, also pushing onto the stack.
- Numbers are pushed onto the stack. Special numbers, like
- Variable Token:
- Match the variable name with the independent variables from the input and pushes the corresponding value onto the stack.
- Operator Token:
- When we encounter an operator (e.g
+
,-
,*
,/
,^
), pop the last two operands (last one isb
and second to last isa
) from the stack and apply the operationa op b
. - Push the result back onto the stack.
- When we encounter an operator (e.g
- Function Token:
- Pop the operand from the stack, applies the function (like sin, cos, etc.), and push the result back onto the stack.
-
Monadic Token:
- Handles unary operations (like negation), pop the operand, applies the operation, and pushes the result back onto the stack.
-
Final Result:
- After processing all tokens, the final result is popped from the stack and returned as the function result
f
.