Welcome! American children are taught to do arithmetic with pencil and paper, but in Japan, China, and Russia, they are taught to use an abacus instead. Flicking beads is much faster than writing down digits, but requires greater mental focus, as the lack of a paper trail makes it harder to detect and fix mistakes.
I was surprised to learn that calculating square roots is only slightly more complicated than long division. Cube roots are hard, the final challenge for advanced students who have mastered all the more basic operations.
Here's a program to teach these techniques. All it requires is a C compiler with the standard libraries and a terminal that supports UTF-8 text and ANSI escape codes. Cut-and-paste and scrolling ability are helpful but not required. If linked with Readline, it supports command-line editing and history.
Compile with either "gcc abacus.c -lm" or "apt install libreadline-dev; gcc -DREADLINE abacus.c -lreadline -lm"
It walks the user though every operation from addition to cube roots, each one building on the ones before. It can also give you the product, square, or cube of a random number so you can try dividing or rooting it yourself. For each calculation, it shows the shortest abacus that can do it with no loss of precision.
It offers an alternate method of multiplication and division for students who have not yet memorized the 9x9 multiplication table. It displays the abacus as a series of digits; its methods are equally valid for all styles of abacus.
Each historic abacus design has its advantages. The Chinese suanpan supports bases up to eighteen (their system of weights was base-16). The Japanese soroban has the best bead economy. And the Russian schoty is the fastest and simplest to operate.
Bead economy doesn't matter with plastic pony beads, so I prefer the Russian design, but with thirty to fifty digits. With a ten-bead abacus, one swipe of one finger changes a digit (no pinching), and 6×4 is either 20+4 or 30-6, never e.g. 50-20-5-1 or 50-30+5-1.
The tenth bead is not absolutely necessary, but it means that any time you add e.g. four to a digit, you're either moving four beads down, or moving all but four beads up. It also allows delayed carry during addition. Any string of tens must be eliminated left-to-right, e.g. 4-10-10-3 becomes 5-0-10-3, then 5-1-0-3. (On a Chinese abacus, decimal digits could go up to 15 before carries must be resolved!)
My personal records, not including setup, no mistakes allowed, are
4:12 to multiply two 10-digit permutations ("perm" command)
3:40 to square a 10-digit permutation
4:46.5 to divide by one 10-digit permutation and get another ("perd")
4:13 to un-square a 10-digit permutation ("pers")
Square roots are faster than long division for the same reason squares are faster than multiplication -- lots of duplicate partial products!
You can even go beyond cube roots into transcendental functions. This program calculates sine, cosine, and exponential. This one calculates pi in decimal, and this one calculates pi in dozenal (base 12). If you're an old fogie like me who still uses xterm, this shell script adds the extra digits ↊ and ↋ to the basic xterm fonts.
Gallery of abacuses:
Foam board, 14mm spacing, no table saw needed.
Masonite with 1/8" nylon cord, 14mm spacing.
Plywood with 1/8" nylon cord, 13mm spacing.
Masonite pocket model, 12mm spacing.
Base-12 abacus
Always cut the base board and rails of the same material so they don't flex apart. Cut the notches deeper than the string requires so that pegs or pencaps can be shoved into them to mark locations on a long abacus.