Misconceptions comparing NodeJS to Python

This was originally a very different article ... many many months ago when I stumbled upon the article below, I ran tests and python beat node each time. I don't know what I did differently then, but I could not recreate when I went to write this. I'm going to be better about documenting tests going forward in the future.

Click here if you want to bypass everything

NOTE: I'm not usually a fan of sucking it up and posting things when it's completely wrong, but it's worth noting that in the 4 years since I wrote this, Python has had major performance improvements with 3.12 - 3.14, with 3.15 poised to do another ~15% performance bump in certain areas. A lot of those areas are in the places this blog sorta glances.

Every once in a while, there's an article I stumble upon that annoys me. Usually along the lines of "Python sucks, and you should never use it for this". I've seen this a couple times over the years, especially when it comes to things like chatbots. Sometimes they're right.

.... Other times, they're right, but misguided in why it is bad. One of those articles was from Towards DataScience. Go take a few to read it, I'll wait.

Ready? Alright, lets dive in.

Usually their site is pretty good when it comes to python stuff, but this article bothered me. Triggering a server to constantly serve the Fibonacci number back to you is a horrible thing for python, but that's because of how python is designed, not because it sucks for those things.

In their example, the issue is comparing an Express app to a Flask app. It's a comparison of 2 different design paradigms for serving web content. Flask is an old-school WSGI framework, while node follows ASGI. ASGI, or Asynchronous Server Gateway Interface, builds off of traditional web server gateway interfaces and introduces the concept of asyncronous communication between client and server.

diagram comparing how a client and server communicate in the frameworks.

This allows the server to process multiple requests from many different clients and return the result about as fast as possible. in a traditional WSGI app, each request is handled in turn. If multiple requests come in at around the same time, the last received will be worked last. So what happens if 1000 requests get processed parallelized 10 at a time? Things stack up ... a lot.

nagel@BANE:~$ python tester.py
On average each node request took 86 milliseconds.

On average each python request took 498 milliseconds.


# I copied the function and called it twice, with node and python running on different ports.

His exact code, block for block.

That's a huge difference, surely it has to be more complicated than just ASGI vs WSGI, no? Kind of, but it's complicated. Even if we use something like gunicorn to add more parallelism to the python code, it's still FAR behind what node can do. Using async workers can improve this a little bit.

Async worker thread (fib:app -k gevent)
On average each python request took 222 milliseconds.

4 Async worker threads (fib:app -k gevent -w 4)
On average each python request took 621 milliseconds.

The difference above is simple:

The first run is done just with using an async worker, giving us huge gains in performance. Gunicorn is handling the requests and spawning a worker each time a request comes in.

The second run, says "I have 4 workers to try and handle the requests that come in, but I'm running in the same memory and CPU space.

Classic flowchart for an application forking and spawning workers.

You an get slightly better performance if you preload the app into gunicorns memory, so that the async worker spends less time spinning up, but that's a difference of 199ms to 222ms. 199ms is still more than double the time that node took to process, unfortunately.

Up until this point, we're still comparing WSGI to ASGI (yes, gunicorn is still WSGI, it just gives you more options on how you want to run the application). There's a newer web framework in python that can bring us closer to even.

I present to you FastAPI. A newer ASGI python framework that's in active development (that doesn't make stupid decisions like naming or architecture, sorry sanic). It uses something like gunicorn, called uvicorn, but it doesn't have the same kind of extendability. Personally, I think that's a smart choice.

from fastapi import FastAPI

app = FastAPI()

def fib(n):
    if n < 2:
        return n
    else:
        fib1 = fib(n - 1)
        fib2 = fib(n - 2)
        return fib1 + fib2



@app.get("/")
def hello_world():
    f = fib(30)
    return {"data": f}

But it's not a magic bullet to solve this comparison. Lets look at the original numbers compared to running in FastAPI.

On average each flask python request took 498 milliseconds.

On average each python request took 493 milliseconds.

well ... shit

Yea, not much better, if at all. So what gives? Python's actually. Python includes a GIL, or global interpreter lock which impacts things that are cpu or memory intensive. Essentially, every python program is limited to one thread and one thread only, even when we were telling gunicorn to use multiple workers. The link above goes into much greater detail than I can, but it basically limits how python programs can handle multi-threaded applications. The best way to visualize this issue is a different article that compares the different multithreading processes in python.

Sourced from https://medium.com/contentsquare-engineering-blog/multithreading-vs-multiprocessing-in-python-ece023ad55a

Frameworks like gunicorn and uvicorn async work usually falls under the normal multithreading. adding additional workers gets you closer to multiprocessing, but then your limitations end up being the hardware you're running it all on. In my case, I was running it on my 2c/2t server, which means I'm bottlenecked almost immediately. Running on my desktop, or a VM with more computational power improves the numbers slightly but not enough to mention.

Why Node is Still Faster and What Can Be Done

The simple answer is design. Node uses a non-blocking async model. Node creates a process containing an eventloop. It will then spawn workers threads as needed. This means every request that comes into the express framework gets spawned into it's own processes as needed and fold back into the eventloop when complete.

Python doesn't have that option, yet there's work to make python threadsafe and remove the GIL, but that's not anytime soon. You can get amazing speedups by switching to PyPy. Tests on the same hardware:

500ms time per request with just flask
360ms time per request just with 2 gunicorn sync workers
136ms time per request with 2 preloaded async gunicorn workers

Speed is greatly increased but, still slower.

What else can you do if you're deadset on using python for your computations?

The simplest answer is always best. Change how you architect the solution to retrieve the data. Leverage serverless frameworks for intensive tasks (hope to have something about this in the new year), use a work manager or queue system.

The problem comes in that most people trying to show complex computational data on an endpoint, are going to have long running processes normally. The best solution for that would be to have an endpoint you could hit to queue up the work, and another endpoint you could hit to fetch the results.