Does this situation sound familiar to you?
- You are a data scientist, you developed an ML model in Python (using
PyTorch, TensorFlow, or something like that), and you'd like your users to
interact with it,
- You would like to make either an API or a web interface to your model,
- Your model is big enough (and therefore slow) that you would prefer not to
load it from scratch every time a user wants to use it, and
- You know a thing or two about servers, but you don't have a deep background,
you don't have the time and/or patience to get into it, you don't have the
proper server administrator rights, or a combination of all three.
If that's your situation, this post is for you.
Flask is a Python package for the quick and easy
creation of APIs that you can use to serve model predictions over the internet.
And if your users need a GUI,
Dash is a software package built on top
of Flask that allows you to quickly create web interfaces - if you are familiar
with R, Dash has been described as "ShinyApps for Python".
Typically you would use a "real" web server (Apache, NGINX, etc) to do the
heavy lifting, but this post focuses on how to use Flask alone to quickly
return results generated from an ML model.
In particular, we will focus on how to keep a model in memory between calls so
you don't need to restart your model at every turn.
WARNING: Flask is not designed to work this way.
The Flask documentation itself
tells you not to use
their integrated web server
for anything other than testing, and if you blindly expose this code to the
internet things can get ugly. It will also be much slower than using a proper
web server. And yet, I am painfully aware that sometimes you don't have the
resources to do things right, and people telling you "there's a way but I'm not
telling you because it's ugly" doesn't help. So remember: this solution
is ideal for situations where you have low traffic, ideally inside an intranet
and/or behind a firewall, and you don't have the technical help you'd need to
do it right. But be aware of its limitations!
Method 1: global variables for simple models
Let's start with the simplest of web servers. This code exposes a single
API endpoint /helloworld
that receives a name and returns a greeting:
from flask import Flask
app = Flask(__name__)
@app.route('/helloworld/<name>')
def hello_world(name):
return f'Hello, {name}'
if __name__ == '__main__':
app.run(debug=True)
If you send a request to http://localhost:5000/helloworld/Test
, you would
get Hello, Test
as a result.
Let's say that you now want to return a counter of how many times you have
received a request - every time you get a request you simply increase a counter,
and then you return that counter. One simple solution is using a global
variable, like so:
from flask import Flask
counter = 0
app = Flask(__name__)
@app.route('/helloworld/<name>')
def hello_world(name):
global counter
counter += 1
return f'Hello, {name}, you are request number {counter}'
if __name__ == '__main__':
app.run(debug=True)
This code does work only in Linux (I think), but under some circumstances it
could be all you need - all that would remain is for you to replace the variable
initialization with code that loads your model into memory.
You would use this method, for instance, when you need to perform a task with
a long enough startup time, such as parsing a long list of JSON files.
If that's your case, you can leave this server running like so:
- Disable debug and open the server to the world by changing the
app.run
line to app.run(debug=False, host='0.0.0.0')
.
- Install the
screen
utility (tmux
is also good), start it typing simply
screen
in your console, run your server (python script_name.py
) and
leave the server running in the background (press Ctrl+A+D). The server
will keep running until the computer is restarted.
Unfortunately for some of you, this solution doesn't work under Windows nor
does it work if you use a "real" web server instead of the one provided with
Flask. More important, it also tends to fail when using some ML libraries that
are not happy with the parallel simultaneous access.
If that's your case, your best solution is to create a sub-process (ugh) and
communicate with your model via IPC (double ugh).
Method 2: Inter-Process Communication (IPC)
Before we jump into the code, we need to understand who is going to talk to
whom and how. It goes as follows: the ML model will run in its own process,
which we'll call the ML-process
. This process
can only be reached via a multiprocessing queue, a data
structure where you put multiple elements which are later retrieved
in the same order in which they were inserted and such that it
can be shared with multiple processes.
Whenever you make a request to the API, Flask creates a new process that we
will call a request-process
. The first thing that this process does is
to open a Pipe. You can think of a pipe like a special pair of telephones that
can only talk to each other and where sound is only emitted when someone is
listening - you can talk for hours into the receiver, but nothing will come out
of the other end until someone listens (in which case they'll get all of your
talking at once) or until the pipe is full. Whenever a request-process needs
to perform a request to the ML-process, it does so as follows:
- As we said above, the request-process opens a Pipe. It has two ends which
we'll call the 'source' and 'target' ends. Remember, though, that despite
their name communication can flow in both directions.
- The request-process puts some data in the 'source' end of the pipe.
Whenever someone picks up the 'target' end of the pipe they'll receive this
data.
- Next, the 'target' end of the pipe is put in the multiprocessing queue.
If we stick to our analogy, it would be the equivalent of having two
cell phones, putting one of them in a box, and mailing it to another person.
- And now, we wait.
The ML-process is constantly monitoring the queue, and it will eventually
receive the 'target' end of the pipe that we put in the queue. I say
"eventually" because other processes are also trying to talk to the ML-process,
and therefore every process has to wait for their turn. In our analogy, it is
the equivalent of a person receiving package after package, each one containing
a cell phone.
Once the ML-process receives our 'target' end of the pipe it extracts the
data, processes it, and puts the result back into the pipe using the 'target'
end it received earlier. This result is then sent back via the pipe, where our
request-process retrieves it and where it can be served back to the user that
made the original request.
The following code does exactly that:
from flask import Flask
from multiprocessing import Process, SimpleQueue, Pipe
# This is the global queue that we use for communication
# between the ML-process and the request-processes
job_queue = SimpleQueue()
app = Flask(__name__)
# This is the process that will run the server
class MLProcess(Process):
def init(self, queue):
super(MLProcess, self).__init__()
self.queue = queue
# The slow initialization code should come here.
# For this example, we just create a really bad cache
self.cache = dict()
def run(self):
# Query the end of the pipe until we tell it to stop
stop = False
while not stop:
# Receive the next message
incoming = self.queue.get()
if incoming == 'shutdown':
# We got the magic value that tells us to stop.
# Make sure this value doesn't happen by accident!
stop = True
else:
# `incoming` is a pipe and therefore I can read from it
data = incoming.recv()
# Do something with the data. In this case, we simply
# convert it to lower case and store it in the cache,
# but you would probably call an ML model here
if data not in self.cache:
self.cache[data] = data.lower()
# Send the result back to the process that requested it
incoming.send(self.cache[data])
# If your model requires any shutdown code, you would place it here.
pass
# This is a normal API endpoint that will communicate with the ML-process
@app.route('/helloworld/<name>')
def hello_world(name):
# Create both ends of a pipe
my_end, other_end = Pipe()
# Send the data through my end
my_end.send(name)
# Send the other end of the pipe via the queue
job_queue.put(other_end)
# This process will now wait forever for a reply
# to come via its own end of the pipe
result = my_end.recv()
# Return the result from the model
return 'Hello, {}'.format(result)
if __name__ == '__main__':
ml_process = MLProcess(job_queue)
ml_process.start()
app.run(debug=True)
job_queue.put('shutdown')
ml_process.join()
This code works well as long as there is perfect
communication between all moving parts. If the ML-process hangs up, for
instance, then no more data will be returned and all request-processes will
keep waiting forever for a reply that will never come. The same will happen
if you send the pipe to the server but you don't put any data in it. You can
mitigate these problems by using the poll
method of a Pipe (which looks
whether there's any data and returns immediately), but you should be aware
that synchronization errors are both common and mean to debug.
Note also that we have a special value that we use for instructing the
ML-process to shut down - this is necessary to ensure that we clean up
everything before exiting our program, but make sure no client can send this
special value by accident!
Final thoughts
Is this a good idea? Probably not - if the developers of Flask themselves tell
you not to do something, then "don't do it" sounds like solid advice.
And we both know that there's nothing more permanent than a temporary solution.
Having said that, and as far as horrible hacks go, I like this one: if you are
a data scientist then you are not here to learn how web servers work nor to
have long discussions with your system administrator on what CGI means.
You are here to get stuff done and getting your ML model in front of users
as fast as possible is a great way to achieve that.
I used to know a way to extend this method to work with Apache (you know, a
"real" web server) but I honestly can't remember it right now. If you need some
help with that then reach out to me and I'll try to figure it out.
I always wanted a smart home. I don't have a particular use for it, I just
think it's cool that I can yell at my living room and it will obey me.
And since next month I'll be moving to an empty flat, it is truly now or never.
This is the first post detailing what I hope will be a painless experience and
what I know will be a long list of frustrations. Today I will detail my general
plan, and in future entries I'll let you know how it all goes.
My aspirations for this first stage are modest: I want to be able to control the
lights in key spaces just by talking to a device. Not just turn them on and
off, mind you, but also dimming the lights to certain levels. I also would
like to have a smart mirror with my morning information (to-do and weather,
mostly), but that is more of a stretch goal.
Starting from the top, I need smart light bulbs. Originally I planned on going
with Ikea's TRÅDFRI
light bulbs for their price, but I decided against them because they don't seem
to play too well with open platforms and because they require an extra hub.
I settled instead for the middle-priced Philips WiZ
because they connect directly over WiFi (unlike it's expensive cousins from the
Philips Hue line). I would have loved to use the cheap
Hama smart lighting options,
but my past experience with this company gives me little hope of their
protocols being open or, for that matter, good.
Another factor in favor of WiZ was that they are supported by
OpenHab thanks to the
heroic work of one volunteer.
Once it is properly configured I expect I'll be able to add
complex commands like "dim the lights to 60% after 17:00 if it's winter" and
stuff like that.
The voice commands will be handled by Mycroft, the privacy-focused
alternative to Alexa and friends. I would really like to buy a Mark II,
but given their delivery times I fear that I'll have to install my own version
first (probably in my old notebook) and eventually migrate.
Lucky for me, Mycroft and OpenHab are good friends.
The final part is networking. If you are familiar with my blog you may know how
much I care for privacy, which is typically a problem when you want to install
hardware that monitors your home 24/7. Therefore, all of the above-mentioned
services will run in their own isolated LAN with no connection to the internet.
Mycroft may get an exception depending on whether I would like to ask it about
the weather, but everything else will stay isolated. This would also guarantee
that I don't lose control of my lights when my internet is down.
I have long ago flashed my WiFi router with dd-wrt
which allows me to have multiple networks and define who can talk to whom.
Progress so far
Given that I already have the light bulbs, I tried to set them up using the
Android WiZ app. This did not work: one key step of setting up the light bulbs
is to register them on the cloud (for whatever reason), and the closed network
made this impossible. I am fine in principle with the light bulbs phoning home
once and then never again (combined with a VPN, the information they would
expose would be minimal), but for that I would need internet and I still don't
have any.
I have also decided that two rooms will get "dumb" lights: the kitchen and the
bathroom. These rooms are not "chill" rooms but rather "be there with a purpose
and then leave" rooms, so there's no point in doing much with them.
And finally, one issue I have not yet decided is what to do about the microphone.
Placing a Mycroft in the hallway would mean that I always need to yell at it,
but I don't like the idea of my neighbors knowing that I turned on my lights at
3 AM and I doubt my neighbors would like it either. My best alternative so far
is a small portable microphone - I read an interview sometime in the 90s about
how Bill Gates' mansion was controlled with pins you were supposed to wear, and
that seems reasonable enough to me. But I have yet to find something small enough.
Next steps
If I decide to go for the smart mirror,
this guide seems
like the way to go: I already have a Raspberry Pi I'm not using (used to be my
NAS server) and an old laptop screen, and it's mirror film approavh is cheaper
than those using two-way glass. The annoying part would be finding the appropriate
control circuits for the screen, which is a can I've been kicking down the road
for a couple years now.
I also would like Mycroft to play my music, but that would require me to install
a NAS and set all networking correctly, which was not fun with all devices in the
same network and will probably not be fun here.
I'll let you know how this all works out.
Here's a movie idea no one asked for:
A group of teenagers have a dream of competing in the Paralympic Games, but
their dream is in danger when their beloved coach gets terribly sick.
And things don't look promising with their substitute coach either.
Sent there as punishment for his failures in the army, he is very strict,
belittles them constantly, and has no sense of humor. The coach's name?
Darth Vader.
Will the young athletes make it to the Space Paralympic Games?
Can the power of friendship prove to be stronger than the Dark Side?
Will Darth Vader find his humanity, with his failure at defending the
Death Star being the first falling domino in his eventual return to the Light Side?
I guess you'll have to watch the movie to find out!
(This is not the dumbest thing I've ever written, but it's probably Top 5)
In what must be one of the weirdest flexes ever, I can proudly say that I
keep my own email server. And like everyone who keeps their own e-mail server
can attest, GMail is always annoyingly difficult to work with.
The last version of this problem is that GMail stopped receiving my email
because I didn't have a PTR record, an error that's annoyingly difficult to
debug for two reasons: because the record is not really called "PTR", making
it slightly difficult to figure out what they want for me,
and because I know I've had one for years.
Long story short, the problem was that my server started using IPv6 and the
outgoing address was not matching the IPv4 PTR record I had.
So I solved the way these problems are always solved:
I turned off IPv6 and everything works again.
I can't remember who gave me this advice for the first time, but I can confirm
that it works.
Now, if only I could manage to identify why Apache is not redirecting the
LetsEncrypt challenges properly...
Edit: This post is best read when listening to
Future Boy's "Computer Shop".
Warning: NSFW language!