Hope it's OK to jot down my notes about this issue here.
First of all, I appreciate the example in the OP a lot, because that is where I started as well - although it made me think shared
is some built-in Python module, until I found a complete example at [Tutor] Global Variables between Modules ??.
However, when I looked for "sharing variables between scripts" [or processes] - besides the case when a Python script needs to use variables defined in other Python source files [but not necessarily running processes] - I mostly stumbled upon two other use cases:
- A script forks itself into multiple child processes, which then run in parallel [possibly on multiple processors] on the same PC
- A script spawns multiple other child processes, which then run in parallel [possibly on multiple processors] on the same PC
As such, most hits regarding "shared variables" and "interprocess communication" [IPC] discuss cases like these two; however, in both of these cases one can observe a "parent", to which the "children" usually have a reference.
What I am interested in, however, is running multiple invocations of the same script, ran independently, and sharing data between those [as in Python: how to share an object instance across multiple invocations of a script], in a singleton/single instance mode. That kind of problem is not really addressed by the above two cases - instead, it essentially reduces to the example in OP [sharing variables across two scripts].
Now, when dealing with this problem in Perl, there is IPC::Shareable; which "allows you to tie a variable to shared memory", using "an integer number or 4 character string[1] that serves as a common identifier for data across process space". Thus, there are no temporary files, nor networking setups - which I find great for my use case; so I was looking for the same in Python.
However, as accepted answer by @Drewfer notes: "You're not going to be able to do what you want without storing the information somewhere external to the two instances of the interpreter"; or in other words: either you have to use a networking/socket setup - or you have to use temporary files [ergo, no shared RAM for "totally separate python sessions"].
Now, even with these considerations, it is kinda difficult to find working examples [except for pickle
] - also in the docs for mmap and
multiprocessing. I have managed to find some other examples - which also describe some pitfalls that the docs do not mention:
- Usage of
mmap
: working code in two different scripts at Sharing Python data between processes using mmap | schmichael's blog- Demonstrates how both scripts change the shared value
- Note that here a temporary file is created as storage for saved data -
mmap
is just a special interface for accessing this temporary file
- Usage of
multiprocessing
: working code at:- Python multiprocessing RemoteManager under a multiprocessing.Process - working example of
SyncManager
[viamanager.start[]
] with sharedQueue
; server[s] writes, clients read [shared data] - Comparison of the multiprocessing module and pyro? - working example of
BaseManager
[viaserver.serve_forever[]
] with shared custom class; server writes, client reads and writes - How to synchronize a python dict with multiprocessing - this answer has a great explanation of
multiprocessing
pitfalls, and is a working example ofSyncManager
[viamanager.start[]
] with shared dict; server does nothing, client reads and writes
- Python multiprocessing RemoteManager under a multiprocessing.Process - working example of
Thanks to these examples, I came up with an example, which essentially does the same as the mmap
example, with approaches from the "synchronize a python dict" example - using BaseManager
[via manager.start[]
through file path address] with shared list; both server and client read and write [pasted below]. Note that:
multiprocessing
managers can be started either viamanager.start[]
orserver.serve_forever[]
serve_forever[]
locks -start[]
doesn't- There is auto-logging facility in
multiprocessing
: it seems to work fine withstart[]
ed processes - but seems to ignore the ones thatserve_forever[]
- The address specification in
multiprocessing
can be IP [socket] or temporary file [possibly a pipe?] path; inmultiprocessing
docs:- Most examples use
multiprocessing.Manager[]
- this is just a function [not class instantiation] which returns aSyncManager
, which is a special subclass ofBaseManager
; and usesstart[]
- but not for IPC between independently ran scripts; here a file path is used - Few other examples
serve_forever[]
approach for IPC between independently ran scripts; here IP/socket address is used - If an address is not specified, then an temp file path is used automatically [see 16.6.2.12. Logging for an example of how to see this]
- Most examples use
In addition to all the pitfalls in the "synchronize a python dict" post, there are additional ones in case of a list. That post notes:
All manipulations of the dict must be done with methods and not dict assignments [syncdict["blast"] = 2 will fail miserably because of the way multiprocessing shares custom objects]
The workaround to dict['key']
getting and setting, is the use of the dict
public methods get
and
update
. The problem is that there are no such public methods as alternative for list[index]
; thus, for a shared list, in addition we have to register __getitem__
and __setitem__
methods [which are private for list
] as exposed
, which means we also have to re-register all the public methods for list
as well :/
Well, I think those were the most critical things; these are the two scripts - they can just be ran in separate terminals [server first]; note developed on Linux with Python 2.7:
a.py
[server]:
import multiprocessing
import multiprocessing.managers
import logging
logger = multiprocessing.log_to_stderr[]
logger.setLevel[logging.INFO]
class MyListManager[multiprocessing.managers.BaseManager]:
pass
syncarr = []
def get_arr[]:
return syncarr
def main[]:
# print dir[[]] # cannot do `exposed = dir[[]]`!! manually:
MyListManager.register["syncarr", get_arr, exposed=['__getitem__', '__setitem__', '__str__', 'append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']]
manager = MyListManager[address=['/tmp/mypipe'], authkey='']
manager.start[]
# we don't use the same name as `syncarr` here [although we could];
# just to see that `syncarr_tmp` is actually
# so we also have to expose `__str__` method in order to print its list values!
syncarr_tmp = manager.syncarr[]
print["syncarr [master]:", syncarr, "syncarr_tmp:", syncarr_tmp]
print["syncarr initial:", syncarr_tmp.__str__[]]
syncarr_tmp.append[140]
syncarr_tmp.append["hello"]
print["syncarr set:", str[syncarr_tmp]]
raw_input['Now run b.py and press ENTER']
print
print 'Changing [0]'
syncarr_tmp.__setitem__[0, 250]
print 'Changing [1]'
syncarr_tmp.__setitem__[1, "foo"]
new_i = raw_input['Enter a new int value for [0]: ']
syncarr_tmp.__setitem__[0, int[new_i]]
raw_input["Press any key [NOT Ctrl-C!] to kill server [but kill client first]".center[50, "-"]]
manager.shutdown[]
if __name__ == '__main__':
main[]
b.py
[client]
import time
import multiprocessing
import multiprocessing.managers
import logging
logger = multiprocessing.log_to_stderr[]
logger.setLevel[logging.INFO]
class MyListManager[multiprocessing.managers.BaseManager]:
pass
MyListManager.register["syncarr"]
def main[]:
manager = MyListManager[address=['/tmp/mypipe'], authkey='']
manager.connect[]
syncarr = manager.syncarr[]
print "arr = %s" % [dir[syncarr]]
# note here we need not bother with __str__
# syncarr can be printed as a list without a problem:
print "List at start:", syncarr
print "Changing from client"
syncarr.append[30]
print "List now:", syncarr
o0 = None
o1 = None
while 1:
new_0 = syncarr.__getitem__[0] # syncarr[0]
new_1 = syncarr.__getitem__[1] # syncarr[1]
if o0 != new_0 or o1 != new_1:
print 'o0: %s => %s' % [str[o0], str[new_0]]
print 'o1: %s => %s' % [str[o1], str[new_1]]
print "List is:", syncarr
print 'Press Ctrl-C to exit'
o0 = new_0
o1 = new_1
time.sleep[1]
if __name__ == '__main__':
main[]
As a final remark, on Linux /tmp/mypipe
is created - but is 0 bytes, and has attributes srwxr-xr-x
[for a socket]; I guess this makes me happy, as I neither have to worry about network ports, nor about temporary files as such :]
Other related questions:
- Python: Possible to share in-memory data between 2 separate processes [very good explanation]
- Efficient Python to Python IPC
- Python: Sending a variable to another script