Documentation¶
Executor¶
The primary object in pywren is an executor. The standard way to get everything set up is to import pywren, and call the default_executor function.
-
default_executor
(**kwargs)¶ Initialize and return an executor object.
Parameters: - config – Settings passed in here will override those in pywren_config. Default None.
- job_max_runtime – Max time per lambda. Default 200
:return executor object.
- Usage
>>> import pywren >>> pwex = pywren.default_executor()
default_executor() reads your pywren_config and returns executor object that’s ready to go. We can use this executor to run map, which applies a function to a list of data in the cloud.
-
Executor.
map
(func, iterdata, extra_env=None, extra_meta=None, invoke_pool_threads=64, data_all_as_one=True, use_cached_runtime=True, overwrite_invoke_args=None, exclude_modules=None)¶ Parameters: - func – the function to map over the data
- iterdata – An iterable of input data
- extra_env – Additional environment variables for lambda environment. Default None.
- extra_meta – Additional metadata to pass to lambda. Default None.
- invoke_pool_threads – Number of threads to use to invoke.
- data_all_as_one – upload the data as a single object. Default True
- use_cached_runtime – Use cached runtime whenever possible. Default true
- overwrite_invoke_args – Overwrite other args. Mainly used for testing.
- exclude_modules – Explicitly keep these modules from pickled dependencies.
Returns: A list with size len(iterdata) of futures for each job
Return type: list of futures.
- Usage
>>> futures = pwex.map(foo, data_list)
map returns a list of futures, and which can return their result when the task is complete.
-
class
ResponseFuture
(call_id, callset_id, invoke_metadata, storage_path)¶ Object representing the result of a PyWren invocation. Returns the status of the execution and the result when available.
-
result
(timeout=None, check_only=False, throw_except=True, storage_handler=None)¶ Return the value returned by the call. If the call raised an exception, this method will raise the same exception If the future is cancelled before completing then CancelledError will be raised.
Parameters: - timeout – This method will wait up to timeout seconds before raising a TimeoutError if function hasn’t completed. If None, wait indefinitely. Default None.
- check_only – Return None immediately if job is not complete. Default False.
- throw_except – Reraise exception if call raised. Default true.
- storage_handler – Storage handler to poll cloud storage. Default None.
Returns: Result of the call.
Raises: - CancelledError – If the job is cancelled before completed.
- TimeoutError – If job is not complete after timeout seconds.
-
Waiting for the results¶
-
wait
(fs, return_when=1, THREADPOOL_SIZE=64, WAIT_DUR_SEC=5)¶ Wait for the Future instances fs to complete. Returns a 2-tuple of lists. The first list contains the futures that completed (finished or cancelled) before the wait completed. The second contains uncompleted futures.
Parameters: - fs – A list of futures.
- return_when – One of ALL_COMPLETED, ANY_COMPLETED, ALWAYS
- THREADPOOL_SIZE – Number of threads to use. Default 64
- WAIT_DUR_SEC – Time interval between each check.
Returns: (fs_dones, fs_notdones) where fs_dones is a list of futures that have completed and fs_notdones is a list of futures that have not completed.
Return type: 2-tuple of lists
- Usage
>>> futures = pwex.map(foo, data) >>> dones, not_dones = wait(futures, ALL_COMPLETED) >>> # not_dones should be an empty list. >>> results = [f.result() for f in dones]
Alternatively, if you want to wait for everything to finish and then get all of the results, you can simply call get_all_results
-
get_all_results
(fs)¶ Take in a list of futures and block until they are completed. call result on each one individually, and return those results.
Parameters: fs – a list of futures. Returns: A list of the results of each futures Return type: list - Usage
>>> pwex = pywren.default_executor() >>> futures = pwex.map(foo, data) >>> results = get_all_results(futures)
Standalone Mode¶
To run pywren in standalone mode, run
pywren standalone launch_instances 2
This launches EC2 instances to run pywren. Once the instances are ready, you can run pywren as usual. The instance type is set in the pywren_config file, under the standalone key. By default, the instance type is set to m4.4xlarge.
>>> import pywren
>>> pwex = pywren.standalone_executor()
>>> futures = pwex.map(func, data)
If you set max_idle_time when launching, the ec2 instances will terminate themselves. Otherwise, you need to explicitly shut them down.
pywren standalone terminate_instances
Standalone Commands¶
# Launch EC2 instances.
> pywren standalone launch_instances --help
Usage: pywren standalone launch_instances [OPTIONS] [NUMBER]
Options:
--max_idle_time INTEGER instance queue idle time before checking
self-termination
--idle_terminate_granularity INTEGER
granularity of billing (sec)
--pywren_git_branch TEXT which branch to use on the stand-alone
--pywren_git_commit TEXT which git to use on the stand-alone
(supercedes pywren_git_branch)
--spot_price FLOAT use spot instances, at this reserve price
# List all running EC2 instances.
> pywren standalone list_instances
# Shut down all running EC2 instances.
> pywren standalone terminate_instances