papermill | 📚 Parameterize , execute , and analyze notebooks | Machine Learning library

by nteract Python Version: 2.6.0 License: BSD-3-Clause

X-Ray Key Features Code Snippets(10)Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | papermill Summary

papermill is a Python library typically used in Artificial Intelligence, Machine Learning, Jupyter applications. papermill has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has high support. You can install using 'pip install papermill' or download it from GitHub, PyPI.

.

Support

Quality

Security

License

Reuse

Support

papermill has a highly active ecosystem.

It has 5255 star(s) with 393 fork(s). There are 92 watchers for this library.

It had no major release in the last 12 months.

There are 125 open issues and 246 have been closed. On average issues are closed in 44 days. There are 6 open pull requests and 0 closed requests.

It has a positive sentiment in the developer community.

The latest version of papermill is 2.6.0

Quality

papermill has 0 bugs and 0 code smells.

Security

papermill has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

papermill code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

papermill is licensed under the BSD-3-Clause License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

papermill releases are not available. You will need to build from source code and install.

Deployable package is available in PyPI.

Build file is available. You can build the component from source.

Installation instructions are not available. Examples and code snippets are available.

It has 4897 lines of code, 560 functions and 39 files.

It has medium code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi has reviewed papermill and discovered the below as its top functions. This is intended to give you an instant insight into papermill implemented functionality, and help decide if they suit your requirements.

Run a papermill notebook
Check if value is an integer
Return True if value is a float
Convert value to python type
Execute a managed notebook
Ends a cell
Execute the notebook with kwargs
Execute all cells
Read the contents of a repository
Translate a float value
Translate a dictionary
Generate code for a parameter
Load entry points
Write data to store
Return the version number
Write content to blob
List all files in a given blob
List objects in this bucket
Reads a file into a list of lines
Read blob from Blob service
Translate a floating point value
Write data to a file
List files in S3
Generator of lines from source
Upload a file to S3
Returns a list of all folders in the specified url

Get all kandi verified functions for this library.

papermill Key Features

No Key Features are available at this moment for papermill.

papermill Examples and Code Snippets

kbatch,Usage,Submit a job

Python

Lines of Code : 35

License : Permissive (MIT)

Copy

$ kbatch job submit --name=list-files \
    --command='["ls", "-lh"] \
    --image=alpine

$ kbatch job submit --name=test \
    --image="mcr.microsoft.com/planetary-computer/python" \
    --command='["papermill", "notebook.ipynb"]' \
    --file=note

NotebookScheduler,Getting started

Python

Lines of Code : 32

License : Permissive (MIT)

Copy

 /
 ├── NotebookScheduler.py 
 ├── hourly/
 │   ├── notebook1.ipynb
 │   ├── notebook2.ipynb
 │   └── snapshots/
 │       ├── notebook1/
 |       │   └──
 │       │      └── notebook1.ipynb
 │       └── notebook2/    
 |           └──
 │

Papersweep,How to use

Jupyter Notebook

Lines of Code : 16

License : Permissive (Apache-2.0)

Copy

$ papersweep --help
usage: papersweep [-h] [--pm_params PM_PARAMS] [--sweep_id SWEEP_ID] [--pdb] [--xtra XTRA] input_nb sweep_config entity project

positional arguments:
  input_nb              Input notebook
  sweep_config          YAML file with t

jupyter run produces invalid JSON with single quotes

Python

Lines of Code : 5

License : Strong Copyleft (CC BY-SA 4.0)

Copy

jupyter nbconvert --to pdf --execute U-Run.ipynb

papermill U-Run.ipynb U-Run-2.ipynb 
jupyter nbconvert --to pdf U-Run-2.ipynb

Add papermill parameters to notebook manually

Python

Lines of Code : 4

License : Strong Copyleft (CC BY-SA 4.0)

Copy

#%% tags=["parameters"]
my_text = ""
print(my_text)

Is there a way to run all Jupyter Notebooks inside a directory?

Python

Lines of Code : 11

License : Strong Copyleft (CC BY-SA 4.0)

Copy

conda install -c conda-forge papermill

import papermill as pm
from pathlib import Path

for nb in Path('./run_all').glob('*.ipynb'):
    pm.execute_notebook(
        input_path=nb,
        output_path=nb  # Path to

Running Multiple functions in the same Thread in a specific order

Python

Lines of Code : 11

License : Strong Copyleft (CC BY-SA 4.0)

Copy

   t1.start() # <- Started 
   t2.start() # <- Started
   # t1 and t2 executing concurrently

   t1.join()
   t2.join()
   # wait for both to finish

   t3.start()
   t3.join()

How do I print progress of a single cell to the console using papermill?

Python

Lines of Code : 2

License : Strong Copyleft (CC BY-SA 4.0)

Copy

/your/dir/>papermill "yournotebook.ipynb" "your notebook output.ipynb" --log-output --log-level DEBUG --progress-bar

how to run a python jupyter notebook daily automatically

Python

Lines of Code : 11

License : Strong Copyleft (CC BY-SA 4.0)

Copy

papermill local/input.ipynb s3://bkt/output.ipynb -p alpha 0.6 -p l1_ratio 0.1

nbconvert --execute

/opt/anaconda/envs/yourenv/bin/jupyter nbconvert \
                      --execute \

How to mass produce reports with jupyter notebooks?

Python

Lines of Code : 21

License : Strong Copyleft (CC BY-SA 4.0)

Copy

dag = DAG()

get_data = PythonCallable(_get_data,
                          product=File('raw.parquet'),
                          dag=dag)

clean_data = PythonCallable(_clean_data,
                            product=File('clean.parquet')

Community Discussions

Trending Discussions on papermill

The airflow scheduler stops working after updating pypi packages on google cloud composer 2.0.1

Is there a conda (mamba) equivalent to pip install's `--root` path option?

Automating the execution of a Jupyter notebook in Azure

iPyKernel throwing "TypeError: object NoneType can't be used in 'await' expression"

Add papermill parameters to notebook manually

Hung cells: running multiple jupyter notebooks in parallel with papermill

zmq.error.ZMQError: Address already in use, when running multiprocessing with multiple notebooks using papermill

Running Multiple functions in the same Thread in a specific order

Can I use Papermill and Scrapbook with AWS EMR Notebooks?

How do you schedule GCP AI Platform notebooks via Google Cloud Composer?

QUESTION

The airflow scheduler stops working after updating pypi packages on google cloud composer 2.0.1

Asked 2022-Mar-27 at 07:04

I am trying to migrate from google cloud composer composer-1.16.4-airflow-1.10.15 to composer-2.0.1-airflow-2.1.4, However we are getting some difficulties with the libraries as each time I upload the libs, the scheduler fails to work.

here is my requirements.txt

...

ANSWER

Answered 2022-Mar-27 at 07:04

We have found out what was happening. The root cause was the performances of the workers. To be properly working, composer expects the scanning of the dags to take less than 15% of the CPU ressources. If it exceeds this limit, it fails to schedule or update the dags. We have just taken bigger workers and it has worked well

Source https://stackoverflow.com/questions/70684862

QUESTION

Is there a conda (mamba) equivalent to pip install's `--root` path option?

Asked 2022-Mar-22 at 17:03

Background

I have a multi-stage Dockerfile for a JupyterLab image. There are three stages:

server
kernel: mamba create -y -n /tmp/kernel-packages/myenv ...
runner:

...

ANSWER

Answered 2022-Mar-22 at 17:03

The --prefix argument is the equivalent - just that some Conda packages use hardcoded paths, hence the issue you encounter.

conda-prefix-replacement

To properly move a Conda environment to a new prefix via a COPY operation one would need to run the conda-prefix-replacement tool (a.k.a., cpr) to ensure that all files with hardcoded paths get updated to the new location. Presumably, conda-pack is doing a similar thing, just under-the-hood.

For your purposes, you might consider pre-running cpr on the environment(s) in the kernel image so that they are ready to work in the deployed location. Though that would mean always COPYing to the same location.

See the cpr repository for details on use.

Source https://stackoverflow.com/questions/71570442

QUESTION

Automating the execution of a Jupyter notebook in Azure

Asked 2022-Mar-03 at 05:31

I have a script in the Jupyter notebook, which creates interactive graphs for the data set that it is provided. I then convert the output as an HTML file without inputs to create a report for that dataset to be shared with my colleagues. I have also used papermill to parametrize the process, that I send it the name of the file and it creates a report for me. All the datasets are stored in Azure datalake.

Now it is all very easy when I am doing it in my local machine, but I want to automate the process to generate reports for the new incoming datasets every hour and store the HTML outputs in the azure datalake, I want to run this automation on the cloud.

I initially began with using automation accounts, but I didnot know how to execute a jupyter notebook in the automation accounts, and where to store my .ipynb file. I have also looked at the jupyter hub server (VM) on azure cloud but I am unable to understand how to automate it as well.

Can any one help me with a way to automate the entire process on the Azure Cloud in the cheapest way possible, because I have to generate a lot of reports.

Thanks!

...

ANSWER

Answered 2022-Mar-03 at 05:31

Apart from Automation, you can use Azure Functions as mentioned in this document:

· To run a PowerShell-based Jupyter Notebook, you can use PowerShell in an Azure function to call the Invoke-ExecuteNotebook cmdlet. This is similar to the technique described above for Automation jobs. For more information, see Azure Functions PowerShell developer guide.

· To run a SQL-based Jupyter Notebook, you can use PowerShell in an Azure function to call the Invoke-SqlNotebook cmdlet. For more information, see Azure Functions PowerShell developer guide.

· To run a Python-based Jupyter Notebook, you can use Python in an Azure function to call papermill. For more information, see Azure Functions Python developer guide.

References: Run Jupyter Notebook on the Cloud in 15 mins #Azure | by Anish Mahapatra | Towards Data Science, How to run a Jupyter notebook with Python code automatically on a daily basis? - Stack Overflow and Scheduled execution of Python script on Azure - Stack Overflow

Source https://stackoverflow.com/questions/71033953

QUESTION

iPyKernel throwing "TypeError: object NoneType can't be used in 'await' expression"

Asked 2022-Feb-22 at 08:29

I know that several similar questions exist on this topic, but to my knowledge all of them concern an async code (wrongly) written by the user, while in my case it comes from a Python package.

I have a Jupyter notebook whose first cell is

...

ANSWER

Answered 2022-Feb-22 at 08:27

Seems to be a bug in ipykernel 6.9.0 - options that worked for me:

upgrade to 6.9.1 (latest version as of 2022-02-22); e.g. via pip install ipykernel --upgrade
downgrade to 6.8.0 (if upgrading messes with other dependencies you might have); e.g. via pip install ipykernel==6.8.0

Source https://stackoverflow.com/questions/71116130

QUESTION

Add papermill parameters to notebook manually

Asked 2021-Oct-27 at 16:37

Is is possible to add papermill parameters to a jupyter notebook manually, e.g. in an editor? Is it possible to add papermill parameters to a .py file and have them persist when converted to a .pynb file?

Context:

I am working on running jupyter notebooks in an automated way via Papermill. I would like to add parameters to a notebook manually rather than using jupyter or jupyter lab interfaces. Ideally these parameters could be added to a python script .py file first. Upon converting the .py file to a .ipynb file the parameters would persist.

My desired workflow looks like this:

Store generic notebook as a < notebook >.py file with parameters in version control repository
Convert python script < notebook >.py to jupyter notebook < notebook >.ipynb
Run < notebook >.ipynb via papermill and pass parameters to it
Use nbconvert to produce output pdf with no code (using exclude_input argument)

Steps 1-3 will be run via a script that can be auotmated. I want to use jupytext to avoid storing the notebooks and all their associated metadata. Currently, the only way I can find to add parameters to a notebook is to add them via jupyter/(lab) interfaces. Then I can run the notebook with papermill. However, this doesn't work with the jupytext conversion.

*Note I would have added the "jupytext" tag to this but it doesn't exist yet and I don't have enough rep to create

EDIT

gooseberry's answer appears to be the correct one.

However, it doesn't actually appear to be necessary to add a parameters tag to your notebook in order to inject parameters via papermill. While papermill will give a no cell with tag parameters found in notebook warning it will still inject the parameters. Additionally, your output notebook from papermill will have a new cell:

...

ANSWER

Answered 2021-Oct-22 at 05:21

It depends which of the formats you chose for your .py files, but assuming you've chosen the "percent" format, where each new cell is marked with #%%, to add tags you write:

Source https://stackoverflow.com/questions/69652215

QUESTION

Hung cells: running multiple jupyter notebooks in parallel with papermill

Asked 2021-Apr-20 at 15:50

I am trying to run jupyter notebooks in parallel by starting them from another notebook. I'm using papermill to save the output from the notebooks.

In my scheduler.ipynb I’m using multiprocessing which is what some people have had success with. I create processes from a base notebook and this seems to always work the 1st time it’s run. I can run 3 notebooks with sleep 10 in 13 seconds. If I have a subsequent cell that attempts to run the exact same thing, the processes that it spawns (multiple notebooks) hang indefinitely. I’ve tried adding code to make sure the spawned processes have exit codes and have completed, even calling terminate on them once they are done- no luck, my 2nd attempt never completes.

If I do:

...

ANSWER

Answered 2021-Apr-20 at 15:50

Have you tried using the subprocess module? It seems like a better option for you instead of multiprocessing. It allows you to asynchronously spawn sub-processes that will run in parallel, this can be used to invoke commands and programs as if you were using the shell. I find it really useful to write python scripts instead of bash scripts.

So you could use your main notebook to run your other notebooks as independent sub-processes in parallel with subprocesses.run(your_function_with_papermill).

Source https://stackoverflow.com/questions/65571729

QUESTION

zmq.error.ZMQError: Address already in use, when running multiprocessing with multiple notebooks using papermill

Asked 2020-Jun-07 at 17:13

I am using the papermill library to run multiple notebooks using multiprocessing simultaneously.

This is occurring on Python 3.6.6, Red Hat 4.8.2-15 within a Docker container.

However when I run the python script, about 5% of my notebooks do not work immediately (No Jupyter Notebook cells run) due to me receiving this error:

...

ANSWER

Answered 2020-Jun-07 at 17:13

The clear problem attribution points towards ZeroMQ unable to successfully .bind()

We need a bit of details for speaking a common language & going in-sync :

_{In case one has never worked with ZeroMQ,
one may here enjoy to first look at "ZeroMQ: Principles in less than Five Seconds"
before diving into further details}

The error message : zmq.error.ZMQError: Address already in use is easier to explain. Whereas ZeroMQ AccessPoint-s can, for obvious reasons freely try to .connect() to many counterparts, yet one and only one can .bind() onto a particular Transport Class' Address Target.

There are three potential reasons for this happening :

1 ) an accidental calling some code (without knowing internal details)
via { multiprocessing.Process | joblib.Parallel | Docker-wrapped | ... }-spawned replicas, which each tries to acquire an ownership of some ZeroMQ Transport Class address, which will for obvious reasons fail to succeed for any attempt after a first one succeeded.

2 ) a rather fatal situation, where some "previously"-run process did not manage to release such Transport Class specific address for further use ( do not remember that ZeroMQ might be just one of more other interested candidates - a Configuration Management flaw ), or in such cases, where previous runs failed to gracefully terminate such resource usage and left remained a Context()-instance still waiting ( infinitely in some cases till an O/S reboot ) listening for something, that will never happen.

3 ) an indeed bad engineering practice in module software design, not to handle the ZeroMQ API documented EADDRINUSE error / exception less brutally than to just crash the whole circus (at all associated costs of that)

The other error message : RuntimeError: Kernel died before replying to kernel_info related to a state, that the notebook's kernel was trying so long to establish all internal connections with its own components ( pool-peers ) that it took waiting longer than a configured or hardcoded timeout and the kernel-process simply stopped waiting anymore and threw itself into an otherwise unhandled exception you observed and reported.

Solution :

Check for any hanging address-owners first, reboot all nodes if in doubts on this, next verify there are no colliding attempts "hidden" inside your own code / { multiprocessing.Process() | joblib.Parallel() | ... the likes }-calls, that after distributed may try to .bind() onto the same target. If none of these steps salvage the trouble within your domain of control, ask the modules' used Support, to analyze and help you refactor & validate your still colliding use-case.

Source https://stackoverflow.com/questions/62231888

QUESTION

Running Multiple functions in the same Thread in a specific order

Asked 2020-May-21 at 01:06

I have three functions that execute 3 different jupyter notebooks using papermill and I want the first (job1) and second (job2) functions to run concurrently and the last function (job3) to run only once the first function (job1) has finished running without any errors. I'm not sure if it makes sense to create a new thread for the second function or how to use the join() method appropriately. I'm running on Windows and for some reason concurrent.futures and multiprocessing don't work, which is why I'm using the thread module.

...

ANSWER

Answered 2020-Apr-23 at 16:23

I like to start off by visualizing the desired flow, which I understand to look like:

This means that t1 and t2 need to start concurrently and then you need to join on both:

Source https://stackoverflow.com/questions/61380956

QUESTION

Can I use Papermill and Scrapbook with AWS EMR Notebooks?

Asked 2020-Feb-06 at 17:09

I have several notebooks which are ran by a "driver" notebook using papermill. These notebooks use the scrapbook library to communicate information to the driver. The driver then passes this information as parameters to other notebooks. I want to use EMR Notebooks to optimize the execution efficiency of this "notebook pipeline". Does AWS EMR Notebooks support scrapbook and papermill or will I need to refactor my notebooks?

...

ANSWER

Answered 2020-Feb-06 at 17:09

As of now, nope. You can't do that directly. What you can do though (what we are doing) is as follows :

Create a python environment on your EMR masternode using the hadoop user
Install sparkmagic in your environment and configure all kernels as described in the README.md file for sparkmagic
Copy your notebook to master node/use it directly from s3 location
Install papermill and run with papermill :

papermill s3://path/to/notebook/input.ipynb s3://path/to/notebook/output.ipynb -p param=1

Source https://stackoverflow.com/questions/59902055

QUESTION

How do you schedule GCP AI Platform notebooks via Google Cloud Composer?

Asked 2020-Jan-30 at 10:41

I've been tasked with automating the scheduling of some notebooks that are run daily that are on AI Platform notebooks via the Papermill operator, but actually doing this through Cloud Composer is giving me some troubles.

Any help is appreciated!

...

ANSWER

Answered 2020-Jan-30 at 10:41

First step is to create Jupyter Lab Notebook. If you want to use additional libraries, install them and restart the kernel (Restart Kernel and Clear All Outputs option). Then, define the processing inside your Notebook.

When it's ready, remove all the runs, peeks and dry runs before you start the scheduling phase.

Now, you need to set up Cloud Composer environment (remember about installing additional packages, that you defined in first step). To schedule workflow, go to Jupyter Lab and create second notebook which generates DAG from workflow.

The final step is to upload the zipped workflow to the Cloud Composer DAGs folder. You can manage your workflow using Airflow UI.

I recommend you to take a look for this article.

Another solution that you can use is Kubeflow, which aims to make running ML workloads on Kubernetes. Kubeflow adds some resources to your cluster to assist with a variety of tasks, including training and serving models and running Jupyter Notebooks. You can find interesting tutorial on codelabs.

I hope you find the above pieces of information useful.

Source https://stackoverflow.com/questions/59975382

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install papermill

You can install using 'pip install papermill' or download it from GitHub, PyPI.
You can use papermill like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: