papermill | πŸ“š Parameterize , execute , and analyze notebooks | Machine Learning library

Β by Β  nteract Python Version: 2.5.0 License: BSD-3-Clause

kandi X-RAY | papermill Summary

kandi X-RAY | papermill Summary

papermill is a Python library typically used in Artificial Intelligence, Machine Learning, Jupyter applications. papermill has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has high support. You can install using 'pip install papermill' or download it from GitHub, PyPI.

.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              papermill has a highly active ecosystem.
              It has 5255 star(s) with 393 fork(s). There are 92 watchers for this library.
              There were 2 major release(s) in the last 6 months.
              There are 125 open issues and 246 have been closed. On average issues are closed in 44 days. There are 6 open pull requests and 0 closed requests.
              It has a positive sentiment in the developer community.
              The latest version of papermill is 2.5.0

            kandi-Quality Quality

              papermill has 0 bugs and 0 code smells.

            kandi-Security Security

              papermill has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              papermill code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              papermill is licensed under the BSD-3-Clause License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              papermill releases are not available. You will need to build from source code and install.
              Deployable package is available in PyPI.
              Build file is available. You can build the component from source.
              Installation instructions are not available. Examples and code snippets are available.
              It has 4897 lines of code, 560 functions and 39 files.
              It has medium code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed papermill and discovered the below as its top functions. This is intended to give you an instant insight into papermill implemented functionality, and help decide if they suit your requirements.
            • Run a papermill notebook
            • Check if value is an integer
            • Return True if value is a float
            • Convert value to python type
            • Execute a managed notebook
            • Ends a cell
            • Execute the notebook with kwargs
            • Execute all cells
            • Read the contents of a repository
            • Translate a float value
            • Translate a dictionary
            • Generate code for a parameter
            • Load entry points
            • Write data to store
            • Return the version number
            • Write content to blob
            • List all files in a given blob
            • List objects in this bucket
            • Reads a file into a list of lines
            • Read blob from Blob service
            • Translate a floating point value
            • Write data to a file
            • List files in S3
            • Generator of lines from source
            • Upload a file to S3
            • Returns a list of all folders in the specified url
            Get all kandi verified functions for this library.

            papermill Key Features

            No Key Features are available at this moment for papermill.

            papermill Examples and Code Snippets

            kbatch,Usage,Submit a job
            Pythondot img1Lines of Code : 35dot img1License : Permissive (MIT)
            copy iconCopy
            $ kbatch job submit --name=list-files \
                --command='["ls", "-lh"] \
                --image=alpine
            
            $ kbatch job submit --name=test \
                --image="mcr.microsoft.com/planetary-computer/python" \
                --command='["papermill", "notebook.ipynb"]' \
                --file=note  
            NotebookScheduler,Getting started
            Pythondot img2Lines of Code : 32dot img2License : Permissive (MIT)
            copy iconCopy
             /
             β”œβ”€β”€ NotebookScheduler.py 
             β”œβ”€β”€ hourly/
             β”‚   β”œβ”€β”€ notebook1.ipynb
             β”‚   β”œβ”€β”€ notebook2.ipynb
             β”‚   └── snapshots/
             β”‚       β”œβ”€β”€ notebook1/
             |       β”‚   └──
             β”‚       β”‚      └── notebook1.ipynb
             β”‚       └── notebook2/    
             |           └──
             β”‚               
            Papersweep,How to use
            Jupyter Notebookdot img3Lines of Code : 16dot img3License : Permissive (Apache-2.0)
            copy iconCopy
            $ papersweep --help
            usage: papersweep [-h] [--pm_params PM_PARAMS] [--sweep_id SWEEP_ID] [--pdb] [--xtra XTRA] input_nb sweep_config entity project
            
            positional arguments:
              input_nb              Input notebook
              sweep_config          YAML file with t  
            jupyter run produces invalid JSON with single quotes
            Pythondot img4Lines of Code : 5dot img4License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            jupyter nbconvert --to pdf --execute U-Run.ipynb 
            
            papermill U-Run.ipynb U-Run-2.ipynb 
            jupyter nbconvert --to pdf U-Run-2.ipynb 
            
            Add papermill parameters to notebook manually
            Pythondot img5Lines of Code : 4dot img5License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            #%% tags=["parameters"]
            my_text = ""
            print(my_text)
            
            Is there a way to run all Jupyter Notebooks inside a directory?
            Pythondot img6Lines of Code : 11dot img6License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            conda install -c conda-forge papermill
            
            import papermill as pm
            from pathlib import Path
            
            for nb in Path('./run_all').glob('*.ipynb'):
                pm.execute_notebook(
                    input_path=nb,
                    output_path=nb  # Path to
            Running Multiple functions in the same Thread in a specific order
            Pythondot img7Lines of Code : 11dot img7License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
               t1.start() # <- Started 
               t2.start() # <- Started
               # t1 and t2 executing concurrently
            
               t1.join()
               t2.join()
               # wait for both to finish
            
               t3.start()
               t3.join()
            
            How do I print progress of a single cell to the console using papermill?
            Pythondot img8Lines of Code : 2dot img8License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            /your/dir/>papermill "yournotebook.ipynb" "your notebook output.ipynb" --log-output --log-level DEBUG --progress-bar
            
            how to run a python jupyter notebook daily automatically
            Pythondot img9Lines of Code : 11dot img9License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            papermill local/input.ipynb s3://bkt/output.ipynb -p alpha 0.6 -p l1_ratio 0.1
            
            nbconvert --execute
            
            /opt/anaconda/envs/yourenv/bin/jupyter nbconvert \
                                  --execute \
                 
            How to mass produce reports with jupyter notebooks?
            Pythondot img10Lines of Code : 21dot img10License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            dag = DAG()
            
            get_data = PythonCallable(_get_data,
                                      product=File('raw.parquet'),
                                      dag=dag)
            
            clean_data = PythonCallable(_clean_data,
                                        product=File('clean.parquet')

            Community Discussions

            QUESTION

            The airflow scheduler stops working after updating pypi packages on google cloud composer 2.0.1
            Asked 2022-Mar-27 at 07:04

            I am trying to migrate from google cloud composer composer-1.16.4-airflow-1.10.15 to composer-2.0.1-airflow-2.1.4, However we are getting some difficulties with the libraries as each time I upload the libs, the scheduler fails to work.

            here is my requirements.txt

            ...

            ANSWER

            Answered 2022-Mar-27 at 07:04

            We have found out what was happening. The root cause was the performances of the workers. To be properly working, composer expects the scanning of the dags to take less than 15% of the CPU ressources. If it exceeds this limit, it fails to schedule or update the dags. We have just taken bigger workers and it has worked well

            Source https://stackoverflow.com/questions/70684862

            QUESTION

            Is there a conda (mamba) equivalent to pip install's `--root` path option?
            Asked 2022-Mar-22 at 17:03
            Background

            I have a multi-stage Dockerfile for a JupyterLab image. There are three stages:

            • server
            • kernel: mamba create -y -n /tmp/kernel-packages/myenv ...
            • runner:
            ...

            ANSWER

            Answered 2022-Mar-22 at 17:03

            The --prefix argument is the equivalent - just that some Conda packages use hardcoded paths, hence the issue you encounter.

            conda-prefix-replacement

            To properly move a Conda environment to a new prefix via a COPY operation one would need to run the conda-prefix-replacement tool (a.k.a., cpr) to ensure that all files with hardcoded paths get updated to the new location. Presumably, conda-pack is doing a similar thing, just under-the-hood.

            For your purposes, you might consider pre-running cpr on the environment(s) in the kernel image so that they are ready to work in the deployed location. Though that would mean always COPYing to the same location.

            See the cpr repository for details on use.

            Source https://stackoverflow.com/questions/71570442

            QUESTION

            Automating the execution of a Jupyter notebook in Azure
            Asked 2022-Mar-03 at 05:31

            I have a script in the Jupyter notebook, which creates interactive graphs for the data set that it is provided. I then convert the output as an HTML file without inputs to create a report for that dataset to be shared with my colleagues. I have also used papermill to parametrize the process, that I send it the name of the file and it creates a report for me. All the datasets are stored in Azure datalake.

            Now it is all very easy when I am doing it in my local machine, but I want to automate the process to generate reports for the new incoming datasets every hour and store the HTML outputs in the azure datalake, I want to run this automation on the cloud.

            I initially began with using automation accounts, but I didnot know how to execute a jupyter notebook in the automation accounts, and where to store my .ipynb file. I have also looked at the jupyter hub server (VM) on azure cloud but I am unable to understand how to automate it as well.

            Can any one help me with a way to automate the entire process on the Azure Cloud in the cheapest way possible, because I have to generate a lot of reports.

            Thanks!

            ...

            ANSWER

            Answered 2022-Mar-03 at 05:31

            Apart from Automation, you can use Azure Functions as mentioned in this document:

            Β· To run a PowerShell-based Jupyter Notebook, you can use PowerShell in an Azure function to call the Invoke-ExecuteNotebook cmdlet. This is similar to the technique described above for Automation jobs. For more information, see Azure Functions PowerShell developer guide.

            Β· To run a SQL-based Jupyter Notebook, you can use PowerShell in an Azure function to call the Invoke-SqlNotebook cmdlet. For more information, see Azure Functions PowerShell developer guide.

            Β· To run a Python-based Jupyter Notebook, you can use Python in an Azure function to call papermill. For more information, see Azure Functions Python developer guide.

            References: Run Jupyter Notebook on the Cloud in 15 mins #Azure | by Anish Mahapatra | Towards Data Science, How to run a Jupyter notebook with Python code automatically on a daily basis? - Stack Overflow and Scheduled execution of Python script on Azure - Stack Overflow

            Source https://stackoverflow.com/questions/71033953

            QUESTION

            iPyKernel throwing "TypeError: object NoneType can't be used in 'await' expression"
            Asked 2022-Feb-22 at 08:29

            I know that several similar questions exist on this topic, but to my knowledge all of them concern an async code (wrongly) written by the user, while in my case it comes from a Python package.

            I have a Jupyter notebook whose first cell is

            ...

            ANSWER

            Answered 2022-Feb-22 at 08:27

            Seems to be a bug in ipykernel 6.9.0 - options that worked for me:

            • upgrade to 6.9.1 (latest version as of 2022-02-22); e.g. via pip install ipykernel --upgrade
            • downgrade to 6.8.0 (if upgrading messes with other dependencies you might have); e.g. via pip install ipykernel==6.8.0

            Source https://stackoverflow.com/questions/71116130

            QUESTION

            Add papermill parameters to notebook manually
            Asked 2021-Oct-27 at 16:37

            Is is possible to add papermill parameters to a jupyter notebook manually, e.g. in an editor? Is it possible to add papermill parameters to a .py file and have them persist when converted to a .pynb file?

            Context:

            I am working on running jupyter notebooks in an automated way via Papermill. I would like to add parameters to a notebook manually rather than using jupyter or jupyter lab interfaces. Ideally these parameters could be added to a python script .py file first. Upon converting the .py file to a .ipynb file the parameters would persist.

            My desired workflow looks like this:

            1. Store generic notebook as a < notebook >.py file with parameters in version control repository
            2. Convert python script < notebook >.py to jupyter notebook < notebook >.ipynb
            3. Run < notebook >.ipynb via papermill and pass parameters to it
            4. Use nbconvert to produce output pdf with no code (using exclude_input argument)

            Steps 1-3 will be run via a script that can be auotmated. I want to use jupytext to avoid storing the notebooks and all their associated metadata. Currently, the only way I can find to add parameters to a notebook is to add them via jupyter/(lab) interfaces. Then I can run the notebook with papermill. However, this doesn't work with the jupytext conversion.

            *Note I would have added the "jupytext" tag to this but it doesn't exist yet and I don't have enough rep to create

            EDIT

            gooseberry's answer appears to be the correct one.

            However, it doesn't actually appear to be necessary to add a parameters tag to your notebook in order to inject parameters via papermill. While papermill will give a no cell with tag parameters found in notebook warning it will still inject the parameters. Additionally, your output notebook from papermill will have a new cell:

            ...

            ANSWER

            Answered 2021-Oct-22 at 05:21

            It depends which of the formats you chose for your .py files, but assuming you've chosen the "percent" format, where each new cell is marked with #%%, to add tags you write:

            Source https://stackoverflow.com/questions/69652215

            QUESTION

            Hung cells: running multiple jupyter notebooks in parallel with papermill
            Asked 2021-Apr-20 at 15:50

            I am trying to run jupyter notebooks in parallel by starting them from another notebook. I'm using papermill to save the output from the notebooks.

            In my scheduler.ipynb I’m using multiprocessing which is what some people have had success with. I create processes from a base notebook and this seems to always work the 1st time it’s run. I can run 3 notebooks with sleep 10 in 13 seconds. If I have a subsequent cell that attempts to run the exact same thing, the processes that it spawns (multiple notebooks) hang indefinitely. I’ve tried adding code to make sure the spawned processes have exit codes and have completed, even calling terminate on them once they are done- no luck, my 2nd attempt never completes.

            If I do:

            ...

            ANSWER

            Answered 2021-Apr-20 at 15:50

            Have you tried using the subprocess module? It seems like a better option for you instead of multiprocessing. It allows you to asynchronously spawn sub-processes that will run in parallel, this can be used to invoke commands and programs as if you were using the shell. I find it really useful to write python scripts instead of bash scripts.

            So you could use your main notebook to run your other notebooks as independent sub-processes in parallel with subprocesses.run(your_function_with_papermill).

            Source https://stackoverflow.com/questions/65571729

            QUESTION

            zmq.error.ZMQError: Address already in use, when running multiprocessing with multiple notebooks using papermill
            Asked 2020-Jun-07 at 17:13

            I am using the papermill library to run multiple notebooks using multiprocessing simultaneously.

            This is occurring on Python 3.6.6, Red Hat 4.8.2-15 within a Docker container.

            However when I run the python script, about 5% of my notebooks do not work immediately (No Jupyter Notebook cells run) due to me receiving this error:

            ...

            ANSWER

            Answered 2020-Jun-07 at 17:13

            The clear problem attribution points towards ZeroMQ unable to successfully .bind()

            We need a bit of details for speaking a common language & going in-sync :

            In case one has never worked with ZeroMQ,
            one may here enjoy to first look at "ZeroMQ: Principles in less than Five Seconds"
            before diving into further details


            The error message : zmq.error.ZMQError: Address already in use is easier to explain. Whereas ZeroMQ AccessPoint-s can, for obvious reasons freely try to .connect() to many counterparts, yet one and only one can .bind() onto a particular Transport Class' Address Target.

            There are three potential reasons for this happening :

            1 ) an accidental calling some code (without knowing internal details)
            via { multiprocessing.Process | joblib.Parallel | Docker-wrapped | ... }-spawned replicas, which each tries to acquire an ownership of some ZeroMQ Transport Class address, which will for obvious reasons fail to succeed for any attempt after a first one succeeded.

            2 ) a rather fatal situation, where some "previously"-run process did not manage to release such Transport Class specific address for further use ( do not remember that ZeroMQ might be just one of more other interested candidates - a Configuration Management flaw ), or in such cases, where previous runs failed to gracefully terminate such resource usage and left remained a Context()-instance still waiting ( infinitely in some cases till an O/S reboot ) listening for something, that will never happen.

            3 ) an indeed bad engineering practice in module software design, not to handle the ZeroMQ API documented EADDRINUSE error / exception less brutally than to just crash the whole circus (at all associated costs of that)

            The other error message : RuntimeError: Kernel died before replying to kernel_info related to a state, that the notebook's kernel was trying so long to establish all internal connections with its own components ( pool-peers ) that it took waiting longer than a configured or hardcoded timeout and the kernel-process simply stopped waiting anymore and threw itself into an otherwise unhandled exception you observed and reported.

            Solution :

            Check for any hanging address-owners first, reboot all nodes if in doubts on this, next verify there are no colliding attempts "hidden" inside your own code / { multiprocessing.Process() | joblib.Parallel() | ... the likes }-calls, that after distributed may try to .bind() onto the same target. If none of these steps salvage the trouble within your domain of control, ask the modules' used Support, to analyze and help you refactor & validate your still colliding use-case.

            Source https://stackoverflow.com/questions/62231888

            QUESTION

            Running Multiple functions in the same Thread in a specific order
            Asked 2020-May-21 at 01:06

            I have three functions that execute 3 different jupyter notebooks using papermill and I want the first (job1) and second (job2) functions to run concurrently and the last function (job3) to run only once the first function (job1) has finished running without any errors. I'm not sure if it makes sense to create a new thread for the second function or how to use the join() method appropriately. I'm running on Windows and for some reason concurrent.futures and multiprocessing don't work, which is why I'm using the thread module.

            ...

            ANSWER

            Answered 2020-Apr-23 at 16:23

            I like to start off by visualizing the desired flow, which I understand to look like:

            This means that t1 and t2 need to start concurrently and then you need to join on both:

            Source https://stackoverflow.com/questions/61380956

            QUESTION

            Can I use Papermill and Scrapbook with AWS EMR Notebooks?
            Asked 2020-Feb-06 at 17:09

            I have several notebooks which are ran by a "driver" notebook using papermill. These notebooks use the scrapbook library to communicate information to the driver. The driver then passes this information as parameters to other notebooks. I want to use EMR Notebooks to optimize the execution efficiency of this "notebook pipeline". Does AWS EMR Notebooks support scrapbook and papermill or will I need to refactor my notebooks?

            ...

            ANSWER

            Answered 2020-Feb-06 at 17:09

            As of now, nope. You can't do that directly. What you can do though (what we are doing) is as follows :

            1. Create a python environment on your EMR masternode using the hadoop user
            2. Install sparkmagic in your environment and configure all kernels as described in the README.md file for sparkmagic
            3. Copy your notebook to master node/use it directly from s3 location
            4. Install papermill and run with papermill :

              papermill s3://path/to/notebook/input.ipynb s3://path/to/notebook/output.ipynb -p param=1

            Source https://stackoverflow.com/questions/59902055

            QUESTION

            How do you schedule GCP AI Platform notebooks via Google Cloud Composer?
            Asked 2020-Jan-30 at 10:41

            I've been tasked with automating the scheduling of some notebooks that are run daily that are on AI Platform notebooks via the Papermill operator, but actually doing this through Cloud Composer is giving me some troubles.

            Any help is appreciated!

            ...

            ANSWER

            Answered 2020-Jan-30 at 10:41

            First step is to create Jupyter Lab Notebook. If you want to use additional libraries, install them and restart the kernel (Restart Kernel and Clear All Outputs option). Then, define the processing inside your Notebook.

            When it's ready, remove all the runs, peeks and dry runs before you start the scheduling phase.

            Now, you need to set up Cloud Composer environment (remember about installing additional packages, that you defined in first step). To schedule workflow, go to Jupyter Lab and create second notebook which generates DAG from workflow.

            The final step is to upload the zipped workflow to the Cloud Composer DAGs folder. You can manage your workflow using Airflow UI.

            I recommend you to take a look for this article.

            Another solution that you can use is Kubeflow, which aims to make running ML workloads on Kubernetes. Kubeflow adds some resources to your cluster to assist with a variety of tasks, including training and serving models and running Jupyter Notebooks. You can find interesting tutorial on codelabs.

            I hope you find the above pieces of information useful.

            Source https://stackoverflow.com/questions/59975382

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install papermill

            You can install using 'pip install papermill' or download it from GitHub, PyPI.
            You can use papermill like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            Install
          • PyPI

            pip install papermill

          • CLONE
          • HTTPS

            https://github.com/nteract/papermill.git

          • CLI

            gh repo clone nteract/papermill

          • sshUrl

            git@github.com:nteract/papermill.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link