Popular New Releases in Data Manipulation
numpy
did_you_mean
v1.5.0
numexpr
NumExpr 2.7.2
nbconvert
6.5.0
hapi-fhir
HAPI FHIR 5.2.0 (Numbat)
Popular Libraries in Data Manipulation
by numpy python
20101 BSD-3-Clause
The fundamental package for scientific computing with Python.
by menzi11 javascript
15078 NOASSERTION
Needs to generate some texts to test if my GUI rendering codes good or not. so I made this.
by mortenjust swift
3291 GPL-3.0
A text editor that will help you write clearer and simpler
by dollarshaveclub javascript
2113 MIT
💈 Shave is a 0 dep JS plugin that truncates text to fit within an element based on a set max-height ✁
by ruby ruby
1756 MIT
The gem that has been saving people from typos since 2014
by pydata python
1687 MIT
Fast numerical array expression evaluator for Python, NumPy, PyTables, pandas, bcolz and more
by benhurott javascript
1514 MIT
A pure javascript masked text and input text component for React-Native.
by ajcr jupyter notebook
1499 MIT
100 data puzzles for pandas, ranging from short and simple to super tricky (60% complete)
by jupyter python
1293 NOASSERTION
Jupyter Notebook Conversion
Trending New libraries in Data Manipulation
by borisdayma python
830 Apache-2.0
DALL·E Mini - Generate images from a text prompt
by seek-oss typescript
815 MIT
Flipping how we define typography in CSS.
by theOehrly python
764 MIT
FastF1 is a python package for accessing and analyzing Formula 1 results, schedules, timing data and telemetry
by ricokahler typescript
417 MIT
a color parsing and manipulation lib served in roughly 2kB
by kashishgrover javascript
307
Show a "read more", "see more", "read less", "see less" inline with your text in React Native
by rtosholdings python
305 NOASSERTION
64bit multithreaded python data analytics tools for numpy arrays and datasets
by yugedata python
246 BSD-3-Clause
Collecting, analyzing, visualizing & paper trading options market data
by razorpay go
209 MIT
A go port of numpy-financial functions and more.
by ueberdosis php
159
Online text file converter
Top Authors in Data Manipulation
1
12 Libraries
645
2
8 Libraries
263
3
6 Libraries
81
4
5 Libraries
169
5
5 Libraries
112
6
5 Libraries
20596
7
5 Libraries
23
8
5 Libraries
47
9
5 Libraries
24
10
5 Libraries
154
1
12 Libraries
645
2
8 Libraries
263
3
6 Libraries
81
4
5 Libraries
169
5
5 Libraries
112
6
5 Libraries
20596
7
5 Libraries
23
8
5 Libraries
47
9
5 Libraries
24
10
5 Libraries
154
Trending Kits in Data Manipulation
OpenCV is a library of programming functions mainly aimed at real-time computer vision. It is written in C, C++, and Python, and runs on Windows, Linux, Android, and macOS.OpenCV is widely used in the field of computer vision for tasks such as object recognition, face detection, and image and video analysis. It has a large community of developers and users and is continuously updated and improved.
OpenCV provides a large collection of algorithms and functions for image and video processing, including:
- Image processing operations like filtering, morphological transformations, thresholding, etc.
- Object detection and recognition, including face detection and recognition, object tracking, etc.
- Image and video analysis, including edge detection, feature extraction, and optical flow.
- Camera calibration and 3D reconstruction.
- Machine learning algorithms, including support for deep learning frameworks like TensorFlow and Caffe.
You can divide an image into two equal parts vertically or horizontally using OpenCV by simply slicing the image array. Here's an example of how you could divide an image into two equal parts horizontally in Python using OpenCV:
This code splits the image into two equal parts, horizontally. It first retrieves the shape of the image and calculates the height and width of the image. It then calculates the starting and ending row and column pixel coordinates for the top and bottom halves of the image. The image is then sliced and each half is stored in the cropped_top and cropped_bot variables. Finally, each of the two cropped images is displayed using the OpenCV function cv2.imshow() and is shown until a key is pressed using the cv2.waitKey(0) function
Here is an example of how you can Divide the image into two equal parts using OpenCV
Preview of the output that you will get on running this code from your IDE
CODE
In this solution we use the Imread function of the OpenCV.
- Copy the code using the "Copy" button above, and paste it in a Python file in your IDE.
- Modify the name, location of the image to display in the code.
- Run the file to divide the image to Top and Bottom
I hope you found this useful. I have added the link to dependent libraries, version information in the following sections.
i found this code snippet by searching for "divide image into tow equal parts python opencv" in kandi. You can try any such use case!
Environment Tested
I tested this solution in the following versions. Be mindful of changes when working with other versions
- The solution is created and executed in python version 3.7.15 .
- The solution is tested on OpenCV 4.6.0
- The solution is tested on numpy 1.21.6
Using this solution, we are able to divide an image using the OpenCV library in Python with simple steps. This process also facilities an easy-to-use, hassle-free method to create a hands-on working version of code which would help us divide an image in Python
Dependent Library
If you do not have OpenCV and numpy that is required to run this code, you can install it by clicking on the above link and copying the pip Install command from the OpenCV page in kandi.
You can search for any dependent library on kandi like OpenCV and numpy
Support
- For any support on kandi solution kits, please use the chat
- For further learning resources, visit the Open Weaver Community learning page.
OpenCV is a computer vision library written in C++ and widely used for image and video processing. It offers a range of features for working with photographs and movies, including the ability to load and save images, use filters, find edges, and find and track objects. In collaboration, applications involving image and video processing are frequently created using Python and OpenCV. This combination enables you to develop solid and adaptable programs that can address various computer vision issues.
In our work as developers, we frequently must read and rotate the photos in our applications to complete various image processing activities, such as recognition, upload, augmentation, training, and many more. There are numerous libraries for Python that enable working with images. Python has features for manipulating, enhancing, and creating more images. In addition to using additional OpenCV functions to apply other transformations to the image, such as scaling, cropping, and applying filters, you can modify the angle of rotation and the image's size to get the desired effect.
Here is an example of how we can draw a line beyond the second point using opencv
Preview of the output that you will get on running this code from your IDE
CODE
In this solution we use the numpy and open cv library
- Copy the code using the "Copy" button above, and paste it in a Python file in your IDE.
- Import open Cv library and Numpy library.
- Modify the name and Length of the points.
- Run the file to draw a line.
I hope you found this useful. I have added the link to dependent libraries, version information in the following sections.
ifound this code snippet by searching for "Draw a line in open cv and python beyond given points" in kandi. You can try any such use case!
Dependent Library
If you do not have OpenCV and numpy that is required to run this code, you can install it by clicking on the above link and copying the pip Install command from the OpenCV page in kandi.
You can search for any dependent library on kandi like OpenCV and numpy
Environment Test
I tested this solution in the following versions. Be mindful of changes when working with other versions
- The solution is created and executed in python version 3.7.15 .
- The solution is tested on OpenCV 4.6.0 version
- The solution is tested on numpy 1.21.6
Using this solution, we are going to draw a line beyond the second given point using the OpenCv library and numpy library in Python with simple steps. This process also facilities an easy to use, hassle free method to create a hands-on working version of code which would help us draw a image in Python
Support
- For any support on kandi solution kits, please use the chat
- For further learning resources, visit the Open Weaver Community learning page.
Precision and recall are two commonly used metrics for evaluating the performance of a classification model. Precision measures the accuracy of the positive predictions, while recall measures the ability of the model to identify all relevant positive samples. y_true is the list of true labels and y_pred is the list of predicted labels. The precision_score and recall_score functions calculate the precision and recall, respectively
Precision is the fraction of true positive predictions out of all positive predictions made. It Measures the accuracy of the positive predictions
recall is the fraction of true positive predictions out of all actual positive cases. It measures the completeness of the positive predictions
- Confusion_matrix: This function generates a confusion matrix given true labels and predicted labels.
- precision_score: This function calculates the precision score of a classification model given true labels and predicted labels.
- recall_score: This function calculates the recall score of a classification model given true labels and predicted labels.
- These libraries and functions can be used to evaluate the performance of a classification model.
Here is the example of how we can find the Precision score and recall score using Sk-learn.
Preview of the output that you will get on running this code from your IDE
Code
In this solution we have used Scikit-Learn
- Copy the code using the "Copy" button above, and paste it in a Python file in your IDE.
- Run the file to get the output
I hope you found this useful. I have added the link to dependent libraries, version information in the following sections.
I found this code snippet by searching for "Need help finding the precision and recall for a confusion matrix" in kandi. You can try any such use case!
Dependent Library
Environment Tested
I tested this solution in the following versions. Be mindful of changes when working with other versions.
- The solution is created in Python 3.7.15 version
- The solution is tested on scikit-learn 1.0.2 version
Using this solution, we are able going to learn how to Finding the precision and recall for a confusion matrix in python using Scikit learn library in Python with simple steps. This process also facilities an easy to use, hassle free method to create a hands-on working version of code which would help Finding the precision and recall for a confusion matrix in Python.
If you do not have Scikit-learn and numpy that is required to run this code, you can install it by clicking on the above link and copying the pip Install command from the Scikit-learn page in kandi.
You can search for any dependent library on kandi like Scikit-learn. numpy
Support
- For any support on kandi solution kits, please use the chat
- For further learning resources, visit the Open Weaver Community learning page.
Converting RGB to YCbCr can provide better results for image and video compression, color space conversions, and HDR processing. There are several reasons why we might need to convert RGB to YCbCr
Compression efficiency: YCbCr provides better compression results compared to RGB, especially in preserving image quality after compression. This is because the human visual system is more sensitive to changes in brightness (luma, Y) than to changes in color (chroma, Cb and Cr). Color space conversion: Some image processing tasks, such as color correction and color space conversion, may require transforming the image from one color space to another. For example, many image sensors capture the image in the YCbCr color space, and it may be necessary to convert it to RGB for display purposes.
OpenCV (Open Source Computer Vision Library) is an open-source and machine-learning software library. OpenCV is a computer vision library written in C++ and widely used for image and video processing. OpenCV provides a vast array of image and video processing functions that can be used in various domains such as:
- Object detection and recognition
- Image and video segmentation
- Face and feature detection
- Object tracking
- Image restoration and enhancement
- Stereoscopic vision
- Motion analysis and object tracking
- 3D reconstruction
RGB and YCbCr are color spaces used in digital image processing.
RGB stands for Blue, Green, Red, and is an encoding of the RGB (Red, Green, Blue) color space. BGR is used in computer vision and image processing applications and is the default color format for the OpenCV library in Python.
YCbCr, on the other hand, stands for Luma (Y) and Chrominance (Cb, Cr), and is a color space used in digital video processing. YCbCr separates the brightness information (luma) from the color information (chroma), which allows for more efficient compression. YCbCr is used in many image and video compression standards, such as JPEG and MPEG. In summary, BGR is used in computer vision and image processing, while YCbCr is used in video processing and compression.
In this solution, we are going to learn how to convert the RGB image to YcbCr using opencv.
Preview of the output that you will get on running this code from your IDE
CODE
In this solution we use the Imread function of the OpenCV.
- Copy the code using the "Copy" button above, and paste it in a Python file in your IDE.
- Import open Cv library and numpy library
- Modify the name, and location of the image in the code.
- Run the file to get the Output
I hope you found this useful. I have added the link to dependent libraries, and version information in the following sections.
i found this code snippet by searching for "OpenCV Python converting color-space image to YCbCr" in kandi. You can try any such use case!
Note:-
If the user wants to Display the output use this command
cv2.imshow('after', YCrbCrImage)
cv2.waitkey(0)
Environment Tested
I tested this solution in the following versions. Be mindful of changes when working with other versions
- The solution is created and executed in python version 3.7.15 .
- The solution is tested on OpenCV 4.6.0
- The solution is tested on numpy 1.21.6
Using this solution, we are going to convert BGR image to YCBCR using the OpenCv library in Python with simple steps. This process also facilities an easy-to-use, hassle-free method to create a hands-on working version of code which would help us convert BGR to YCBCR in Python
Dependent Library
If you do not have OpenCV and numpy that is required to run this code, you can install it by clicking on the above link and copying the pip Install command from the OpenCV page in kandi.
You can search for any dependent library on kandi like OpenCV and numpy
Support
- For any support on kandi solution kits, please use the chat
- For further learning resources, visit the Open Weaver Community learning page.
In Python, the "where" condition is used in conjunction with Boolean indexing to filter the elements of an array, list, or DataFrame based on a specific condition. The condition is specified as a Boolean expression, and the elements that satisfy the condition are kept while the elements that do not are removed.
You can fetch the value of a particular column with a WHERE condition using a “SQL SELECT” statement.
- SQL SELECT: Using the SQL SELECT command, you may query a database and get specified data from one or more of its tables.
In the WHERE clause, you may also include several criteria by using logical operators like "AND," "OR," etc.
- AND: In a WHERE clause, several criteria can be combined using the SQL AND statement. When all criteria are true, rows from a table are returned using the AND statement.
- OR: In a WHERE clause, multiple conditions can be combined using the SQL OR statement. When at least one of the requirements is true, the OR statement is used to get rows from a table.
For better knowledge of fetching the value of a particular column with where condition, you may have a look at the code below.
Fig : Preview of the output that you will get on running this code from your IDE.
Code
In this solution we're using Pandas and NumPy libraries.
Instructions
Follow the steps carefully to get the output easily.
- Install pandas on your IDE(Any of your favorite IDE).
- Copy the snippet using the 'copy' and paste it in your IDE.
- Add required dependencies and import them in Python file.
- Run the file to generate the output.
I hope you found this useful. I have added the link to dependent libraries, version information in the following sections.
I found this code snippet by searching for 'How to fetch value of particular column with where condition in pandas' in kandi. You can try any such use case!
Environment Tested
I tested this solution in the following versions. Be mindful of changes when working with other versions.
- The solution is created in PyCharm 2021.3.
- The solution is tested on Python 3.9.7.
- Pandas version-v1.5.2.
- NumPy version-v1.24.0.
Using this solution, we are able to fetch value of particular column with where condition in pandas with simple steps. This process also facilities an easy way to use, hassle-free method to create a hands-on working version of code which would help us to fetch value of particular column with where condition in pandas.
Dependent Libraries
You can also search for any dependent libraries on kandi like 'pandas' and 'numpy'.
Support
- For any support on kandi solution kits, please use the chat
- For further learning resources, visit the Open Weaver Community learning page.
We will locate a specific group of words in a text using the SpaCy library, then replace those words with an empty string to remove them from the text.
Using SpaCy, it is possible to exclude words within a specific span from a text in the following ways:
- Text pre-processing: Removing specific words or phrases from text can be a useful step in pre-processing text data for NLP tasks such as text classification, sentiment analysis, and language translation.
- Document summarization: Maintaining only the most crucial information, specific words or phrases will serve to construct a summary of a lengthy text.
- Data cleaning: Anonymization and data cleaning can both benefit from removing sensitive or useless text information, such as names and addresses.
- Text generation: Adding context or meaning to the generated content might help create new text by deleting specific words or phrases.
- Text augmentation: Text can be used for text augmentation techniques in NLP by removing specific words or phrases and replacing them with new text variations.
Here is how you can remove words in span using SpaCy:
Preview of the output that you will get on running this code from your IDE
Code
In this solution we have used spacy library of python
- Copy the code using the "Copy" button above, and paste it in a Python file in your IDE.
- Enter the Text
- Run the code that Remove Specific words in the text
I hope you found this useful. I have added the link to dependent libraries, version information in the following sections.
I found this code snippet by searching for "Remove words in span from spacy" in kandi. You can try any such use case!
Note
In this snippet we are using a Language model (en_core_web_sm)
- Download the model using the command python -m spacy download en_core_web_sm .
- paste it in your terminal and download it.
Check the user's spacy version using pip show spacy command in users terminal.
- if its version 3.0, you will need to load it using nlp = spacy.load("en_core_web_sm")
- if its version is less than 3.0 you will need to load it using nlp = spacy.load("en")
Environment Tested
I tested this solution in the following versions. Be mindful of changes when working with other versions.
- The solution is created in Python 3.7.15 Version
- The solution is tested on Spacy 3.4.3 Version
Using this solution, we can collect nouns that ends with s-t-l with the help of function in spacy . This process also facilities an easy to use, hassle free method to create a hands-on working version of code which would help us use full stop whenever the user needs in the sentence in python.
Dependent Library
If you do not have SpaCy and numpy that is required to run this code, you can install it by clicking on the above link and copying the pip Install command from the Spacy page in kandi.
You can search for any dependent library on kandi like SpaCy and numpy
Support
- For any support on kandi solution kits, please use the chat
- For further learning resources, visit the Open Weaver Community learning page
SpaCy is an open-source software library for advanced natural language processing. It assists you in creating programs that process and "understand" massive amounts of text because it was created expressly for use in production environments. The quick and effective tokenization offered by spaCy is one of its main advantages. SpaCy is frequently used for tasks including information extraction, machine translation, named entity recognition, part-of-speech tagging, and text summarization in business, academia, and government research projects.
Additionally, spaCy offers tools for standard tasks like text classification, language recognition, working with word vectors and similarity, and more. You can use spaCy's tokenizer to remove certain types of tokens from a text. You may use SpaCy in a few ways to get rid of tokens in text, including symbols, punctuation, and numerals. Some examples include:
- Eliminating common stop words: SpaCy has a built-in list of terms you can eliminate from your writing, like "and," "or," and "the."
- Eliminating punctuation: You may verify whether a token is a punctuation using the spacy.tokens.token.is punct property and then deletes it from the text.
- Removing numbers: To determine whether a token is a number and to delete it from the text, use the spacy.tokens.token.like num property.
- Removing symbols: To determine whether a token is a symbol or not and to delete it from the text, use the spacy.tokens.token.isalpha and spacy.tokens.token.is digit properties.
Here is how you can remove tokens like symbols, punctuation, and numbers in SpaCy:
Preview of the output that you will get on running this code from your IDE
Code
In this solution we use the Attributes method of the SpaCy library.
- Copy the code using the "Copy" button above, and paste it in a Python file in your IDE.
- Enter the Text
- Run the file to annihilate symbols ,numbers and punctuation
I hope you found this useful. I have added the link to dependent libraries, version information in the following sections.
I found this code snippet by searching for "How to filter tokens from spacy Document " in kandi. You can try any such use case!
Note
In this snippet we are using a Language model (en_core_web_sm)
- Download the model using the command python -m spacy download en_core_web_sm .
- paste it in your terminal and download it.
Check the user's spacy version using pip show spacy command in users terminal.
- if its version 3.0 or above you will need to load it using nlp = spacy.load("en_core_web_sm")
- if its version is less than 3.0 you will need to load it using nlp = spacy.load("en")
Environment Test
I tested this solution in the following versions. Be mindful of changes when working with other versions.
- The solution is created in Python 3.7.15. Version
- The solution is tested on Spacy 3.4.3 Version
- The solution is tested on numpy 1.21.6 Version
Using this solution, we can able to delete or remove symbols ,punctuation, numbers using python with the help of Spacy library. This process also facilities an easy to use, hassle free method to create a hands-on working version of code which would help us remove the token in python.
Dependent Library
If you do not have SpaCy and numpy that is required to run this code, you can install it by clicking on the above link and copying the pip Install command from the Spacy page in kandi.
You can search for any dependent library on kandi like Spacy and numpy
Support
- For any support on kandi solution kits, please use the chat
- For further learning resources, visit the Open Weaver Community learning page.
Attempting to create a new image from an input image by swapping the positions of the rows and columns using nested loops, and then writing the resulting image to a file using the OpenCV library
OpenCV (Open Source Computer Vision) and NumPy are two powerful libraries in Python that are widely used in computer vision, image processing, and machine learning applications. Here's a brief overview of how each library can be used. OpenCV provides a variety of computer vision algorithms and functions for image and video processing.
- These functions range from basic image filtering, resizing, and rotation to advanced feature detection, object recognition, and video analysis.
- OpenCV can read and write a variety of image and video formats, making it easy to work with different types of media.
- OpenCV has interfaces for several programming languages, including
h, w, c where h, w, and c represent the height, width, and number of color channels in the new array. A nested loop is a loop inside another loop. It is a common programming construct used to iterate over multiple levels of data, such as two-dimensional arrays or matrices. cv2.imwrite is a function provided by the OpenCV library that is used to write an image to a file on a disk. The function takes two arguments: the filename of the image to be saved, and the image data to be written.
Here is the example how to rotate the image:
Preview of the output that you will get on running this code from your IDE
CODE
In this solution we use the Imread function of the OpenCV.
- Copy the code using the "Copy" button above, and paste it in a Python file in your IDE.
- Import open Cv and Numpy library
- Modify the name, location of the image to be rotate in the code.
- Run the file to rotate the image.
I hope you found this useful. I have added the link to dependent libraries, version information in the following sections.
i found this code snippet by searching for "Image rotation using OpenCV" in kandi. You can try any such use case!
Dependent Libraries
If you do not have OpenCV that is required to run this code, you can install it by clicking on the above link and copying the pip Install command from the OpenCV page in kandi.
You can search for any dependent library on kandi like OpenCV, numpy
Envorinment Tested
I tested this solution in the following versions. Be mindful of changes when working with other versions
- The solution is created and executed in python version 3.7.15 .
- The solution is tested on OpenCV 4.6.0
Using this solution, we are able to rotate an image using the OpenCv library in Python with simple steps. This process also facilities an easy to use, hassle free method to create a hands-on working version of code which would help us rotate an image in Python
Support
- For any support on kandi solution kits, please use the chat
- For further learning resources, visit the Open Weaver Community learning page.
Indexing and slicing a tensor in PyTorch refers to selecting a specific part of a tensor, which can be done using a combination of indices and slices. This is useful for selecting tensor parts, such as a subset of rows or columns or a certain number of elements along a certain dimension. Indexing and slicing can be used to select and manipulate tensor parts, which can be used for various operations, such as creating sub-tensors from a larger tensor or applying certain operations to only a subset of elements in a tensor.
A tensor in Python is a multi-dimensional array used to store numerical data. It is a fundamental data structure in deep learning models like convolutional neural networks (CNNs). Tensors are usually represented as a matrix of numbers and can be manipulated using various operations such as addition, multiplication, and division.
Indexing and slicing of tensors in PyTorch are the same as indexing and slicing lists in Python.
- To retrieve a single tensor element, use the indexing operator [] with the corresponding indices.
- To slice a tensor, use the slicing operator: with the corresponding indices.
Here is an example of indexing and slicing a tensor in PyTorch.
Fig 1: Preview of the output that you will get on indexing a tensor in PyTorch.
Fig 2: Preview of the output that you will get on slicing a tensor in PyTorch.
Codes
In this solution, we use the torch.tensor Function of the PyTorch library
Instructions
Follow the steps carefully to get the output easily.
- Install Jupyter Notebook on your computer.
- Open terminal and install the required libraries with following commands.
- Install pytorch - pip install torch.
- Copy the codes using the "Copy" button above, and paste it into your IDE's Python file.
- Print Result in slicing.
- Run the file to perform Indexing and slicing a tensor in PyTorch.
I hope you found this useful. I have added the link to dependent libraries, version information in the following sections.
I found this code snippet by searching for "Indexing and slicing a tensor in PyTorch" in kandi. You can try any such use case!
Dependent Libraries
If you do not have PyTorch that is required to run this code, you can install it by clicking on the above link and copying the pip Install command from the PyTorch page in kandi.
You can search for any dependent library on kandi like PyTorch
Environment Tested
I tested this solution in the following versions. Be mindful of changes when working with other versions.
- The solution is created in Python 3.9.6
- The solution is tested on PyTorch 2.0.0+cpu version.
Using this solution, we are able to perform indexing and slicing of tensor in PyTorch in Python with simple steps. PyTorch is also used in Computer Vision and Generative Adversarial Networks.
Support
- For any support on kandi solution kits, please use the chat
- For further learning resources, visit the Open Weaver Community learning page.
This code demonstrates how a simple linear regression model can be trained and used to make predictions in Python using the sci-kit-learn library. The LinearRegression class from the sklearn.linear_model module in sci-kit-learn is used to build and train linear regression models in Python.
Linear Regression is a supervised machine learning algorithm used for regression problems. In regression problems, the goal is to predict a continuous target variable based on one or more input variables. The linear regression algorithm fits a linear equation to the observed data between the dependent (target) and independent (predictor) variables. The equation is represented by a line that best captures the relationship between the variables.
The model.predict() method in sci-kit-learn's LinearRegression class is used to make predictions for new data based on a trained linear regression model.
Linear Regression is widely used for many applications, including forecasting, modeling, and understanding the relationship between variables.
Preview of the output that you will get on running this code from your IDE
Code
In this solution we have used LinearRegression
- Copy the code using the "Copy" button above, and paste it in a Python file in your IDE.
- Run the file to get the output
I hope you found this useful. I have added the link to dependent libraries, version information in the following sections.
I found this code snippet by searching for "use .predict() method in python for Linear regression" in kandi. You can try any such use case!
Environment Tested
I tested this solution in the following versions. Be mindful of changes when working with other versions.
- The solution is created in Python 3.7.15 version
- The solution is tested on sci-kit-learn 1.0.2 version
- The solution is tested on numpy 1.21.6 version
Using this solution, we are able going to learn how to predict a simple linear regression model using Scikit learn library in Python with simple steps. This process also facilities an easy-to-use, hassle-free method to create a hands-on working version of code which would help use the .predict() method in python for Linear regression in Python.
Dependent Library
If you do not have Scikit-learn and numpy that is required to run this code, you can install it by clicking on the above link and copying the pip Install command from the Scikit-learn page in kandi.
You can search for any dependent library on kandi like Scikit-learn. and numpy
Support
- For any support on kandi solution kits, please use the chat
- For further learning resources, visit the Open Weaver Community learning page.
Creating a Pandas DataFrame with a unique index can provide several benefits, including:
- Uniqueness: A unique index ensures that a unique label can identify each row in the DataFrame. This helps avoid issues when dealing with duplicate rows or merging data from multiple sources.
- Data Integrity: By using a unique index, you can help maintain the integrity of your data. This can make performing operations such as filtering, sorting, and aggregating data easier without affecting the underlying data structure.
- Efficiency: Using a unique index can make certain operations more efficient when working with large datasets. For example, when performing joins or merges between dataframes, using a unique index can speed up the process by allowing the data to be aligned more quickly.
In Python, NumPy is a library for numerical computing. It provides a powerful N-dimensional array object, as well as a variety of functions for performing mathematical operations on arrays. NumPy arrays are efficient and fast and can be used for various data analysis tasks, such as filtering, sorting, and aggregating data. Pandas are created on top of NumPy, providing a higher-level Python interface for data manipulation and analysis. The append() method is used to add rows of data to an existing DataFrame. The append() method returns a new DataFrame with the rows from the original DataFrame and the appended rows.
Creating a Pandas DataFrame with a unique index can help ensure data integrity, improve efficiency, and make data analysis and manipulation easier and more intuitive.
Preview of the output that you will get on running this code.
Code
In this solution we have used append() function of python.
- Copy this code using "Copy" button above and paste it in your Python ide
- Import Pandas and Numpy library of python.
- Run the code to get a unique index.
I hope you have found this useful. I have added the dependent library and version information in the following section.
I found this code snippet by searching "Create pandas dataFrame with unique index" in kandi. you can try any use case.
Dependent Library
If you do not have Pandas that is required to run this code you can install it by clicking on th above link and copying the pip install command from the pandas page in Kandi. You can search for any dependent library in Kandi like Pandas.
Environment Test
In this solution we have used the following versions. Be mindful to change when working with other versions.
- This solution is created using Python version 3.7.15
- This solution is Tested using Pandas 1.5.2
Using this solution we can able to Create a dataframe with a unique Index using Pandas library in python with simple Steps. This process also facilities an easy to use, hassle free method to create a hands-on working version of code which would help us create a Dataframe with unique Index in Python.
Support
- For any support on kandi solution kits, please use the chat
- For further learning resources, visit the Open Weaver Community learning page.
A multi-label confusion matrix is a useful tool for evaluating the performance of multi-label classification models. It provides a detailed view of the true positive (TP), false positive (FP), false negative (FN), and true negative (TN) predictions made by the classifier for each label. This information can be used to evaluate several aspects of the classifier's performance, including:
- Accuracy: The overall accuracy of the classifier can be computed as the ratio of correct predictions to total predictions.
- Precision: Precision measures the fraction of correct positive predictions. It can be used to evaluate the quality of the positive predictions made by the classifier.
- Recall: Recall measures the fraction of actual positive instances correctly identified by the classifier. It can be used to evaluate the completeness of the positive predictions made by the classifier.
- F1-Score: The F1-score is the harmonic mean of precision and recall and provides a balance between precision and recall.
- Support: The support is the number of instances belonging to each class.
These performance metrics can be computed for each label and averaged across labels to give an overall view of the classifier's performance.
In addition to these performance metrics, the multi-label confusion matrix can also help identify specific areas for improvement in the classifier. For example, if the classifier has low precision for a particular label, it may indicate that it is making too many false positive predictions. On the other hand, if the classifier has a low recall for a particular label, it may indicate that the classifier needs to include more actual positive instances for that label. By identifying these specific areas for improvement, the multi-label confusion matrix can help guide further development and refinement of the classifier.
Preview of the output that you will get on running this code from your IDE
Code
In this solution we have used Sklearn library.
- Copy the code using the "Copy" button above, and paste it in a Python file in your IDE.
- Run the file to create multi label confusion matrix.
I hope you found this useful. I have added the link to dependent libraries, version information in the following sections.
I found this code snippet by searching for "Multi-label confusion matrix" in kandi. You can try any such use case!
Dependent Library
If you do not have Scikit-learn that is required to run this code, you can install it by clicking on the above link and copying the pip Install command from the Scikit-learn page in kandi.
You can search for any dependent library on kandi like Scikit-learn.
Environment Tested
I tested this solution in the following versions. Be mindful of changes when working with other versions.
- The solution is created in Python 3.7.15 version
- The solution is tested on scikit-learn 1.0.2 version
- The solution is tested on numpy 1.21.6 version
Using this solution, we are able to create a multi-label confusion matrix using Scikit learn library in Python with simple steps. This process also facilities an easy-to-use, hassle-free method to create a hands-on working version of code which would help us label for confusion matrix in Python.
Support
- For any support on kandi solution kits, please use the chat
- For further learning resources, visit the Open Weaver Community learning page.
Rendering text is an important part of web design and typography, as it allows text to be displayed in a way that is visually appealing and easy to read. Rendering text is storing text in a computer document and displaying it on a screen, often with formatting such as font size, font type, and color. This is typically done by a program such as a word processor or web browser.
Pygame is a set of Python modules designed for writing video games. It is free and open source, designed to make it easy to write fun games. It includes functions for creating graphics, playing sounds, handling mouse and keyboard input, and much more.
Rendering text with Pygame involves using the Pygame library to display text on the screen. This is done by creating a font object and using the render() method to draw the text to the screen. The font object can be customized with color and size, and the text can be drawn to the screen in any position.
Here is an example of rendering text with Pygame
Fig1: Preview of Code
Fig2: Preview of the Output
Code
In this solution, we use the Pygame function.
Instructions
- Install Jupyter Notebook on your computer.
- Open terminal and install the required libraries with following commands.
- Install Pygame - pip install pygame
- Copy the snippet using the 'copy' button and paste it into that file.
- Run the file using run button.
I hope you found this useful. I have added the link to dependent libraries, version information in the following sections.
I found this code snippet by searching for "Rendering text with Pygame" in kandi. You can try any such use case!
Dependent Libraries
If you do not have Pygame that is required to run this code, you can install it by clicking on the above link and copying the pip Install command from the Pygame page in kandi.
You can search for any dependent library on kandi like Pygame.
Environment Tested
I tested this solution in the following versions. Be mindful of changes when working with other versions.
- The solution is created in Python3.9.6
- The solution is tested on Pygame 2.3.0 version.
Using this solution, we are able to render the text with Pygame
This process also facilities an easy to use, hassle free method to create a hands-on working version of code which would help us to render the text with Pygame
Support
- For any support on kandi solution kits, please use the chat
- For further learning resources, visit the Open Weaver Community learning page.
Finding the sublist's size allows you to determine each sublist's size in a list quickly. This can be useful in various situations where you must work with varying sizes of sublists. For example, use this script to check that a list of data has the expected structure or to identify the smallest sublist in a list. Additionally, use this script to filter out sublists too small to be useful or to perform calculations or manipulations on sublists of a particular size.
NumPy (short for "Numerical Python") is a powerful Python library for working with multi-dimensional arrays and matrices. It provides various mathematical functions for working with these arrays, including linear algebra, Fourier transforms, and random number generation.
hasattr is a built-in Python function that takes an object and a string. It returns True if the object has an attribute with the given string name and False otherwise. hasattr is often used in combination with other built-in functions, such as getattr and setattr, which retrieve and set the value of an attribute on an object. hasattr can be useful when working with complex data structures, such as objects or dictionaries, where we want to check whether a certain attribute or key exists before trying to access it.
This is a useful tool for working with lists of sublists and can save you time and effort when you need to analyze or manipulate data in this format.
Preview of the output that you will get on running this code.
Code
In this solution we have used Len() function in python.
- Copy this code using "Copy" button above and paste it in your Python ide
- Run the code, get the size of each sublist and neglect empty and single elements
I hope you have found this useful. I have added the version information in the following section.
I found this code snippet by searching "Python find size of each sublist in a list" in kandi. you can try any use case.
Environment Tested
In this solution we have used the following versions. Be mindful to change when working with other versions.
- This solution is created using Python version 3.7.15
Using this solution we can able to get the size of each sublist and neglect empty and single elements in python with simple Steps. This process also facilities an easy to use, hassle free method to create a hands-on working version of code which would help us to get size of each sub-list in Python.
Dependent Library
If you do not have numpy that is required to run this code, you can install it by clicking on the above link and copying the pip Install command from the numpy page in kandi.
You can search for any dependent library on kandi like numpy
Support
- For any support on kandi solution kits, please use the chat
- For further learning resources, visit the Open Weaver Community learning page.
Trending Discussions on Data Manipulation
Vue - Wait for for loop to fetch all items asynchronously
How to get file name which have error using AWK command?
Declaring variable in R for DBI query to MS SQL
Check if azure databricks mount point exists from .NET
R basics: working with multiple variables at once and their output
I can't read excel file using dt.fread from datatable AttributeError
How can I use data.table in a package without importing all functions?
Conflicting object names within a solution
How to update shiny module with reactive dataframe from another module
R command `group_by`
QUESTION
Vue - Wait for for loop to fetch all items asynchronously
Asked 2022-Apr-17 at 17:32i have an array of data to be fetched, so i have to use a for loop to fetch all the data, but i want to do it asynchronously (multiple calls at the same time). After having fetched the data i also want to do some data manipulation, so i need to run code AFTER all the data has been fetched
1for (var e in this.dataTofetch) {
2 axios
3 .get("https://www.example.com/api/" + e)
4 .then((response) => this.fetchedData.push(response.data));
5}
6this.manipulateData();
7
The problem is that whenever i reach the manipulateData function, fetchedData is empty.
Also i tried doing it synchronously using await and it works but it becomes very slow when making multiple calls.
ANSWER
Answered 2022-Apr-17 at 17:21The best approach I can think of is to use Promise.all()
. You will leave out the .then
-handler, because axios.get()
returns you a promise.
An exact implementation example can be found here at StackOverflow: Promise All with Axios.
QUESTION
How to get file name which have error using AWK command?
Asked 2022-Mar-12 at 17:18I am using the SAC tool to read the header information but some files have no header information and it prints an error. Is there any way to use AWK to print that files if they do not have a header or error during work. I often used AWK for data manipulation but failed this time.
Here is my try:
1saclst a f *2020-05*BHZ*
2
This is the output
1saclst a f *2020-05*BHZ*
2GS.GS043.2020-05-18T03:52.BHZ.sac 3.37
3GS.GS043.2020-05-18T09:28.BHZ.sac 3.64
4GS.GS043.2020-05-18T12:09.BHZ.sac 3.42
5saclst: Error determining SAC header: GS.GS043.2020-05-18T14:36.BHZ.sac
6GS.GS043.2020-05-18T16:25.BHZ.sac 2.92
7GS.GS043.2020-05-18T18:51.BHZ.sac 3.66
8
Now I want to get the file name and print it but seems like AWK does not help;
1saclst a f *2020-05*BHZ*
2GS.GS043.2020-05-18T03:52.BHZ.sac 3.37
3GS.GS043.2020-05-18T09:28.BHZ.sac 3.64
4GS.GS043.2020-05-18T12:09.BHZ.sac 3.42
5saclst: Error determining SAC header: GS.GS043.2020-05-18T14:36.BHZ.sac
6GS.GS043.2020-05-18T16:25.BHZ.sac 2.92
7GS.GS043.2020-05-18T18:51.BHZ.sac 3.66
8saclst a f *2020-05*BHZ* | awk '{if ($2<0) print $1;}' > ../test.dat
9
My output file is empty and the terminal shows this error:
Is there any way to save this error so I can later modify it?
1saclst a f *2020-05*BHZ*
2GS.GS043.2020-05-18T03:52.BHZ.sac 3.37
3GS.GS043.2020-05-18T09:28.BHZ.sac 3.64
4GS.GS043.2020-05-18T12:09.BHZ.sac 3.42
5saclst: Error determining SAC header: GS.GS043.2020-05-18T14:36.BHZ.sac
6GS.GS043.2020-05-18T16:25.BHZ.sac 2.92
7GS.GS043.2020-05-18T18:51.BHZ.sac 3.66
8saclst a f *2020-05*BHZ* | awk '{if ($2<0) print $1;}' > ../test.dat
9saclst: Error determining SAC header: SC.LZB.2020-05-21T10:46.BHZ.sac
10saclst: Error determining SAC header: SC.LZB.2020-05-21T11:57.BHZ.sac
11saclst: Error determining SAC header: SC.LZB.2020-05-26T11:23.BHZ.sac
12saclst: Error determining SAC header: SC.LZB.2020-05-28T10:44.BHZ.sac
13saclst: Error determining SAC header: SC.QSC.2020-05-12T06:49.BHZ.sac
14
ANSWER
Answered 2022-Mar-12 at 09:06Here's what I think you are looking for:
1saclst a f *2020-05*BHZ*
2GS.GS043.2020-05-18T03:52.BHZ.sac 3.37
3GS.GS043.2020-05-18T09:28.BHZ.sac 3.64
4GS.GS043.2020-05-18T12:09.BHZ.sac 3.42
5saclst: Error determining SAC header: GS.GS043.2020-05-18T14:36.BHZ.sac
6GS.GS043.2020-05-18T16:25.BHZ.sac 2.92
7GS.GS043.2020-05-18T18:51.BHZ.sac 3.66
8saclst a f *2020-05*BHZ* | awk '{if ($2<0) print $1;}' > ../test.dat
9saclst: Error determining SAC header: SC.LZB.2020-05-21T10:46.BHZ.sac
10saclst: Error determining SAC header: SC.LZB.2020-05-21T11:57.BHZ.sac
11saclst: Error determining SAC header: SC.LZB.2020-05-26T11:23.BHZ.sac
12saclst: Error determining SAC header: SC.LZB.2020-05-28T10:44.BHZ.sac
13saclst: Error determining SAC header: SC.QSC.2020-05-12T06:49.BHZ.sac
14# just for demo, pipe SAC tool to awk for your actual use case
15$ cat ip.txt
16GS.GS043.2020-05-18T03:52.BHZ.sac 3.37
17GS.GS043.2020-05-18T09:28.BHZ.sac 3.64
18GS.GS043.2020-05-18T12:09.BHZ.sac 3.42
19saclst: Error determining SAC header: GS.GS043.2020-05-18T14:36.BHZ.sac
20GS.GS043.2020-05-18T16:25.BHZ.sac 2.92
21GS.GS043.2020-05-18T18:51.BHZ.sac 3.66
22
23# filter lines with Error based on number of fields or `Error` in 2nd field
24$ awk 'NF != 2' ip.txt
25saclst: Error determining SAC header: GS.GS043.2020-05-18T14:36.BHZ.sac
26$ awk '$2 == "Error"' ip.txt
27saclst: Error determining SAC header: GS.GS043.2020-05-18T14:36.BHZ.sac
28
29# print only last field
30$ awk '$2 == "Error"{print $NF}' ip.txt
31GS.GS043.2020-05-18T14:36.BHZ.sac
32
If the saclst
command puts the lines with Error
on stderr
, you can use this:
1saclst a f *2020-05*BHZ*
2GS.GS043.2020-05-18T03:52.BHZ.sac 3.37
3GS.GS043.2020-05-18T09:28.BHZ.sac 3.64
4GS.GS043.2020-05-18T12:09.BHZ.sac 3.42
5saclst: Error determining SAC header: GS.GS043.2020-05-18T14:36.BHZ.sac
6GS.GS043.2020-05-18T16:25.BHZ.sac 2.92
7GS.GS043.2020-05-18T18:51.BHZ.sac 3.66
8saclst a f *2020-05*BHZ* | awk '{if ($2<0) print $1;}' > ../test.dat
9saclst: Error determining SAC header: SC.LZB.2020-05-21T10:46.BHZ.sac
10saclst: Error determining SAC header: SC.LZB.2020-05-21T11:57.BHZ.sac
11saclst: Error determining SAC header: SC.LZB.2020-05-26T11:23.BHZ.sac
12saclst: Error determining SAC header: SC.LZB.2020-05-28T10:44.BHZ.sac
13saclst: Error determining SAC header: SC.QSC.2020-05-12T06:49.BHZ.sac
14# just for demo, pipe SAC tool to awk for your actual use case
15$ cat ip.txt
16GS.GS043.2020-05-18T03:52.BHZ.sac 3.37
17GS.GS043.2020-05-18T09:28.BHZ.sac 3.64
18GS.GS043.2020-05-18T12:09.BHZ.sac 3.42
19saclst: Error determining SAC header: GS.GS043.2020-05-18T14:36.BHZ.sac
20GS.GS043.2020-05-18T16:25.BHZ.sac 2.92
21GS.GS043.2020-05-18T18:51.BHZ.sac 3.66
22
23# filter lines with Error based on number of fields or `Error` in 2nd field
24$ awk 'NF != 2' ip.txt
25saclst: Error determining SAC header: GS.GS043.2020-05-18T14:36.BHZ.sac
26$ awk '$2 == "Error"' ip.txt
27saclst: Error determining SAC header: GS.GS043.2020-05-18T14:36.BHZ.sac
28
29# print only last field
30$ awk '$2 == "Error"{print $NF}' ip.txt
31GS.GS043.2020-05-18T14:36.BHZ.sac
32$ saclst a f *2020-05*BHZ* 2> error.log
33
QUESTION
Declaring variable in R for DBI query to MS SQL
Asked 2022-Mar-08 at 14:11I'm writing an R query that runs several SQL queries using the DBI package to create reports. To make this work, I need to be able to declare a variable in R (such as a Period End Date) that is then called from within the SQL query. When I run my query, I get the following error:
If I simply use the field name (PeriodEndDate), I get the following error:
Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘dbGetQuery’ for signature ‘"Microsoft SQL Server", "character"’
If I use @ to access the field name (@PeriodEndDate), I get the following error:
Error: nanodbc/nanodbc.cpp:1655: 42000: [Microsoft][ODBC SQL Server Driver][SQL Server]Must declare the scalar variable "@PeriodEndDate". [Microsoft][ODBC SQL Server Driver][SQL Server]Statement(s) could not be prepared. '
An example query might look like this:
1library(DBI) # Used for connecting to SQL server and submitting SQL queries.
2library(tidyverse) # Used for data manipulation and creating/saving CSV files.
3library(lubridate) # Used to calculate end of month, start of month in queries
4
5# Define time periods for queries.
6PeriodEndDate <<- ceiling_date(as.Date('2021-10-31'),'month') # Enter Period End Date on this line.
7PeriodStartDate <<- floor_date(PeriodEndDate, 'month')
8
9# Connect to SQL Server.
10con <- dbConnect(
11 odbc::odbc(),
12 driver = "SQL Server",
13 server = "SERVERNAME",
14 trusted_connection = TRUE,
15 timeout = 5,
16 encoding = "Latin1")
17
18samplequery <- dbGetQuery(con, "
19 SELECT * FROM [TableName]
20 WHERE OrderDate <= @PeriodEndDate
21")
22
I believe one way might be to use the paste function, like this:
1library(DBI) # Used for connecting to SQL server and submitting SQL queries.
2library(tidyverse) # Used for data manipulation and creating/saving CSV files.
3library(lubridate) # Used to calculate end of month, start of month in queries
4
5# Define time periods for queries.
6PeriodEndDate <<- ceiling_date(as.Date('2021-10-31'),'month') # Enter Period End Date on this line.
7PeriodStartDate <<- floor_date(PeriodEndDate, 'month')
8
9# Connect to SQL Server.
10con <- dbConnect(
11 odbc::odbc(),
12 driver = "SQL Server",
13 server = "SERVERNAME",
14 trusted_connection = TRUE,
15 timeout = 5,
16 encoding = "Latin1")
17
18samplequery <- dbGetQuery(con, "
19 SELECT * FROM [TableName]
20 WHERE OrderDate <= @PeriodEndDate
21")
22samplequery <- dbGetQuery(con, paste("
23 SELECT * FROM [TableName]
24 WHERE OrderDate <=", PeriodEndDate")
25
However, that can get unwieldy if it involves several variables being referenced outside the query or in several places within the query.
Is there a relatively straightforward way to do this?
Thanks in advance for any thoughts you might have!
ANSWER
Answered 2022-Mar-08 at 14:11The mechanism in most DBI
-based connections is to use ?
-placeholders[1] in the query and params=
in the call to DBI::dbGetQuery
or DBI::dbExecute
.
Perhaps this:
1library(DBI) # Used for connecting to SQL server and submitting SQL queries.
2library(tidyverse) # Used for data manipulation and creating/saving CSV files.
3library(lubridate) # Used to calculate end of month, start of month in queries
4
5# Define time periods for queries.
6PeriodEndDate <<- ceiling_date(as.Date('2021-10-31'),'month') # Enter Period End Date on this line.
7PeriodStartDate <<- floor_date(PeriodEndDate, 'month')
8
9# Connect to SQL Server.
10con <- dbConnect(
11 odbc::odbc(),
12 driver = "SQL Server",
13 server = "SERVERNAME",
14 trusted_connection = TRUE,
15 timeout = 5,
16 encoding = "Latin1")
17
18samplequery <- dbGetQuery(con, "
19 SELECT * FROM [TableName]
20 WHERE OrderDate <= @PeriodEndDate
21")
22samplequery <- dbGetQuery(con, paste("
23 SELECT * FROM [TableName]
24 WHERE OrderDate <=", PeriodEndDate")
25samplequery <- dbGetQuery(con, "
26 SELECT * FROM [TableName]
27 WHERE OrderDate <= ?
28", params = list(PeriodEndDate))
29
In general the mechanisms for including an R object as a data-item are enumerated well in https://db.rstudio.com/best-practices/run-queries-safely/. In the order of my recommendation,
- Parameterized queries (as shown above);
glue::glue_sql
;sqlInterpolate
(which uses the same?
-placeholders as #1);- The link also mentions "manual escaping" using
dbQuoteString
.
Anything else is in my mind more risky due to inadvertent SQL corruption/injection.
I've seen many questions here on SO that try to use one of the following techniques: paste
and/or sprintf
using sQuote
or hard-coded paste0("'", PeriodEndDate, "'")
. These are too fragile in my mind and should be avoided.
My preference for parameterized queries extends beyond this usability, it also can have non-insignificant impacts on repeated use of the same query, since DBMSes tend to analyze/optimize the query and cache this for the next use. Consider this:
1library(DBI) # Used for connecting to SQL server and submitting SQL queries.
2library(tidyverse) # Used for data manipulation and creating/saving CSV files.
3library(lubridate) # Used to calculate end of month, start of month in queries
4
5# Define time periods for queries.
6PeriodEndDate <<- ceiling_date(as.Date('2021-10-31'),'month') # Enter Period End Date on this line.
7PeriodStartDate <<- floor_date(PeriodEndDate, 'month')
8
9# Connect to SQL Server.
10con <- dbConnect(
11 odbc::odbc(),
12 driver = "SQL Server",
13 server = "SERVERNAME",
14 trusted_connection = TRUE,
15 timeout = 5,
16 encoding = "Latin1")
17
18samplequery <- dbGetQuery(con, "
19 SELECT * FROM [TableName]
20 WHERE OrderDate <= @PeriodEndDate
21")
22samplequery <- dbGetQuery(con, paste("
23 SELECT * FROM [TableName]
24 WHERE OrderDate <=", PeriodEndDate")
25samplequery <- dbGetQuery(con, "
26 SELECT * FROM [TableName]
27 WHERE OrderDate <= ?
28", params = list(PeriodEndDate))
29### parameterized queries
30DBI::dbGetQuery("select ... where OrderDate >= ?", params=list("2020-02-02"))
31DBI::dbGetQuery("select ... where OrderDate >= ?", params=list("2020-02-03"))
32
33### glue_sql
34PeriodEndDate <- as.Date("2020-02-02")
35qry <- glue::glue_sql("select ... where OrderDate >= {PeriodEndDate}", .con=con)
36# <SQL> select ... where OrderDate >= '2020-02-02'
37DBI::dbGetQuery(con, qry)
38PeriodEndDate <- as.Date("2021-12-22")
39qry <- glue::glue_sql("select ... where OrderDate >= {PeriodEndDate}", .con=con)
40# <SQL> select ... where OrderDate >= '2021-12-22'
41DBI::dbGetQuery(con, qry)
42
In the case of parameterized queries, the "query" itself never changes, so its optimized query (internal to the server) can be reused.
In the case of the glue_sql
queries, the query itself changes (albeit just a handful of character), so most (all?) DBMSes will re-analyze and re-optimize the query. While they tend to do it quickly, and most analysts' queries are not complex, it is still unnecessary overhead, and missing an opportunity in cases where your query and/or the indices require a little more work to optimize well.
Notes:
?
is used by most DBMSes but not all. Others use$name
or$1
or such. Withodbc::odbc()
, however, it is always?
(no name, no number), regardless of the actual DBMS.Not sure if you are using this elsewhere, but the use of
<<-
(vice<-
or=
) can encourage bad habits and/or unreliable/unexpected results.It is not uncommon to use the same variable multiple times in a query. Unfortunately, you will need to include the variable multiple times, and order is important. For example,
1library(DBI) # Used for connecting to SQL server and submitting SQL queries.
2library(tidyverse) # Used for data manipulation and creating/saving CSV files.
3library(lubridate) # Used to calculate end of month, start of month in queries
4
5# Define time periods for queries.
6PeriodEndDate <<- ceiling_date(as.Date('2021-10-31'),'month') # Enter Period End Date on this line.
7PeriodStartDate <<- floor_date(PeriodEndDate, 'month')
8
9# Connect to SQL Server.
10con <- dbConnect(
11 odbc::odbc(),
12 driver = "SQL Server",
13 server = "SERVERNAME",
14 trusted_connection = TRUE,
15 timeout = 5,
16 encoding = "Latin1")
17
18samplequery <- dbGetQuery(con, "
19 SELECT * FROM [TableName]
20 WHERE OrderDate <= @PeriodEndDate
21")
22samplequery <- dbGetQuery(con, paste("
23 SELECT * FROM [TableName]
24 WHERE OrderDate <=", PeriodEndDate")
25samplequery <- dbGetQuery(con, "
26 SELECT * FROM [TableName]
27 WHERE OrderDate <= ?
28", params = list(PeriodEndDate))
29### parameterized queries
30DBI::dbGetQuery("select ... where OrderDate >= ?", params=list("2020-02-02"))
31DBI::dbGetQuery("select ... where OrderDate >= ?", params=list("2020-02-03"))
32
33### glue_sql
34PeriodEndDate <- as.Date("2020-02-02")
35qry <- glue::glue_sql("select ... where OrderDate >= {PeriodEndDate}", .con=con)
36# <SQL> select ... where OrderDate >= '2020-02-02'
37DBI::dbGetQuery(con, qry)
38PeriodEndDate <- as.Date("2021-12-22")
39qry <- glue::glue_sql("select ... where OrderDate >= {PeriodEndDate}", .con=con)
40# <SQL> select ... where OrderDate >= '2021-12-22'
41DBI::dbGetQuery(con, qry)
42samplequery <- dbGetQuery(con, "
43 SELECT * FROM [TableName]
44 WHERE OrderDate <= ?
45 or (SomethingElse = ? and OrderDate > ?)0
46", params = list(PeriodEndDate, 99, PeriodEndDate))
47
If you have a list/vector of values and want to use SQL's IN
operator, then you have two options, my preference being the first (for the reasons stated above):
Create a string of question marks and paste into the query. (Yes, this is
paste
ing into the query, but we are not dealing with the risk of incorrectly single-quoting or double-quoting. SinceDBI
does not support any other mechanism, this is what we have.)
1library(DBI) # Used for connecting to SQL server and submitting SQL queries.
2library(tidyverse) # Used for data manipulation and creating/saving CSV files.
3library(lubridate) # Used to calculate end of month, start of month in queries
4
5# Define time periods for queries.
6PeriodEndDate <<- ceiling_date(as.Date('2021-10-31'),'month') # Enter Period End Date on this line.
7PeriodStartDate <<- floor_date(PeriodEndDate, 'month')
8
9# Connect to SQL Server.
10con <- dbConnect(
11 odbc::odbc(),
12 driver = "SQL Server",
13 server = "SERVERNAME",
14 trusted_connection = TRUE,
15 timeout = 5,
16 encoding = "Latin1")
17
18samplequery <- dbGetQuery(con, "
19 SELECT * FROM [TableName]
20 WHERE OrderDate <= @PeriodEndDate
21")
22samplequery <- dbGetQuery(con, paste("
23 SELECT * FROM [TableName]
24 WHERE OrderDate <=", PeriodEndDate")
25samplequery <- dbGetQuery(con, "
26 SELECT * FROM [TableName]
27 WHERE OrderDate <= ?
28", params = list(PeriodEndDate))
29### parameterized queries
30DBI::dbGetQuery("select ... where OrderDate >= ?", params=list("2020-02-02"))
31DBI::dbGetQuery("select ... where OrderDate >= ?", params=list("2020-02-03"))
32
33### glue_sql
34PeriodEndDate <- as.Date("2020-02-02")
35qry <- glue::glue_sql("select ... where OrderDate >= {PeriodEndDate}", .con=con)
36# <SQL> select ... where OrderDate >= '2020-02-02'
37DBI::dbGetQuery(con, qry)
38PeriodEndDate <- as.Date("2021-12-22")
39qry <- glue::glue_sql("select ... where OrderDate >= {PeriodEndDate}", .con=con)
40# <SQL> select ... where OrderDate >= '2021-12-22'
41DBI::dbGetQuery(con, qry)
42samplequery <- dbGetQuery(con, "
43 SELECT * FROM [TableName]
44 WHERE OrderDate <= ?
45 or (SomethingElse = ? and OrderDate > ?)0
46", params = list(PeriodEndDate, 99, PeriodEndDate))
47MyDates <- c(..., ...)
48qmarks <- paste(rep("?", length(MyDates)), collapse=",")
49samplequery <- dbGetQuery(con, sprintf("
50 SELECT * FROM [TableName]
51 WHERE OrderDate IN (%s)
52", qmarks), params = as.list(MyDates))
53
glue_sql
supports expanding internally:
1library(DBI) # Used for connecting to SQL server and submitting SQL queries.
2library(tidyverse) # Used for data manipulation and creating/saving CSV files.
3library(lubridate) # Used to calculate end of month, start of month in queries
4
5# Define time periods for queries.
6PeriodEndDate <<- ceiling_date(as.Date('2021-10-31'),'month') # Enter Period End Date on this line.
7PeriodStartDate <<- floor_date(PeriodEndDate, 'month')
8
9# Connect to SQL Server.
10con <- dbConnect(
11 odbc::odbc(),
12 driver = "SQL Server",
13 server = "SERVERNAME",
14 trusted_connection = TRUE,
15 timeout = 5,
16 encoding = "Latin1")
17
18samplequery <- dbGetQuery(con, "
19 SELECT * FROM [TableName]
20 WHERE OrderDate <= @PeriodEndDate
21")
22samplequery <- dbGetQuery(con, paste("
23 SELECT * FROM [TableName]
24 WHERE OrderDate <=", PeriodEndDate")
25samplequery <- dbGetQuery(con, "
26 SELECT * FROM [TableName]
27 WHERE OrderDate <= ?
28", params = list(PeriodEndDate))
29### parameterized queries
30DBI::dbGetQuery("select ... where OrderDate >= ?", params=list("2020-02-02"))
31DBI::dbGetQuery("select ... where OrderDate >= ?", params=list("2020-02-03"))
32
33### glue_sql
34PeriodEndDate <- as.Date("2020-02-02")
35qry <- glue::glue_sql("select ... where OrderDate >= {PeriodEndDate}", .con=con)
36# <SQL> select ... where OrderDate >= '2020-02-02'
37DBI::dbGetQuery(con, qry)
38PeriodEndDate <- as.Date("2021-12-22")
39qry <- glue::glue_sql("select ... where OrderDate >= {PeriodEndDate}", .con=con)
40# <SQL> select ... where OrderDate >= '2021-12-22'
41DBI::dbGetQuery(con, qry)
42samplequery <- dbGetQuery(con, "
43 SELECT * FROM [TableName]
44 WHERE OrderDate <= ?
45 or (SomethingElse = ? and OrderDate > ?)0
46", params = list(PeriodEndDate, 99, PeriodEndDate))
47MyDates <- c(..., ...)
48qmarks <- paste(rep("?", length(MyDates)), collapse=",")
49samplequery <- dbGetQuery(con, sprintf("
50 SELECT * FROM [TableName]
51 WHERE OrderDate IN (%s)
52", qmarks), params = as.list(MyDates))
53MyDates <- c(..., ...)
54qry <- glue::glue_sql("
55 SELECT * FROM [TableName]
56 WHERE OrderDate IN ({MyDates*})", .con=con)
57DBI::dbGetQuery(con, qry)
58
QUESTION
Check if azure databricks mount point exists from .NET
Asked 2021-Dec-14 at 08:44I work on an app which does some kind of data engineering and we use Azure ADLS for data storage and Databricks for data manipulation. There are two approaches in order to retrieve the data, the first one uses the Storage Account
and Storage account secret key
and the other approach uses mount point
. When I go with the first approach, I can successfully check, from .NET, whether the Storage account
and it's corresponsive Secret key
correspond to each other and return a message whether the credentials are right or not. However, I need to do the same thing with the mount point
i.e. determine whether the mount point
exists in dbutils.fs.mounts()
or anywhere in the storage (I don't know how mount point
exactly works and if it stores data in blob).
The flow for Storage account
and Secret key
is the following:
- Try to connect using the
BlobServiceClient
API from Microsoft; - If it fails, return a message to the user that the credentials are invalid;
- If it doesn't fail, proceed further.
I'm not that familiar with /mnt/
and stuff because I mostly do .NET but is there a way to check from .NET whether a mount point
exists or not?
ANSWER
Answered 2021-Dec-14 at 08:44Mount point is just a kind of reference to the underlying cloud storage. dbutils.fs.mounts()
command needs to be executed on some cluster - it's doable, but it's not fast & cumbersome.
The simplest way to check that is to use List command of DBFS REST API, passing the mount point name /mnt/<something>
as path
parameter. If it doesn't exist, you'll get error message RESOURCE_DOES_NOT_EXIST
:
1{
2 "error_code": "RESOURCE_DOES_NOT_EXIST",
3 "message": "No file or directory exists on path /mnt/test22/."
4}
5
QUESTION
R basics: working with multiple variables at once and their output
Asked 2021-Nov-29 at 19:49I have a survey dataset with 40 ordered factor variables. The variables are transformed into characters when the data is imported.Please correct me if I am wrong, as I am thinking of using the apply function
here.
Below my data manipulation:
1### data
2v1 <- as.character(c(1,4,2,4,3,1,3,4,5,2,2,3,6,5,4,6,5,4,5,6,6,2,4,3,4,5,6,1,6,3,5,6,3,2,4,5,3,2,4,5,3,2,4))
3v2 <- as.character(c(3,4,1,4,5,1,3,1,5,6,4,3,4,5,6,3,3,5,4,3,3,5,6,3,4,3,4,6,3,1,1,3,4,5,6,1,3,6,4,3,1,6,5))
4
5df <- data.frame(v1,v2)
6
7### transform into ordered factor
8
9df$v1.f <- as.factor(df$v1)
10df$v1.f <- ordered(df$v1.f, levels = c("1", "2", "3", "4", "5", "6"))
11
The real levels are unsorted characters, which is why I included the step. I don't mind typing this for all variables, but it seems redundant.
My second issue is with the output. I would like to create a fancy report and know how to generate the numbers for it:
1### data
2v1 <- as.character(c(1,4,2,4,3,1,3,4,5,2,2,3,6,5,4,6,5,4,5,6,6,2,4,3,4,5,6,1,6,3,5,6,3,2,4,5,3,2,4,5,3,2,4))
3v2 <- as.character(c(3,4,1,4,5,1,3,1,5,6,4,3,4,5,6,3,3,5,4,3,3,5,6,3,4,3,4,6,3,1,1,3,4,5,6,1,3,6,4,3,1,6,5))
4
5df <- data.frame(v1,v2)
6
7### transform into ordered factor
8
9df$v1.f <- as.factor(df$v1)
10df$v1.f <- ordered(df$v1.f, levels = c("1", "2", "3", "4", "5", "6"))
11v1.freq <- table(df$v1.f)
12v1.perc <- round(prop.table(v1.freq),2)*100
13v1.med <- median(df$v1)
14
How can a table that contains all the information for all the variables at once for multiple variables be printed - especially when there are no answers to a level (see v2
, where there is no response for level 2
; table()
simply skips over the level).
How do I turn the R output in a table that has the levels as headers and frequencies and percentages as rows for multiple variables?
Copy/pasting the numbers into an Excel Sheet seems - again - unnecessary and prone to errors.
ANSWER
Answered 2021-Nov-29 at 10:57First, you might want to check if you have a stringAsFactor
option for your data import function.
Then, as I understand, you want to transform your variable into ordered factors, and this for all of them. You can wrap this into a dplyr
sentence, and use forcats
to handle factors. Let's take your data:
1### data
2v1 <- as.character(c(1,4,2,4,3,1,3,4,5,2,2,3,6,5,4,6,5,4,5,6,6,2,4,3,4,5,6,1,6,3,5,6,3,2,4,5,3,2,4,5,3,2,4))
3v2 <- as.character(c(3,4,1,4,5,1,3,1,5,6,4,3,4,5,6,3,3,5,4,3,3,5,6,3,4,3,4,6,3,1,1,3,4,5,6,1,3,6,4,3,1,6,5))
4
5df <- data.frame(v1,v2)
6
7### transform into ordered factor
8
9df$v1.f <- as.factor(df$v1)
10df$v1.f <- ordered(df$v1.f, levels = c("1", "2", "3", "4", "5", "6"))
11v1.freq <- table(df$v1.f)
12v1.perc <- round(prop.table(v1.freq),2)*100
13v1.med <- median(df$v1)
14library(tidyverse)
15df %>%
16 mutate(across(1:2, ~factor(.))) %>%
17 mutate(across(1:2,~ordered(.))) %>%
18 str()
19
Output:
1### data
2v1 <- as.character(c(1,4,2,4,3,1,3,4,5,2,2,3,6,5,4,6,5,4,5,6,6,2,4,3,4,5,6,1,6,3,5,6,3,2,4,5,3,2,4,5,3,2,4))
3v2 <- as.character(c(3,4,1,4,5,1,3,1,5,6,4,3,4,5,6,3,3,5,4,3,3,5,6,3,4,3,4,6,3,1,1,3,4,5,6,1,3,6,4,3,1,6,5))
4
5df <- data.frame(v1,v2)
6
7### transform into ordered factor
8
9df$v1.f <- as.factor(df$v1)
10df$v1.f <- ordered(df$v1.f, levels = c("1", "2", "3", "4", "5", "6"))
11v1.freq <- table(df$v1.f)
12v1.perc <- round(prop.table(v1.freq),2)*100
13v1.med <- median(df$v1)
14library(tidyverse)
15df %>%
16 mutate(across(1:2, ~factor(.))) %>%
17 mutate(across(1:2,~ordered(.))) %>%
18 str()
19'data.frame': 43 obs. of 2 variables:
20 $ v1: Ord.factor w/ 6 levels "1"<"2"<"3"<"4"<..: 1 4 2 4 3 1 3 4 5 2 ...
21 $ v2: Ord.factor w/ 5 levels "1"<"3"<"4"<"5"<..: 2 3 1 3 4 1 2 1 4 5 ...
22
As you can see, the variables are transformed as ordered factors, with levels ordered alphabetically. To explain, mutate
is to alterate your variables, across
specify which variables you want to change, and how. Here, we want to mutate the variable 1
to 2
and apply to them the functions factor
and then ordered
. If the alphabetical levelling isn't the one desired, you can still mutate the column by it self and give the levels
argument.
For the second question, as far as there is no level "2" for V2, unlike V1, you cannot merge the two variable, unless you add a level for V2 with NA. You can still check janitor::tabyl
to give you cross frequencies, and create one table per variable:
1### data
2v1 <- as.character(c(1,4,2,4,3,1,3,4,5,2,2,3,6,5,4,6,5,4,5,6,6,2,4,3,4,5,6,1,6,3,5,6,3,2,4,5,3,2,4,5,3,2,4))
3v2 <- as.character(c(3,4,1,4,5,1,3,1,5,6,4,3,4,5,6,3,3,5,4,3,3,5,6,3,4,3,4,6,3,1,1,3,4,5,6,1,3,6,4,3,1,6,5))
4
5df <- data.frame(v1,v2)
6
7### transform into ordered factor
8
9df$v1.f <- as.factor(df$v1)
10df$v1.f <- ordered(df$v1.f, levels = c("1", "2", "3", "4", "5", "6"))
11v1.freq <- table(df$v1.f)
12v1.perc <- round(prop.table(v1.freq),2)*100
13v1.med <- median(df$v1)
14library(tidyverse)
15df %>%
16 mutate(across(1:2, ~factor(.))) %>%
17 mutate(across(1:2,~ordered(.))) %>%
18 str()
19'data.frame': 43 obs. of 2 variables:
20 $ v1: Ord.factor w/ 6 levels "1"<"2"<"3"<"4"<..: 1 4 2 4 3 1 3 4 5 2 ...
21 $ v2: Ord.factor w/ 5 levels "1"<"3"<"4"<"5"<..: 2 3 1 3 4 1 2 1 4 5 ...
22library(janitor)
23df2 <- df %>%
24 mutate(across(1:2, ~factor(.))) %>%
25 mutate(across(1:2,~ordered(.)))
26
27map(df2, tabyl)
28
Output:
1### data
2v1 <- as.character(c(1,4,2,4,3,1,3,4,5,2,2,3,6,5,4,6,5,4,5,6,6,2,4,3,4,5,6,1,6,3,5,6,3,2,4,5,3,2,4,5,3,2,4))
3v2 <- as.character(c(3,4,1,4,5,1,3,1,5,6,4,3,4,5,6,3,3,5,4,3,3,5,6,3,4,3,4,6,3,1,1,3,4,5,6,1,3,6,4,3,1,6,5))
4
5df <- data.frame(v1,v2)
6
7### transform into ordered factor
8
9df$v1.f <- as.factor(df$v1)
10df$v1.f <- ordered(df$v1.f, levels = c("1", "2", "3", "4", "5", "6"))
11v1.freq <- table(df$v1.f)
12v1.perc <- round(prop.table(v1.freq),2)*100
13v1.med <- median(df$v1)
14library(tidyverse)
15df %>%
16 mutate(across(1:2, ~factor(.))) %>%
17 mutate(across(1:2,~ordered(.))) %>%
18 str()
19'data.frame': 43 obs. of 2 variables:
20 $ v1: Ord.factor w/ 6 levels "1"<"2"<"3"<"4"<..: 1 4 2 4 3 1 3 4 5 2 ...
21 $ v2: Ord.factor w/ 5 levels "1"<"3"<"4"<"5"<..: 2 3 1 3 4 1 2 1 4 5 ...
22library(janitor)
23df2 <- df %>%
24 mutate(across(1:2, ~factor(.))) %>%
25 mutate(across(1:2,~ordered(.)))
26
27map(df2, tabyl)
28$v1
29 .x[[i]] n percent
30 1 3 0.06976744
31 2 7 0.16279070
32 3 8 0.18604651
33 4 10 0.23255814
34 5 8 0.18604651
35 6 7 0.16279070
36
37$v2
38 .x[[i]] n percent
39 1 7 0.1627907
40 3 13 0.3023256
41 4 9 0.2093023
42 5 7 0.1627907
43 6 7 0.1627907
44
QUESTION
I can't read excel file using dt.fread from datatable AttributeError
Asked 2021-Nov-19 at 15:28Hello I'm trying to read an excel file 'myFile.xlsx' using datatable.fread
(version 1.0.0) function to speedup data manipulation.
The problem is I had an AttributeError: module 'xlrd' has no attribute 'xlsx'.
The command I used is:
1import datatable as dt
2DT = dt.fread("myFile.xlsx")
3
I checked the module where the error occurred is the module xls of datatable
package:
1import datatable as dt
2DT = dt.fread("myFile.xlsx")
3def read_xls_workbook(filename, subpath):
4 try:
5 import xlrd
6 # Fixes the warning
7 # "PendingDeprecationWarning: This method will be removed in future
8 # versions. Use 'tree.iter()' or 'list(tree.iter())' instead."
9 xlrd.xlsx.ensure_elementtree_imported(False, None) # Here
10 xlrd.xlsx.Element_has_iter = True # and Here
11
Is there any solution to fix this issue? please.
ANSWER
Answered 2021-Nov-19 at 15:28The issue is that datatable package is not updated yet to make use of xldr>1.2.0, so in order to make it work you have to install xldr = 1.2.0
1import datatable as dt
2DT = dt.fread("myFile.xlsx")
3def read_xls_workbook(filename, subpath):
4 try:
5 import xlrd
6 # Fixes the warning
7 # "PendingDeprecationWarning: This method will be removed in future
8 # versions. Use 'tree.iter()' or 'list(tree.iter())' instead."
9 xlrd.xlsx.ensure_elementtree_imported(False, None) # Here
10 xlrd.xlsx.Element_has_iter = True # and Here
11pip install xldr==1.2.0
12
I hope it helped.
QUESTION
How can I use data.table in a package without importing all functions?
Asked 2021-Oct-27 at 22:46I'm building an R package in which I would like to use dtplyr
to perform various bits of data manipulation. My issue is that dtplyr
seems to only work if I import the whole of data.table
(i.e. using the roxygen #' @import data.table
). Without this I get errors like:
1Error in .(x = sum(x), y = sum(y), :
2 could not find function "."
3
If I can solve this problem by only importing certain functions from data.table
that would be great, but there seems to be no function .()
in the package. My knowledge of data.table
is limited, but I can only assume it uses .()
to edit parsed code (similar to the base R bquote()
), but that dtplyr
for some reason needs data.table
to be loaded for this to work.
I've tried various things such as withr::with_package("data.table", code)
and requireNamespace("data.table")
, but so far importing the whole package is the only thing that seems to work. This is not a viable solution because it completely ruins the well-maintained namespace in the package I'm working on by importing so many functions from data.table
.
NB, this package houses a project which will be worked on by many other analysts well into the future. While simply writing data.table
code may be preferable in terms of performance and general good-practice, using dtplyr
to translate dplyr
code gives a boost in readability and ease-of-use that is far more important in this context.
ANSWER
Answered 2021-Oct-27 at 22:46The (documented) solution I found is to set .datatable.aware <- TRUE
somewhere in the package source code. According to the documentation, if you're using data.table
in a package without importing the whole thing, you should do this so that [.data.table()
does not revert to calling [.data.frame()
. From the docs:
...please define .datatable.aware = TRUE anywhere in your R source code (no need to export). This tells data.table that you as a package developer have designed your code to intentionally rely on data.table functionality even though it may not be obvious from inspecting your NAMESPACE file.
QUESTION
Conflicting object names within a solution
Asked 2021-Oct-13 at 19:08I have a project that does some file and data manipulation using several classes generated from elsewhere. I'm trying to use those generated classes in one place, but I'm running into issues when I add references in ProcessorProject
to more than one of the "Item" projects because the object names conflict with each other.
I know that this could be easily solved by wrapping the generated code within the "Item" classes in their projects' namespace, but I'm trying to avoid modifying those generated files in any way.
Is there any other way around this that I'm not thinking of? A way to add that generated code to the project namespace without actually modifying the files themselves? Something else?
Very simplified model:
1ProcessorProject
2 Processor.cs
3 switch (color)
4 case "Blue":
5 BlueUtility.DoSomething();
6 break;
7 case "Red":
8 RedUtility.DoSomething();
9 break;
10
11BlueItemProject
12 BlueUtility.cs
13 namespace BlueItem
14 class BlueUtility
15 BlueItem.cs [generated]
16 partial class BlueItemInfo
17 public ItemInfo Information
18 public SomeOtherInformation MoreInformation
19 partial class ItemInfo
20 partial class SomeOtherInformation
21
22RedItemProject
23 RedUtility.cs
24 namespace RedItem
25 class RedUtility
26 RedItem.cs [generated]
27 partial class RedItemInfo
28 public ItemInfo Information
29 public SomeOtherInformation MoreInformation
30 partial class ItemInfo
31 partial class SomeOtherInformation
32
33
ANSWER
Answered 2021-Oct-13 at 19:08Create an alias for each reference in the References properties window. Then on the file where you use them write something like this at the top
1ProcessorProject
2 Processor.cs
3 switch (color)
4 case "Blue":
5 BlueUtility.DoSomething();
6 break;
7 case "Red":
8 RedUtility.DoSomething();
9 break;
10
11BlueItemProject
12 BlueUtility.cs
13 namespace BlueItem
14 class BlueUtility
15 BlueItem.cs [generated]
16 partial class BlueItemInfo
17 public ItemInfo Information
18 public SomeOtherInformation MoreInformation
19 partial class ItemInfo
20 partial class SomeOtherInformation
21
22RedItemProject
23 RedUtility.cs
24 namespace RedItem
25 class RedUtility
26 RedItem.cs [generated]
27 partial class RedItemInfo
28 public ItemInfo Information
29 public SomeOtherInformation MoreInformation
30 partial class ItemInfo
31 partial class SomeOtherInformation
32
33extern alias NewAliasOfProject;
34using NewAliasOfProject::NamespaceName;
35
QUESTION
How to update shiny module with reactive dataframe from another module
Asked 2021-Sep-27 at 09:22The goal of this module is create a reactive barplot that changes based on the output of a data selector module. Unfortunately the barplot does not update. It's stuck at the first variable that's selected.
I've tried creating observer functions to update the barplot, to no avail. I've also tried nesting the selector server module within the barplot module, but I get the error: Warning: Error in UseMethod: no applicable method for 'mutate' applied to an object of class "c('reactiveExpr', 'reactive', 'function')"
I just need some way to tell the barplot module to update whenever the data it's fed changes.
Barplot Module:
1#UI
2
3barplotUI <- function(id) {
4 tagList(plotlyOutput(NS(id, "barplot"), height = "300px"))
5}
6
7#Server
8#' @param data Reactive element from another module: reactive(dplyr::filter(austin_map, var == input$var))
9barplotServer <- function(id, data) {
10 moduleServer(id, function(input, output, session) {
11 #Data Manipulation
12 bardata <- reactive({
13 bar <-
14 data |>
15 mutate(
16 `> 50% People of Color` = if_else(`% people of color` >= 0.5, 1, 0),
17 `> 50% Low Income` = if_else(`% low-income` >= 0.5, 1, 0)
18 )
19
20 total_av <- mean(bar$value)
21 poc <- bar |> filter(`> 50% People of Color` == 1)
22 poc_av <- mean(poc$value)
23 lowincome <- bar |> filter(`> 50% Low Income` == 1)
24 lowincome_av <- mean(lowincome$value)
25 bar_to_plotly <-
26 data.frame(
27 y = c(total_av, poc_av, lowincome_av),
28 x = c("Austin Average",
29 "> 50% People of Color",
30 "> 50% Low Income")
31 )
32
33 return(bar_to_plotly)
34 })
35
36 #Plotly Barplot
37 output$barplot <- renderPlotly({
38 plot_ly(
39 x = bardata()$x,
40 y = bardata()$y,
41 color = I("#00a65a"),
42 type = 'bar'
43
44 ) |>
45 config(displayModeBar = FALSE)
46
47 })
48 })
49}
50
EDIT : Data Selector Module
1#UI
2
3barplotUI <- function(id) {
4 tagList(plotlyOutput(NS(id, "barplot"), height = "300px"))
5}
6
7#Server
8#' @param data Reactive element from another module: reactive(dplyr::filter(austin_map, var == input$var))
9barplotServer <- function(id, data) {
10 moduleServer(id, function(input, output, session) {
11 #Data Manipulation
12 bardata <- reactive({
13 bar <-
14 data |>
15 mutate(
16 `> 50% People of Color` = if_else(`% people of color` >= 0.5, 1, 0),
17 `> 50% Low Income` = if_else(`% low-income` >= 0.5, 1, 0)
18 )
19
20 total_av <- mean(bar$value)
21 poc <- bar |> filter(`> 50% People of Color` == 1)
22 poc_av <- mean(poc$value)
23 lowincome <- bar |> filter(`> 50% Low Income` == 1)
24 lowincome_av <- mean(lowincome$value)
25 bar_to_plotly <-
26 data.frame(
27 y = c(total_av, poc_av, lowincome_av),
28 x = c("Austin Average",
29 "> 50% People of Color",
30 "> 50% Low Income")
31 )
32
33 return(bar_to_plotly)
34 })
35
36 #Plotly Barplot
37 output$barplot <- renderPlotly({
38 plot_ly(
39 x = bardata()$x,
40 y = bardata()$y,
41 color = I("#00a65a"),
42 type = 'bar'
43
44 ) |>
45 config(displayModeBar = FALSE)
46
47 })
48 })
49}
50dataInput <- function(id) {
51 tagList(
52 pickerInput(
53 NS(id, "var"),
54 label = NULL,
55 width = '100%',
56 inline = FALSE,
57 options = list(`actions-box` = TRUE,
58 size = 10),
59 choices =list(
60 "O3",
61 "Ozone - CAPCOG",
62 "Percentile for Ozone level in air",
63 "PM2.5",
64 "PM2.5 - CAPCOG",
65 "Percentile for PM2.5 level in air")
66 )
67 )
68}
69
70dataServer <- function(id) {
71 moduleServer(id, function(input, output, session) {
72 austin_map <- readRDS("./data/austin_composite.rds")
73 austin_map <- as.data.frame(austin_map)
74 austin_map$value <- as.numeric(austin_map$value)
75
76 list(
77 var = reactive(input$var),
78 df = reactive(austin_map |> dplyr::filter(var == input$var))
79 )
80
81 })
82}
83
Simplified App
1#UI
2
3barplotUI <- function(id) {
4 tagList(plotlyOutput(NS(id, "barplot"), height = "300px"))
5}
6
7#Server
8#' @param data Reactive element from another module: reactive(dplyr::filter(austin_map, var == input$var))
9barplotServer <- function(id, data) {
10 moduleServer(id, function(input, output, session) {
11 #Data Manipulation
12 bardata <- reactive({
13 bar <-
14 data |>
15 mutate(
16 `> 50% People of Color` = if_else(`% people of color` >= 0.5, 1, 0),
17 `> 50% Low Income` = if_else(`% low-income` >= 0.5, 1, 0)
18 )
19
20 total_av <- mean(bar$value)
21 poc <- bar |> filter(`> 50% People of Color` == 1)
22 poc_av <- mean(poc$value)
23 lowincome <- bar |> filter(`> 50% Low Income` == 1)
24 lowincome_av <- mean(lowincome$value)
25 bar_to_plotly <-
26 data.frame(
27 y = c(total_av, poc_av, lowincome_av),
28 x = c("Austin Average",
29 "> 50% People of Color",
30 "> 50% Low Income")
31 )
32
33 return(bar_to_plotly)
34 })
35
36 #Plotly Barplot
37 output$barplot <- renderPlotly({
38 plot_ly(
39 x = bardata()$x,
40 y = bardata()$y,
41 color = I("#00a65a"),
42 type = 'bar'
43
44 ) |>
45 config(displayModeBar = FALSE)
46
47 })
48 })
49}
50dataInput <- function(id) {
51 tagList(
52 pickerInput(
53 NS(id, "var"),
54 label = NULL,
55 width = '100%',
56 inline = FALSE,
57 options = list(`actions-box` = TRUE,
58 size = 10),
59 choices =list(
60 "O3",
61 "Ozone - CAPCOG",
62 "Percentile for Ozone level in air",
63 "PM2.5",
64 "PM2.5 - CAPCOG",
65 "Percentile for PM2.5 level in air")
66 )
67 )
68}
69
70dataServer <- function(id) {
71 moduleServer(id, function(input, output, session) {
72 austin_map <- readRDS("./data/austin_composite.rds")
73 austin_map <- as.data.frame(austin_map)
74 austin_map$value <- as.numeric(austin_map$value)
75
76 list(
77 var = reactive(input$var),
78 df = reactive(austin_map |> dplyr::filter(var == input$var))
79 )
80
81 })
82}
83library(shiny)
84library(tidyverse)
85library(plotly)
86
87source("barplot.r")
88source("datamod.r")
89
90
91ui = fluidPage(
92 fluidRow(
93 dataInput("data"),
94 barplotUI("barplot")
95 )
96 )
97
98server <- function(input, output, session) {
99 data <- dataServer("data")
100 variable <- data$df
101
102
103 barplotServer("barplot", data = variable())
104
105}
106
107shinyApp(ui, server)
108
109
ANSWER
Answered 2021-Sep-27 at 09:22As I wrote in my comment, passing a reactive dataset as an argument to a module server is no different to passing an argument of any other type.
Here's a MWE that illustrates the concept, passing either mtcars
or a data frame of random values between a selection module and a display module.
The critical point is that the selection module returns the reactive [data
], not the reactive's value [data()
] to the main server function and, in turn, the reactive, not the reactive's value is passed as a parameter to the plot module.
1#UI
2
3barplotUI <- function(id) {
4 tagList(plotlyOutput(NS(id, "barplot"), height = "300px"))
5}
6
7#Server
8#' @param data Reactive element from another module: reactive(dplyr::filter(austin_map, var == input$var))
9barplotServer <- function(id, data) {
10 moduleServer(id, function(input, output, session) {
11 #Data Manipulation
12 bardata <- reactive({
13 bar <-
14 data |>
15 mutate(
16 `> 50% People of Color` = if_else(`% people of color` >= 0.5, 1, 0),
17 `> 50% Low Income` = if_else(`% low-income` >= 0.5, 1, 0)
18 )
19
20 total_av <- mean(bar$value)
21 poc <- bar |> filter(`> 50% People of Color` == 1)
22 poc_av <- mean(poc$value)
23 lowincome <- bar |> filter(`> 50% Low Income` == 1)
24 lowincome_av <- mean(lowincome$value)
25 bar_to_plotly <-
26 data.frame(
27 y = c(total_av, poc_av, lowincome_av),
28 x = c("Austin Average",
29 "> 50% People of Color",
30 "> 50% Low Income")
31 )
32
33 return(bar_to_plotly)
34 })
35
36 #Plotly Barplot
37 output$barplot <- renderPlotly({
38 plot_ly(
39 x = bardata()$x,
40 y = bardata()$y,
41 color = I("#00a65a"),
42 type = 'bar'
43
44 ) |>
45 config(displayModeBar = FALSE)
46
47 })
48 })
49}
50dataInput <- function(id) {
51 tagList(
52 pickerInput(
53 NS(id, "var"),
54 label = NULL,
55 width = '100%',
56 inline = FALSE,
57 options = list(`actions-box` = TRUE,
58 size = 10),
59 choices =list(
60 "O3",
61 "Ozone - CAPCOG",
62 "Percentile for Ozone level in air",
63 "PM2.5",
64 "PM2.5 - CAPCOG",
65 "Percentile for PM2.5 level in air")
66 )
67 )
68}
69
70dataServer <- function(id) {
71 moduleServer(id, function(input, output, session) {
72 austin_map <- readRDS("./data/austin_composite.rds")
73 austin_map <- as.data.frame(austin_map)
74 austin_map$value <- as.numeric(austin_map$value)
75
76 list(
77 var = reactive(input$var),
78 df = reactive(austin_map |> dplyr::filter(var == input$var))
79 )
80
81 })
82}
83library(shiny)
84library(tidyverse)
85library(plotly)
86
87source("barplot.r")
88source("datamod.r")
89
90
91ui = fluidPage(
92 fluidRow(
93 dataInput("data"),
94 barplotUI("barplot")
95 )
96 )
97
98server <- function(input, output, session) {
99 data <- dataServer("data")
100 variable <- data$df
101
102
103 barplotServer("barplot", data = variable())
104
105}
106
107shinyApp(ui, server)
108
109library(shiny)
110library(ggplot2)
111
112# Select module
113selectUI <- function(id) {
114 ns <- NS(id)
115 selectInput(ns("select"), "Select a dataset", c("mtcars", "random"))
116}
117
118selectServer <- function(id) {
119 moduleServer(
120 id,
121 function(input, output, session) {
122 data <- reactive({
123 if (input$select == "mtcars") {
124 mtcars
125 } else {
126 tibble(x=runif(10), y=rnorm(10), z=rbinom(n=10, size=20, prob=0.3))
127 }
128 })
129
130 return(data)
131 }
132 )
133}
134
135# Barplot module
136barplotUI <- function(id) {
137 ns <- NS(id)
138
139 tagList(
140 selectInput(ns("variable"), "Select variable:", choices=c()),
141 plotOutput(ns("plot"))
142 )
143}
144
145barplotServer <- function(id, plotData) {
146 moduleServer(
147 id,
148 function(input, output, session) {
149 ns <- NS(id)
150
151 observeEvent(plotData(), {
152 updateSelectInput(
153 session,
154 "variable",
155 choices=names(plotData()),
156 selected=names(plotData()[1])
157 )
158 })
159
160 output$plot <- renderPlot({
161 # There's an irritating transient error as the dataset
162 # changes, but handling it would
163 # detract from the purpose of this answer
164 plotData() %>%
165 ggplot() + geom_bar(aes_string(x=input$variable))
166
167 })
168 }
169 )
170}
171
172# Main UI
173ui <- fluidPage(
174 selectUI("select"),
175 barplotUI("plot")
176)
177
178# Main server
179server <- function(input, output, session) {
180 selectedData <- selectServer("select")
181 barplotServer <- barplotServer("plot", plotData=selectedData)
182}
183
184# Run the application
185shinyApp(ui = ui, server = server)
186
QUESTION
R command `group_by`
Asked 2021-Sep-22 at 09:13I am not able to understand exactly how this code works. I have found it on a tutorial guide:
Data manipulation in R - Steph Locke
on page 133 an example that I am able to understand only partially.
1library(tidyverse)
2library(nycflights13)
3
4flights %>%
5group_by(month, carrier) %>%
6summarise(n=n()) %>% ##sum of items;
7group_by(month) %>%
8mutate(prop=scales::percent(n/sum(n)), n=NULL) %>%
9spread(month, prop)
10
11
12flights %>%
13group_by(month, carrier) %>% ## This is grouping by months and within the months by carrier;
14summarise(n=n()) %>% ## It is summing the items, giving for each month and each carrier the sum of items;
15
At this point there in another group_by()
, it looks like a nested to group_by(month, carrier)
Then:
1library(tidyverse)
2library(nycflights13)
3
4flights %>%
5group_by(month, carrier) %>%
6summarise(n=n()) %>% ##sum of items;
7group_by(month) %>%
8mutate(prop=scales::percent(n/sum(n)), n=NULL) %>%
9spread(month, prop)
10
11
12flights %>%
13group_by(month, carrier) %>% ## This is grouping by months and within the months by carrier;
14summarise(n=n()) %>% ## It is summing the items, giving for each month and each carrier the sum of items;
15mutate(prop=scales::percent(n/sum(n)), n=NULL) %>% ## Calculates the percentage of items over the total and store them in "prop"
16
Last line it creates the matrix, putting in the columns month
and inside the value obtained from prop
I would like to understand better what is doing exactly the second group_by(month) %>%
Thank you in advance for every reply.
ANSWER
Answered 2021-Sep-22 at 09:04The second group_by
is not needed here as by default summarise
step argument .groups = "drop_last"
. Therefore, after the first summarise
, there is only a single grouping column i.e. 'month' remains. We can change the code to
1library(tidyverse)
2library(nycflights13)
3
4flights %>%
5group_by(month, carrier) %>%
6summarise(n=n()) %>% ##sum of items;
7group_by(month) %>%
8mutate(prop=scales::percent(n/sum(n)), n=NULL) %>%
9spread(month, prop)
10
11
12flights %>%
13group_by(month, carrier) %>% ## This is grouping by months and within the months by carrier;
14summarise(n=n()) %>% ## It is summing the items, giving for each month and each carrier the sum of items;
15mutate(prop=scales::percent(n/sum(n)), n=NULL) %>% ## Calculates the percentage of items over the total and store them in "prop"
16flights %>%
17 group_by(month, carrier) %>%
18 summarise(n=n()) %>%
19 mutate(prop=scales::percent(n/sum(n)), n=NULL)
20
Suppose, we change the default value in .groups
to "drop", then, it will drop all the grouping variables, and thus a new group_by statement is needed. Also, after the last grouping statement, if we are using mutate
, it wouldn't drop the group attributes and thus ungroup
would be useful
1library(tidyverse)
2library(nycflights13)
3
4flights %>%
5group_by(month, carrier) %>%
6summarise(n=n()) %>% ##sum of items;
7group_by(month) %>%
8mutate(prop=scales::percent(n/sum(n)), n=NULL) %>%
9spread(month, prop)
10
11
12flights %>%
13group_by(month, carrier) %>% ## This is grouping by months and within the months by carrier;
14summarise(n=n()) %>% ## It is summing the items, giving for each month and each carrier the sum of items;
15mutate(prop=scales::percent(n/sum(n)), n=NULL) %>% ## Calculates the percentage of items over the total and store them in "prop"
16flights %>%
17 group_by(month, carrier) %>%
18 summarise(n=n()) %>%
19 mutate(prop=scales::percent(n/sum(n)), n=NULL)
20flights %>%
21 group_by(month, carrier) %>%
22 summarise(n=n(), .groups = "drop") %>%
23 group_by(month) %>%
24 mutate(prop=scales::percent(n/sum(n)), n=NULL) %>%
25 ungroup
26
Community Discussions contain sources that include Stack Exchange Network
Tutorials and Learning Resources in Data Manipulation
Tutorials and Learning Resources are not available at this moment for Data Manipulation