kandi background
Explore Kits

regression | Multivariable regression library in Go | Testing library

 by   sajari Go Version: v1.0.1 License: MIT

 by   sajari Go Version: v1.0.1 License: MIT

Download this library from

kandi X-RAY | regression Summary

regression is a Go library typically used in Testing applications. regression has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.
Multivariable regression library in Go
Support
Support
Quality
Quality
Security
Security
License
License
Reuse
Reuse

kandi-support Support

  • regression has a low active ecosystem.
  • It has 311 star(s) with 49 fork(s). There are 17 watchers for this library.
  • It had no major release in the last 12 months.
  • There are 0 open issues and 14 have been closed. On average issues are closed in 114 days. There are 3 open pull requests and 0 closed requests.
  • It has a neutral sentiment in the developer community.
  • The latest version of regression is v1.0.1
regression Support
Best in #Testing
Average in #Testing
regression Support
Best in #Testing
Average in #Testing

quality kandi Quality

  • regression has 0 bugs and 0 code smells.
regression Quality
Best in #Testing
Average in #Testing
regression Quality
Best in #Testing
Average in #Testing

securitySecurity

  • regression has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
  • regression code analysis shows 0 unresolved vulnerabilities.
  • There are 0 security hotspots that need review.
regression Security
Best in #Testing
Average in #Testing
regression Security
Best in #Testing
Average in #Testing

license License

  • regression is licensed under the MIT License. This license is Permissive.
  • Permissive licenses have the least restrictions, and you can use them in most projects.
regression License
Best in #Testing
Average in #Testing
regression License
Best in #Testing
Average in #Testing

buildReuse

  • regression releases are available to install and integrate.
  • Installation instructions are not available. Examples and code snippets are available.
  • It has 522 lines of code, 31 functions and 4 files.
  • It has high code complexity. Code complexity directly impacts maintainability of the code.
regression Reuse
Best in #Testing
Average in #Testing
regression Reuse
Best in #Testing
Average in #Testing
Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample Here

Get all kandi verified functions for this library.

Get all kandi verified functions for this library.

regression Key Features

Multivariable regression library in Go

regression Examples and Code Snippets

See all related Code Snippets

default

copy iconCopydownload iconDownload
    $ go get github.com/sajari/regression

Supports Go 1.8+

example usage
-------------

Import the package, create a regression and add data to it. You can use as many variables as you like, in the below example there are 3 variables for each observation.

```go
package main

import (
	"fmt"

	"github.com/sajari/regression"
)

func main() {
	r := new(regression.Regression)
	r.SetObserved("Murders per annum per 1,000,000 inhabitants")
	r.SetVar(0, "Inhabitants")
	r.SetVar(1, "Percent with incomes below $5000")
	r.SetVar(2, "Percent unemployed")
	r.Train(
		regression.DataPoint(11.2, []float64{587000, 16.5, 6.2}),
		regression.DataPoint(13.4, []float64{643000, 20.5, 6.4}),
		regression.DataPoint(40.7, []float64{635000, 26.3, 9.3}),
		regression.DataPoint(5.3, []float64{692000, 16.5, 5.3}),
		regression.DataPoint(24.8, []float64{1248000, 19.2, 7.3}),
		regression.DataPoint(12.7, []float64{643000, 16.5, 5.9}),
		regression.DataPoint(20.9, []float64{1964000, 20.2, 6.4}),
		regression.DataPoint(35.7, []float64{1531000, 21.3, 7.6}),
		regression.DataPoint(8.7, []float64{713000, 17.2, 4.9}),
		regression.DataPoint(9.6, []float64{749000, 14.3, 6.4}),
		regression.DataPoint(14.5, []float64{7895000, 18.1, 6}),
		regression.DataPoint(26.9, []float64{762000, 23.1, 7.4}),
		regression.DataPoint(15.7, []float64{2793000, 19.1, 5.8}),
		regression.DataPoint(36.2, []float64{741000, 24.7, 8.6}),
		regression.DataPoint(18.1, []float64{625000, 18.6, 6.5}),
		regression.DataPoint(28.9, []float64{854000, 24.9, 8.3}),
		regression.DataPoint(14.9, []float64{716000, 17.9, 6.7}),
		regression.DataPoint(25.8, []float64{921000, 22.4, 8.6}),
		regression.DataPoint(21.7, []float64{595000, 20.2, 8.4}),
		regression.DataPoint(25.7, []float64{3353000, 16.9, 6.7}),
	)
	r.Run()

	fmt.Printf("Regression formula:\n%v\n", r.Formula)
	fmt.Printf("Regression:\n%s\n", r)
}
```

Note: You can also add data points one by one.

Once calculated you can print the data, look at the R^2, Variance, residuals, etc. You can also access the coefficients directly to use elsewhere, e.g.

```go
// Get the coefficient for the "Inhabitants" variable 0:
c := r.Coeff(0)
```

You can also use the model to predict new data points

```go
prediction, err := r.Predict([]float64{587000, 16.5, 6.2})
```

Feature crosses are supported so your model can capture fixed non-linear relationships

```go

r.Train(
  regression.DataPoint(11.2, []float64{587000, 16.5, 6.2}),
)
//Add a new feature which is the first variable (index 0) to the power of 2
r.AddCross(PowCross(0, 2))
r.Run()

```

ggplot2: Projecting points or distribution on a non-orthogonal (eg, -45 degree) axis

copy iconCopydownload iconDownload
(my_hist <- df %>%
    mutate(gain = final - initial) %>% # gain would be better name
    ggplot(aes(gain)) +
    geom_density())
a <- ggplot_build(my_hist)
rot = pi * 3/4
diag_hist <- tibble(
  x = a[["data"]][[1]][["x"]],
  y = a[["data"]][[1]][["y"]]
) %>%
  # squish
  mutate(y = y*0.2) %>%
  # rotate 135 deg CCW
  mutate(xy = x*cos(rot) - y*sin(rot),
         dens = x*sin(rot) + y*cos(rot)) %>%
  # slide
  mutate(xy = xy - 0.7,  #  magic number based on plot range below
         dens = dens - 0.7)
ggplot(df, aes(x = initial, y = final, color = group)) +
  geom_point() + 
  geom_smooth(method = "lm", formula  =  y~x) +
  stat_ellipse(size = 1.2) +
  geom_abline(slope  =  1, color = "black", size = 1.2) +
  coord_fixed(clip = "off", 
              xlim = c(-0.7,1.6),
              ylim = c(-0.7,1.6), 
              expand = expansion(0)) +
  annotate("segment", x = -1.4, xend = 0, y = 0, yend = -1.4) +
  annotate("path", x = diag_hist$xy, y = diag_hist$dens) +
  theme_bw() +
  theme(legend.position = c(.15, .85), 
        plot.margin = unit(c(.1,.1,2,2), "cm")) 
(my_hist <- df %>%
    mutate(gain = final - initial) %>% # gain would be better name
    ggplot(aes(gain)) +
    geom_density())
a <- ggplot_build(my_hist)
rot = pi * 3/4
diag_hist <- tibble(
  x = a[["data"]][[1]][["x"]],
  y = a[["data"]][[1]][["y"]]
) %>%
  # squish
  mutate(y = y*0.2) %>%
  # rotate 135 deg CCW
  mutate(xy = x*cos(rot) - y*sin(rot),
         dens = x*sin(rot) + y*cos(rot)) %>%
  # slide
  mutate(xy = xy - 0.7,  #  magic number based on plot range below
         dens = dens - 0.7)
ggplot(df, aes(x = initial, y = final, color = group)) +
  geom_point() + 
  geom_smooth(method = "lm", formula  =  y~x) +
  stat_ellipse(size = 1.2) +
  geom_abline(slope  =  1, color = "black", size = 1.2) +
  coord_fixed(clip = "off", 
              xlim = c(-0.7,1.6),
              ylim = c(-0.7,1.6), 
              expand = expansion(0)) +
  annotate("segment", x = -1.4, xend = 0, y = 0, yend = -1.4) +
  annotate("path", x = diag_hist$xy, y = diag_hist$dens) +
  theme_bw() +
  theme(legend.position = c(.15, .85), 
        plot.margin = unit(c(.1,.1,2,2), "cm")) 
(my_hist <- df %>%
    mutate(gain = final - initial) %>% # gain would be better name
    ggplot(aes(gain)) +
    geom_density())
a <- ggplot_build(my_hist)
rot = pi * 3/4
diag_hist <- tibble(
  x = a[["data"]][[1]][["x"]],
  y = a[["data"]][[1]][["y"]]
) %>%
  # squish
  mutate(y = y*0.2) %>%
  # rotate 135 deg CCW
  mutate(xy = x*cos(rot) - y*sin(rot),
         dens = x*sin(rot) + y*cos(rot)) %>%
  # slide
  mutate(xy = xy - 0.7,  #  magic number based on plot range below
         dens = dens - 0.7)
ggplot(df, aes(x = initial, y = final, color = group)) +
  geom_point() + 
  geom_smooth(method = "lm", formula  =  y~x) +
  stat_ellipse(size = 1.2) +
  geom_abline(slope  =  1, color = "black", size = 1.2) +
  coord_fixed(clip = "off", 
              xlim = c(-0.7,1.6),
              ylim = c(-0.7,1.6), 
              expand = expansion(0)) +
  annotate("segment", x = -1.4, xend = 0, y = 0, yend = -1.4) +
  annotate("path", x = diag_hist$xy, y = diag_hist$dens) +
  theme_bw() +
  theme(legend.position = c(.15, .85), 
        plot.margin = unit(c(.1,.1,2,2), "cm")) 

Why does summary() show different standard errors than coeftest()?

copy iconCopydownload iconDownload
fit <- glm(am ~ hp + cyl, family=binomial(link="logit"), mtcars)

summary(fit)
#             Estimate Std. Error z value Pr(>|z|)   
# (Intercept)  5.83220    2.06595   2.823  0.00476 **
# hp           0.02775    0.01366   2.031  0.04228 * 
# cyl         -1.70306    0.60286  -2.825  0.00473 **
library(sandwich); library(lmtest)

coeftest(fit, vcov.=vcov(fit))
#              Estimate Std. Error z value Pr(>|z|)   
# (Intercept)  5.832204   2.065949  2.8230 0.004757 **
# hp           0.027748   0.013664  2.0307 0.042282 * 
# cyl         -1.703064   0.602862 -2.8250 0.004729 **
(ct <- coeftest(fit, vcov.=vcovHC(fit, type='HC0')))
#              Estimate Std. Error z value Pr(>|z|)   
# (Intercept)  5.832204   2.139307  2.7262 0.006407 **
# hp           0.027748   0.012254  2.2643 0.023553 * 
# cyl         -1.703064   0.572045 -2.9772 0.002909 **
texreg::screenreg(fit)
# =========================
#                 Model 1  
# -------------------------
# (Intercept)       5.83 **
#                  (2.07)  
# hp                0.03 * 
#                  (0.01)  
# cyl              -1.70 **
#                  (0.60)  
# -------------------------
# AIC              34.63   
# BIC              39.02   
# Log Likelihood  -14.31   
# Deviance         28.63   
# Num. obs.        32      
# =========================
# *** p < 0.001; ** p < 0.01; * p < 0.05
texreg::screenreg(fit, 
                  override.pvalues=ct[, 4],
                  override.se=ct[, 3])
# =========================
#                 Model 1  
# -------------------------
# (Intercept)       5.83 **
#                  (2.73)  
# hp                0.03 * 
#                  (2.26)  
# cyl              -1.70 **
#                 (-2.98)  
# -------------------------
# AIC              34.63   
# BIC              39.02   
# Log Likelihood  -14.31   
# Deviance         28.63   
# Num. obs.        32      
# =========================
# *** p < 0.001; ** p < 0.01; * p < 0.05
fit <- glm(am ~ hp + cyl, family=binomial(link="logit"), mtcars)

summary(fit)
#             Estimate Std. Error z value Pr(>|z|)   
# (Intercept)  5.83220    2.06595   2.823  0.00476 **
# hp           0.02775    0.01366   2.031  0.04228 * 
# cyl         -1.70306    0.60286  -2.825  0.00473 **
library(sandwich); library(lmtest)

coeftest(fit, vcov.=vcov(fit))
#              Estimate Std. Error z value Pr(>|z|)   
# (Intercept)  5.832204   2.065949  2.8230 0.004757 **
# hp           0.027748   0.013664  2.0307 0.042282 * 
# cyl         -1.703064   0.602862 -2.8250 0.004729 **
(ct <- coeftest(fit, vcov.=vcovHC(fit, type='HC0')))
#              Estimate Std. Error z value Pr(>|z|)   
# (Intercept)  5.832204   2.139307  2.7262 0.006407 **
# hp           0.027748   0.012254  2.2643 0.023553 * 
# cyl         -1.703064   0.572045 -2.9772 0.002909 **
texreg::screenreg(fit)
# =========================
#                 Model 1  
# -------------------------
# (Intercept)       5.83 **
#                  (2.07)  
# hp                0.03 * 
#                  (0.01)  
# cyl              -1.70 **
#                  (0.60)  
# -------------------------
# AIC              34.63   
# BIC              39.02   
# Log Likelihood  -14.31   
# Deviance         28.63   
# Num. obs.        32      
# =========================
# *** p < 0.001; ** p < 0.01; * p < 0.05
texreg::screenreg(fit, 
                  override.pvalues=ct[, 4],
                  override.se=ct[, 3])
# =========================
#                 Model 1  
# -------------------------
# (Intercept)       5.83 **
#                  (2.73)  
# hp                0.03 * 
#                  (2.26)  
# cyl              -1.70 **
#                 (-2.98)  
# -------------------------
# AIC              34.63   
# BIC              39.02   
# Log Likelihood  -14.31   
# Deviance         28.63   
# Num. obs.        32      
# =========================
# *** p < 0.001; ** p < 0.01; * p < 0.05
fit <- glm(am ~ hp + cyl, family=binomial(link="logit"), mtcars)

summary(fit)
#             Estimate Std. Error z value Pr(>|z|)   
# (Intercept)  5.83220    2.06595   2.823  0.00476 **
# hp           0.02775    0.01366   2.031  0.04228 * 
# cyl         -1.70306    0.60286  -2.825  0.00473 **
library(sandwich); library(lmtest)

coeftest(fit, vcov.=vcov(fit))
#              Estimate Std. Error z value Pr(>|z|)   
# (Intercept)  5.832204   2.065949  2.8230 0.004757 **
# hp           0.027748   0.013664  2.0307 0.042282 * 
# cyl         -1.703064   0.602862 -2.8250 0.004729 **
(ct <- coeftest(fit, vcov.=vcovHC(fit, type='HC0')))
#              Estimate Std. Error z value Pr(>|z|)   
# (Intercept)  5.832204   2.139307  2.7262 0.006407 **
# hp           0.027748   0.012254  2.2643 0.023553 * 
# cyl         -1.703064   0.572045 -2.9772 0.002909 **
texreg::screenreg(fit)
# =========================
#                 Model 1  
# -------------------------
# (Intercept)       5.83 **
#                  (2.07)  
# hp                0.03 * 
#                  (0.01)  
# cyl              -1.70 **
#                  (0.60)  
# -------------------------
# AIC              34.63   
# BIC              39.02   
# Log Likelihood  -14.31   
# Deviance         28.63   
# Num. obs.        32      
# =========================
# *** p < 0.001; ** p < 0.01; * p < 0.05
texreg::screenreg(fit, 
                  override.pvalues=ct[, 4],
                  override.se=ct[, 3])
# =========================
#                 Model 1  
# -------------------------
# (Intercept)       5.83 **
#                  (2.73)  
# hp                0.03 * 
#                  (2.26)  
# cyl              -1.70 **
#                 (-2.98)  
# -------------------------
# AIC              34.63   
# BIC              39.02   
# Log Likelihood  -14.31   
# Deviance         28.63   
# Num. obs.        32      
# =========================
# *** p < 0.001; ** p < 0.01; * p < 0.05
fit <- glm(am ~ hp + cyl, family=binomial(link="logit"), mtcars)

summary(fit)
#             Estimate Std. Error z value Pr(>|z|)   
# (Intercept)  5.83220    2.06595   2.823  0.00476 **
# hp           0.02775    0.01366   2.031  0.04228 * 
# cyl         -1.70306    0.60286  -2.825  0.00473 **
library(sandwich); library(lmtest)

coeftest(fit, vcov.=vcov(fit))
#              Estimate Std. Error z value Pr(>|z|)   
# (Intercept)  5.832204   2.065949  2.8230 0.004757 **
# hp           0.027748   0.013664  2.0307 0.042282 * 
# cyl         -1.703064   0.602862 -2.8250 0.004729 **
(ct <- coeftest(fit, vcov.=vcovHC(fit, type='HC0')))
#              Estimate Std. Error z value Pr(>|z|)   
# (Intercept)  5.832204   2.139307  2.7262 0.006407 **
# hp           0.027748   0.012254  2.2643 0.023553 * 
# cyl         -1.703064   0.572045 -2.9772 0.002909 **
texreg::screenreg(fit)
# =========================
#                 Model 1  
# -------------------------
# (Intercept)       5.83 **
#                  (2.07)  
# hp                0.03 * 
#                  (0.01)  
# cyl              -1.70 **
#                  (0.60)  
# -------------------------
# AIC              34.63   
# BIC              39.02   
# Log Likelihood  -14.31   
# Deviance         28.63   
# Num. obs.        32      
# =========================
# *** p < 0.001; ** p < 0.01; * p < 0.05
texreg::screenreg(fit, 
                  override.pvalues=ct[, 4],
                  override.se=ct[, 3])
# =========================
#                 Model 1  
# -------------------------
# (Intercept)       5.83 **
#                  (2.73)  
# hp                0.03 * 
#                  (2.26)  
# cyl              -1.70 **
#                 (-2.98)  
# -------------------------
# AIC              34.63   
# BIC              39.02   
# Log Likelihood  -14.31   
# Deviance         28.63   
# Num. obs.        32      
# =========================
# *** p < 0.001; ** p < 0.01; * p < 0.05
fit <- glm(am ~ hp + cyl, family=binomial(link="logit"), mtcars)

summary(fit)
#             Estimate Std. Error z value Pr(>|z|)   
# (Intercept)  5.83220    2.06595   2.823  0.00476 **
# hp           0.02775    0.01366   2.031  0.04228 * 
# cyl         -1.70306    0.60286  -2.825  0.00473 **
library(sandwich); library(lmtest)

coeftest(fit, vcov.=vcov(fit))
#              Estimate Std. Error z value Pr(>|z|)   
# (Intercept)  5.832204   2.065949  2.8230 0.004757 **
# hp           0.027748   0.013664  2.0307 0.042282 * 
# cyl         -1.703064   0.602862 -2.8250 0.004729 **
(ct <- coeftest(fit, vcov.=vcovHC(fit, type='HC0')))
#              Estimate Std. Error z value Pr(>|z|)   
# (Intercept)  5.832204   2.139307  2.7262 0.006407 **
# hp           0.027748   0.012254  2.2643 0.023553 * 
# cyl         -1.703064   0.572045 -2.9772 0.002909 **
texreg::screenreg(fit)
# =========================
#                 Model 1  
# -------------------------
# (Intercept)       5.83 **
#                  (2.07)  
# hp                0.03 * 
#                  (0.01)  
# cyl              -1.70 **
#                  (0.60)  
# -------------------------
# AIC              34.63   
# BIC              39.02   
# Log Likelihood  -14.31   
# Deviance         28.63   
# Num. obs.        32      
# =========================
# *** p < 0.001; ** p < 0.01; * p < 0.05
texreg::screenreg(fit, 
                  override.pvalues=ct[, 4],
                  override.se=ct[, 3])
# =========================
#                 Model 1  
# -------------------------
# (Intercept)       5.83 **
#                  (2.73)  
# hp                0.03 * 
#                  (2.26)  
# cyl              -1.70 **
#                 (-2.98)  
# -------------------------
# AIC              34.63   
# BIC              39.02   
# Log Likelihood  -14.31   
# Deviance         28.63   
# Num. obs.        32      
# =========================
# *** p < 0.001; ** p < 0.01; * p < 0.05

Running single linear regressions across multiple variables, in groups

copy iconCopydownload iconDownload
library(tidyverse)

ivs <- colnames(mtcars)[3:ncol(mtcars)]
names(ivs) <- ivs

mtcars %>% 
  group_by(cyl) %>% 
  group_modify(function(data, key) {
    map_df(ivs, function(iv) {
      frml <- as.formula(paste("mpg", "~", iv))
      lm(frml, data = data) %>% broom::glance()
      }, .id = "iv") 
  }) %>% 
  select(cyl, iv, r.squared, p.value)

# A tibble: 27 × 4
# Groups:   cyl [3]
     cyl iv    r.squared  p.value
   <dbl> <chr>     <dbl>    <dbl>
 1     4 disp  0.648      0.00278
 2     4 hp    0.274      0.0984 
 3     4 drat  0.180      0.193  
 4     4 wt    0.509      0.0137 
 5     4 qsec  0.0557     0.485  
 6     4 vs    0.00238    0.887  
 7     4 am    0.287      0.0892 
 8     4 gear  0.115      0.308  
 9     4 carb  0.0378     0.567  
10     6 disp  0.0106     0.826  
11     6 hp    0.0161     0.786  
# ...

logistic regression and GridSearchCV using python sklearn

copy iconCopydownload iconDownload
res = pd.DataFrame(logreg_cv.cv_results_)
res.iloc[:,res.columns.str.contains("split[0-9]_test_score|params",regex=True)]
 
                           params  split0_test_score  split1_test_score  split2_test_score
0   {'C': 0.001, 'penalty': 'l2'}           0.000000           0.000000           0.000000
1    {'C': 0.01, 'penalty': 'l2'}           0.000000           0.000000           0.000000
2     {'C': 0.1, 'penalty': 'l2'}           0.973568           0.952607           0.952174
3     {'C': 1.0, 'penalty': 'l2'}           0.863934           0.851064           0.836449
4    {'C': 10.0, 'penalty': 'l2'}           0.811634           0.769547           0.787838
5   {'C': 100.0, 'penalty': 'l2'}           0.789826           0.762162           0.773438
6  {'C': 1000.0, 'penalty': 'l2'}           0.781003           0.750000           0.763871
lr = LogisticRegression(C=0.01).fit(X_train_vectors_tfidf,y_train)
np.unique(lr.predict(X_train_vectors_tfidf))
array([0])
# expected probability
np.exp(lr.intercept_)/(1+np.exp(lr.intercept_))
array([0.41764462])

lr.predict_proba(X_train_vectors_tfidf)
 
array([[0.58732636, 0.41267364],
       [0.57074279, 0.42925721],
       [0.57219143, 0.42780857],
       ...,
       [0.57215605, 0.42784395],
       [0.56988186, 0.43011814],
       [0.58966184, 0.41033816]])
res = pd.DataFrame(logreg_cv.cv_results_)
res.iloc[:,res.columns.str.contains("split[0-9]_test_score|params",regex=True)]
 
                           params  split0_test_score  split1_test_score  split2_test_score
0   {'C': 0.001, 'penalty': 'l2'}           0.000000           0.000000           0.000000
1    {'C': 0.01, 'penalty': 'l2'}           0.000000           0.000000           0.000000
2     {'C': 0.1, 'penalty': 'l2'}           0.973568           0.952607           0.952174
3     {'C': 1.0, 'penalty': 'l2'}           0.863934           0.851064           0.836449
4    {'C': 10.0, 'penalty': 'l2'}           0.811634           0.769547           0.787838
5   {'C': 100.0, 'penalty': 'l2'}           0.789826           0.762162           0.773438
6  {'C': 1000.0, 'penalty': 'l2'}           0.781003           0.750000           0.763871
lr = LogisticRegression(C=0.01).fit(X_train_vectors_tfidf,y_train)
np.unique(lr.predict(X_train_vectors_tfidf))
array([0])
# expected probability
np.exp(lr.intercept_)/(1+np.exp(lr.intercept_))
array([0.41764462])

lr.predict_proba(X_train_vectors_tfidf)
 
array([[0.58732636, 0.41267364],
       [0.57074279, 0.42925721],
       [0.57219143, 0.42780857],
       ...,
       [0.57215605, 0.42784395],
       [0.56988186, 0.43011814],
       [0.58966184, 0.41033816]])
res = pd.DataFrame(logreg_cv.cv_results_)
res.iloc[:,res.columns.str.contains("split[0-9]_test_score|params",regex=True)]
 
                           params  split0_test_score  split1_test_score  split2_test_score
0   {'C': 0.001, 'penalty': 'l2'}           0.000000           0.000000           0.000000
1    {'C': 0.01, 'penalty': 'l2'}           0.000000           0.000000           0.000000
2     {'C': 0.1, 'penalty': 'l2'}           0.973568           0.952607           0.952174
3     {'C': 1.0, 'penalty': 'l2'}           0.863934           0.851064           0.836449
4    {'C': 10.0, 'penalty': 'l2'}           0.811634           0.769547           0.787838
5   {'C': 100.0, 'penalty': 'l2'}           0.789826           0.762162           0.773438
6  {'C': 1000.0, 'penalty': 'l2'}           0.781003           0.750000           0.763871
lr = LogisticRegression(C=0.01).fit(X_train_vectors_tfidf,y_train)
np.unique(lr.predict(X_train_vectors_tfidf))
array([0])
# expected probability
np.exp(lr.intercept_)/(1+np.exp(lr.intercept_))
array([0.41764462])

lr.predict_proba(X_train_vectors_tfidf)
 
array([[0.58732636, 0.41267364],
       [0.57074279, 0.42925721],
       [0.57219143, 0.42780857],
       ...,
       [0.57215605, 0.42784395],
       [0.56988186, 0.43011814],
       [0.58966184, 0.41033816]])

Closing subwindow with parent window on tkinter

copy iconCopydownload iconDownload
def Destroy_subwindow(self, event):
    self.parent.destroy()
import tkinter as tk 

class MainApplication():
    def __init__(self, parent):
        # Main window
        self.parent = parent
        but1 = tk.Button(self.parent, text="Main window", width = 30, command=self.sub_window)
        but1.pack()
        self.parent.protocol("WM_DELETE_WINDOW", self.on_main_close)

    def sub_window(self):
        # Sub window(s)
        self.window = tk.Toplevel(self.parent)
        but2 = tk.Button(self.window, text="Sub window", width = 30)
        but2.pack()

    def on_main_close(self):
        print("Closing everything")
        self.parent.destroy()

root = tk.Tk()
app = MainApplication(root)
root.mainloop()
def Destroy_subwindow(self, event):
    self.parent.destroy()
import tkinter as tk 

class MainApplication():
    def __init__(self, parent):
        # Main window
        self.parent = parent
        but1 = tk.Button(self.parent, text="Main window", width = 30, command=self.sub_window)
        but1.pack()
        self.parent.protocol("WM_DELETE_WINDOW", self.on_main_close)

    def sub_window(self):
        # Sub window(s)
        self.window = tk.Toplevel(self.parent)
        but2 = tk.Button(self.window, text="Sub window", width = 30)
        but2.pack()

    def on_main_close(self):
        print("Closing everything")
        self.parent.destroy()

root = tk.Tk()
app = MainApplication(root)
root.mainloop()

Gradient exploding problem in a graph neural network

copy iconCopydownload iconDownload
model = keras.models.Sequential([
                     keras.layers.Flatten(input_shape=[28, 28]),
                     keras.layers.BatchNormalization(),
                     keras.layers.Dense(300, activation="elu", 
                     kernel_initializer="he_normal"),
                     keras.layers.BatchNormalization(),
                     keras.layers.Dense(100, activation="elu", 
                     kernel_initializer="he_normal"),
                     keras.layers.BatchNormalization(),
                     keras.layers.Dense(10, activation="softmax")
         ])
 model = keras.models.Sequential([
                     keras.layers.Flatten(input_shape=[28, 28]),
                     keras.layers.BatchNormalization(),
                     keras.layers.Dense(300, kernel_initializer="he_normal", use_bias=False),
                     keras.layers.BatchNormalization(),
                     keras.layers.Activation("elu"),
                     keras.layers.Dense(100, kernel_initializer="he_normal", use_bias=False),
                     keras.layers.Activation("elu"),
                     keras.layers.BatchNormalization(),
                     keras.layers.Dense(10, activation="softmax")
         ])
model = keras.models.Sequential([
                     keras.layers.Flatten(input_shape=[28, 28]),
                     keras.layers.BatchNormalization(),
                     keras.layers.Dense(300, activation="elu", 
                     kernel_initializer="he_normal"),
                     keras.layers.BatchNormalization(),
                     keras.layers.Dense(100, activation="elu", 
                     kernel_initializer="he_normal"),
                     keras.layers.BatchNormalization(),
                     keras.layers.Dense(10, activation="softmax")
         ])
 model = keras.models.Sequential([
                     keras.layers.Flatten(input_shape=[28, 28]),
                     keras.layers.BatchNormalization(),
                     keras.layers.Dense(300, kernel_initializer="he_normal", use_bias=False),
                     keras.layers.BatchNormalization(),
                     keras.layers.Activation("elu"),
                     keras.layers.Dense(100, kernel_initializer="he_normal", use_bias=False),
                     keras.layers.Activation("elu"),
                     keras.layers.BatchNormalization(),
                     keras.layers.Dense(10, activation="softmax")
         ])
tf.debugging.check_numerics(layerN, "LayerN is producing nans!")
Traceback (most recent call last):
  File "trainer.py", line 506, in <module>
    worker.train_model()
  File "trainer.py", line 211, in train_model
    l, tmae = train_step(*batch)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 855, in _call
    return self._stateless_fn(*args, **kwds)  # pylint: disable=not-callable
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2943, in __call__
    filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1919, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 560, in call
    ctx=ctx)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError:  LayerN is producing nans! : Tensor had NaN values
layers.Dense(x, name="layer1", kernel_regularizer=regularizers.l2(1e-6), kernel_constraint=min_max_norm(min_value=1e-30, max_value=1.0))
tf.debugging.check_numerics(layerN, "LayerN is producing nans!")
Traceback (most recent call last):
  File "trainer.py", line 506, in <module>
    worker.train_model()
  File "trainer.py", line 211, in train_model
    l, tmae = train_step(*batch)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 855, in _call
    return self._stateless_fn(*args, **kwds)  # pylint: disable=not-callable
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2943, in __call__
    filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1919, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 560, in call
    ctx=ctx)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError:  LayerN is producing nans! : Tensor had NaN values
layers.Dense(x, name="layer1", kernel_regularizer=regularizers.l2(1e-6), kernel_constraint=min_max_norm(min_value=1e-30, max_value=1.0))
tf.debugging.check_numerics(layerN, "LayerN is producing nans!")
Traceback (most recent call last):
  File "trainer.py", line 506, in <module>
    worker.train_model()
  File "trainer.py", line 211, in train_model
    l, tmae = train_step(*batch)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 855, in _call
    return self._stateless_fn(*args, **kwds)  # pylint: disable=not-callable
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2943, in __call__
    filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1919, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 560, in call
    ctx=ctx)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError:  LayerN is producing nans! : Tensor had NaN values
layers.Dense(x, name="layer1", kernel_regularizer=regularizers.l2(1e-6), kernel_constraint=min_max_norm(min_value=1e-30, max_value=1.0))

Shiny app produces error: &quot;arguments imply differing number of rows: 0, 1&quot;

copy iconCopydownload iconDownload
textInput("MIPa", label = HTML("MIP-1\\(\\alpha\\)") ##used in ui

input$MIP_1a ##used in server.

    input_list <- reactive({
    input_list <- list( Age=as.numeric(input$Age),
                Gender=ifelse(input$Gender == "Female", 1, 0), 
                GCS_Bestin24=as.numeric(input$GCS_Bestin24),
             Premorbid_depression=as.numeric(input$Premorbid_depression),
                Antidep_first6m=as.numeric(input$Antidep_first6m),
                MIP_1a = as.numeric(input$MIPa),
                MIP_3a = as.numeric(input$MIP_3a),
                RANTES = as.numeric(input$RANTES),
                sIL_6R = as.numeric(input$sIL_6R),
                ITAC = as.numeric(input$ITAC),
                IL_21 = as.numeric(input$IL_21),
                Fractalkine = as.numeric(input$Fractalkine),
                TNF_a = as.numeric(input$TNF_a),
                IL_1b = as.numeric(input$ILb),
                IL_7 = as.numeric(input$IL_7),
                IL_10 = as.numeric(input$IL0),
                GM_CSF = as.numeric(input$GM_CSF),
                MIP_1b = as.numeric(input$MIPb))
    input_list
  })
##input changed to input_list()
##this needs to be done for each instance where input is used on its own.
geom_vline(xintercept = pred_prob_func(input_list())$pred_prob)  
ui <- fluidPage(
                uiOutput('int_plot') #changed from plotlyOutput('int_plot')
               )

server <- (
output$int_plot <- renderUI({
    if(anyNA(input_list())){
     helpText("Input User data")
    }else{
      phats_graph$`Total Percentile` <- unlist(t(sapply(phats_graph$Phat, function(x) quantile_fun(value=x)))[,1])
      phats_graph$`PTD Percentile` <- unlist(t(sapply(phats_graph$Phat, function(x) quantile_fun(value=x)))[,2])
      phats_graph$`No PTD Percentile` <- unlist(t(sapply(phats_graph$Phat, function(x) quantile_fun(value=x)))[,3])
      
      
      
      int_plot <- ggplot(phats_graph, aes(ptd_per=`No PTD Percentile`)) + geom_density(aes(x=Phat, fill = `PTD Event Status`), alpha=0.5) +
        geom_vline(xintercept = input$thresholdslider, linetype = 'dashed') +
        geom_vline(xintercept = pred_prob_func(input_list())$pred_prob) +
        xlab('Threshold Percentage') + ylab('Density') +
        theme_minimal() + scale_fill_manual(values=c("#5D3A9B", "#E66100"), name="")
      
      
      renderPlotly( ggplotly(int_plot, tooltip=c("x", "ptd_per")))
    }


  })
)
textInput("MIPa", label = HTML("MIP-1\\(\\alpha\\)") ##used in ui

input$MIP_1a ##used in server.

    input_list <- reactive({
    input_list <- list( Age=as.numeric(input$Age),
                Gender=ifelse(input$Gender == "Female", 1, 0), 
                GCS_Bestin24=as.numeric(input$GCS_Bestin24),
             Premorbid_depression=as.numeric(input$Premorbid_depression),
                Antidep_first6m=as.numeric(input$Antidep_first6m),
                MIP_1a = as.numeric(input$MIPa),
                MIP_3a = as.numeric(input$MIP_3a),
                RANTES = as.numeric(input$RANTES),
                sIL_6R = as.numeric(input$sIL_6R),
                ITAC = as.numeric(input$ITAC),
                IL_21 = as.numeric(input$IL_21),
                Fractalkine = as.numeric(input$Fractalkine),
                TNF_a = as.numeric(input$TNF_a),
                IL_1b = as.numeric(input$ILb),
                IL_7 = as.numeric(input$IL_7),
                IL_10 = as.numeric(input$IL0),
                GM_CSF = as.numeric(input$GM_CSF),
                MIP_1b = as.numeric(input$MIPb))
    input_list
  })
##input changed to input_list()
##this needs to be done for each instance where input is used on its own.
geom_vline(xintercept = pred_prob_func(input_list())$pred_prob)  
ui <- fluidPage(
                uiOutput('int_plot') #changed from plotlyOutput('int_plot')
               )

server <- (
output$int_plot <- renderUI({
    if(anyNA(input_list())){
     helpText("Input User data")
    }else{
      phats_graph$`Total Percentile` <- unlist(t(sapply(phats_graph$Phat, function(x) quantile_fun(value=x)))[,1])
      phats_graph$`PTD Percentile` <- unlist(t(sapply(phats_graph$Phat, function(x) quantile_fun(value=x)))[,2])
      phats_graph$`No PTD Percentile` <- unlist(t(sapply(phats_graph$Phat, function(x) quantile_fun(value=x)))[,3])
      
      
      
      int_plot <- ggplot(phats_graph, aes(ptd_per=`No PTD Percentile`)) + geom_density(aes(x=Phat, fill = `PTD Event Status`), alpha=0.5) +
        geom_vline(xintercept = input$thresholdslider, linetype = 'dashed') +
        geom_vline(xintercept = pred_prob_func(input_list())$pred_prob) +
        xlab('Threshold Percentage') + ylab('Density') +
        theme_minimal() + scale_fill_manual(values=c("#5D3A9B", "#E66100"), name="")
      
      
      renderPlotly( ggplotly(int_plot, tooltip=c("x", "ptd_per")))
    }


  })
)
textInput("MIPa", label = HTML("MIP-1\\(\\alpha\\)") ##used in ui

input$MIP_1a ##used in server.

    input_list <- reactive({
    input_list <- list( Age=as.numeric(input$Age),
                Gender=ifelse(input$Gender == "Female", 1, 0), 
                GCS_Bestin24=as.numeric(input$GCS_Bestin24),
             Premorbid_depression=as.numeric(input$Premorbid_depression),
                Antidep_first6m=as.numeric(input$Antidep_first6m),
                MIP_1a = as.numeric(input$MIPa),
                MIP_3a = as.numeric(input$MIP_3a),
                RANTES = as.numeric(input$RANTES),
                sIL_6R = as.numeric(input$sIL_6R),
                ITAC = as.numeric(input$ITAC),
                IL_21 = as.numeric(input$IL_21),
                Fractalkine = as.numeric(input$Fractalkine),
                TNF_a = as.numeric(input$TNF_a),
                IL_1b = as.numeric(input$ILb),
                IL_7 = as.numeric(input$IL_7),
                IL_10 = as.numeric(input$IL0),
                GM_CSF = as.numeric(input$GM_CSF),
                MIP_1b = as.numeric(input$MIPb))
    input_list
  })
##input changed to input_list()
##this needs to be done for each instance where input is used on its own.
geom_vline(xintercept = pred_prob_func(input_list())$pred_prob)  
ui <- fluidPage(
                uiOutput('int_plot') #changed from plotlyOutput('int_plot')
               )

server <- (
output$int_plot <- renderUI({
    if(anyNA(input_list())){
     helpText("Input User data")
    }else{
      phats_graph$`Total Percentile` <- unlist(t(sapply(phats_graph$Phat, function(x) quantile_fun(value=x)))[,1])
      phats_graph$`PTD Percentile` <- unlist(t(sapply(phats_graph$Phat, function(x) quantile_fun(value=x)))[,2])
      phats_graph$`No PTD Percentile` <- unlist(t(sapply(phats_graph$Phat, function(x) quantile_fun(value=x)))[,3])
      
      
      
      int_plot <- ggplot(phats_graph, aes(ptd_per=`No PTD Percentile`)) + geom_density(aes(x=Phat, fill = `PTD Event Status`), alpha=0.5) +
        geom_vline(xintercept = input$thresholdslider, linetype = 'dashed') +
        geom_vline(xintercept = pred_prob_func(input_list())$pred_prob) +
        xlab('Threshold Percentage') + ylab('Density') +
        theme_minimal() + scale_fill_manual(values=c("#5D3A9B", "#E66100"), name="")
      
      
      renderPlotly( ggplotly(int_plot, tooltip=c("x", "ptd_per")))
    }


  })
)
textInput("MIPa", label = HTML("MIP-1\\(\\alpha\\)") ##used in ui

input$MIP_1a ##used in server.

    input_list <- reactive({
    input_list <- list( Age=as.numeric(input$Age),
                Gender=ifelse(input$Gender == "Female", 1, 0), 
                GCS_Bestin24=as.numeric(input$GCS_Bestin24),
             Premorbid_depression=as.numeric(input$Premorbid_depression),
                Antidep_first6m=as.numeric(input$Antidep_first6m),
                MIP_1a = as.numeric(input$MIPa),
                MIP_3a = as.numeric(input$MIP_3a),
                RANTES = as.numeric(input$RANTES),
                sIL_6R = as.numeric(input$sIL_6R),
                ITAC = as.numeric(input$ITAC),
                IL_21 = as.numeric(input$IL_21),
                Fractalkine = as.numeric(input$Fractalkine),
                TNF_a = as.numeric(input$TNF_a),
                IL_1b = as.numeric(input$ILb),
                IL_7 = as.numeric(input$IL_7),
                IL_10 = as.numeric(input$IL0),
                GM_CSF = as.numeric(input$GM_CSF),
                MIP_1b = as.numeric(input$MIPb))
    input_list
  })
##input changed to input_list()
##this needs to be done for each instance where input is used on its own.
geom_vline(xintercept = pred_prob_func(input_list())$pred_prob)  
ui <- fluidPage(
                uiOutput('int_plot') #changed from plotlyOutput('int_plot')
               )

server <- (
output$int_plot <- renderUI({
    if(anyNA(input_list())){
     helpText("Input User data")
    }else{
      phats_graph$`Total Percentile` <- unlist(t(sapply(phats_graph$Phat, function(x) quantile_fun(value=x)))[,1])
      phats_graph$`PTD Percentile` <- unlist(t(sapply(phats_graph$Phat, function(x) quantile_fun(value=x)))[,2])
      phats_graph$`No PTD Percentile` <- unlist(t(sapply(phats_graph$Phat, function(x) quantile_fun(value=x)))[,3])
      
      
      
      int_plot <- ggplot(phats_graph, aes(ptd_per=`No PTD Percentile`)) + geom_density(aes(x=Phat, fill = `PTD Event Status`), alpha=0.5) +
        geom_vline(xintercept = input$thresholdslider, linetype = 'dashed') +
        geom_vline(xintercept = pred_prob_func(input_list())$pred_prob) +
        xlab('Threshold Percentage') + ylab('Density') +
        theme_minimal() + scale_fill_manual(values=c("#5D3A9B", "#E66100"), name="")
      
      
      renderPlotly( ggplotly(int_plot, tooltip=c("x", "ptd_per")))
    }


  })
)

How to fit polynomial model of all predictors using &quot;.&quot; in R?

copy iconCopydownload iconDownload
get_formula <- function(resp) {
  reformulate(
    sapply(setdiff(names(train_data), resp), function(x) paste0("poly(", x, ", 2)")),
    response = resp
  )
}


model <- get_formula("type")
model
# type ~ poly(npreg, 2) + poly(glu, 2) + poly(bp, 2) + poly(skin, 
#     2) + poly(bmi, 2) + poly(ped, 2) + poly(age, 2)
glm(model, data=train_data, family=binomial)

How to input matrix data into brms formula?

copy iconCopydownload iconDownload
library(tidyverse)
signalregression.brms = brm(Density ~
                              s(cbind(d0_10_bin, d0_100_bin, d0_1000_bin, d0_10000_bin),
                                by = cbind(d0_10, d0_100, d0_1000, d0_10000),
                                k = 3),
                            data = Data %>%
                              mutate(d0_10_bin = 10,
                                     d0_100_bin = 100,
                                     d0_1000_bin = 1000,
                                     d0_10000_bin = 10000))
map_chr(unname(unlist(pacman::p_depends(brms)[c("Depends", "Imports")])), ~ paste(., ": ", pacman::p_version(.), sep = ""))
 [1] "Rcpp: 1.0.6"           "methods: 4.0.3"        "rstan: 2.21.2"         "ggplot2: 3.3.3"       
 [5] "loo: 2.4.1"            "Matrix: 1.2.18"        "mgcv: 1.8.33"          "rstantools: 2.1.1"    
 [9] "bayesplot: 1.8.0"      "shinystan: 2.5.0"      "projpred: 2.0.2"       "bridgesampling: 1.1.2"
[13] "glue: 1.4.2"           "future: 1.21.0"        "matrixStats: 0.58.0"   "nleqslv: 3.3.2"       
[17] "nlme: 3.1.149"         "coda: 0.19.4"          "abind: 1.4.5"          "stats: 4.0.3"         
[21] "utils: 4.0.3"          "parallel: 4.0.3"       "grDevices: 4.0.3"      "backports: 1.2.1"
library(tidyverse)
signalregression.brms = brm(Density ~
                              s(cbind(d0_10_bin, d0_100_bin, d0_1000_bin, d0_10000_bin),
                                by = cbind(d0_10, d0_100, d0_1000, d0_10000),
                                k = 3),
                            data = Data %>%
                              mutate(d0_10_bin = 10,
                                     d0_100_bin = 100,
                                     d0_1000_bin = 1000,
                                     d0_10000_bin = 10000))
map_chr(unname(unlist(pacman::p_depends(brms)[c("Depends", "Imports")])), ~ paste(., ": ", pacman::p_version(.), sep = ""))
 [1] "Rcpp: 1.0.6"           "methods: 4.0.3"        "rstan: 2.21.2"         "ggplot2: 3.3.3"       
 [5] "loo: 2.4.1"            "Matrix: 1.2.18"        "mgcv: 1.8.33"          "rstantools: 2.1.1"    
 [9] "bayesplot: 1.8.0"      "shinystan: 2.5.0"      "projpred: 2.0.2"       "bridgesampling: 1.1.2"
[13] "glue: 1.4.2"           "future: 1.21.0"        "matrixStats: 0.58.0"   "nleqslv: 3.3.2"       
[17] "nlme: 3.1.149"         "coda: 0.19.4"          "abind: 1.4.5"          "stats: 4.0.3"         
[21] "utils: 4.0.3"          "parallel: 4.0.3"       "grDevices: 4.0.3"      "backports: 1.2.1"

Looping linear regression output in a data frame in r

copy iconCopydownload iconDownload
library(tidyverse)

data %>%
  group_by(country, Area) %>%
  nest() %>%
  mutate(model = map(data, ~ lm(amount ~ week, data = .x)), 
         result = map2(model, data, ~data.frame(predict(.x, newdata = .y,
                       interval = "prediction",level = 0.95)))) %>%
  ungroup %>%
  select(-model) %>%
  unnest(c(data, result)) 

#  country Area   week amount   fit     lwr   upr
#   <chr>   <chr> <dbl>  <dbl> <dbl>   <dbl> <dbl>
# 1 US      G         1     12  20.8 -27.7    69.3
# 2 US      G         2     23  21.7 -22.0    65.4
# 3 US      G         3     34  22.6 -19.4    64.6
# 4 US      G         4     32  23.5 -20.2    67.2
# 5 US      G         5     12  24.4 -24.1    72.9
# 6 US      I         1     12  20.8 -33.9    75.5
# 7 US      I         2     34  30.5 -18.8    79.8
# 8 US      I         3     45  40.2  -7.17   87.6
# 9 US      I         4     65  49.9   0.595  99.2
#10 US      I         5     45  59.6   4.90  114. 
#11 UK      A         1     45  36.6  -6.05   79.2
#12 UK      A         2     34  37.1  -1.34   75.5
#13 UK      A         3     23  37.6   0.667  74.5
#14 UK      A         4     43  38.1  -0.341  76.5
#15 UK      A         5     43  38.6  -4.05   81.2
library(purrr)
library(broom)

data %>%
  group_by(country, Area) %>%
  nest() %>%
  mutate(models = map(data, ~ lm(amount ~ week, data = .)), 
         aug = map(models, ~ augment(.x, interval = "prediction"))) %>%
  unnest(aug) %>%
  select(country, Area, amount, week, .fitted, .lower, .upper)

# A tibble: 15 x 7
# Groups:   country, Area [3]
   country Area  amount  week .fitted  .lower .upper
   <chr>   <chr>  <dbl> <dbl>   <dbl>   <dbl>  <dbl>
 1 US      G         12     1    20.8 -27.7     69.3
 2 US      G         23     2    21.7 -22.0     65.4
 3 US      G         34     3    22.6 -19.4     64.6
 4 US      G         32     4    23.5 -20.2     67.2
 5 US      G         12     5    24.4 -24.1     72.9
 6 US      I         12     1    20.8 -33.9     75.5
 7 US      I         34     2    30.5 -18.8     79.8
 8 US      I         45     3    40.2  -7.17    87.6
 9 US      I         65     4    49.9   0.595   99.2
10 US      I         45     5    59.6   4.90   114. 
11 UK      A         45     1    36.6  -6.05    79.2
12 UK      A         34     2    37.1  -1.34    75.5
13 UK      A         23     3    37.6   0.667   74.5
14 UK      A         43     4    38.1  -0.341   76.5
15 UK      A         43     5    38.6  -4.05    81.2
data <- data.frame(country = c("US","US","US","US","US","US","US","US","US","US","UK","UK","UK","UK","UK"),
                   Area = c("G","G","G","G","G","I","I","I","I","I","A","A","A","A","A"),
                   week = c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5),amount = c(12,23,34,32,12,12,34,45,65,45,45,34,23,43,43))

splitVar <- paste0(data$country,"-",data$Area)
dfList <- split(data,splitVar)
result <- do.call(rbind,lapply(dfList,function(x){
     model <- lm(amount ~ week, data = x)
     cbind(x,predict(model,newdata = x,interval = "prediction",level = 0.95))
}))
result
        country Area week amount  fit         lwr       upr
UK-A.11      UK    A    1     45 36.6  -6.0463638  79.24636
UK-A.12      UK    A    2     34 37.1  -1.3409128  75.54091
UK-A.13      UK    A    3     23 37.6   0.6671656  74.53283
UK-A.14      UK    A    4     43 38.1  -0.3409128  76.54091
UK-A.15      UK    A    5     43 38.6  -4.0463638  81.24636
US-G.1       US    G    1     12 20.8 -27.6791493  69.27915
US-G.2       US    G    2     23 21.7 -21.9985147  65.39851
US-G.3       US    G    3     34 22.6 -19.3841749  64.58417
US-G.4       US    G    4     32 23.5 -20.1985147  67.19851
US-G.5       US    G    5     12 24.4 -24.0791493  72.87915
US-I.6       US    I    1     12 20.8 -33.8985900  75.49859
US-I.7       US    I    2     34 30.5 -18.8046427  79.80464
US-I.8       US    I    3     45 40.2  -7.1703685  87.57037
US-I.9       US    I    4     65 49.9   0.5953573  99.20464
US-I.10      US    I    5     45 59.6   4.9014100 114.29859
data <- data.frame(country = c("US","US","US","US","US","US","US","US","US","US","UK","UK","UK","UK","UK"),
                   Area = c("G","G","G","G","G","I","I","I","I","I","A","A","A","A","A"),
                   week = c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5),amount = c(12,23,34,32,12,12,34,45,65,45,45,34,23,43,43))

splitVar <- paste0(data$country,"-",data$Area)
dfList <- split(data,splitVar)
result <- do.call(rbind,lapply(dfList,function(x){
     model <- lm(amount ~ week, data = x)
     cbind(x,predict(model,newdata = x,interval = "prediction",level = 0.95))
}))
result
        country Area week amount  fit         lwr       upr
UK-A.11      UK    A    1     45 36.6  -6.0463638  79.24636
UK-A.12      UK    A    2     34 37.1  -1.3409128  75.54091
UK-A.13      UK    A    3     23 37.6   0.6671656  74.53283
UK-A.14      UK    A    4     43 38.1  -0.3409128  76.54091
UK-A.15      UK    A    5     43 38.6  -4.0463638  81.24636
US-G.1       US    G    1     12 20.8 -27.6791493  69.27915
US-G.2       US    G    2     23 21.7 -21.9985147  65.39851
US-G.3       US    G    3     34 22.6 -19.3841749  64.58417
US-G.4       US    G    4     32 23.5 -20.1985147  67.19851
US-G.5       US    G    5     12 24.4 -24.0791493  72.87915
US-I.6       US    I    1     12 20.8 -33.8985900  75.49859
US-I.7       US    I    2     34 30.5 -18.8046427  79.80464
US-I.8       US    I    3     45 40.2  -7.1703685  87.57037
US-I.9       US    I    4     65 49.9   0.5953573  99.20464
US-I.10      US    I    5     45 59.6   4.9014100 114.29859
library(tidyverse)

data %>% 
  mutate(CountryArea=paste0(country,Area) %>% factor %>% fct_inorder) %>% 
  split(.$CountryArea) %>% 
  map(~lm(amount~week, data=.)) %>% 
  map(predict, interval = "prediction",level = 0.95) %>% 
  reduce(rbind) %>% 
  cbind(data, .)

   country Area week amount  fit         lwr       upr
1       US    G    1     12 20.8 -27.6791493  69.27915
2       US    G    2     23 21.7 -21.9985147  65.39851
3       US    G    3     34 22.6 -19.3841749  64.58417
4       US    G    4     32 23.5 -20.1985147  67.19851
5       US    G    5     12 24.4 -24.0791493  72.87915
6       US    I    1     12 20.8 -33.8985900  75.49859
7       US    I    2     34 30.5 -18.8046427  79.80464
8       US    I    3     45 40.2  -7.1703685  87.57037
9       US    I    4     65 49.9   0.5953573  99.20464
10      US    I    5     45 59.6   4.9014100 114.29859
11      UK    A    1     45 36.6  -6.0463638  79.24636
12      UK    A    2     34 37.1  -1.3409128  75.54091
13      UK    A    3     23 37.6   0.6671656  74.53283
14      UK    A    4     43 38.1  -0.3409128  76.54091
15      UK    A    5     43 38.6  -4.0463638  81.24636

See all related Code Snippets

Community Discussions

Trending Discussions on regression
  • R two regressions from one table
  • ggplot2: Projecting points or distribution on a non-orthogonal (eg, -45 degree) axis
  • Azure Auto ML JobConfigurationMaxSizeExceeded error when using a cluster
  • Why does summary() show different standard errors than coeftest()?
  • Running single linear regressions across multiple variables, in groups
  • logistic regression and GridSearchCV using python sklearn
  • Closing subwindow with parent window on tkinter
  • How are memory_order_seq_cst fences useful anymore in C++20?
  • Gradient exploding problem in a graph neural network
  • Shiny app produces error: &quot;arguments imply differing number of rows: 0, 1&quot;
Trending Discussions on regression

QUESTION

R two regressions from one table

Asked 2022-Mar-19 at 16:01

I am trying to plot two different regression lines (with the formula: salary = beta0 + beta1D3 + beta2spending + beta3*(spending*D3) + w) into one scatter plot by deviding the data I have into two subsets as seen in the following code:

salary = data$salary
spending = data$spending
D1 = data$North
D2 = data$South
D3 = data$West

subsetWest = subset(data, D3 == 1)
subsetRest = subset(data, D3 == 0)

abab = lm(salary ~ 1 + spending + 1*spending, data=subsetWest) #red line
caca = lm(salary ~ 0 + spending + 0*spending, data=subsetRest) #blue line


plot(spending,salary)

points(subsetWest$spending, subsetWest$salary, pch=25, col = "red")
points(subsetRest$spending, subsetRest$salary, pch=10, col = "blue")

abline(abab, col = "red")
abline(caca, col = "blue")

This is a sample of my data table: [enter image description here][1] [1]: https://i.stack.imgur.com/LowYo.png

And this is the plot I get when running the code:

[enter image description here][2] [2]: https://i.stack.imgur.com/It8ai.png

My problem is that the intercept for my second regression is wrong, in fact I do not even get an intercept when looking at the summary, unlike with the first regression.

Does anybody see where my problem is or does anybody know an alternative way of plotting the two regression lines?

Help would be much appreciated. Thank you very much!

This is the whole table:

structure(list(salary = c(39166L, 40526L, 40650L, 53600L, 58940L, 
53220L, 61356L, 54340L, 51706L, 49000L, 48548L, 54340L, 60336L, 
53050L, 54720L, 43380L, 43948L, 41632L, 36190L, 41878L, 45288L, 
49248L, 54372L, 67980L, 46764L, 41254L, 45590L, 43140L, 44160L, 
44500L, 41880L, 43600L, 45868L, 36886L, 39076L, 40920L, 42838L, 
50320L, 44964L, 41938L, 54448L, 51784L, 45288L, 49280L, 44682L, 
51220L, 52030L, 51576L, 58264L, 51690L), spending = c(6692L, 
6228L, 7108L, 9284L, 9338L, 9776L, 11420L, 11072L, 8336L, 7094L, 
6318L, 7242L, 7564L, 8494L, 7964L, 7136L, 6310L, 6118L, 5934L, 
6570L, 7828L, 9034L, 8698L, 10040L, 7188L, 5642L, 6732L, 5840L, 
5960L, 7462L, 5706L, 5066L, 5458L, 4610L, 5284L, 6248L, 5504L, 
6858L, 7894L, 5018L, 10880L, 8084L, 6804L, 5658L, 4594L, 5864L, 
7410L, 8246L, 7216L, 7532L), North = c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), South = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L), West = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, 
-50L))

ANSWER

Answered 2022-Mar-19 at 14:50

My problem is that the intercept for my second regression is wrong, in fact I do not even get an intercept when looking at the summary, unlike with the first regression.

That is because your second model specifies no intercept, since you use ... ~ 0 + ...

Also, your first model doesn't make sense because it includes spending twice. The second entry for spending will be ignored by lm

Source https://stackoverflow.com/questions/71539104

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install regression

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

DOWNLOAD this Library from

Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from
over 430 million Knowledge Items
Find more libraries
Reuse Solution Kits and Libraries Curated by Popular Use Cases
Explore Kits

Save this library and start creating your kit

Explore Related Topics

Share this Page

share link
Reuse Pre-built Kits with regression
Consider Popular Testing Libraries
Try Top Libraries by sajari
Compare Testing Libraries with Highest Support
Compare Testing Libraries with Highest Quality
Compare Testing Libraries with Highest Security
Compare Testing Libraries with Permissive License
Compare Testing Libraries with Highest Reuse
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from
over 430 million Knowledge Items
Find more libraries
Reuse Solution Kits and Libraries Curated by Popular Use Cases
Explore Kits

Save this library and start creating your kit

  • © 2022 Open Weaver Inc.