Finite Mixture of Logistic regression in Data Stream

Hi All,

While working on a project analyzing advertising clicks, we just put together a [R] S4 class to FIT, in a data stream (e.g., row-by-row) a finite mixture of logistic regression models. It can be found under Downloads. Here is an example of it’s use: the code below generates a dataset and plots it. In the final lines we run through the dataset row by row, and fit models with 1, 2, and 3 clusters:


## Usage examples:
source("onlineMixtureLogistic.R")
library(lattice)

# Create a dataset:
set.seed(12345)

n <- 10e4			# Number of subjects
k <- 2				# Number of predictors (including intercept)
j <- 2				# Number of clusters
pj <- c(.3, .7) 	# Cluster probabilties

betas <- matrix( 
			c(
				c(3 , -2.5), 
				c(-2, 5)
			), nrow=j, byrow=TRUE)

X <- matrix(c(rep(1,n),runif((k-1) * n,-5,5)), ncol=k)
cluster <- sample(1:j, n, TRUE, pj)
y <- gen.mixture(X, betas, cluster)

# Plot the dataset:
library(lattice)
xyplot(jitter(y) ~ X[,2], groups=cluster)

# Inspect the elements:
betas
table(cluster) / sum(table(cluster))


# Instantiate object (predictors, clusters)
oLM1 <- OnlineLogMixture(k,1)
oLM2 <- OnlineLogMixture(k,2)
oLM3 <- OnlineLogMixture(k,3)

for(i in 1:nrow(X)){
	oLM1 <- add.observation(oLM1, y[i], X[i,])
	oLM2 <- add.observation(oLM2, y[i], X[i,])
	oLM3 <- add.observation(oLM3, y[i], X[i,])
}

summary(oLM1)
summary(oLM2)
summary(oLM3)