the bot will generate acronyms of 3-8 characters in length. it's your
job to find the "best" phrase fitting the acronym. work in progress,
but this is still playable
the module will drop your old tables if you have them, so if there's data there,
be sure to back them up and figure out some migration strategy (probably annoying
and probably having to script it).
the big change is that each line is associated to a context now, and channels
are also associated to contexts. this should allow for a better partitioning
of multiple brains, and changing which channels point to which brain.
also caught in the wake is some additional logging verbosity, and a change to
no longer lower() everything learned.
the script to dump a file into the database has also been updated with the above
changes
this only makes sense if we have a target word set, which we usually do.
start with the target word and go backwords, finding k2s that lead to it
(and that lead to that k2, and so on) until we get to the start-of-chain
value, when we know we're done working backwards. then resume the normal
appending logic
probably needs some work, probably a bit slow on huge databases. analysis
pending, but this appears to work
check interval is every 10 minutes, rows in markov_chatter_target
have a 1 in chance chance of leading to a line being generated,
every 10 minutes. (so an interval of 144 = 10 min * 6 * 24 = one line
per day, on average)
Markov, Twitter: switch to forking a thread ourselves, and check every
second whether or not to quit. this is the "better" part above, as
now we can instantly quit the thread rather than waiting for all
the timers to fire and expire
track all lines seen and all lines said by Markov. every 30 seconds,
if there have been more than 20 such lines, and Markov is responsible
for roughly half of them, then shut up for 30 seconds, because the
bot probably got stuck talking to another bot.
this should mean that such a reply infinite loop can't happen for
more than a minute.
i'm not entirely sure on the 30 sec/20 lines ratio. this may need
tuning.
when finding a key for (__start1,__start2), instead of fetcihng all
(which can be a lot, in chatty channels and/or over time), get the
max ID in the table, pick a random ID between 1,max, and pick the
first id >= to it, and use that. just as random, nowhere near as
intensive.