for simplicity's sake, this was added to the extlib/irclib rather
than subclassing. because i'm lazy. anyway, check that flag instead
of doing the event._target = None hack, since that hack was breaking
Markov.
for an unrelated reason (what to learn and not learn), update Markov
also remove an unused method that was getting in my way while coding this
this should result in no chains having a null context --- if no pre-existing
context is created, one is created for the channel/nick and used. this makes,
for example, arbitrary queries "private" to that nick (again unless that has
been overridden). shouldn't affect much of anything, but adding this made
the context-less learning code obsolete, which is fine since it was never used
anyway
the module will drop your old tables if you have them, so if there's data there,
be sure to back them up and figure out some migration strategy (probably annoying
and probably having to script it).
the big change is that each line is associated to a context now, and channels
are also associated to contexts. this should allow for a better partitioning
of multiple brains, and changing which channels point to which brain.
also caught in the wake is some additional logging verbosity, and a change to
no longer lower() everything learned.
the script to dump a file into the database has also been updated with the above
changes
this only makes sense if we have a target word set, which we usually do.
start with the target word and go backwords, finding k2s that lead to it
(and that lead to that k2, and so on) until we get to the start-of-chain
value, when we know we're done working backwards. then resume the normal
appending logic
probably needs some work, probably a bit slow on huge databases. analysis
pending, but this appears to work
check interval is every 10 minutes, rows in markov_chatter_target
have a 1 in chance chance of leading to a line being generated,
every 10 minutes. (so an interval of 144 = 10 min * 6 * 24 = one line
per day, on average)
Markov, Twitter: switch to forking a thread ourselves, and check every
second whether or not to quit. this is the "better" part above, as
now we can instantly quit the thread rather than waiting for all
the timers to fire and expire
track all lines seen and all lines said by Markov. every 30 seconds,
if there have been more than 20 such lines, and Markov is responsible
for roughly half of them, then shut up for 30 seconds, because the
bot probably got stuck talking to another bot.
this should mean that such a reply infinite loop can't happen for
more than a minute.
i'm not entirely sure on the 30 sec/20 lines ratio. this may need
tuning.
when finding a key for (__start1,__start2), instead of fetcihng all
(which can be a lot, in chatty channels and/or over time), get the
max ID in the table, pick a random ID between 1,max, and pick the
first id >= to it, and use that. just as random, nowhere near as
intensive.
a context is a meta-classification ('banter, 'secrets', whatever)
based on targets (channels or nicknames). when a line is being
learned from a known target, the chains are placed in that context.
this is for allowing one brain to have multiple personalities, in
a sense, for large networks or cases where there may be a more
sanitized set of channels and a couple channels where everyone lets
it rip. a later enhancement would have sentence creation choose from
context-less chains (and contexts matching the current target), but
i need to go back to the drawing board on that one a bit.
ramble ramble ramble
e.g. if i say 'dr_botzo: hello dude', he only learns 'hello dude'.
this is mainly being done because the bot's name being in the brain
so many times was getting kind of silly, especially in channels that
have lots of conversations with the bot
somehow a chain led us down a path where there are no values for
the keys in the chain. if that happens, just abort.
i'm not quite sure how this could happen
this eliminates the expensive database hit on every request for a line.
the cache is loaded when the module loads and learning new lines should
add the appropriate word to the list. seemed like a pretty good compromise
this keeps us from having the entire markov chain in memory and
having to do the pickling and so on. in many ways, this is a good
thing.
in one way, this is a bad thing. each line on irc will create a
__start1,__start2 item in the database, which means starting a
chain will be an expensive process. (approx 3 seconds, from irc
logs of 600,000 K lines). following selects run much faster, but
the first one is dog slow. a later commit should hopefully fix this.
more of a moving of the code, actually, it now exists in (an overridden)
_handle_event, so that recursions happen against irc events directly,
rather than an already partially interpreted object.
with this change, modules don't need to implement do() nor do we have a
need for the internal_bus, which was doing an additional walk of the
modules after the irc event was already handled and turned into text. now
the core event handler does the recursion scans.
to support this, we bring back the old replypath trick and use it again,
so we know when to send a privmsg reply and when to return text so that
it may be chained in recursion. this feels old hat by now, but if you
haven't been following along, you should really look at the diff.
that's the meat of the change. the rest is updating modules to use
self.reply() and reimplementing (un)register_handlers where appropriate
if the end of a chain has been reached via __end, but min_size
has not been satisfied, discard the last couple elements in the
chain and try again. use min_search_tries so we don't do this
forever.
note that this isn't guaranteed, if the chain is such that the
current tuple has nowhere to go but to the end of the line, then
it will follow it --- it doesn't try to go back and rebuilt a different
chain or anything.
yeah, we have MegaHAL, but i can't find a good implementation in
python that actually works and is stable, so we'll implement a
simple thing ourselves. works pretty much like MegaHAL does, but
without the string corruption.
original code provided by ape, care of mike bloy