Once upon a time, you would create an e-mail account and use it for a
long time without receiving spam. In fact, whenever you received your
first spam message, you'd know exactly who to blame: that one cousin of
yours who'd send you every single motivational powerpoint she came
across, along with a list of 1500 other e-mail addresses. We could argue
about who's the spammer in this situation, but that discussion will have
to wait.
That kind of control over your account is no longer possible: even if
you never share your account with anyone, you will at some point get
spam. It's just the way things are, the "background radiation" of the
internet. Luckily for us, things got so bad that a lot of smart people
sat down to think really hard about this, and came up with Bayesian
filtering, a
technique so effective that most of us don't even bother checking our
Spam folders anymore.
So we^1 succeeded once. It's a good thing to remember, because
we have a much harder battle to fight now: trolling, and it's ugly
cousin, online harrasment.
Let's say you post a message on an online board. These are some of the
things that could happen, in no particular order:
- You could get an interesting, well thought reply (note that "well
thought" doesn't mean "agrees with you"). It happens.
- You could be modded down by people that disagree with what you just
posted, even if the rules say they shouldn't.
- You could be flooded by negative messages, because a certain group
decided to impose their point of view. This is called brigading, by
the way, and it's usually not personal - they oppose your point of
view, but not you.
- You could be flooded by negative messages, because a group has
decided to target you online for something you said, or did, or
are.
- You could be posting in behalf of a company, in order to speak in
favor of your products posting as anyone-but-an-employee. This is
called being a shill, and
most websites either pretend that it doesn't happen or they don't
care.
- You could be trying to derail a discussion, in order to make sure a
certain point is not brought to light, or is drowned in the noise.
This usually implies that you work for a government agency, it's
being done right
now,
and it works.
We used to believe that everyone on the internet would eventually behave
nicely, and that we could build our services based on trusting the 95%
of users that have no hidden agenda. This is sadly not so, because
- ... people have not behaved nicely on the Internet since September
1993.
- ... 5% of very loud users are a lot more noticeable than 95% of the
quiet ones. A post-mortem of a DARPA Challenge showed that a single
person can sabotage the work of thousands of well-meaning
volunteers.
In the follow-up articles I'm going to comment on what I perceive to be
three main points in which this issue could be attacked. They are
- Anonymity: there's no way of taking measures against a person,
only against a user. This is by design, and I'm not arguing that
we should get rid of anonymity. We should instead focus on
identifying toxic users, which I think can be done implementing user
groups.
- Flamewars: derailing discussions in order to kill them. This may
be a job for pattern matching, identifying when the shape of a
discussion is tending towards known anti-patterns. We might also
want to add clustering, in order to identify brigades.
- Harrassment: perhaps the harder one, requires sentiment analysis
techniques to identify negative comments and kill them before they
reach their destination.
In the follow-up essays I'll present some papers about how one would go
about attacking each point. I have no reason to believe that this
techniques are unknown (some of them are already implemented), but I
post them hoping that, much like Bayesian filtering, someone will read
them and have an "oh, wait" moment).
Coming up next: anonymous users and user groups.
Footnotes
^1 Of course, by "we" I mean "the computer science community in
general". I did not create Bayesian filtering.
I'm the proud owner of a Genius MousePen i608 graphic tablet (also known
as UC-LOGIC Tablet WP8060U
). This tablet is quite old and cheap, which
is more often than not a recipe for headaches.
One very specific problem that I have: my tablet has an aspect ratio of
4:3, like old computers did, but both my desktop and laptop's screens
have an aspect ratio of 16:9. Why is this a problem? Because my computer
believes that the tablet and screen have the same aspect ratio, and
whenever I draw a circle on my tablet it comes up on screen as an oval.
There are two possible solutions to this issue. One is changing my
screen's resolution to match the 4:3 aspect ratio, which is annoying: I
have to change the screen settings, then fiddle with my actual, physical
screen so it doesn't stretch the image, and then I have to undo both
steps once I'm done. The second solution requires a bit more
calculations, but it's the right way: we'll configure the tablet in
such a way that Linux recognizes the difference in ratios.
To be more precise: We will define a rectangle with the same height as
the screen and a proportional width (sticking to the 4:3 ratio between
width and height), we will position that rectangle in the center of the
screen, and all movements in our tablet will only apply to that section
of the screen. All movements on the tablet will translate to this
rectangle without distortion, and if we need to interact with the screen
outside this area we can still use our mouse.
The following code will run all the numbers for us. In essence, it will
calculate the required set of parameters, and then it will modify the
property Coordinate Transformation Matrix
of xinit
accordingly:
# Get the current screen resolution
resolution=`xrandr | grep '*' | cut -f 4 -d ' '`
width=`echo ${resolution} | cut -f 1 -d 'x'`
height=`echo ${resolution} | cut -f 2 -d 'x'`
# Get the proper tablet width, according to the 4:3 proportion
tablet_width=`echo "${height} 3 / 4 * n" | dc`
# We need to calculate four parameters c0, c1, c2, c3. For that, we use the
# 'dc' utility, which uses postfix notation (i.e., you write "7/3" as "7 3 /").
#
# Note: if you want to move the usable section of the screen left or right,
# take a look at the 'x offset' value. Also note that, since we are using the
# entire height of the screen, the 'y offset' is simply 0.
# Touch area width / width
c0=0`echo "7 k ${tablet_width} ${width} / n" | dc`
# Touch area x offset / width
c1=0`echo "7 k ${width} ${tablet_width} - 2 / ${width} / n" | dc`
# Touch area height / height
c2=1.0
# Touch area y offset / height
c3=0.0
# Obtain the device ID for the graphics tablet. Note that UC-LOGIC is my device
# ID, but yours may be different
device=`xinput | grep UC-LOGIC | head -n 1 | cut -f 2 -d '=' | cut -f 1`
# Set the Coordinate Transformation Matrix
xinput set-prop ${device} --type=float "Coordinate Transformation Matrix" ${c0} 0 ${c1} 0 ${c2} ${c3} 0 0 1
And that's it! It happens to me often that the transformation doesn't
work straight away, in which case unplugging and plugging the tablet
again solves the problem. A second issue with every reinstall is that
the X server sometimes refuses to recognize my tablet. I solved that
problem by adding the following lines to the /etc/X11/xorg.conf
file:
Section "InputClass"
Identifier "evdev tablet catchall"
MatchIsTablet "on"
MatchDevicePath "/dev/input/event*"
Driver "evdev"
EndSection
There's a popular song, written by an Argentinean musician called Charly
García, called "Los Dinosaurios" ("The Dinosaurs"). The song was
released in 1983 in the album "Clics Modernos", and you can listen to it
in all its vinyl glory here.
This song represents for me an interesting problem: it is by far my
favourite song from this author, and I would like to listen to something
similar. But so far all recommendation systems have failed me. Here are
some of the reasons why.
A first approach could be to pick something from the same author, or
even the same album. This approach, sadly, doesn't work: while Charly
García is certainly a prolific author, with 41 published records and
countless guest appearances, his main style is oriented towards
electronic music, and it doesn't really fit the style of this specific
song. If anything, this song is more fitting for his earlier albums,
which limits us quite a bit - out of those 41 soloist albums, "Clics
Modernos" is the second one.
We could instead assume that this song was written in a certain context,
and that looking at authors from a similar context we can obtain similar
music. Again, this doesn't entirely work: if we pick "Argentinean songs
from the 80's", we would end up with a list of songs that fit perfectly
the style of the other 8 songs on this album, but not this one
specifically^1. Grouping the song into "Latin American music",
as some systems do, only exacerbates the problem: there is no relation
at all between this song and, say, a Cuban bachata.
If we look at the actual lyrics, the situation gets even worse: "Los
dinosaurios" is a thinly veiled critique of the military dictatorship
that ruled the country between 1976 and 1983. A lyrics-based systems
would most likely fail on two fronts: either it wouldn't understand the
references made in the song and label it as "nonsense/fantastical", or
it would understand the reference and recommend politically charged
songs. Neither approach seems really right - while "The times they are
a-Changin" could be a viable candidate for a recommendation, neither
"The Revolution will not be televised" nor "Redemption song" fit the
bill.
All of these approaches fail for the same reason: they apply a
network-oriented measure to a song that doesn't fit the popular rhythm
of the time and place in which it was produced, and which doesn't fit
the overall style of the author either.
So what exactly am I looking for? A non-technical answer would be "I
need a song that contains simple vocals, a piano as it's main
instrument, and with raising tension towards the end". Or in the words
of the author, a song that "adapts the English sound to Tango". As far
as I know, the only system that applied a similarity measure capable of
detecting this would be Pandora, but with their system closed to Europe,
I cannot tell whether this works or not.
Footnotes
^1 How to obtain a digital archive of Argentinean songs from
the early 80's is left as an exercise for the reader.
Related reading:
The Napoleon Dynamite
problem.
For the longest time, I thought that elections all over the world were
more or less the same. Finding out that this is not the case was
surprising, but not as much as what came afterwards: out of all the
typical complaints, the Argentinean voting system manages to avoid
most common pitfalls. So here's a quick overview.
Technicalities
First of all, and this should be a no-brainer, election day in Argentina
is a public holiday, and always takes place on Sunday. The act of voting
itself can take somewhere between 5 minutes and an hour, depending on
whether you show up during rush hour or not. Voting is mandatory, too,
and it always takes place as close to home as possible, so you really
have no excuse not to vote.
On election day you show up with your ID to your designated place
(usually the nearest school), and you queue at your assigned table.
Eventually you present your ID and receive in return an empty envelope.
Then you enter the so-called "dark room" alone, and close the door.
Inside you'll find piles of ballots with the name and photo of each
candidate. You pick one, seal the envelope, put it inside the urn,
receive your ID back, and off you go.
Running the voting process is a (paid!) civic duty, and as such anyone
over the age of 21 can be chosen. If you are selected, you get a letter
informing you of your role (either President, Vice President, or
backup), and where and when to receive a free training course. Aside
from these voting authorities (three per table), there's an unspecified
number of volunteers from political parties. These volunteers' main job
is to make sure everything is run fair and square, but the table
President can kick them out if they are out of line. Should (s)he need
it, the army is always there to lend a hand.
What goes well, what doesn't
What is there to like about this system? A lot. Mandatory voting, public
holiday, and short-ish queues ensure that everyone can vote. And while
jerk bosses exist everywhere, it's an accepted part of the culture that,
on election day, you vote. The system is also simple enough that
anyone can understand it.
Large scale fraud is notoriously difficult, and expensive. Voter
supression is also reasonably dealt with, as evidenced by the 80%
turnover rate. Fun fact: every election, a non-zero number of fugitives
end up arrested when they show up to their assigned voting place.
The Achiles heel of this system are the ballots themselves. For starters
they can be stolen (and they are), so political parties must print ten
times the required number to account for this (and even then, at times
they just give up). They can also be replaced by fake ones, so they are
thrown away during the vote count for being couterfeits. There's
literally no good reason for not implementing a single voting ballot,
which is probably why new projects introducing this change are shot down
over and over by the ruling parties. At a more general level, the
argentinean political system exhibits all known defects from the "first
past the poll" system.
In conclusion
I'm really impressed by the vision the founding fathers of Argentina
showed in several aspects, and elections is one of them (the other main
one being the Constitution). There is no doubt that argentinean
politicians are not the best the country has to offer, but that's hardly
the voting procedure's fault.
And then again, who hasn't elected a clown for President here and there?