Archive for October, 2010

Recorded Future analyses streaming Web data to predict the future

Saturday, October 30th, 2010

Recorded Future is a Boston-based startup with backing from Google and In-Q-Tel uses sophisticated linguistic and statistical algorithms to extract time-related information from streams of Web data about entities and events. Their goal is to help their clients to understand how the relationships between entities and events of interest are changing over time and make predictions about the future.

Recorded Future system architecture

A recent Technology Review article, See the Future with a Search, describes it this way.

“Conventional search engines like Google use links to rank and connect different Web pages. Recorded Future’s software goes a level deeper by analyzing the content of pages to track the “invisible” connections between people, places, and events described online.
   ”That makes it possible for me to look for specific patterns, like product releases expected from Apple in the near future, or to identify when a company plans to invest or expand into India,” says Christopher Ahlberg, founder of the Boston-based firm.
   A search for information about drug company Merck, for example, generates a timeline showing not only recent news on earnings but also when various drug trials registered with the website clinicaltrials.gov will end in coming years. Another search revealed when various news outlets predict that Facebook will make its initial public offering.
   That is done using a constantly updated index of what Ahlberg calls “streaming data,” including news articles, filings with government regulators, Twitter updates, and transcripts from earnings calls or political and economic speeches. Recorded Future uses linguistic algorithms to identify specific types of events, such as product releases, mergers, or natural disasters, the date when those events will happen, and related entities such as people, companies, and countries. The tool can also track the sentiment of news coverage about companies, classifying it as either good or bad.”

Pricing for access to their online services and API starts at $149 a month, but there is a free Futures email alert service through which you can get the results of some standing queries on a daily or weekly basis. You can also explore the capabilities they offer through their page on the 2010 US Senate Races.

“Rather than attempt to predict how the the races will turn out, we have drawn from our database the momentum, best characterized as online buzz, and sentiment, both positive and negative, associated with the coverage of the 29 candidates in 14 interesting races. This dashboard is meant to give the view of a campaign strategist, as it measures how well a campaign has done in getting the media to speak about the candidate, and whether that coverage has been positive, in comparison to the opponent.”

Their blog reveals some insights on the technology they are using and much more about the business opportunities they see. Clearly the company is leveraging named entity recognition, event recognition and sentiment analysis. A short A White Paper on Temporal Analytics has some details on their overall approach.

Chinese Tianhe-1A is fastest supercomputer

Thursday, October 28th, 2010

Tianhe-1AChina’s Tianhe-1A is being recognized as the world’s fastest supercomputer. It has 7168 NVIDIA Tesla GPUs and achieved a Linpack score of 2.507 petaflops, a 40% speedup over Oak Ridge National Lab’s Jaguar, the previous top machine. Today’s WSJ has an article,

“Supercomputers are massive machines that help tackle the toughest scientific problems, including simulating commercial products like new drugs as well as defense-related applications such as weapons design and breaking codes. The field has long been led by U.S. technology companies and national laboratories, which operate systems that have consistently topped lists of the fastest machines in the world.

But Nvidia says the new system in Tianjin—which is being formally announced Thursday at an event in China—was able to reach 2.5 petaflops. That is a measure of calculating speed ordinarily translated into a thousand trillion operations per second. It is more than 40% higher than the mark set last June by a system called Jaguar at Oak Ridge National Laboratory that previously stood at No. 1 on a twice-yearly ranking of the 500 fastest supercomputers.”

The NYT and HPCwire also have good overview articles. The HPC article points out that the Tianhe-1A has a relatively low Linpack efficiency compaed to the Jaguar.

“Although the Linpack performance is a stunning 2.5 petaflops, the system left a lot of potential FLOPS in the machine. Its peak performance is 4.7 petaflops, yielding a Linpack efficiency of just over 50 percent. To date, this is a rather typical Linpack yield for GPGPU-accelerated supers. Because the GPUs are stuck on the relatively slow PCIe bus, the overhead of sending calculations to the graphics processors chews up quite a few cycles on both the CPUs and GPUs. By contrast, the CPU-only Jaguar has a Linpack/peak efficiency of 75 percent. Even so, Tianhe-1A draws just 4 megawatts of power, while Jaguar uses nearly 7 megawatts and yields 30 percent less Linpack.

The (unofficial) “official” list of the fastest supercomputers is TOP500 which seems to be inaccessible at the moment, due no doubt to the heavy load caused by the news stories above. The TOP500 list is due for a refresh next month.

How Rapleaf is eroding our privacy on the Web

Monday, October 25th, 2010

RapLeaf knows what you did last summer.

The Wall Street Journal continues its exploration of how our privacy is eroding on the Web in new article by Emily Steel — A Web Pioneer Profiles Users by Name. The article profiles the San Francisco startup RapLeaf, which defines its vision as follows.

“We want every person to have a meaningful, personalized experience – whether online or offline. We want you see the right content at the right time, every time. We want you to get better, more personalized service. To achieve this, we help Fortune 2000 companies gain insight into their customers, engage them more meaningfully, and deliver the right message at the right time. We also help consumers understand their online footprint.”

RapLeaf ties email address to profiles with information about people and uses the profiles to target advertisements for clients. The articles shows the information collected for one person, Linda Twombly of Nashua NH, and what some of the coded information means.

Rapleaf does allow you to see the information it has collected about you, but you have to create a RapLeaf account to see it. You might be surprised about how well it knows you. Visit this page to see if your browser has RapLeaf cookies. You can also use it to opt out your email addresses from the RapLeaf system.

To be fair, RapLeaf and other companies are not doing anything illegal and mainly collect information that people choose to make public on the Web. However, their use of cookies does allow them to aggregate and integrate information about individuals and to associate that information with email addresses, Facebook UIDs and dozens of other identifiers. The information can be used to help Web-based systems serve you better — but their idea of serving you better is likely to involve peppering you with targeted ads.

How RapLeaf collects information about Web users

WSJ: many Facebook apps transmit user IDs to advertising and tracking companies

Sunday, October 17th, 2010

This Wall Street Journal article says that many of the most popular of the 550,000 Facebook apps (!) have been transmitting identifying information about users and their friends to dozens of advertising and Internet tracking companies.

“The apps reviewed by the Journal were sending Facebook ID numbers to at least 25 advertising and data firms, several of which build profiles of Internet users by tracking their online activities.

Defenders of online tracking argue that this kind of surveillance is benign because it is conducted anonymously. In this case, however, the Journal found that one data-gathering firm, RapLeaf Inc., had linked Facebook user ID information obtained from apps to its own database of Internet users, which it sells. RapLeaf also transmitted the Facebook IDs it obtained to a dozen other firms, the Journal found.

RapLeaf said that transmission was unintentional. “We didn’t do it on purpose,” said Joel Jewitt, vice president of business development for RapLeaf.”

Update: Facebook responds.

New Facebook Groups Considered Somewhat Harmful

Thursday, October 7th, 2010

I always think of things I should have added in the hour after making a post. Sigh. Here goes…

The situation is perhaps not so different from mailing lists, Google groups or any number of similar systems. I can set up one of those and add people to them without their consent — even people who are are not my friends. Even people whom I don’t know and who don’t know me. Such email-oriented lists can also have public membership lists. The only check on this is that most mailing lists frameworks send a notice to people being added informing them of the action. But many frameworks allow the list owner to suppress such notifications.

But still, Facebook seems different, based on the how the rest of it is configured and on how people use it. I believe that a common expectation would be that if you are listed as a member of an open or private group, that you are a willing member.

When you get a notification that you are now a member of the Facebook group Crazy people who smell bad, you can leave the group immediately. llBut we have Facebook friends, many of them in fact, who only check in once a month or even less frequently. Notifications of their being added to a group will probably be missed.

Facebook should fix this by requiring that anyone added to a group confirm that they want to be in the group before they become members. After fixing it, there’s lots more that can be done to make Facebook groups a powerful way for assured information sharing.

New Facebook Groups Considered Harmful

Thursday, October 7th, 2010

Facebook has rolled out a new version of groups announced on the Facebook blog.

“Until now, Facebook has made it easy to share with all of your friends or with everyone, but there hasn’t been a simple way to create and maintain a space for sharing with the small communities of people in your life, like your roommates, classmates, co-workers and family.

Today we’re announcing a completely overhauled, brand new version of Groups. It’s a simple way to stay up to date with small groups of your friends and to share things with only them in a private space. The default setting is Closed, which means only members see what’s going on in a group.”

There are three kinds of groups: open, closed and secret. Open groups have public membership listings and public content. Private ones have public membership but public but private content. For secret groups, both the membership and content are private.

A key part of the idea is that the group members collectively define who is in the group, spreading the work of setting up and maintaining the group over many people.

But a serious issue with the new Facebook group framework is that a member can unilaterally add any of their friends to a group. No confirmation is required by the person being added. This was raised as an issue by Jason Calacanis.

The constraint that one can only add Facebook friend to a group he belongs to does offer some protection against ending up in unwanted groups (e.g., by spammers). But it could still lead to problems. I could, for example, create a closed group named Crazy people who smell bad and add all of my friends without their consent. Since the group is not secret like this one, anyone can see who is in the group. Worse yet, I could then leave the group. (By the way, let me know if you want to join any of these groups).

While this might just be an annoying prank, it could spin out of control — what might happen if one of your so called friends adds you to the new, closed “Al-Queda lovers” group?

The good news is that this should be easy to fix. After all, Facebook does require confirmation for the friend relation and has a mechanism for recommending that friends like pages or try apps. Either mechanism would work for inviting others to join groups.

We have started working with a new group-centric secure information sharing model being developed by Ravi Sandhu and others as a foundation for better access and privacy contols in social media systems. It seems like a great match.

See update.

How the DC Internet voting pilot was hacked

Wednesday, October 6th, 2010

University of Michigan professor J. Alex Halderman explains how his research group compromised the Washington DC online voting pilot in his blog post, Hacking the D.C. Internet Voting Pilot.

“The District of Columbia is conducting a pilot project to allow overseas and military voters to download and return absentee ballots over the Internet. Before opening the system to real voters, D.C. has been holding a test period in which they’ve invited the public to evaluate the system’s security and usability. … Within 36 hours of the system going live, our team had found and exploited a vulnerability that gave us almost total control of the server software, including the ability to change votes and reveal voters’ secret ballots. In this post, I’ll describe what we did, how we did it, and what it means for Internet voting.”

The problem was a shell-injection vulnerability that involved the procedure used to upload absentee ballots. Halderman concludes

“The specific vulnerability that we exploited is simple to fix, but it will be vastly more difficult to make the system secure. We’ve found a number of other problems in the system, and everything we’ve seen suggests that the design is brittle: one small mistake can completely compromise its security. I described above how a small error in file-extension handling left the system open to exploitation. If this particular problem had not existed, I’m confident that we would have found another way to attack the system.”

Stuxnet worm update

Tuesday, October 5th, 2010

From slashdot earlier today:

“Numerous Stuxnet related stories continue to flow through my bin today, so brace yourself: Unsurprisingly, Iran blames Stuxnet on a plot set up by the west designed to infect its nuclear facilities. A Symantec researcher analyzed the code and put forth attack scenarios. A threatpost researcher writes about the sophistication of the worm. Finally, Dutch multinationals have revealed that the worm is also attacking them. We may never know what this thing was really all about.”

Stuxnet questions and answers from F-Secure

Friday, October 1st, 2010

If you are interested in the Stuxnet worm, take a look at this blog post from F-secure Labs, Stuxnet Questions and Answers. It’s relatively free of over ventilation and speculation. F-secure is a Finnish company specializing in anti-virus and computer security software. Here’s an intriguing example from the post that does speculate a bit.

Q: How does Stuxnet know it has already infected a machine?
A: It sets a Registry key with a value “19790509″ as an infection marker.

Q: What’s the significance of “19790509″?
A: It’s a date. 9th of May, 1979.

Q: What happened on 9th of May, 1979?
A: Maybe it’s the birthday of the author? Then again, on that date a Jewish-Iranian businessman called Habib Elghanian was executed in Iran. He was accused to be spying for Israel.

Hat tip HN.

update: Another good resource is SYmantec’s W32.Stuxnet Dossier.

“While the bulk of analysis is complete, Stuxnet is an incredibly large and complex threat. The authors expect to make revisions to this document shortly after release as new information is uncovered or may be publicly disclosed. This paper is the work of numerous individuals on the Symantec Security Response team over the last three months well beyond the cited authors. Without their assistance, this paper would not be possible.”