ElectroEuro: A Virtual Coin that Enables the Exchange of Green Energy, Driving a Cleaner and Decarbonized Europe

Team1

In 2017, General Electric (GE), one of the largest American companies that specializes in oil and gas, healthcare, aviation and software development, and Eurelectric, the union of the electric industry in Europe, partnered to create an Ecomagination Challenge hackathon. Ecomagination refers to “GE’s growth strategy to enhance resource productivity and reduce environmental impact at a global scale through commercial solutions for our customers and through our own operations”. The focus was on building digital solutions to help decarbonize energy and transportation in Europe, and the hackathon was held in Berlin between June 12-13, where over 100 participants from around the world came together to compete on solving the two challenges presented: Electrification and Advanced Manufacturing.

 

For the Electrification challenge, the target was to come up with solutions where things are powered by electricity. Sample solutions span: renewable energy resource siting, electric heating and conversion to heat pumps analysis, electric vehicle charger siting and renewable energy integration. For the Advanced Manufacturing challenge, the goal was to optimize existing manufacturing processes. Sample solutions consist of: forecasting manufacturing delays based on parts complexity, detecting delay drifts, and optimizing critical production rescheduling. Both challenges sought solutions that drive the de-carbonization of Europe and the platform for both challenges was Predix, the industrial Internet of Things (IoT) platform.Predix is GE’s  software platform for the collection and analysis of data from industrial machines.

 

Our team competed in Electrification, which was considered to be the greater of both challenges.

Priority3

 

Ira Blekhman, from GE Digital in Israel decided to promote the challenge and posted a notice about this on the Facebook page of FemTech, a community in Israel for women in technology with over 1000 members. Talia Kohen was the founder and CEO of FemTech, and when she saw the advertisement she immediately thought about Sheryl Sandberg’s words: What would you do if you were not afraid,” and decided to join her first competitive hackathon and serve as the CEO of her team! Talia is a masters student at Bar Ilan University in Computer Science (and Cornell alumnus), and was a finalist for the Anita Borg Scholarship. Talia was also a Microsoft Woman of Excellence and a Google Outstander. She hopes to one day serve as the CEO of her own startup.

sdr

 

The formation of her team was a bit unorthodox. Before she formally selected the other members of her team, Ran Koretzki, a graduate student at the Technion in Computer Science in Israel asked to join her, and they were the original core of the team: truly a Cornell-Technion alliance. Ran, whose technical experience includes summer internships with both Google and Facebook, served as the team’s CTO and developed most of the solution’s technical architecture.

 

Priority5

 

Talia then added Idan Nesher, a UX designer to the team, knowing that the key to winning is a compelling presentation. As much as architecting an energy bank and virtual currency system would win the hearts of the judges, the presentation would need to be at the same professional level.  Idan studied product design at the Avni institute in Tel Aviv, and subsequently moved to Berlin and started to work as a freelance designer. He dove fast into the user experience design (UX) world, believing that UX will be the future of all products since it is centered about innovation.

 

She added two other developers to the team who would be able to do front-end development in order to have a live demo ready to show at the hackathon, since seeing is believing: Haim Bender and Isaack Rasmussen. Haim, like Ran and Idan, came from Israel.  He studied Math and Computer Science at Tel Aviv University. Isaack, the only non-Israeli on the team is a software developer with more than a decade of experience, originally from Africa and now residing in Denmark.

 

Talia was privileged to have been mentored by another masters student at Bar Ilan, Micah Shlain, who taught her the principles of software development, and it was through this process that she was able to guide her team from idea to design to implementation.

 

We designed a  decentralized virtual currency known as the ElectroEuro for trading energies through an energy bank in Europe, driving a low carbon economy.  The use of the greener energies would promote decarbonization, and the monetization would make it accessible and practical. The concept was to unite Europe through electricity like the Euro. This currency is similar to BitCoin in that it is universal and there is a finite quantity of it. The transaction of energy is carried out through it, and it can be bought through goods that do not promote carbonization.

 

The energy bank consists of eight sources of energy ranked by the green-factor and stability, respectively.  The price of the energy is based on two metrics: 1) the distance to transport the energy (a fixed price), and 2) the quantity.  A market is generated based on the surplus of energy per country and per energy source. Machine learning is used to predict consumption, production, and cost, through a set of sensors that detect features for each type of energy. For example, a set of sensors detects weather and this goes into the prediction of the availability of solar energy. The other features include: location, cost of operation, availability, difficulty of harnessing, volume, waste, risk, failures, pollution, volume, and cost of production.  There then is an intricate bidding process performed on an interval where green and stable energies are promoted. The machine learning, which is the core logic of the system, is implemented on Predix, and the rest is still in the design phase.

 

This solution offers significant technological benefits: hybridization, mobility, decentralization, big data optimality, and efficiency. Hybridization is achieved through combining different energy sources. Mobility is realized through creating a mechanism that allows one to obtain energy through a nearest neighbor EU country rather than through OPEC.  Decentralization is in place because of the virtual currency, and it causes a free market to be generated. Big data optimality is materialized through a large sensor network which generates ample data for analysis. Lastly, efficiency is achieved through the big data that is collected that allows for replacing or moving resources throughout a network.

 

Additionally, it offers political and economic benefits: makes green energy cheap, generates a free market, promotes production driven by revenue, and promotes autonomy for individual countries. The green energy is more affordable, because there is a concerted effort to make it easier for both the producer and the consumer. For the consumer this means: lowering the price of green energy, creating flexible trading rules, and allowing for delayed payments. For the supplier this entails: forgiving debts, issuing small loans and imposing penalties for trading polluting energies. A free market is generated because there is less reliance on OPEC. Additionally, unlike OPEC, where production is heavily driven by politics rather than by being centered around revenue, this configuration is less politically driven and more idealistic. Lasty, countries are more autonomous than under OPEC. OPEC regulated countries very strictly: it penalized countries that under-produced by limiting their negotiation power, and fined countries that overproduced.  Every country individually was inclined to cheat by discounting its prices and exceeding quotas.

 

We presented a pitch at the Hackathon on a boat to a panel of judges, and several hours later, near the conference venue for Minds + Machines (which was walking distance from the boat) we found out that we won first place, in Electrification, which is 10K Euro for the team! Everyone was surprised and delighted at the same time. We also found out that we won 3K for second place in the category of Predix development.

 

Priority4

 

We further won up to 10K for travel expenses to Portugal for the Eurelectric conference to present our winning solution.  This project was slated to be a one-off, but GE is interested in continuing to support it to see it validated and developed.

The Power of WordNet and How to Use It in Python

In this post, I am going to talk about the relations in WordNet (https://wordnet.princeton.edu) and how you can use these in a Python project. WordNet is a database of English words with different relations between the words.

Take a look at the next four sentences.

  1.  “She went home and had pasta.”
  2. “Then she cleaned the kitchen and sat on the sofa.”
  3. “A little while later, she got up from the couch.”
  4. “She walked to her bed and in a few minutes she was snoring loudly.”

In Natural Language Processing, we try to use computer programs to find the meaning of sentences. In the above four sentences, with the help of WordNet, a computer program will be able to identify the following –

  1. “pasta” is a type of dish.
  2. “kitchen” is a part of “home”.
  3. “sofa” is the same thing as “couch”.
  4. “snoring” implies “sleeping”.

Let’s get started with using WordNet in Python. It is included as a part of the NLTK (http://www.nltk.org/) corpus. To use it, we need to import it first.

>>> from nltk.corpus import wordnet as wn

Continue reading

Natural Language Understanding : Let’s Play Dumb

What is the meaning of the word understanding? This was a question posed during  a particularly enlightening lecture given by Dr. Anupam Basu, a professor with the  Department of Computer Science Engineering at IIT Kharagpur, India.

Understanding something probably relates to being able to answer questions based on it, maybe form an image or a flow chart in your head. If you can make another human being comprehend the concept with the least amount of effort, well that means you do truly understand what you are talking about. But what about a computer? How does it understand?

Let’s take a look at some sentences written in various languages:

年は学校に行く; (Japanese)

The boy is going to school; (English)

Le garçon va à l’école; (French)

लड़का स्कूल जा रहा है; (Hindi)

El niño va a la escuela; (Spanish)

What is happening in the sentences above? If you know any two of these languages you will probably be able to tell the right answer. If you know only one, say English, you will probably answer with respect to that language. Taking a closer look, you will notice two sentences seem to have the same structure—the ones in Spanish and French. So you take an intelligent guess and say yes they mean the same thing. Since these sentences mean the same thing, and the one in English talks about a boy with respect to a school, maybe they all mean the same thing.

But how many sentences have you really understood? For me, only three of these make sense, thus forming a picture in my head of a boy directed toward a school connected by the present continuous tense of the verb “go.” English (a language I am well versed in), Hindi (my national language), and French (a language I am learning) all plant the same image in my head. The other translations I have taken from Google-Translate, hence the Japanese is just symbols for me. If you ask me questions in Japanese based on that sentence, I won’t be able to answer them, and definitely not in Japanese. If you ask me a question in any of the other three languages however, I will be able to answer quite fluently and even translate between the three without much effort.

During the lecture, Professor  Basu, introduced the concept of frames, invented by Charles J. Fillmore.

“The idea behind frame semantics is that speakers are aware of possibly quite complex situation types, packages of connected expectations, that go by various names—frames, schemas, scenarios, scripts, cultural narratives, memes—and the words in our language are understood with such frames as their presupposed background.”  (Fillmore 2012, p. 712)  Source/Read more at Computational Linguistics-Charles J. Fillmore

Consider the following sentence:

Tom is going to school in the evening with a basket of muffins from home.

What are the possible ways  one can represent this in a computer program?

Let us consider a schema, which defines the various attributes of the verb “go.” The verb go will generally have attributes like:

go { Agent :Tom; Source : home ; Destination : school ; Time : evening ;Object: basket of muffins  …}

Now suppose, someone asks the question “What is Tom carrying?”

How can your algorithm answer this? Carrying or to carry is an altogether different verb. Let us look again, now that we have a little more information.

Tom is a boy. Mary is a girl. Tom likes muffins. John cannot dance.T om loves engineering. Jane is beautiful. Tom is going to school in the evening with a basket of muffins from home. Tom sleeps early.

(With complexity, more attributes can be added, right now I am simply considering the easiest possible sentences.)

For this particular data, when the question “What is Tom carrying?” is asked, we can form the case frame for each verb that has Tom as a key, thus ignoring the other possible agents, that is Mary, John and Jane:

to be         {Agent: Tom ; Object: boy;…..}

to like       {Agent: Tom ; Object: muffins;……}

to love      {Agent: Tom ; Object: engineering;…..}

to sleep    {Agent: Tom ; Object: none ; Time: early ; …}

go             {Agent :Tom; Object: basket of muffins …}

Now based on the generic case frame of the verb  “to carry,” it will be easy to answer the question, as carry can’t be related to engineering for example. Anymore related questions like, “Where is he going with the basket?” may also be answered. Since the verb is the key here, and not the agent, we will just look at the case frame of the verb  “to go.”

This is a pretty basic example, and there are of course quite a few redundancies, like size of the frame and actually relating the verbs to their objects.There’s a need for word embeddings and other things that I am assuming are taken care of at the moment, but it puts forth a method to make your algorithm “understand.”

Now consider the following set of sentences:

Father gave a birthday gift to the boy. The boy opened the packet. He found a story book inside.

It was his birthday. Father bought him a packet. The boy opened it. There was a book within. It had pictures.

A book was wrapped in lace. It was his birthday present. The boy’s father presented it to him.

Question: What is the birthday present? The answer is very simple. The book. Yet the sentences vary in structure.The number of words that can be used to describe the same book is countably large. Imagine this to be a major news incident, or something that goes viral on Facebook. Each website or user will have their own set of words,set of interviews, opinions, grammatical structure, correct or incorrect.

Now imagine a search engine trying to crawl through it all to find you the right answer. Or is it the closest match? Did you go through the entire process of forming a case frame before deciding the answer was the book? Or was it a simple image that flashed through your mind.

Father -> Gift -> Boy -> Birthday -> Book

So, our algorithm must be built on a background knowledge similar to the one possessed by humans, also called ontology. Just like a human child keeps adding to a huge database of words, syntax, and semantics, subconsciously through his/her life, the algorithm should be able to learn too. However, designing a schema independent of the language it represents, like your brain, it is still a challenge. It will require a powerful dependency parser and a very large dictionary of words, to actually train the algorithm.

Here, I would like to mention the chatter bot ELIZA. It is an early natural language processing computer program created by Joseph Weizenbaum in the 1960s at the MIT Artificial Intelligence Laboratory. ELIZA was created to demonstrate the superficiality of communication between man and machine. ELIZA was capable of engaging in discourse, however it could not converse with true understanding. Even though many early users were convinced of ELIZA’s intelligence and understanding.

This once again brings us to the question is your machine/algorithm really able to “understand” a language and replicate without rote learning? Making the algorithm independent of the data set and able to handle unexpected problems is of paramount importance today. For example, an online chat bot designed to solve customer issues through texts should comprehend what the customer is asking. However there is a silent killer lurking, ready to destabilize your algorithm in such situations. When a person introduces in a touch of sarcasm into the conversation, the chat bot is unable to detect subtlety of language. That’s the toughest problem for any NLP algorithm to handle in today’s era, as no data set can teach sarcasm. Language, like the world and news of the day, continuously evolves. So how can any algorithm keep up?
This is the not a new issue. The question was famously posed by Alan Turing in his 1950s paper “Can Machines Think?” A machine that shows true comprehension continues to intrigue and interest computer scientists, who continue to search for a universal answer.

Notes from IIT Kharagpur ACM Summer School on ML and NLP

IIT KGP CAMPUS

Entrance to library and academic area.

[This entry has been edited for clarity. An example given discussing the similarity of words in French and English was incorrect. The following sentence has been removed: “The next question addressed by Bhattacharya was the ambiguity that may arise in languages with similar origins, for example in French ‘magazine’ actually means shop while in English, well it is a magazine.”]

Today is June 14th, so I am 14 days into summer school; 7 more days left, and we are all already feeling saddened by the idea of leaving Kharagpur soon. In India, an IIT is a dream for 90% of the 12th graders who join IIT coaching classes. The competition is high so not everyone gets in. I’m one of those who didn’t get in. So when I saw there was an ACM Summer School opportunity at the largest and oldest IIT in India, obviously I grabbed it. By sheer luck, I was selected to actually attend the school. Over the course of 21 days, we have been tasked to learn about machine learning and natural language processing. Continue reading

How 1 Million App Calls can Tell you a Bit About Malware – Part 2

In my previous blog post, I described some of my findings regarding malicious mobile apps. In summary, I observed that there are POSIX abstractions, which are popular only for malicious apps. The findings were derived from a study that I did with some colleagues on POSIX (Portable Operating System Interface) abstractions. Recall that, a part of our study involved the examination of the POSIX calls that are used by both benign Android applications (~1 million) coming from the Google Play Store, and malicious Android applications (about 1260 of them) taken from a well-known dataset, which you can download from here.

Figure 1: Potentially Malicious Apps. The identification was based on an SVM Model.

Figure 1: Potentially Malicious Apps. The identification was based on an SVM Model.

Table 2: Indicative potentially malicious apps classified by the SVM model. These apps were identified as malicious by more than 15 antiviruses.

Table 2: Indicative potentially malicious apps classified by the SVM model. These apps were identified as malicious by more than 15 antiviruses.

We performed a further analysis on these results to check if we can create a more robust filter to detect malicious apps, than the simple filter described in my previous post (recall that this filter was based on the three most unpopular abstractions among benign applications and at the same time popular among malicious ones). Our attempt involved the following: we fed a set of benign apps (the 500 most popular apps of the Google Store) and the aforementioned dataset of the malicious apps, to an SVM (Support Vector Machine), a binary classifier that builds a model based on given features (abstractions in our case) to separate the two cases. In this way the classifier can classify a new app as malicious or not. By using the model on the same set of apps that we examined in the previous case, 1283 apps were identified as suspicious. Based on the antiviruses provided by the VirusTotal website again, we found that from these apps, 232 (18%) are potentially malicious. Even if the approach seems less robust than the previous one, Figure 1, illustrates that there are more cases of apps that were indicated as malicious by more than one antivirus. Table 1, presents applications that were filtered out by the SVM model, and were identified as malicious by more than 15 antiviruses.

Figure 2: Potentially Malicious Apps. The identification was based on the obfuscated libraries.

Figure 2: Potentially Malicious Apps. The identification was based on the obfuscated libraries.

Table 2: Indicative potentially malicious apps containing obfuscated libraries. These apps were identified as malicious by more than 22 antiviruses.

Table 2: Indicative potentially malicious apps containing obfuscated libraries. These apps were identified as malicious by more than 22 antiviruses.

Through our experiments, we came across a number of Android apps that included obfuscated libraries (991 apps in total). Given the fact that obfuscation techniques have been extensively encountered while analyzing Android malware, we decided to examine all the apps that contained such libraries by using the 54 antiviruses of the VirusTotal website. Surprisingly, almost half of the apps (481 in total — 48.53%) were classified as suspicious. An interesting observation is that the majority of these apps were indicated as potentially malicious by a large number of antiviruses — see Figure 2. Table 2, presents indicative apps that were identified as malicious by more than 22 antiviruses.

As it is clear, a malware detector cannot be based solely on observations like the aforementioned ones. However, such findings could be useful for the development of complex filters that can help find malicious software.