HH Tu's blog

What is memcache?

HH Tu's picture

Today, I will introduce an useful technique in fetching database - Memcache. it is a distributed memory caching system, we can build a highly efficient cloud system with it. The basic concept behind this method is to use key-based structure to fetch and store data into memory. The original idea comes from Brad Fitzpatrick who used this method to enhance LiveJournal.com(2003). There are lots of website which use this method: LiveJournal, Wikipedia, Flickr, Twitter, Youtube, Digg, WordPress.com…etc. It can reduce almost all the databases loading time, and has better access and resource utilization to the database when a Memcache miss happened. It got key-based cache & distributed memory object caching system, but the authentication needs to control by the users.

It is good to store frequently used information to reduce the need to retrieve. The simplest example is like when you browsing on the Internet, most of the website contents will be downloaded into your folder, it is used to improve the speed when you browsing same website in the next time. Memcache system use the same viewpoint. It takes part of your computer memory to make your computer faster access, deployed and accessed from anywhere over a network, and you can create more and more cache as you want(of course, you need enough memory), and even more, it treats all cache as one single node which means you can combine several computer memory and use together!! What a wonderful mechanism. All operations should run in O(1) time.  read more »

What is Machine Learning?

HH Tu's picture

Nowadays, if a programmer wants to solve a word parsing problem, he would write a program to solve it. First of all, he needs to input a file and write some code instructions to parse it, then the program collects the useful information and output it. This is simple, but unfortunately, it cannot be the only rule to solve all the problems in the world. Humans can identify whether an e-mail is spam or ham easily, but it is not easy to find a useful algorithm to do it.

Spam mails can be different and thus very difficult to identify. Even the human brain cannot remember or identify every possible case. Today if we can rely on computers to help us collect data, auto-extract useful information, order the results to what we want, etc. and even self-learn to give us a prediction, it would be great!! The point is we don't have a direct algorithm but we have data.

Assume we have thousands of clients around the world and have tens of millions of e-mails every day. If we want to identify whether an e-mail is spam or ham, we can see previous mails and give an approximate prediction. According to traditional statistical analysis, it will take a lot of substantial time and money!! Furthermore, spam mail behavior changes over time and mail types change due to the different locations in the world. If we just follow the traditional rule, it will fail some day. But from another perspective, if we know this e-mail is sent by the spammer to broadcast advertisements or sent by a general manager to issue an order, we can easily handle it. You can write code instructions to quickly filter it out or leave it, but it is not an easy job to get this information.  read more »