Blogs

Login with Facebook and retrieve user identities

Hank Chen's picture

Facebook, which is abbreviated as FB, is a social-network website. Many websites integrate FB for promotions. How to integrate FB into your website? There are
1.Your options for API:
Facebook official API can be divide to two parts: one part is web programming, which is implemented by JavaScript or PHP; another port is mobile programming for iOS or android. Because FB is so popular recently, many program languages also develop APIs for FB, such as SilverLight, Flash, .Net or Java. And this article choose JavaScript for example.
2.Before developing:
2.1.Sign up for a Facebook account.
2.2.Login your Facebook account and sign up for “Developer”. You can get information from following URL:
http://sofree.cc/fb-app-1/
3.Start creating your Facebook APP:
3.1.After complete Step 2, click the button “Create New APP”, to get your App ID and APP secret, just like the following screenshot with red box. Then, fill out the “Basic Info” form, like the following screenshot with blue box. For example, our website for testing is “me.cellpoint.com”, so you fill “me.cellpoint.com” in the field “App Domain”, and fill “http://me.cellpoint.com” in “URL”. Finally, click “Save” for complete your work.

3.2.In the folder “me.cellopoint.com”, add the new file “channel.html” and “index.html”, and open “channel.html” to add the JavaScript code as follows:

<script src="//connect.facebook.net/en_US/all.js"></script>
3.3.Open the file “index.html”, pasting the following codes into this file, and edit those two lines of code which is marked with red. Note if your JavaScript code paste after the <body> tag.

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:fb="http://www.facebook.com/2008/fbml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Login</title>
</head>
<body>
<div id="fb-root"></div>
<script src="//connect.facebook.net/en_US/all.js"></script>
<script>
window.fbAsyncInit = function() {
FB.init({
appId : 'xxxxxxxxxxx',
// App ID, please replace it to the App ID which you retrieved from Step 3.1
channelURL : '//me.cellopoint.com/channel.html',
// Please replace it to the real URL that you setup for the channel.html in Step 3.2

status : true, // check login status
cookie : true, // enable cookies to allow the server to access the session
oauth : true, // enable OAuth 2.0
xfbml : true // parse XFBML
});
};
</script>
</body>
</html>
3.4.Login and retrieve member’s personal information
Add JavaScript code as follows, and all the property of callback described as follows:
a.userID: the login user’s ID, which is retrieved as the following code marked in red. If your website try to integrate FB for login, you can identity your member by this ID.
b.If you want to retrieve user’s email, you have to setup the following code which is marked in green.
c.You can retrieve user name and email by using “/me” from FB.api as following code which is marked in blue.
function FBLogin(){
FB.login(function(response) {
if (response.authResponse) {
alert('Success');
alert('UserID' + response.authResponse.userID);
FB.api('/me', function(response) {
alert('UserName' + response.name);
alert('UserEmail' + response.email);
});
} else {
alert('Failed');
}
}, {scope: 'email'});
}
3.5.Logout: add JavaScript code as follows.
function FBLoginOut(){
FB.logout(function(response) {
alert('Logged out.');
});
}
4.If you want to remove this application, you can find out the option which is called “Apps” on your personal page, find out this application, and remove it.
5.Reference: http://developers.facebook.com/docs/reference/javascript/
 

Libgtop

David Lee's picture

How to get the resource usage of Linux system, such as memory and CPU utilization, at the runtime of a process? We can read the file of system in the directory /proc/<process id>/stat, or we can use the “top” command in the shell; Howerver, extra effort is required with both approach because the file or interface need to be parsed before we use it. Here is another method to get the information about resource usage of entire system or a specific process: Ligbtop, a open source library based on C programming.

Libgtop is a library of GNOME project, used to implement the “top” functionality of the desktop environment. It depends on Glib, another library of GNOME. The latest version of Libgtop is 2.28. Noticed that Glib 2.6.0 and Intltool 0.35.0 or later versions need to be installed before we install Libgtop.

In general, the CPU utilization is caculated according the time CPU spend in different mode. These usually can be divided into user mode, nice mode, system(kernel) mode and idle mode. We can use the API of Libgtop to get the CPU time (clock clicks) of each mode from system boot. For example, the source code below can be used to caculate the CPU utilization.

#include <glibtop>
#include <glibtop/cpu.h>

double cpu_rate;
int dt, du, dn, ds;
glibtop_cpu cpu_begin,cpu_end;
glibtop_get_cpu(&cpu_begin);
sleep(1);
glibtop_get_cpu(&cpu_end);
dt = cpu_end.total - cpu.begin.total;
du = cpu_end.user - cpu.begin.user;
dn = cpu_end.nice - cpu.begin.nice;
ds = cpu_end.sys - cpu.begin.sys;
cpu_rate = 100.0 * (du+dn+ds) / dt

Note that we need to get the clock click count at two different points in time, so the function glibtop_get_cpu is called twice. On the other hand, the monitor of memory utilization is much more simply, as:

#include <glibtop>
#include <glibtop/mem.h>

double mem_rate;
glibtop_mem memory;
glibtop_get_mem(&memory);
mem_rate = 100.0 * memory.used / memory.total;

There are a variety of types of resource can be monitored by Libgtop. In addition to system CPU and memory utilization described above, also includes CPU and memory utilization of specific process, swap, file system, network interface, and so on. The Detail API and data structure can refer to GNOME’s official website: http://developer.gnome.org/libgtop/

Introduction of Google File System

David Lee's picture

Why can Google dominate the search engine market? One important reason is the excellent performance relies on the file system. Google has designed a unique distributed file system to meet its huge storage demand, known as Google File System (GFS). Google did not release GFS as open source software, but still released some technical details, including an official paper.

There are two mainly differences between GFS and traditional distributed file system. First, component failures are the norm rather than the exception. The failures can be caused by application bugs, operating system bugs, human errors, and even hardware or network problems. Since even the expensive hard disk device can not completely exclude all failures, Google just simply builds its storage machine by mutiple inexpensive comodity, and against failures through integrate constant monitoring, error detection, fault tolerance, and automatic recovery to GFS.

Second, most files are mutated by appending new data rather than overwriting or removing existing data. Once written, data are usually need only to be readable but not writable. And most reading operations are “large streaming reads”, where individual operations typically read hundreds of KBs, more commonly 1 MB or more. Notice that the system stores a modest number of large files, each typically 100 MB or larger in size. GFS support small files, but does not optimize for them.

The architecture of GFS similar to the supernode (Master) and distributed nodes (chunkservers) approach. Real data will be stored in chunkservers, which report their state to Master periodically. When a client wants to read a file, it query to Master about the state of target chunkserver, and Master response the location of chunkserver if it is in idle stage. Hence the client can request chunk data to the chunkserver.

GFS supports the large volume and flows for Google search engine. On the other hand, BigTable, a database system used by a number of Google applications such as Gmail, Google Maps, Youtube and other cloud services, also built on GFS. We can say that GFS is the killer technology in the cloud generation.

More details can refer to the GFS: http://labs.google.com/papers/gfs.html

Semaphore functions in PHP

Ruby Lin's picture

Semaphore is a variable or abstract data type that provides a simple but useful abstraction for controlling access by multiple processes to a common resource in a parallel programming environment.

A useful way to think of a semaphore is as a record of how many units of a particular resource are available, coupled with operations to safely (i.e. without race conditions) adjust that record as units are required or become free, and if necessary wait until a unit of the resource becomes available.

Semaphores are a useful tool in the prevention of race conditions and deadlocks; however, their use is by no means a guarantee that a program is free from these problems. Semaphores which allow an arbitrary resource count are called counting semaphores, whilst semaphores which are restricted to the values 0 and 1 (or locked/unlocked, unavailable/available) are called binary semaphores.

The following are semaphore functions in PHP:
int ftok (string $pathname, string $proj) - Convert a pathname and a project identifier to a System V IPC key.

sem_acquire (resource $sem_identifier) - Acquire a semaphore.

resource sem_get (int $key [,int $max_acquire = 1 [,int $perm = 0666 [,int $auto_release = 1]]]) - Get a semaphore id.

bool sem_release (resource $sem_identifier) - Release a semaphore.

bool sem_remove (resource $sem_identifier) - Remove a semaphore.

resource shm_attach (int $key [, int $memsize [, int $perm]]) - Creates or open a shared memory segment.

bool shm_detach (resource $shm_identifier) - Disconnects from shared memory segment.

mixed shm_get_var (resource $shm_identifier, int $variable_key) - Returns a variable from shared memory.

bool shm_has_var (resource $shm_identifier, int $variable_key) - Check whether a specific entry exists.

bool shm_put_var (resource $shm_identifier, int $variable_key, mixed $variable) - Inserts or updates a variable in shared memory.

bool shm_remove_var (resource $shm_identifier, int $variable_key) - Removes a variable from shared memory.

bool shm_remove (resource $shm_identifier) - Removes shared memory from Unix systems.

http://php.net/manual/en/book.sem.php

GNU libextractor

Shawn Lin's picture

Introduction

GNU libextractor is GNU’s library for extracting meta data from files. Meta data includes format information (such as mime type, image dimensions, color depth, recording frequency), content descriptions (such as document title or document description) and copyright information (such as license, author and contributors). Currently, libextractor supports the following formats: HTML, PDF, PS, OLE2 (DOC, XLS, PPT), OpenOffice (sxw), StarOffice (sdw), DVI, MAN, FLAC, MP3 (ID3v1 and ID3v2), NSF(E) (NES music), SID (C64 music), OGG, WAV, EXIV2, JPEG, GIF, PNG, TIFF, DEB, RPM, TAR(.GZ), ZIP, ELF, S3M (Scream Tracker 3), XM (eXtended Module), IT (Impulse Tracker), FLV, REAL, RIFF (AVI), MPEG, QT and ASF. Also, various additional MIME types are detected.

Libextractor is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License. GNU libextractor uses plugins to handle various file formats. Technically a plugin can support multiple file formats; however, most plugins only support one particular format. By default, GNU libextractor will use all plugins that are available and found in the plugin installation directory. Applications can request the use of only specific plugins or the exclusion of certain plugins.

Example for using dynamic library

// hello.c
#include <extractor.h>
int main()
{
struct EXTRACTOR_PluginList *el;
el = EXTRACTOR_plugin_load_defaults (EXTRACTOR_OPTION_DEFAULT_POLICY);
// ...
EXTRACTOR_plugin_remove_all (el);
return 0;
}

You can then compile the example using
$ gcc \ -I/Library/Frameworks/Extractor.framework/Versions/Current/include \
-o hello hello.c \
-L/Library/Frameworks/Extractor.framework/Versions/Current/lib \
-lextractor

Plugin management

C Struct: EXTRACTOR_PluginList

A plugin list represents a set of GNU libextractor plugins.

Function: void EXTRACTOR_plugin_remove_all (struct EXTRACTOR_PluginList *plugins)

Unload all of the plugins in the given list.

Function: struct EXTRACTOR_PluginList * EXTRACTOR_plugin_remove (struct EXTRACTOR_PluginList *plugins, const char*name)

Unloads a particular plugin. The given name should be the short name of the plugin, for example “mime” for the mime-type extractor or “mpeg” for the MPEG extractor.

Function: struct EXTRACTOR_PluginList
* EXTRACTOR_plugin_add (struct EXTRACTOR_PluginList *plugins, const char* name,const char* options, enum EXTRACTOR_Options flags)

Loads a particular plugin.

Function: struct EXTRACTOR_PluginList
* EXTRACTOR_plugin_add_config (struct EXTRACTOR_PluginList *plugins, const char* config, enum EXTRACTOR_Options flags)

Loads and unloads plugins based on a configuration string, modifying the existing list, which can be NULL. The string has the format “[-]NAME(OPTIONS){:[-]NAME(OPTIONS)}*”. Prefixing the plugin name with a “-” means that the plugin should be unloaded.

Function: struct EXTRACTOR_PluginList
* EXTRACTOR_plugin_add_defaults (enum EXTRACTOR_Options flags)

Loads all of the plugins in the plugin directory.

Example for a minimal extract method

The following example shows how a plugin can return the mime type of a file.

int
EXTRACTOR_mymime_extract
(const char *data,
size_t data_size,
EXTRACTOR_MetaDataProcessor proc,
void *proc_cls,
const char * options)
{
if (data_size < 4)
return 0;
if (0 != memcmp (data, "\177ELF", 4))
return 0;
if (0 != proc (proc_cls,
"mymime",
EXTRACTOR_METATYPE_MIMETYPE,
EXTRACTOR_METAFORMAT_UTF8,
"text/plain",
"application/x-executable",
1 + strlen("application/x-executable")))
return 1;
/* more calls to 'proc' here as needed */
return 0;
}

Internal utility functions

‘convert_numeric.h’ defines various conversion functions for numbers (in particular, byte-order conversion for floating point numbers).

‘unzip.h’ defines an API for accessing compressed files.

‘pack.h’ provides an interpreter for unpacking structs of integer numbers from streams and converting from big or little endian to host byte order at the same time.

‘convert.h’ provides a function for character set conversion described below.

Function: char * EXTRACTOR_common_convert_to_utf8(const char *input, size_t len, const char * charset)
which can be used to easily convert text from any character set to UTF-8.

Conclusion

In short, for us doing frequent message analyses, GNU libextractor not only helped us to find the correct file format, you can also write a variety of plugin tools, analysis of various text formats, find the corresponding MIME type. In terms of email, it can be said to be an indispensable tool.

What is memcache?

HH Tu's picture

Today, I will introduce an useful technique in fetching database - Memcache. it is a distributed memory caching system, we can build a highly efficient cloud system with it. The basic concept behind this method is to use key-based structure to fetch and store data into memory. The original idea comes from Brad Fitzpatrick who used this method to enhance LiveJournal.com(2003). There are lots of website which use this method: LiveJournal, Wikipedia, Flickr, Twitter, Youtube, Digg, WordPress.com…etc. It can reduce almost all the databases loading time, and has better access and resource utilization to the database when a Memcache miss happened. It got key-based cache & distributed memory object caching system, but the authentication needs to control by the users.

It is good to store frequently used information to reduce the need to retrieve. The simplest example is like when you browsing on the Internet, most of the website contents will be downloaded into your folder, it is used to improve the speed when you browsing same website in the next time. Memcache system use the same viewpoint. It takes part of your computer memory to make your computer faster access, deployed and accessed from anywhere over a network, and you can create more and more cache as you want(of course, you need enough memory), and even more, it treats all cache as one single node which means you can combine several computer memory and use together!! What a wonderful mechanism. All operations should run in O(1) time.

Here is a simple example to illustrate the usefulness in combing memory together. We fetch data from a server every day, and we want to speed up, so we add one more server, how to use it efficiently?

In pic(1), we got two server two memory, in order to ensure same results when you stored and retrieved from any server, you need to copy every data into another server's memory, it wastes time and memory, not a good setup. In Pic(2), we use Memcache method, then you can store and retrieve from same location in your web servers, there will be no in-consonance happened. More memory, more cache!!

When should we use Memcache? when you use lots of “SELECT * from XXX” from database, and have high probability to use and use again, you can use it happily. Here is an easy analysis, when you start to use Memcache, you can consider the following situations: 1. The search timing(How often) 2. The hit timing(What accuracy) 3. Validate?(How long). Of course you need to pay a little extra works to handle this, this should be included.

I give a brief procedure for how to implement your code with your database. Assuming you have lots of servers need to connect to each other for better memory using. Here's an simple example flow: Your clients ask servers for data, and your servers ask Memcache first, if the data is not in your Memcache does not have this data, then you go to fetch your database. Once you got the new data, remember store it into Memcache to increase the hit rate in the next time.

In the above example, you can understand Memcache is implemented as a network daemon. Most people use PHP or C/C++ to communicate with Memcache. I use it in the Linux system, if you wants to use it with C/C++, you need to install some basic package in your Linux: 1. libevent 2. Memcache 3. libmemcache.

The other details are all in the Official website: http://memcached.org/

What is Node.js?

Paul Chien's picture

JavaScript has traditionally only run in the web browser, but recently there has been considerable interest in bringing it to the server side as well, thanks to the CommonJS project. Other server-side JavaScript environments include Jaxer and Narwhal. However, Node.js is a bit different from these solutions, because it is event-based rather than thread based. Web servers like Apache that are used to serve PHP and other CGI scripts are thread based because they spawn a system thread for every incoming request. While this is fine for many applications, the thread based model does not scale well with many long-lived connections like you would need in order to serve real-time applications like Friendfeed or Google Wave.

Node.js, uses an event loop instead of threads, and is able to scale to millions of concurrent connections. It takes advantage of the fact that servers spend most of their time waiting for I/O operations, like reading a file from a hard drive, accessing an external web service or waiting for a file to finish being uploaded, because these operations are much slower than in memory operations. Every I/O operation in Node.js is asynchronous, meaning that the server can continue to process incoming requests while the I/O operation is taking place. JavaScript is extremely well suited to event-based programming because it has anonymous functions and closures which make defining inline callbacks a cinch, and JavaScript developers already know how to program in this way. This event-based model makes Node.js very fast, and makes scaling real-time applications very easy.

http://docs.pylonsproject.org/projects/pyramid/1.0/narr/introduction.html

LDAP

David Lee's picture

Consider two different issues: First, a huge organization with thousands of members, many departments and IT resources. How to maintain an updatable and accessible online address book for it? Second, a MIS staff need to maintain different sets of username and password for a number of different systems, such as linux login, apache, samba, mail service, etc.) How to make his work easily? These two issues seem irrelevant, but can be served by the same solution: LDAP (Lightweight Directory Access Protocol).

LDAP is a protocol for accessing online directory service, based on X.500. It omitted many complicated details of X.500 protocol to be a flexible and lightweight network application protocol build on IP networks. For the first issue above, with the flexible design LDAP allows us to catalog different type of resources to be a distributed online database. And, for the second issue, it also provides a standardized interface for referring to difference applications, thus integration with different configuration of those applications can be easily.

With the macro perspective, LDAP constructs multiple data to be a tree structure, called DIT (Directory Information Tree). A DIT can be cut into many sub-trees, each of them can be stored in a different LDAP server to achieve the distributed architecture. Each record in DIT can be replaced by a unique distinguished name (DN). As the “absolute path” in general file systems, DN is used to identifier the address in DIT.

And with the micro perspective, each record in the LDAP are consistent with a schema, which can be converted to LDIF (LDAP Data Interchange Format) for human-readable (notice that the real data storing may be binary.) In LDIF, each record will have multiple “attributes”, and each attribute is composed by multiple values. Which attributes a record can have is defined by its “objectClass”. For example, a record with objectClass “employee” may have attributes such as name, department, and email address, while another record with objectClass “department” may have attributes such as administrator and member. Every record have at least two attributes: DN and objectClass, while other required and optional attributes are determined according to the value of objectClass.

To retrieval information on the LDAP server, we can make queries packeaged into LDAP URL format:

ldap://" [ <host> ]"/" <dn> [ "?" <attributes>[ "?" <scope> "?" <filter> ] ]
<host> ::= <hostname>[ ":" <port> ]
<attributes> ::= NULL | <attributelist>
<attributelist> ::= <attributetype>| <attributetype>[ "," <attributelist> ]
<scope> ::= "base" | "one" | "sub"
● host: IP address of the server
● dn: DN of the search starting point
● attributes: which attributes of matching entry will be returned
● scope: search scope (single node, the first generation of child nodes, or entire sub-tree)
● filter: search criteria


For example: ldap://cellopoint.com/ou=rd,ou=unit,ou=company,dc=cellopoint,dc=com?mail?sub?uid=david will return email address of every employee that ID is David and is at RD department of Cellopoint.

Currently, the most popularly LDAP software are openldap and Microsoft Active Directory. While the former is a open source software, people can try it to experience the convenience of LDAP.

Web Application Frameworks

June Huang's picture

Due to the growing use of the Web and web services, web sites in the Web 2.0 era no longer support only static content. Site content has become dynamic so that users can perform real-time tasks such as checking and sending mail. The scale of our web projects becomes vast and it becomes complex to maintain as new features are continually added.

Web application frameworks provide a software architectural model that aid us to organize and manage the different components of our web application. They also provide some useful libraries for example: accessing the database, rendering templates and managing sessions.

Many web application frameworks use a Model-View-Controller (MVC) architecture that defines the logical components of the web application. The following are the explanations of each model, view and controller:

Model
The application model is used to handle the data of the system. In other words, it includes the data and functions that are used to manipulate the data. Controllers and views obtain and change data with the model.

View
The view is the rendered component of the application that is seen by the user, in other words, the user interface. The user uses the user interface to interact with the application.

Controller
Controllers are used to handle requests from the user and returns the response to the user. It obtains the required data from the model, prepares it into a suitable format, inserts the data in the view and renders the view for the user.

A typical request to the server happens as follows: The user interacts with user interface and a request is sent to the server. The main controller handles the request by determining the appropriate delegate controller and passes the control to that controller. The delegate controller interacts with the model to gather or update data for the view, renders the view and returns the control to the main controller. The main controller responds with the rendered view. The cycle repeats when the user interacts with the user interface and sends a new request.


References:
[1] Web application framework. (2011, May 28). In Wikipedia, The Free Encyclopedia. Retrieved 15:23, May 30, 2011, from http://en.wikipedia.org/w/index.php?title=Web_application_framework&oldid=431373642
[2] Model–view–controller. (2011, May 26). In Wikipedia, The Free Encyclopedia. Retrieved 17:12, May 30, 2011, from http://en.wikipedia.org/w/index.php?title=Model%E2%80%93view%E2%80%93controller&oldid=430946706

Protocol Buffers

Shawn Lin's picture

 Introduction

  • flexible, efficient, automated mechanism for serializing structured data.
  • think XML, but smaller, faster, and simpler.
  • use special generated source code to easily write and read your structured data.
  • update your data structure without breaking deployed programs that are compiled against the "old" format.

Why not just use XML?
Protocol buffers have many advantages over XML for serializing structured data. Protocol buffers:

* are simpler
* are 3 to 10 times smaller
* are 20 to 100 times faster
* are less ambiguous
* generate data access classes that are easier to use programmatically

How do they work?

  • You specify how you want the information you're serializing to be structured by defining protocol buffer message types in .proto files.
  • Each protocol buffer message is a small logical record of information, containing a series of name-value pairs.

Example:
package tutorial;
message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;

enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}
message PhoneNumber {
required string number = 1;
optional PhoneType type = 2 [default = HOME];
}
repeated PhoneNumber phone = 4;
}
message AddressBook {
repeated Person person = 1;
}

Three type

  • required: a value for the field must be provided, otherwise the message will be considered "uninitialized"
  • optional: the field may or may not be set. If an optional field value isn't set, a default value is used.
  • repeated: the field may be repeated any number of times (including zero).

Start working

  • protoc -I=$SRC_DIR --cpp_out=$DST_DIR $SRC_DIR/addressbook.proto
  • This generates the following files in your specified destination directory:
  • addressbook.pb.h, the header which declares your generated classes.
  • addressbook.pb.cc, which contains the implementation of your classes.

Example:
Person person;
person.set_name("John Doe");
person.set_id(1234);
person.set_email("jdoe@example.com");
fstream output("myfile", ios::out | ios::binary);
person.SerializeToOstream(&output);

fstream input("myfile", ios::in | ios::binary);
Person person;
person.ParseFromIstream(&input);
cout << "Name: " << person.name() << endl;
cout << "E-mail: " << person.email() << endl;

Entire message, including:

  • bool IsInitialized() const;: checks if all the required fields have been set.
  • string DebugString() const;: returns a human-readable representation of the message, particularly useful for debugging.
  • void CopyFrom(const Person& from);: overwrites the message with the given message's values.
  • void Clear();: clears all the elements back to the empty state.
  • bool SerializeToString(string* output) const;: serializes the message and stores the bytes in the given string. Note that the bytes are binary, not text; we only use the string class as a convenient container.
  • bool ParseFromString(const string& data);: parses a message from the given string.
  • bool SerializeToOstream(ostream* output) const;: writes the message to the given C++ ostream.
  • bool ParseFromIstream(istream* input);: parses a message from the given C++ istream.

http://code.google.com/intl/zh-TW/apis/protocolbuffers/docs/overview.html
http://www.cppprog.com/2010/0908/207_4.html

CodeIgniter 2.0.2 Released

Ruby Lin's picture

 There are many PHP frameworks available today, and some of the top PHP frameworks used by developers today include: The Zend Framework, CakePHP, Symfony, CodeIgniter, Seagull, Yii. These frameworks bring a number of benefits to your PHP development, for examples:

1. MVC(Model-View-Controller) architecture

2. Separate PHP from HTML

3. User-friendly URL namespaces

4. Rapid development

These frameworks have their own positives and negatives. Each programer has a different style and different priorities when it comes to adopting a tool kit to use when building apps. CodeIgniter is an open source web application framework that helps you write incredible PHP programs, and it is well-known for following features:

1. a small footprint

2. exceptional performance

3. ease-of-use

4. clear, thorough documentation

5. nearly zero configuration

6. no command line

7. no large-scale monolithic library

CodeIgniter attracts me because of it is easy to understand and easy to extend. And it has a number of supporting helpers, libraries, and plug-ins that you can use. All the tools you need are in one little package. If it is not enough, you can create your own libraries. CodeIgniter also has some security tools. For both users and developers, security is a key question. Cross Site Scripting (XSS) is one of the most common application-layer web attacks. CodeIgniter comes with a Cross Site Scripting Hack prevention filter which can either run automatically to filter all POST and COOKIE data that is encountered, or you can run it on a per item basis.

CodeIgniter 2.0.2 was released. This is a security maintenance release. The security fix patches a small vulnerability in the cross site scripting filter.

http://codeigniter.com/

What is Machine Learning?

HH Tu's picture

Nowadays, if a programmer wants to solve a word parsing problem, he would write a program to solve it. First of all, he needs to input a file and write some code instructions to parse it, then the program collects the useful information and output it. This is simple, but unfortunately, it cannot be the only rule to solve all the problems in the world. Humans can identify whether an e-mail is spam or ham easily, but it is not easy to find a useful algorithm to do it.

Spam mails can be different and thus very difficult to identify. Even the human brain cannot remember or identify every possible case. Today if we can rely on computers to help us collect data, auto-extract useful information, order the results to what we want, etc. and even self-learn to give us a prediction, it would be great!! The point is we don't have a direct algorithm but we have data.

Assume we have thousands of clients around the world and have tens of millions of e-mails every day. If we want to identify whether an e-mail is spam or ham, we can see previous mails and give an approximate prediction. According to traditional statistical analysis, it will take a lot of substantial time and money!! Furthermore, spam mail behavior changes over time and mail types change due to the different locations in the world. If we just follow the traditional rule, it will fail some day. But from another perspective, if we know this e-mail is sent by the spammer to broadcast advertisements or sent by a general manager to issue an order, we can easily handle it. You can write code instructions to quickly filter it out or leave it, but it is not an easy job to get this information.

We still have hope! Spam mails are not totally random. If we can collect and collate enough information, and give reasonable assumption with analysis, we can expect to find a good prediction that is closely approximated to the real answer. Exact correct prediction is not possible (unless you are God), but we can rely on computers to auto-collect data from existing e-mail sources and output useful information. This procedure is the value of machine learning.

Machine learning applications are currently used for optimizing network traffic identification, bank lending credit ranking, stock prediction, medical clinical data, biological nervous system and even space plan. A well-known case is a computer that plays chess against a human brain!! With all of these examples, we can say that machine learning is already in our lives. The question is, how do we let the computer learn? We start from the way people think.

How do people identify e-mails as spam or ham? Can you say why? Because it is not the same as normal mail, it is spam. Spam mails contain weird contents, unfamiliar senders, etc. None of these are standard criteria. Another example is face recognition. Can you explain why you know your father's or your mother's face? It is because you have seen them from birth. They are not strangers and you see them everyday so they are not unfamiliar, but this is still not a standard rule. From above examples, we actually build some characteristics: people's outline and the first impression of e-mail; symmetrical face and the words in e-mails; people have eyes, nose, mouth and an e-mail has recipient, sender, attachments. This is what we already do in our lives, but there are lots of rules that we use and so it is not easy to just write an algorithm to solve it. Here comes machine learning, it can collect data and analyse the attributes, find out what attributes are useful and can be coupled with our purpose (eg. mail prediction).

Modern machine learning consists of lots of statistics and calculus, because we have to find a correlation with optimization to achieve the goal. Machine learning can be divided into two parts. The first part is learning through a large amount of information and data, with optimizing to produce a represented model to use. The second part is prediction. We use the previous represented model to receive future input that will estimate the result and give an useful prediction. In reality, continuous learning is another important issue. Things will change, but we use adaptive learning.

Pyramid Introduction

Paul Chien's picture

Pyramid is a general, open source, Python web application development framework. Its primary goal is to make it easier for a developer to create web applications. The type of application being created could be a spreadsheet, a corporate intranet, or a social networking platform; Pyramid’s generality enables it to be used to build an unconstrained variety of web applications.
The first release of Pyramid’s predecessor (named repoze.bfg) was made in July of 2008. We have worked hard to ensure that Pyramid continues to follow the design and engineering principles that we consider to be the core characteristics of a successful framework:

Simplicity
Pyramid takes a “pay only for what you eat” approach. This means that you can get results even if you have only a partial understanding of Pyramid. It doesn’t force you to use any particular technology to produce an application, and we try to keep the core set of concepts that you need to understand to a minimum.

Minimalism
Pyramid concentrates on providing fast, high-quality solutions to the fundamental problems of creating a web application: the mapping of URLs to code, templating, security and serving static assets. We consider these to be the core activities that are common to nearly all web applications.

Documentation
Pyramid’s minimalism means that it is relatively easy for us to maintain extensive and up-to-date documentation. It is our goal that no aspect of Pyramid remains undocumented.

Speed
Pyramid is designed to provide noticeably fast execution for common tasks such as templating and simple response generation. Although the “hardware is cheap” mantra may appear to offer a ready solution to speed problems, the limits of this approach become painfully evident when one finds him or herself responsible for managing a great many machines.

Reliability
Pyramid is developed conservatively and tested exhaustively. Where Pyramid source code is concerned, our motto is: “If it ain’t tested, it’s broke”. Every release of Pyramid has 100% statement coverage via unit tests.

Openness
As with Python, the Pyramid software is distributed under a permissive open source license.


http://docs.pylonsproject.org/projects/pyramid/1.0/narr/introduction.html

Web Crawlers - Crawling Policies

June Huang's picture

Continuing from my last blog entry on web crawlers, let me now give a more detailed explanation as to how web crawlers traverse the Web. Web crawlers use a combination of policies to determine their crawling behavior, such policies include a selection policy, a revisit policy, a politeness policy and a parallelization policy. I shall discuss each of these as follows.

As only a percent of the Web can be downloaded, a web crawler must use a selection policy to determine which resources are relevant to download. This is more useful than downloading a random portion of the Web. An example of a selection policy is the PageRank policy (Google) where the importance of a page is determined by the links to and from that page. Other examples of selection policies are based on the context of the page and the resources’ MIME type.

Web crawlers use revisiting policies to determine the cost associated with an outdated resource. The goal is to minimize this cost. This is important because resources in the Web are continually created, updated or deleted; all within the time it takes a web crawler to finish its crawl through the Web. It is undesirable for the search engine to return an outdated copy of the resource. The cost to revisit the page are based on freshness and age, where freshness focuses on whether or not the local copy is the current copy of the resource and age focuses on how long ago the local copy was updated.

The politeness policy is used so that the performance of a site is not heavily affected whist the web crawler downloads a portion of the site. The server may be overloaded as it has to handle the requests of the viewers of the site as well as the web crawler. Solutions proposed to alleviate the load are: introducing an interval that restricts the web crawler from overloading server with requests and the robot exclusion protocol where the administrators indicate which portions of the site are not to be accessed by the crawler.

Parallelization policies are used to coordinate multiple web crawlers crawling the same Web space. The goal is to maximize the download rate of the resources as well as refraining the web crawlers from downloading the same pages.

[1] Web crawler. (2011, February 22). In Wikipedia, The Free Encyclopedia. Retrieved 16:24, March 4, 2011, from http://en.wikipedia.org/w/index.php?title=Web_crawler&oldid=415343979

Design Patterns in JavaScript

Paul Chien's picture

 The fact that JavaScript is so expressive allows you to be very creative in how design patterns are applied to your code. There are three main reasons why you would want to use design patterns in JavaScript:

  1. Maintainability: Design patterns help to keep your modules more loosely coupled. This makes it easier to refactor your code and swap out different modules. It also makes it easier to work in large teams and to collaborate with other programmers.
  2. Communication: Design patterns provide a common vocabulary for dealing with different types of objects. They give programmers shorthand for describing how their systems work. Instead of long explanations, you can just say, “It uses the factory pattern.” The fact that a particular pattern has a name means you can discuss it at a high level, without having to get into the details.
  3. Performance: Some of the patterns we cover in this book are optimization patterns. They can drastically improve the speed at which your program runs and reduce the amount of code you need to transmit to the client. The flyweight and proxy patterns are the most important examples of this.

There are two reasons why you might not want to use design patterns:

  1. Complexity: Maintainability often comes at a cost, and that cost is that your code may be more complex and less likely to be understood by novice programmers.
  2. Performance: While some patterns improve performance, most of them add a slight performance overhead to your code. Depending on the specific demands of your project, this overhead may range from unnoticeable to completely unacceptable.

Implementing patterns is the easy part; knowing which one to use (and when) is the hard part. Applying design patterns to your code without knowing the specific reasons for doing so can be dangerous. Make an effort to ensure that the pattern you select is the most appropriate and won’t degrade performance below acceptable limits.

[1] Ross Harmes and Dustin Diaz (2008). Pro JavaScript Design Patterns

Parallel programming language Erlang!

Shawn Lin's picture

 Telecommunication companies like Nortel Networks and T-Mobile develop their system with Erlang to achieve ‘Concurrent’ and ‘Fault-Torrent’ capabilities. In addition to concurrent and fault-tolerant, multi-core and Hyper-Threading (HT) processor environments are very good environments for the Erlang language .

Erlang solves one of the most pressing problems facing developers today: how to write reliable, concurrent, high-performance systems. It's used worldwide by companies who need to produce reliable, efficient, and scalable applications.

Moore's Law is the observation that the amount you can do on a single chip doubles every two years, but Moore's Law is taking a detour. Rather than producing faster and faster processors, companies such as Intel and AMD are producing multi-core devices: single chips containing two, four, or more processors. If your programs aren't concurrent, they'll only run on a single processor at a time. Your users will think that your code is slow.

Erlang is a programming language designed for building highly parallel, distributed, fault-tolerant systems. It has been used commercially for many years to build massive fault-tolerated systems that run for years with minimal failures.

Erlang programs run seamlessly on multi-core computers: this means your Erlang program should run a lot faster on a quad-core processor than on a single core processor, all without you having to change a line of code.

Developing systems with Erlang has the following benefits:

  • Write a program, move to the implementation of a multi-core environment, the speed will naturally become faster (or even possible to achieve linear acceleration, n-core to enhance the n-fold).
  • You can write fault-tolerant systems, the computer will restart after crash.
  • You can write a "hot-swap code" system, you can upgrade your code while it is processing, without suspending it.
  • The program is incredibly streamlined.


Erlang's Mnesia provides a database management system (Database Management System, DBMS). Mnesia is an integrated DBMS and can be accessed at a fairly rapid pace. It can be set across a number of separate entities for data replication node to provide fault-tolerant operation.
In addition to Mnesia, you will always use the OTP library when developing systems with Erlang. OTP is a set of Erlang libraries and open source programs, to help the Erlang programs establish industrial grade applications. OTP is Erlang’s source of power; using OTP can be quite easy to write a solid server.

http://www.erlang.org/doc/

Web Crawlers

June Huang's picture

Looking up information on the Internet has become a daily task for many of us. Thanks to the invention of search engines, it is not laborious to do. Search engines are convenient to use as they produce immediate results from countless sources. From Web pages to images and videos, we are able to search through almost everything anyone can ever find in the Web. To be able to return results, search engines first make use of a computer program called a web crawler that explores the resources on the Web. Web crawlers look at the pages’ contents and store information about the page so that when the user requests something, the search engine can find related resources and return them to the user. In this article I shall give a brief introduction of how search engines manage and find what we are interested in.

To begin with, the web crawler is given a list of URLs. The crawler visits a page and identifies keywords and links. It then determines which pieces of information are worth adding or updating. The web crawler will then download a portion of that page and index some metadata, for example the page’s URL, for future searches. The newly found links are then added to the list of URLs for the crawler to continue exploring.

Web crawlers have to select which pages it should visit to obtain information because there are infinitely many pages on the Internet and pages can be constantly added, modified or deleted. Policies are used to determine whether a page is worth visiting as it is impractical to visit every single page in the Web and possibly visiting it multiple times to check for updates. An example of a policy is Google’s PageRank policy that weighs the importance of a page depending on the links to the page and the PageRank of those pages. The number of pages that link to a specific page represents the page’s importance and therefore contributes to its PageRank. The higher the PageRank the more the page is worth indexing. Distributed web crawling is also used to share the URLs for exploring and page download so as to optimise the crawl through the Web.

References:
[1] Web crawler. (2010, December 22). In Wikipedia, The Free Encyclopedia. Retrieved 11:21, December 29, 2010, from http://en.wikipedia.org/w/index.php?title=Web_crawler&oldid=403711331
[2] PageRank. (2011, January 2). In Wikipedia, The Free Encyclopedia. Retrieved 11:15, January 6, 2011, from http://en.wikipedia.org/w/index.php?title=PageRank&oldid=405547279

Drupal Introduction

Ricky Wu's picture

Drupal is one of the best Content Management Systems (CMS). It is written in PHP and requires a MySQL database. Its basic installation can be easily turned into many different types of web sites - from simple web logs to large online communities.

Here is a list of the Drupal benefits:

  • Easy to install - Drupal installation described here;
  • Easy to use - no programming knowledge needed! Read this tutorial to learn the basics of Drupal.
  • Lots of features including Search Engine Friendly URLs(SEF), categories, search function and many more;
  • Lots of modules to extend your site's functionality;
  • Flexibility - you can easily turn your Drupal installation into a forum, blog, wiki and many other types of web sites;
  • Free to use, open source. You can freely install Drupal and you can modify the source code to fit your needs;
  • Lots of users and a large community - easy to find solutions to your problems.

By enabling and configuring individual modules, an administrator can design a unique site which can be used for knowledge management, web publishing, community interaction purposes, etc.
Here are some typical Drupal usages:

  • Content management - Via a simple, browser-based interface, members can publish stories, blogs, polls, images, forums, etc. Administrators can easily customize the design of their Drupal installation.
  • The Drupal classification system allows hierarchical ordering, cross-indexing of posts and multiple category sets for most content types. Access to content is controlled through administrator-defined user roles. A search option is also available.
  • Weblog - A single installation can be configured as an individual personal weblog site or multiple individual weblogs. Drupal supports the Blogger API, provides RSS feeds for each individual blog and can be set to ping weblog directories when new content is posted on the home page.
  • Discussion-based community - A Drupal web site can be successfully used as a discussion forum. Comment boards, attached to most content types, make it simple for members to discuss new posts. Administrators can control whether content and comments are posted without approval, with administrator approval or through community moderation. With the built-in news aggregator, communities can subscribe to and then discuss content from other sites.
  • Collaboration - Used for managing the construction of Drupal, the project module is suitable for supporting other open source software projects. The wiki-like collaborative book module includes versions control, making it simple for a group to create, revise and maintain documentation or any other type of text.

Drupal is a powerful, developer-friendly tool for building complex sites. Like most powerful tools, it requires some expertise and experience to operate. But it’s not friendly to user as beginner.

Referrence :

Drupal - Official Website

SimpleDB

Griffith's picture

Amazon SimpleDB is another service from Amazon that uses its Dynamo technology. With SimpleDB, Amazon has at last incorporated database as part of the company's web services. SimpleDB is currently available to the public as a beta service, with several technical limitations, including:
A single query will timeout after 5 seconds.
Only strings are available as data type.
Data query, write, retrieval are type casted into strings.
Strings cannot have more than 1,024 characters.
An item can have a maximum of 256 attributes.
In open beta testing, a SimpleDB domain is capped at 10GB capacity.

SimpleDB is not RDBMS (relational database management system), but it rather operates in a far simpler fashion. Amazon SimpleDB's data are stored as Domain → PKeys, PKeys → Attributes, and within each attribute, Key → Value. For instance:

Key:1
Attributes:
Category: Company
Name: Cellopoint
Website: http://www.cellopoint.com

Being a database, SimpleDB comes with its own querying API.

Installing the PHP Library for Amazon SimpleDB:
1. Download PHP Library for Amazon SimpleDB at Amazon Web Services (AWS) website.
2. Extract the file.
3. Request and retrieve Access Key ID and Secret Access Key from AWS.

Client connection testing:
1. cd to PHP Library for Amazon SimpleDB directory.
2. vim test_client.php
<?php
$AWS_ACCESS_KEY_ID = …
$AWS_SECRET_ACCESS_KEY = …

require_once('Amazon/SimpleDB/Mock.php');
$service = new Amazon_SimpleDB_Mock();

require_once('Amazon/SimpleDB/Client.php');
$service = new Amazon_SimpleDB_Client($AWS_ACCESS_KEY_ID, $AWS_SECRET_ACCESS_KEY);
?>
3. Run test_client.php

Inserting data:
1. PHP aplication
$domain= “MyDomain”;
$item = “Product01”;

$attr1 = new Amazon_SimpleDB_Model_ReplaceableAttribute();
$attr1->withName('Category')->withValue('Device');

$attr2 = new Amazon_SimpleDB_Model_ReplaceableAttribute();
$attr2->withName('Name')->withValue('D01');

$attrArr = array($attr1, $attr2);

$request = new Amazon_SimpleDB_Model_PutAttributesRequest();
$request->withDomainName($domain)->withItemName($item)->setAttribute($attrArr);

invokePutAttributes($service, $request);
2. Service response

Service Response
==============================
PutAttributesResponse
ResponseMetadata
RequestId
...
BoxUsage

Retrieving data:
1. PHP application
$request = new Amazon_SimpleDB_Model_QueryRequest();
$request->setDomainName('MyDomain');
$request->setQueryExpression(“['Category' = 'Device']”);

invokeQuery($service, $request);
2. Service response
GetAttributesResponse
GetAttributesResult
Attribute
Name
Category
Value
Device

ResponseMetadata
RequestId

BoxUsage
..

Amazing Graphical Scripting Language

Shawn Lin's picture

Sikuli is a visual technology to automate and test graphical user interfaces (GUI) using images (screenshots). Sikuli includes Sikuli Script, a visual scripting API for Python, and Sikuli IDE, an integrated development environment for writing visual scripts with screenshots easily. Sikuli Script automates anything you see on the screen without internal API's support. You can programmatically control a web page, a desktop application running on Windows/Linux/Mac OS X, or even an iPhone application running in an emulator.

Sikuli, which read much like a Japanese name, in fact, is an innovative programming language, by a student at MIT (students from Taiwan) and his friends took more than three years to research and generated products.

It is a new concept, the use of image recognition, to the effect of automation of many complex instructions.

As Vgod said:“ The most important revolution of Sikuli is code readability and ease to use. Screenshot directly on the code inside, people can directly ‘see’ what he wants to control, which no one ever thought about. Previously, only programmers were able to write programs using the mysterious alien languages.”

From the automated tools point of view, "SIKULI" is not so unique, but it is unique in the method it uses. We know that programming languages are fairly mature technology tools and thus programmers are used to the idea that the languages are difficult to use or have been hypnotized himself to say such as ". NET has been more useful! "," wow! DELPHI has added a super useful component of the Windows API, "this type of dialogue.

No one ever overturn the concept of the past before the advent of "SIKULI", that came up with a new programming language and creative ways. "SIKULI" really achieved innovation. Screenshot replaced with objects, you do not have to know the Windows API libraries, which can control the window components. Although it has not developed to the concept where one can write stand-alone applications, it can be used as a desktop automation tool. but it does point out a way to tell programmers around the world, "A new program design way to go!"

First, of course, you must download "SIKULI", and installed Java Runtime Environment (JRE) environment in the computer. You can follow the method to easily customize the operation!

http://www.youtube.com/watch?v=FxDOlhysFcM

Enter Ext JS: The Best of JavaScript Libraries

Paul Chien's picture

A long time ago in a galaxy far, far away (more precisely, early 2006, the planet Earth), a gentleman by the name of Jack Slocum developed a set of extension utilities to the YUI library. These utilities rapidly gained popularity within the YUI community and were quickly organized into an independent library called YUI-Ext. In fall 2006, Jack released the .33 version of this new library under the terms of the Berkeley Software Distribution (BSD) license.

After a while, before 2006 was over in fact, the name of the library was changed to Ext, because it was starting to develop independently of YUI at that point. In fact, support for other libraries was beginning to be developed within Ext.

In 2007, Jack formed a company to further advance the development of Ext, which at some point thereafter began to be known as Ext JS. On April 1, 2007, Ext JS 1.0 was released.

In a short period of time, Ext JS evolved from a handy set of extensions to a popular library into what many people, including yours truly, feel is the most mature JavaScript UI development library available today.

Ext JS is focused on allowing you to create great user interfaces in a web app. In fact, it is best known for its top-notch collection of UI widgets. It allows you to create web apps that mimic the look and feel of desktop native applications, and it also allows you to mimic most of the functionality those applications provide. In short, Ext JS enables you to build true RIAs.

It’s not all about widgets, though, as we’ll see. There are a lot of useful utility-type functions in Ext JS as well. Need some Ajax functionality? Check. Need some string manipulation functions? It’s got those too. Need a data abstraction layer? Ext JS has you covered.

[1] Frank W. Zammetti (2009). Practical Ext JS Projects with Gears

A Search Engine for Your Personal Cloud

June Huang's picture

Accessing the myriad of information on the Web has been made possible with web search engines. Nowadays, cloud technology changes the way people interact with the Web, for example social networking and data storage. Our personal cloud grows and keeping track of what goes on and where things occur becomes a relevant issue. Consider all the information that you and your social networks create in one day. E-mails, calendar events and conversations are all such examples of your social streams and remembering everything that happens is impossible. How can we effectively find things in our own cloud? The answer is: a search engine for your personal cloud.

Greplin and Introspectr are two such services that allow users to filter through their personal data. They offer indexing of your social-networking services like Facebook and Twitter, mailboxes like Gmail (including attachments and links) and even file-sharing services such as Dropbox and Google Docs. To use their services, simply type in your query and they will return all the occurrences of your search regardless of which streams they appeared from.

The main difference between Greplin and Introspectr is that Greplin offers real-time indexing approximately every 20 minutes. With Instrospectr, you will have to update the index manually. There is also a known issue where Greplin does not index contents of external URLs from tweets where as Introspectr does [2]. Both Greplin and Introspectr allows you to index a variety of services, however, log-in information is required to specify a service you want indexed. It is obvious that safety and privacy becomes a concern. Greplin states that they use OAuth to retrieve only the data and they do not have access to your log-in information [3].

Greplin and Introspectr offer a convenient and centralized way for users to filter through the contents of their cloud. Their services can be accessed on almost all devices with an Internet connection and searching through social feeds become just as easy as an e-mail or hard-drive search.

[1] Arrington, M (Aug 31, 2010). The Other Half Of Search: Greplin Is A Personal Search Engine For Your Online Life. Retrieved on October 26, 2010, from http://techcrunch.com/2010/08/31/greplin-ycombinator-personal-search/
[2] Schonfeld, E. (Oct 12, 2010). Introspectr Searches Your Social Streams. Retrieved on October 26, 2010, from http://techcrunch.com/2010/10/12/introspectr-search-social/
[3] Greplin: https://www.greplin.com/
[4] Introspectr: https://www.introspectr.com/

Google Omaha

Griffith's picture

Google client products are capable of updating themselves to a newer version without end-user intervention. This is known as “auto-updating.”

Most of Google client products possess the auto-update feature, but with different implementation. Some has their own auto-update solutions, while others utilize the common auto-updater code and the common auto-updater server.

Google sought for client products to minimize code duplication and avoid maintaining multiple servers that essentially perform identical functionality. As result they considered unifying all client products under a single auto-update solution. This decision was influenced by the fact the evolution of Microsoft Windows, which made this unification almost mandatory. Window Vista has a strict security model that restricts the ability of most applications on a machine to perform system-changing activities, including: modifying the Windows registry, writing to the Program Files directory, and in some cases writing any persistent change to the system at all. The update of an installed program requires all these modifications, which would lead traditional auto-updating to fail on Microsoft Vista.

The lack of shared code brought another issue: each auto-update implementation had its own subset of desired features. Many of them failed to have multiple update tracks, or define a consistent versioning mechanism. Code unification allowed Google to deploy a rich set of auto-update capabilities to all its client applications.

As the number of Google applications increased, the improvement of the overall install user experience became increasingly desirable. Traditionally, the browser would prompt the end-user with a series of technical and confusing dialogs which encouraged the user to abandon their installation. Then the user was led to a wizard filled with choices that they did not need to know or know how to decide amongst. These result in a bad user experience during the product installation process.

In order to meet all the requirements and challenges mentioned above, Google developed a shared client infrastructure that handles all installation and auto-updating tasks for Google Windows client products. This client communicated with a single Google auto-update server. This server, together with the client, is named Google Update. And the project is known as Omaha.

Server and Desktop Virtualization

Ricky Wu's picture

In Computing, virtualization is broad term that refers to the abstraction of computer resources. It hides the physical characteristics of computing resources from their users, be they application, or end users. This includes making a single physical resource appear to function as multiple virtual resources; It can also include making multiple physical resources appears as a single virtual resource…
In a recent discussion, the virtualization is widely applied to a number of concept including:

  • Server Virtualization
  • Desktop Virtualization
  • Network Virtualization
  • Storage Virtualization
  • Application Infrastructure Virtualization

In this article, we’ll focus on server and client Virtualization.

Server virtualization and desktop virtualization can be a confusing topic if we don’t know the inherent differences between the two virtualization technologies. Server virtualization is defined as the partitioning of a physical server into smaller virtual servers. One common usage of this technology is in Web servers. Instead of requiring a separate computer for each server, many of virtual servers can co-exist on the same computer. Server virtualization has numerous of benefits:

  • Increased Hardware Utilization – Hardware and energy saving, reduced administration overhead.
  • Security – Virtual machines can be stored as image used to restore compromised system.
  • Development – Easy to setup in repeatable fasion.

Desktop virtualization, sometimes referred to as client virtualization, is defined as a virtualization technology that is used to separate a computer desktop environment from the physical computer. Wikipedia defines desktop virtualization as: ‘Desktop Virtualization is a server-centric computing model that borrows from the traditional thin-client model but is designed to give administrators and end users the best of both worlds: the ability to host and centrally manage desktop virtual machines in the data center while giving end users a full PC desktop experience.’
The desktop virtualization offers advantages over computer operating as individual unit as each desktop will not require its own hardware, operating system and software. There are also many benefits of desktop virtualization:

  • Offline virtual desktop without a host OS which will allow you to take your virtual desktop with you
  • Check-in and out of virtual desktop, which will allow for the updating of server-based VMs. This will ensure that your desktop virtual machines remain current and in the backup rotation
  • Local VM snapshots for recovery and rollback with will keep your users doing their jobs and not on the phone with support
  • Encryption and Profile based security which will ensure that your customer’s data is safe and secure whether in the datacenter or on the go

References:

Wikipedia – Desktop virtualization
Wikipedia – Hardware virtualization
WEBOPEDIA – The Difference Between Server and Client Virtualization
InfoQ – An Introduction to Virtualization

RabbitMQ

RabbitMQ is a message queue server which provide Advanced Message Queuing Protocol (AMQP). It is implemented by Erlang! And it provide C, Python, PHP and Jave client API.
• Persistent message
• Transaction
• Virtual host
• Cluser

Message Producer:
It creates a message with the routing key and send it to exchange. The routing key is used to determine which Message Queue should be sent to.

Exchange:
It will accept the message and route them to Message Queue. A binding define the relation between an exchange and Message Queue.
There are three types of

Fanout_Exchange

exchange:

Direct_Exchange

Topic_Exchange

Message Queue:
It holds messages and deliver them to message consumer.
Message Consumer:
It will do the work according to the incoming message.

Reference:
http://notes.variogr.am/post/143623387/broadcasting-your-logs-with-rabbitmq-and-python
http://barryp.org/software/py-amqplib/
http://www.infoq.com/articles/AMQP-RabbitMQ
http://www.infoq.com/cn/articles/AMQP-RabbitMQ
http://wiki.secondlife.com/wiki/Message_Queue_Evaluation_Notes