Sunday, September 29, 2013

Higher Order functions in Scala

The other day I was experimenting with functional programming in scala and  the concept of first class functions.

A programming language is said to support first class functions if

  1. Functions can be passed as parameters to another function.
  2. Functions can be returned from another function.
  3. Functions can be assigned to a variable in the same ways as any other first class object or primitive data types can be assigned to a variable.
  4. Functions can be defined within another function.
A corollary to this is that first class functions in a programming language lead to the language supporting closures since it has to clearly define what variables and values are being referred to by inner functions.

Exploring all these things in Scala made me write some sample code to demonstrate these concepts.
Below is a scala function which demonstrates all these concepts.

// 1.Takes a function as input, 
// 2.creates a nested function and assigns it to a variable,
//  3. applies the input function to input variables
 // 4. returns a function
  def firstclassfunc(inputfunc:(Int)=> Int, inputparam:Int): (Int,Int) => Int = {
    //val innerfunc = (x:Int, y:Int) => inputfunc(inputparam+x+y)
    val innerfunctionliteral =  (x:Int, y:Int)=> { inputfunc(inputparam+x+y) } 
    def anotherinnerfunc(x:Int, y:Int):Int = {inputfunc(inputparam+x+y)}
    val innerfunctionvalclassrepresentation = new Function2[Int, Int, Int] {
    def apply(x: Int, y:Int): Int = inputfunc(inputparam+x+y)
    val innerfuncdefvalpartiallyapplied =  anotherinnerfunc _
    val innerfuncdefvalinstantiated = anotherinnerfunc(1,2);
    // All of the below scala mechanisms work. As long as we can return a function that can still has accept //two  free variables, it seems we conform to the parent functions' return type definition of (Int,Int) => Int
      return innerfunctionliteral
    // return anotherinnerfunc _
   //  return innerfunctionvalclassrepresentation
   //   return innerfuncdefvalpartiallyapplied
   //  return anotherinnerfunc(_,_)
   // return innerfunctionval

Monday, August 23, 2010

Big Data - Now and the future

Data, data and more data !! The era of big data is upon us. Tera byte data sets are slowly becoming common place and exa and peta byte data sets are expected soon.

What are the underlying trends that caused the explosion of big data - or more aptly semi structured big data? On the web, the first one is the rise of Web search and the second one is the rise of social networking.

Search companies like Google needed a way to index the entire web on their machines. Google came up with the concept of MapReduce - a data processing framework on commodity machines to do this cost effectively. Open source implementations of MapReduce- named 'Hadoop' soon followed to solve these data processing issues. Social networking also required that the Facebooks and LinkedIns of the world , store huge amounts of user generated data coming in at a very high rate. They then had to index it, analyze it and generate insights from it to drive further user adoption and virality. A lot of this data was semi-structured( did not fit in a database neatly) and required a lot more computation to generate insights, than the traditional BI model.
This is leading to the rise of the so called Big Data Stack at consumer internet companies and it has five major components

Big Data Storage : NOSQL databases - Cassandra/Voldemort, HDFS, HBase
Big Data Indexing and index storage : Lucene, Katta or NOSQL stores like above; Zoie (real time indexing from Linkedin) ; Bobo for faceted search
Big Data Processing and Analytics: Hadoop, Hive, Pig
Big Data Workflows: Oozie( Yahoo), Azkaban(Linkedin), Cascading(Chris Wenzel)
Big Data and Big Log transportation : Chukwa, Flume, Scribe etc
Big Data Intelligence : Mahout (A Machine Learning framework -that can run on top of Hadoop)
Big Data Sharding: Gizzard ( A middleware sharding framework developed by Twitter)

(The exact use cases of the above stack and the variations at various internet companies merits its own discussion and is outside the scope of this article; I will address this in another post.)

Traditional Fortune 500 enterprises have long relied on an enterprise architecture stack consisting of RDBMS and BI software running on high-end servers; However, there was no good way to handle unstructured and semi structured data until recently. As more ideas like user generated data percolate from the consumer internet into the enterprise, enterprises are beginning to see the same big data issues that were first experienced in consumer internet space. There is also a growing realization that data can now be processed cost effectively to generate hidden insights and drive competitive advantage.

However today's CIO's lack the tools needed to manage this data. Even though this new stack and frameworks are getting mature, the skillsets currently needed by the IT staff to handle these new frameworks is very high. And every CIO is pressed on budget and under pressure to deliver value to their business using minimal staff. I think we will see a lot of tools and processes develop around big data ti ease the transition to the enterprise.

It should be an interesting space to watch!!

Friday, January 29, 2010

Current perspectives on Scalability - A buffet from various Internet scale companies

Dark Launch
Use(functional) concurrency supporting languages basd servers for applications which map to a parallel environment more.
Use straight forward HTTP web servers for req-response style requests.
Use C++ whenever efficiency/logging is required.

Develop/use NOSQL based approaches(Cassandra) for semi-structured/unstructured data that can tolerate relaxed consistency.

Develop your own Storage system (which does not require all the metadata and inode entries generally required by general POSIX systems) for photos to get rid of expensive CDN's.

Scribe - Their own distributed/reliable Logging System.

Do not use too many fine grained services - I have seen this problem in companies where too many fine grained services, then result in a drop order on deployment day (pretty painful).

No service private schemas ( Then how do they make changes to databases in an isolated way).
Swim Lanes

Thursday, January 21, 2010

Business Skills for Technology executives

I was reading one of the blog posts from AKF Partners the other day, where they talk about the business skills that need to be acquired by Technology Executives.
It is pretty coincidental, that I started on this exact quest some time ago.

The approach is pretty simple.
1) Got to recommended reading list
2) Read one book on each topic - whenever you have time
3) So far I understood Competitive Strategy, Positioning and a very basic way to read financial reports and a number of other skills too.
4) I felt this was the best approach in Silicon Valley - short of a full time MBA program.

Tuesday, January 19, 2010

An Architecture to learn from:FaceBook Chat

Many times a programming language is just a tool.. Sometimes it is a differentiator

At Facebook , they have used Erlang mostly for its lightweight concurrency and its actors model concurrency( ErLang calls them Channels; Scala calls them actors)

This has real implications in terms of how many machines Facebook has to buy to support chat; I am sure they cut their hardware requirements by at least half from what they would have needed if they went with a traditional request/response model; Shows how a good architecture means real money saved for hight traffic sites.

If you want to learn how Facebook uses Erlang as a differentiator, go through this presentation
There is a pdf somewhere also which explains the architecture in more detail.

My only question is this : Could a java based NIO approach have delivered similar/same results for Facebook; Is Java threading model so heavy and the semantics of shared memory so ill suited for Facebook chat?

Ramping up on Scala

I was getting up to speed on Scala on and off, but never made a concerted effort to get the entire hang of it.

Yesterday I finally got hold of Martin Odersky's Programming in Scala book and going at a good pace.

I would love to use more and more functional programming features in my future career.


Friday, August 14, 2009

Competitive Strategy: Analyzing your career and any industry

I am into an in-depth study of Michael Porter's competitive Strategy book , as part of my effort to understand a number of business concepts.

I have already gone through a number of marketing books such as
Positioning (Ries/Trout), Seth Godin(All Marketers are Liars ) etc.

However Porter's book is in a different class of its own. It gives you a framework to analyze any industry using a five forces framework of suppliers, buyers, threat of new entrants, substitutes and industry rivalry.

Some of the stuff is common sense and it seems this stuff is more applicable to late-stage or mature companies - than startups.

I am doing an analysis of two entities based on whatever I learned by studying this framework.
The first is that of a Software Engineer's career in USA.
The second one is of my current employer's industry.

Will be following up soon with posts on these subjects.