blog.humaneguitarist.org

discoveries in digital audio, music notation, and information encoding

Archive for the ‘sinuses’ tag

an improved PHP includes to DOT graph script by Carl

leave a comment

Almost two years ago, I posted this about making a DOT graph by linking PHP files via include() and require() commands.

The idea was to have a simple way to visualize dependencies of one PHP file to another, etc. So I wrote a little script to try and do that.

The idea of having a blog is, of course, about sharing information and putting things out there. But it's always better when people write in and comment on a post, share their ideas, and sometimes their code!

Carl wrote in last year with an improved script to make the DOT file. Unfortunately, because the indentation didn't get completely preserved by the WordPress comments, Python was returning errors in regard to the indentation.

For a long time I kept meaning to message Carl and ask if he could send along the script via an email attachment, so I could re-post it.

Well last night I decided to finally email him and this morning I woke up to a reply with the script attached.

The script still works as it did before in terms of command line arguments.

$ python analyseRequires.py path-to-my-PHP-script-folder 1 my-output-file.dot

You can read the former post to learn more about the arguments.

Anyway, thanks a bunch to Carl for sharing his code and for reading the blog.

I've pasted the code below with Carl's original notes about the changes he made.

# analyseRequires.py

"""
* removing duplicates
* working with absolute paths to identify file identities even if references through different relative paths
* improved regular expressions (sometimes the first letter of a file was missing)
* simplified file existence verification (to respect php's simple fallback for includes: if the path doesn't exist, the working directory is tried and the directories in the php.ini's include_path) 
"""

#####
#importing modules
import glob, re, sys, os, fnmatch
br = "\n"
tab = "\t"
#regex components
anything = ".*"
quotes = "(\"|')"
dependency = "(require|require_once|include|include_once)"
#####
#exiting if all 3 arguments are not passed via command line
def fail():
    print ("ERROR: " + str(len(sys.argv)-1) + " of 3 required arguments provided.")
    sys.exit()
 
 
#####
#getting arguments passed via command line
    
#testing for root DIRECTORY string
try: myDir = sys.argv[1]
except: fail()

rootDir = os.path.abspath(myDir)+"/"

#testing for RECURSION boolean
try: myRec = sys.argv[2]
except: fail()
 
#testing for OUTPUT filename string
try: myFile = sys.argv[3]
except: fail()
 
 
print "--------------------------------------------"
print "gathering files"
print "--------------------------------------------"

#####
#making list of PHP files within DIRECTORY
if myRec == "0": #without recursion
    myDir2 = myDir + "/*.php"
    PHP_list = glob.glob(myDir2)
elif myRec == "1": #with recursion
    PHP_list = []
    for dirname, dirnames, filenames in os.walk(myDir):
        for filename in filenames:
            if fnmatch.fnmatch (filename,("*.php")):
                match = os.path.join(dirname,filename)
                PHP_list.append(match)
 
for p in PHP_list:
    print os.path.abspath(p)

print "--------------------------------------------"
print "searching includes"
print "--------------------------------------------"

#make an empty list;
#tuples will go in the list;
#each tuple will contain a PHP filename and a PHP filename it includes
includeList = set()
 
#iterate through each PHP file and place tuples in the list
for phpFile in PHP_list:
    fileOpen = open(phpFile, "r")
    print "processing " + phpFile
    #for each line in a PHP file
    for line in fileOpen:
            m = re.match(anything+dependency+anything+"\((.*)\)", line)
            if m:
                inBracketExpression = m.group(2)
        m = re.match(".*"+quotes+"(.*)"+quotes, inBracketExpression)
        if m:
           targetBasename = m.group(2)
        else:
           continue

        # print tab + targetBasename

                if targetBasename[-4::] == ".php": #only PHP files
                    source = os.path.abspath(phpFile)
            source = source.replace(rootDir,"")
            # the include statement checks the path given, otherwise goes for the current working directory and finally checks the                 include_path in the php.ini
            # assume that the given path is valid
            targetFile = os.path.abspath(os.path.dirname(phpFile)+"/"+targetBasename)
            # if it is not try simulating php's fallback to current working dir methods
            if( not os.path.exists(targetFile)):
            print tab+" >>> ! >>> "+ targetFile + " doesn't exist, using"
                targetFile = os.path.abspath(rootDir+targetBasename)    
            print tab+targetFile+" instead"
                if( not os.path.exists(targetFile) ):
                print tab+"no success. skipping include."
                continue
            targetFile = targetFile.replace(rootDir,"")
                    includeList.add( (source, targetFile) )
            else: pass
 
print "--------------------------------------------"
print "writing output"
print "--------------------------------------------"

#creating DOT file
dot = open(myFile, "w")
 
#writing to DOT file
dot.write("digraph dependencies {" + br)
for dependency in includeList:
    print dependency
    dot.write(tab)
    dot.write("\"")
    dot.write(dependency[0])
    dot.write("\"")
    dot.write(" -> ")
    dot.write("\"")
    dot.write(dependency[1])
    dot.write("\"")
    dot.write(";")
    dot.write(br)
dot.write("}")
dot.close()
 
 
#####
#exiting
sys.exit()
--------------

Related Content:

Written by nitin

June 23rd, 2013 at 2:41 pm

making a DOT graph for PHP include statements

4 comments

A couple of months ago, I posted about my experience with making a Python dependency graph.

Of course, as the post states, I was originally looking for a way to make a graph showing the relationship among PHP files in regard to "include" statements.

Well, I'm home sick and after a few hours of trying to find an easy, out-of-box solution I gave up and rolled my own Python script to make me a DOT graph file.

I didn't have anything better to do.

:(

The results are pretty simplistic, but I'm happy enough with it for now.

The Python script takes three arguments: the directory in which the PHP files exist, whether to search recursively or not (0=no, 1=yes), and the name of the output file as such:

$ python makeDOT.py blog/wordpress 1 wordpressIncludes.dot

#####
#importing modules
import glob, re, sys, os, fnmatch
br = "\n"
tab = "\t"


#####
#exiting if all 3 arguments are not passed via command line
def fail():
    print ("ERROR: " + str(len(sys.argv)-1) + " of 3 required arguments provided.")
    sys.exit()


#####
#getting arguments passed via command line
   
#testing for root DIRECTORY string
try: myDir = sys.argv[1]
except: fail()

#testing for RECURSION boolean
try: myRec = sys.argv[2]
except: fail()

#testing for OUTPUT filename string
try: myFile = sys.argv[3]
except: fail()


#####
#making list of PHP files within DIRECTORY
if myRec == "0": #without recursion
    myDir2 = myDir + "/*.php"
    PHP_list = glob.glob(myDir2)
elif myRec == "1": #with recursion
    PHP_list = []
    for dirname, dirnames, filenames in os.walk(myDir):
        for filename in filenames:
            if fnmatch.fnmatch (filename,("*.php")):
                match = os.path.join(dirname,filename)
                PHP_list.append(match)

#make an empty list;
#tuples will go in the list;
#each tuple will contain a PHP filename and a PHP filename it includes
includeList = []

#iterate through each PHP file and place tuples in the list
for phpFile in PHP_list:
    fileOpen = open(phpFile, "r")
    #for each line in a PHP file
    for line in fileOpen:
            m = re.match(r"(.*)include(.*\()(.*)\)", line) #for include(),include_once()
            if m:
                matchFile = m.group(3)[1:-1]
                if matchFile[-4::] == ".php": #only PHP files
                    phpFile = phpFile.replace("\\","/")
                    matchFile = matchFile.replace("\\","/")
                    matchFile = matchFile.replace("\"","")
                    matchFile = matchFile.replace('\'',"")
                    includeList.append([phpFile[len(myDir)+1:], matchFile])
            else: pass

            m = re.match(r'(.*)require(.*\()(.*)\)', line) #for require(), require_once()
            if m:
                matchFile = m.group(3)[1:-1]
                if matchFile[-4::] == '.php': #only PHP files
                    phpFile = phpFile.replace("\\","/")
                    matchFile = matchFile.replace("\\","/")
                    matchFile = matchFile.replace("\"","")
                    matchFile = matchFile.replace('\'',"")
                    includeList.append([phpFile[len(myDir)+1:], matchFile])
            else: pass


#####
#creating DOT file
dot = open(myFile, "w")

#writing to DOT file
dot.write("digraph {" + br)
for a,b in includeList:
    dot.write(tab)
    dot.write("\"")
    dot.write(a)
    dot.write("\"")
    dot.write(" -> ")
    dot.write("\"")
    dot.write(b)
    dot.write("\"")
    dot.write(";")
    dot.write(br)
dot.write("}")
dot.close()


#####
#exiting
sys.exit()

I ran the Python script on the PHP scripts for MXMLiszt.

Then I used the "circo" layout engine in Graphviz – specifically the Gvedit.exe application – on this resultant DOT file.

Here's the result:


--------------

Related Content:

Written by nitin

July 30th, 2011 at 1:03 pm